[jira] [Created] (CARBONDATA-3524) support compaction by GLOBAL_SORT

2019-09-23 Thread QiangCai (Jira)
QiangCai created CARBONDATA-3524:


 Summary: support compaction by GLOBAL_SORT
 Key: CARBONDATA-3524
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3524
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai


[Backgroud]

For GLOBAL_SORT table,  now the segments will be compact in LOCAL_SORT. 

[Motivation]

After compaction,  maybe it will impact query performance. Better to use 
GLABOL_SORT compaction to improve the performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-3523) Should store file size into index file

2019-09-19 Thread QiangCai (Jira)
QiangCai created CARBONDATA-3523:


 Summary: Should store file size into index file
 Key: CARBONDATA-3523
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3523
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai






--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3348) support alter table property SORT_COLUMNS

2019-05-15 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-3348.
--
Resolution: Fixed

> support alter table property SORT_COLUMNS
> -
>
> Key: CARBONDATA-3348
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3348
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Priority: Major
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3371) Compaction show ArrayIndexOutOfBoundsException after sort_columns modification

2019-05-05 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-3371:


Assignee: QiangCai

> Compaction show ArrayIndexOutOfBoundsException after sort_columns modification
> --
>
> Key: CARBONDATA-3371
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3371
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>
> 2019-05-05 15:26:39 ERROR DataTypeUtil:619 - Cannot convert� Z�w} to SHORT 
> type valueWrong length: 8, expected 2
> 2019-05-05 15:26:39 ERROR DataTypeUtil:621 - Problem while converting data 
> type� Z�w} 
> 2019-05-05 15:26:39 ERROR CompactionResultSortProcessor:185 - 3
> java.lang.ArrayIndexOutOfBoundsException: 3
>  at 
> org.apache.carbondata.core.scan.wrappers.ByteArrayWrapper.getNoDictionaryKeyByIndex(ByteArrayWrapper.java:81)
>  at 
> org.apache.carbondata.processing.merger.CompactionResultSortProcessor.prepareRowObjectForSorting(CompactionResultSortProcessor.java:332)
>  at 
> org.apache.carbondata.processing.merger.CompactionResultSortProcessor.processResult(CompactionResultSortProcessor.java:250)
>  at 
> org.apache.carbondata.processing.merger.CompactionResultSortProcessor.execute(CompactionResultSortProcessor.java:175)
>  at 
> org.apache.carbondata.spark.rdd.CarbonMergerRDD$$anon$1.(CarbonMergerRDD.scala:226)
>  at 
> org.apache.carbondata.spark.rdd.CarbonMergerRDD.internalCompute(CarbonMergerRDD.scala:84)
>  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:82)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:108)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-05-05 15:26:39 ERROR CarbonMergerRDD:233 - Compaction Failed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3371) Compaction show ArrayIndexOutOfBoundsException after sort_columns modification

2019-05-05 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3371:


 Summary: Compaction show ArrayIndexOutOfBoundsException after 
sort_columns modification
 Key: CARBONDATA-3371
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3371
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai


2019-05-05 15:26:39 ERROR DataTypeUtil:619 - Cannot convert� Z�w} to SHORT 
type valueWrong length: 8, expected 2
2019-05-05 15:26:39 ERROR DataTypeUtil:621 - Problem while converting data 
type� Z�w} 
2019-05-05 15:26:39 ERROR CompactionResultSortProcessor:185 - 3
java.lang.ArrayIndexOutOfBoundsException: 3
 at 
org.apache.carbondata.core.scan.wrappers.ByteArrayWrapper.getNoDictionaryKeyByIndex(ByteArrayWrapper.java:81)
 at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.prepareRowObjectForSorting(CompactionResultSortProcessor.java:332)
 at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.processResult(CompactionResultSortProcessor.java:250)
 at 
org.apache.carbondata.processing.merger.CompactionResultSortProcessor.execute(CompactionResultSortProcessor.java:175)
 at 
org.apache.carbondata.spark.rdd.CarbonMergerRDD$$anon$1.(CarbonMergerRDD.scala:226)
 at 
org.apache.carbondata.spark.rdd.CarbonMergerRDD.internalCompute(CarbonMergerRDD.scala:84)
 at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:82)
 at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
 at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
 at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2019-05-05 15:26:39 ERROR CarbonMergerRDD:233 - Compaction Failed



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3347) support SORT_COLUMNS modification

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-3347:
-
Description: 
*Background*

Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
modify SORT_COLUMNS on this table, we need to create a new table and migrate 
data. If the data is huge, the migration will take a long time and even impact 
the user business.

SORT_SCOPE in table properties can be modified now. And we can specify new 
SORT_SCOPE during data loading. Carbon index file will mark whether this 
segment is sorted or not. So the different segments maybe have different 
SORT_SCOPE.

*Mo**tivation*

After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
according to their business. History segments will still use old 
SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
if need.

But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
create the table because the modification will take many resources to resort 
data of old segments.

 

please check design doc for more detail.

[^sort_columns modification_v2.pdf]

  was:
*Background*

Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
modify SORT_COLUMNS on this table, we need to create a new table and migrate 
data. If the data is huge, the migration will take a long time and even impact 
the user business.

SORT_SCOPE in table properties can be modified now. And we can specify new 
SORT_SCOPE during data loading. Carbon index file will mark whether this 
segment is sorted or not. So the different segments maybe have different 
SORT_SCOPE.

*Mo**tivation*

After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
according to their business. History segments will still use old 
SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
if need.

But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
create the table because the modification will take many resources to resort 
data of old segments.

 

please check design doc for more detail.

[^sort_columns modification.pdf]

[^sort_columns modification_v2.pdf]


> support SORT_COLUMNS modification
> -
>
> Key: CARBONDATA-3347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3347
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: sort_columns modification.pdf, sort_columns 
> modification_v2.pdf
>
>
> *Background*
> Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
> modify SORT_COLUMNS on this table, we need to create a new table and migrate 
> data. If the data is huge, the migration will take a long time and even 
> impact the user business.
> SORT_SCOPE in table properties can be modified now. And we can specify new 
> SORT_SCOPE during data loading. Carbon index file will mark whether this 
> segment is sorted or not. So the different segments maybe have different 
> SORT_SCOPE.
> *Mo**tivation*
> After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
> according to their business. History segments will still use old 
> SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
> if need.
> But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
> create the table because the modification will take many resources to resort 
> data of old segments.
>  
> please check design doc for more detail.
> [^sort_columns modification_v2.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3347) support SORT_COLUMNS modification

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-3347:
-
Description: 
*Background*

Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
modify SORT_COLUMNS on this table, we need to create a new table and migrate 
data. If the data is huge, the migration will take a long time and even impact 
the user business.

SORT_SCOPE in table properties can be modified now. And we can specify new 
SORT_SCOPE during data loading. Carbon index file will mark whether this 
segment is sorted or not. So the different segments maybe have different 
SORT_SCOPE.

*Mo**tivation*

After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
according to their business. History segments will still use old 
SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
if need.

But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
create the table because the modification will take many resources to resort 
data of old segments.

 

please check design doc for more detail.

[^sort_columns modification.pdf]

[^sort_columns modification_v2.pdf]

  was:
*Background*

Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
modify SORT_COLUMNS on this table, we need to create a new table and migrate 
data. If the data is huge, the migration will take a long time and even impact 
the user business. 

SORT_SCOPE in table properties can be modified now. And we can specify new 
SORT_SCOPE during data loading. Carbon index file will mark whether this 
segment is sorted or not. So the different segments maybe have different 
SORT_SCOPE.

*Mo**tivation*

After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
according to their business. History segments will still use old 
SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
if need. 

But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
create the table because the modification will take many resources to resort 
data of old segments.

 

please check design doc for more detail.

[^sort_columns modification.pdf]


> support SORT_COLUMNS modification
> -
>
> Key: CARBONDATA-3347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3347
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: sort_columns modification.pdf, sort_columns 
> modification_v2.pdf
>
>
> *Background*
> Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
> modify SORT_COLUMNS on this table, we need to create a new table and migrate 
> data. If the data is huge, the migration will take a long time and even 
> impact the user business.
> SORT_SCOPE in table properties can be modified now. And we can specify new 
> SORT_SCOPE during data loading. Carbon index file will mark whether this 
> segment is sorted or not. So the different segments maybe have different 
> SORT_SCOPE.
> *Mo**tivation*
> After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
> according to their business. History segments will still use old 
> SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
> if need.
> But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
> create the table because the modification will take many resources to resort 
> data of old segments.
>  
> please check design doc for more detail.
> [^sort_columns modification.pdf]
> [^sort_columns modification_v2.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3347) support SORT_COLUMNS modification

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-3347:
-
Attachment: sort_columns modification_v2.pdf

> support SORT_COLUMNS modification
> -
>
> Key: CARBONDATA-3347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3347
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: sort_columns modification.pdf, sort_columns 
> modification_v2.pdf
>
>
> *Background*
> Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
> modify SORT_COLUMNS on this table, we need to create a new table and migrate 
> data. If the data is huge, the migration will take a long time and even 
> impact the user business. 
> SORT_SCOPE in table properties can be modified now. And we can specify new 
> SORT_SCOPE during data loading. Carbon index file will mark whether this 
> segment is sorted or not. So the different segments maybe have different 
> SORT_SCOPE.
> *Mo**tivation*
> After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
> according to their business. History segments will still use old 
> SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
> if need. 
> But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
> create the table because the modification will take many resources to resort 
> data of old segments.
>  
> please check design doc for more detail.
> [^sort_columns modification.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3347) support SORT_COLUMNS modification

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-3347:


Assignee: QiangCai

> support SORT_COLUMNS modification
> -
>
> Key: CARBONDATA-3347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3347
> Project: CarbonData
>  Issue Type: New Feature
>  Components: spark-integration
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: sort_columns modification.pdf
>
>
> *Background*
> Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
> modify SORT_COLUMNS on this table, we need to create a new table and migrate 
> data. If the data is huge, the migration will take a long time and even 
> impact the user business. 
> SORT_SCOPE in table properties can be modified now. And we can specify new 
> SORT_SCOPE during data loading. Carbon index file will mark whether this 
> segment is sorted or not. So the different segments maybe have different 
> SORT_SCOPE.
> *Mo**tivation*
> After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
> according to their business. History segments will still use old 
> SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
> if need. 
> But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
> create the table because the modification will take many resources to resort 
> data of old segments.
>  
> please check design doc for more detail.
> [^sort_columns modification.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3350) enhance custom compaction to support resort single segment

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-3350:
-
Parent Issue: CARBONDATA-3347  (was: CARBONDATA-3343)

> enhance custom compaction to support resort single segment
> --
>
> Key: CARBONDATA-3350
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3350
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3343) Support Compaction for Range Sort

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-3343:


Assignee: (was: QiangCai)

> Support Compaction for Range Sort
> -
>
> Key: CARBONDATA-3343
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3343
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: MANISH NALLA
>Priority: Major
> Attachments: Support Compaction for Range.docx
>
>
> CarbonData supports Compaction for all sort scopes based on their 
> taskIds, i.e, we group the partitions(carbondata files) of different 
> segments which have the same taskId to one task and then compact. But this 
> would not be the correct way to handle the compaction in the case of Range 
> Sort where we have data divided into different ranges for different 
> segments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3349) add is_sorted and sort_columns information into show segments

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-3349:
-
Parent Issue: CARBONDATA-3347  (was: CARBONDATA-3343)

> add is_sorted and sort_columns information into show segments
> -
>
> Key: CARBONDATA-3349
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3349
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3348) support alter table property SORT_COLUMNS

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-3348:
-
Parent Issue: CARBONDATA-3347  (was: CARBONDATA-3343)

> support alter table property SORT_COLUMNS
> -
>
> Key: CARBONDATA-3348
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3348
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3343) Support Compaction for Range Sort

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-3343:


Assignee: QiangCai

> Support Compaction for Range Sort
> -
>
> Key: CARBONDATA-3343
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3343
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: MANISH NALLA
>Assignee: QiangCai
>Priority: Major
> Attachments: Support Compaction for Range.docx
>
>
> CarbonData supports Compaction for all sort scopes based on their 
> taskIds, i.e, we group the partitions(carbondata files) of different 
> segments which have the same taskId to one task and then compact. But this 
> would not be the correct way to handle the compaction in the case of Range 
> Sort where we have data divided into different ranges for different 
> segments.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3350) enhance custom compaction to support resort single segment

2019-04-09 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3350:


 Summary: enhance custom compaction to support resort single segment
 Key: CARBONDATA-3350
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3350
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3349) add is_sorted and sort_columns information into show segments

2019-04-09 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-3349:
-
Summary: add is_sorted and sort_columns information into show segments  
(was: Show Segments add is_sorted and sort_columns information)

> add is_sorted and sort_columns information into show segments
> -
>
> Key: CARBONDATA-3349
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3349
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3349) Show Segments add is_sorted and sort_columns information

2019-04-09 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3349:


 Summary: Show Segments add is_sorted and sort_columns information
 Key: CARBONDATA-3349
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3349
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3348) support alter table property SORT_COLUMNS

2019-04-09 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3348:


 Summary: support alter table property SORT_COLUMNS
 Key: CARBONDATA-3348
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3348
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3347) support SORT_COLUMNS modification

2019-04-09 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3347:


 Summary: support SORT_COLUMNS modification
 Key: CARBONDATA-3347
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3347
 Project: CarbonData
  Issue Type: New Feature
  Components: spark-integration
Reporter: QiangCai
 Attachments: sort_columns modification.pdf

*Background*

Now SORT_COLUMNS can’t be modified after the table is created. If we want to 
modify SORT_COLUMNS on this table, we need to create a new table and migrate 
data. If the data is huge, the migration will take a long time and even impact 
the user business. 

SORT_SCOPE in table properties can be modified now. And we can specify new 
SORT_SCOPE during data loading. Carbon index file will mark whether this 
segment is sorted or not. So the different segments maybe have different 
SORT_SCOPE.

*Mo**tivation*

After the table is created, the user can adjust SORT_SCOPE/SORT_COLUMNS 
according to their business. History segments will still use old 
SORT_SCOPE/SORT_COLUMNS, but the user also can resort old segments one by one 
if need. 

But we still suggest the user give a proper SORT_SCOPE/SORT_COLUMNS when they 
create the table because the modification will take many resources to resort 
data of old segments.

 

please check design doc for more detail.

[^sort_columns modification.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3317) Executing 'show segments' command throws NPE when spark streaming app write data to new stream segment.

2019-03-18 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3317?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-3317.
--
Resolution: Fixed

> Executing 'show segments' command throws NPE when spark streaming app write 
> data to new stream segment.
> ---
>
> Key: CARBONDATA-3317
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3317
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.6.0
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: 1.6.0
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> When spark streaming app starts to create new stream segment, it does not 
> create carbondataindex file before writing data successfully, and now if 
> execute 'show segments' command, it will throw NPE.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3299) Updated value is not being reflected in the Desc formatted Command.

2019-02-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-3299.
--
Resolution: Fixed

> Updated value is not being reflected in the Desc formatted Command.
> ---
>
> Key: CARBONDATA-3299
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3299
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Shivam Goyal
>Priority: Minor
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> After changing the carbon properties related to Compaction, changed value is 
> not being reflected in the desc formatted command.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3276) Compacting table that do not exist should throw NoSuchTableException instead of MalformedCarbonCommandException

2019-02-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-3276.
--
Resolution: Fixed

> Compacting table that do not exist should throw NoSuchTableException instead 
> of MalformedCarbonCommandException
> ---
>
> Key: CARBONDATA-3276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3276
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Chenjian Qiu
>Priority: Major
>  Time Spent: 4h 10m
>  Remaining Estimate: 0h
>
> Compacting table that do not exist should throw NoSuchTableException instead 
> of MalformedCarbonCommandException("Operation not allowed : ALTER TABLE 
> table_name COMPACT 'MAJOR'")
> it's confused



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-841) improve the compress encoding for numeric type column to give good performance

2019-02-02 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai closed CARBONDATA-841.
---
Resolution: Fixed

> improve the compress encoding for numeric type column to give good performance
> --
>
> Key: CARBONDATA-841
> URL: https://issues.apache.org/jira/browse/CARBONDATA-841
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>
> Now no-dictionary column use LV(length-value) encoding. It isn't the best 
> choice for numeric type column.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-436) Make blocklet size configuration respect to the actual size (in terms of byte) of the blocklet

2019-02-02 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-436?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai closed CARBONDATA-436.
---
Resolution: Fixed

> Make blocklet size configuration respect to the actual size (in terms of 
> byte) of the blocklet
> --
>
> Key: CARBONDATA-436
> URL: https://issues.apache.org/jira/browse/CARBONDATA-436
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: suo tong
>Assignee: QiangCai
>Priority: Major
>
> Currently, the blocklet size is based on the row counts within the blocklet. 
> The default value(12) is small for hdfs io. If we increase the value, 
> which may cause too many Young-GC when we scan many columns, instead, we can 
> extend the configuration with respect to the actual size of the blocklet.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-1136) After compaction, the select query is not showing data

2019-02-02 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai closed CARBONDATA-1136.

Resolution: Fixed

> After compaction, the select query is not showing data
> --
>
> Key: CARBONDATA-1136
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1136
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> After compaction, the select query is not showing data
> create table part_major_compact(a String, b int) partitioned by (c int) 
> stored by 'carbondata' 
> tblproperties('PARTITION_TYPE'='LIST','LIST_INFO'='1,2')
> insert into part_major_compact select 'a', 2, 3 from originTable limit 1
> insert into part_major_compact select 'b', 3, 4 from originTable limit 1
> insert into part_major_compact select 'c', 4, 5 from originTable limit 1
> insert into part_major_compact select 'd', 1, 2 from originTable limit 1
> alter table part_major_compact compact 'major'
> select * from part_major_compact where c = 4



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-1572) Support Streaming Ingest

2019-02-02 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai closed CARBONDATA-1572.

Resolution: Fixed

> Support Streaming Ingest
> 
>
> Key: CARBONDATA-1572
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1572
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: CarbonData Streaming Ingest_v1.6.pdf
>
>  Time Spent: 14h 20m
>  Remaining Estimate: 0h
>
> CarbonData should support streaming ingest.
> [^CarbonData Streaming Ingest_v1.6.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-3263) Should update doc for RANGE_COLUMN feature

2019-02-02 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-3263.
--
Resolution: Fixed

> Should update doc for RANGE_COLUMN feature
> --
>
> Key: CARBONDATA-3263
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3263
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Minor
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-3269) Range_column throwing ArrayIndexOutOfBoundsException when using KryoSerializer

2019-01-24 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-3269:
-
Description: 
Reproduce:

For range_column feature,When we set "spark.serializer" to 
"org.apache.spark.serializer.KryoSerializer", data loading will throw 
ArrayIndexOutOfBoundsException.

Excpetion:

2019-01-25 13:00:19 ERROR DataLoadProcessorStepOnSpark$:367 - Data Loading 
failed for table carbon_range_column4
 java.lang.ArrayIndexOutOfBoundsException: 5
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 2019-01-25 13:00:19 ERROR TaskContextImpl:91 - Error in TaskFailureListener
 org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: 
Data Loading failed for table carbon_range_column4
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$.org$apache$carbondata$spark$load$DataLoadProcessorStepOnSpark$$wrapException(DataLoadProcessorStepOnSpark.scala:368)
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anonfun$convertFunc$3.apply(DataLoadProcessorStepOnSpark.scala:215)
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anonfun$convertFunc$3.apply(DataLoadProcessorStepOnSpark.scala:210)
 at org.apache.spark.TaskContext$$anon$2.onTaskFailure(TaskContext.scala:144)
 at 
org.apache.spark.TaskContextImpl$$anonfun$markTaskFailed$1.apply(TaskContextImpl.scala:107)
 at 
org.apache.spark.TaskContextImpl$$anonfun$markTaskFailed$1.apply(TaskContextImpl.scala:107)
 at 
org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130)
 at 
org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128)
 at org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:106)
 at org.apache.spark.scheduler.Task.run(Task.scala:113)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
 Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 ... 4 more

  was:
2019-01-25 13:00:19 ERROR DataLoadProcessorStepOnSpark$:367 - Data Loading 
failed for table carbon_range_column4
java.lang.ArrayIndexOutOfBoundsException: 5
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2019-01-25 13:00:19 ERROR TaskContextImpl:91 - Error in TaskFailureListener
org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: 
Data Loading failed for table carbon_range_column4
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$.org$apache$carbondata$spark$load$DataLoadProcessorStepOnSpark$$wrapException(DataLoadProcessorStepOnSpark.scala:368)
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anonfun$convertFunc$3.apply(DataLoadProcessorStepOnSpark.scala:215)
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anonfun$convertFunc$3.apply(DataLoadProcessorStepOnSpark.scala:210)
 at org.apache.spark.TaskContext$$anon$2.onTaskFailure(TaskContext.scala:144)
 at 
org.apache.spark.TaskContextImpl$$anonfun$markTaskFailed$1.apply(TaskContextImpl.scala:107)
 at 

[jira] [Created] (CARBONDATA-3269) Range_column throwing ArrayIndexOutOfBoundsException when using KryoSerializer

2019-01-24 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3269:


 Summary: Range_column throwing ArrayIndexOutOfBoundsException when 
using KryoSerializer
 Key: CARBONDATA-3269
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3269
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai


2019-01-25 13:00:19 ERROR DataLoadProcessorStepOnSpark$:367 - Data Loading 
failed for table carbon_range_column4
java.lang.ArrayIndexOutOfBoundsException: 5
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
2019-01-25 13:00:19 ERROR TaskContextImpl:91 - Error in TaskFailureListener
org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException: 
Data Loading failed for table carbon_range_column4
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$.org$apache$carbondata$spark$load$DataLoadProcessorStepOnSpark$$wrapException(DataLoadProcessorStepOnSpark.scala:368)
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anonfun$convertFunc$3.apply(DataLoadProcessorStepOnSpark.scala:215)
 at 
org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anonfun$convertFunc$3.apply(DataLoadProcessorStepOnSpark.scala:210)
 at org.apache.spark.TaskContext$$anon$2.onTaskFailure(TaskContext.scala:144)
 at 
org.apache.spark.TaskContextImpl$$anonfun$markTaskFailed$1.apply(TaskContextImpl.scala:107)
 at 
org.apache.spark.TaskContextImpl$$anonfun$markTaskFailed$1.apply(TaskContextImpl.scala:107)
 at 
org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130)
 at 
org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128)
 at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
 at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
 at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128)
 at org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:106)
 at org.apache.spark.scheduler.Task.run(Task.scala:113)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 ... 4 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3269) Range_column throwing ArrayIndexOutOfBoundsException when using KryoSerializer

2019-01-24 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-3269:


Assignee: QiangCai

> Range_column throwing ArrayIndexOutOfBoundsException when using KryoSerializer
> --
>
> Key: CARBONDATA-3269
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3269
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Critical
>
> 2019-01-25 13:00:19 ERROR DataLoadProcessorStepOnSpark$:367 - Data Loading 
> failed for table carbon_range_column4
> java.lang.ArrayIndexOutOfBoundsException: 5
>  at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:108)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> 2019-01-25 13:00:19 ERROR TaskContextImpl:91 - Error in TaskFailureListener
> org.apache.carbondata.processing.loading.exception.CarbonDataLoadingException:
>  Data Loading failed for table carbon_range_column4
>  at 
> org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$.org$apache$carbondata$spark$load$DataLoadProcessorStepOnSpark$$wrapException(DataLoadProcessorStepOnSpark.scala:368)
>  at 
> org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anonfun$convertFunc$3.apply(DataLoadProcessorStepOnSpark.scala:215)
>  at 
> org.apache.carbondata.spark.load.DataLoadProcessorStepOnSpark$$anonfun$convertFunc$3.apply(DataLoadProcessorStepOnSpark.scala:210)
>  at org.apache.spark.TaskContext$$anon$2.onTaskFailure(TaskContext.scala:144)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskFailed$1.apply(TaskContextImpl.scala:107)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$markTaskFailed$1.apply(TaskContextImpl.scala:107)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:130)
>  at 
> org.apache.spark.TaskContextImpl$$anonfun$invokeListeners$1.apply(TaskContextImpl.scala:128)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>  at 
> org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:128)
>  at org.apache.spark.TaskContextImpl.markTaskFailed(TaskContextImpl.scala:106)
>  at org.apache.spark.scheduler.Task.run(Task.scala:113)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
>  at 
> org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
>  at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
>  at org.apache.spark.scheduler.Task.run(Task.scala:108)
>  ... 4 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3263) Should update doc for RANGE_COLUMN feature

2019-01-21 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-3263:


Assignee: QiangCai

> Should update doc for RANGE_COLUMN feature
> --
>
> Key: CARBONDATA-3263
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3263
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Minor
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3263) Should update doc for RANGE_COLUMN feature

2019-01-21 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3263:


 Summary: Should update doc for RANGE_COLUMN feature
 Key: CARBONDATA-3263
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3263
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3242) Range_Column should be table level property

2019-01-10 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3242:


 Summary: Range_Column should be table level property
 Key: CARBONDATA-3242
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3242
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3239) Throwing ArrayIndexOutOfBoundsException in DataSkewRangePartitioner

2019-01-09 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3239:


 Summary: Throwing ArrayIndexOutOfBoundsException in 
DataSkewRangePartitioner
 Key: CARBONDATA-3239
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3239
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Reporter: QiangCai


2019-01-10 15:31:21 ERROR DataLoadProcessorStepOnSpark$:367 - Data Loading 
failed for table carbon_range_column4
java.lang.ArrayIndexOutOfBoundsException: 1
 at 
org.apache.spark.DataSkewRangePartitioner$$anonfun$initialize$1.apply$mcVI$sp(DataSkewRangePartitioner.scala:223)
 at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
 at 
org.apache.spark.DataSkewRangePartitioner.initialize(DataSkewRangePartitioner.scala:222)
 at 
org.apache.spark.DataSkewRangePartitioner.getPartition(DataSkewRangePartitioner.scala:234)
 at 
org.apache.spark.shuffle.sort.BypassMergeSortShuffleWriter.write(BypassMergeSortShuffleWriter.java:151)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:96)
 at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:53)
 at org.apache.spark.scheduler.Task.run(Task.scala:108)
 at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3219) support range partition the input data for local_sort/global sort data loading

2019-01-01 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3219?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-3219:


Assignee: QiangCai

> support range partition the input data for local_sort/global sort data loading
> --
>
> Key: CARBONDATA-3219
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3219
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Affects Versions: 1.5.2
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: RANGE_COLUMN.pdf
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> For global_sort/local_sort table, load data command add RANGE_COLUMN option
> load data inpath '' into table  options('RANGE_COLUMN'=' column>')
> It will range partition the input data.
> Design doc:
> [^RANGE_COLUMN.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-3220) Should support presto to read stream segment data

2019-01-01 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-3220:


Assignee: QiangCai

> Should support presto to read stream segment data
> -
>
> Key: CARBONDATA-3220
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3220
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 1.5.2
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>
> now the integration of presto doesn't support to read streaming segment data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3220) Should support presto to read stream segment data

2019-01-01 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3220:


 Summary: Should support presto to read stream segment data
 Key: CARBONDATA-3220
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3220
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 1.5.2
Reporter: QiangCai


now the integration of presto doesn't support to read streaming segment data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3219) support range partition the input data for local_sort/global sort data loading

2019-01-01 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3219:


 Summary: support range partition the input data for 
local_sort/global sort data loading
 Key: CARBONDATA-3219
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3219
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Affects Versions: 1.5.2
Reporter: QiangCai
 Attachments: RANGE_COLUMN.pdf

For global_sort/local_sort table, load data command add RANGE_COLUMN option
load data inpath '' into table  options('RANGE_COLUMN'='')
It will range partition the input data.

Design doc:

[^RANGE_COLUMN.pdf]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3059) the size of carbondata file is much smaller than table_blocksize

2018-10-30 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3059:


 Summary: the size of carbondata file is much smaller than 
table_blocksize
 Key: CARBONDATA-3059
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3059
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.5.1
Reporter: QiangCai
 Attachments: image-2018-10-30-15-56-52-108.png

create a table with table_blocksize = 128,

but the size of carbondata file is about 55MB.

for example: 58.4 + 50.7 < 128 -128 * 10%

Why does the system separate the data into the two files?

 

!image-2018-10-30-15-56-52-108.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-3021) Streaming throw Unsupported data type exception

2018-10-17 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-3021:


 Summary: Streaming throw Unsupported data type exception
 Key: CARBONDATA-3021
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3021
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 1.5.0
Reporter: QiangCai


at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:343)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anon$1.run(StreamExecution.scala:206)
Caused by: org.apache.carbondata.streaming.CarbonStreamException: Job failed to 
write data file
at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:288)
at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply(CarbonAppendableStreamSink.scala:238)
at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply(CarbonAppendableStreamSink.scala:238)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$.writeDataFileJob(CarbonAppendableStreamSink.scala:238)
at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink.addBatch(CarbonAppendableStreamSink.scala:133)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply$mcV$sp(StreamExecution.scala:666)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply(StreamExecution.scala:666)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch$1.apply(StreamExecution.scala:666)
at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatch(StreamExecution.scala:665)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(StreamExecution.scala:306)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1$$anonfun$apply$mcZ$sp$1.apply(StreamExecution.scala:294)
at 
org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:279)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
at 
org.apache.spark.sql.execution.streaming.StreamExecution$$anonfun$org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches$1.apply$mcZ$sp(StreamExecution.scala:294)
at 
org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
at 
org.apache.spark.sql.execution.streaming.StreamExecution.org$apache$spark$sql$execution$streaming$StreamExecution$$runBatches(StreamExecution.scala:290)
... 1 more
Caused by: java.lang.IllegalArgumentException: Unsupported data type: LONG
at 
org.apache.carbondata.core.util.comparator.Comparator.getComparatorByDataTypeForMeasure(Comparator.java:73)
at 
org.apache.carbondata.streaming.segment.StreamSegment.mergeBatchMinMax(StreamSegment.java:471)
at 
org.apache.carbondata.streaming.segment.StreamSegment.updateStreamFileIndex(StreamSegment.java:610)
at 
org.apache.carbondata.streaming.segment.StreamSegment.updateIndexFile(StreamSegment.java:627)
at 
org.apache.spark.sql.execution.streaming.CarbonAppendableStreamSink$$anonfun$writeDataFileJob$1.apply$mcV$sp(CarbonAppendableStreamSink.scala:277)
... 20 more



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2984) streaming throw NPE when there is no data in the task of a batch

2018-09-28 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2984:


 Summary: streaming throw NPE when there is no data in the task of 
a batch
 Key: CARBONDATA-2984
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2984
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai


!746438440.jpg!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2853) Add min/max index for streaming segment

2018-09-10 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-2853:
-
Description: Streaming index file in stream segment adds min/max meta index 
for each streaming file during streaming ingestion. So the filter query can use 
the min/max index to prune the streaming files to reduce the number of the 
spark tasks in the driver side. Streaming file adds min/max into the blocklet 
header, so the filter query can skip data during scanning file.  (was: 
Streaming index file in stream segment adds min/max meta index for each 
streaming file during streaming ingestion. So the filter query can use the 
file-level min/max index to prune the streaming files to reduce the number of 
the spark tasks.)
Summary: Add min/max index for streaming segment  (was: Add file-level 
min/max index for streaming segment)

> Add min/max index for streaming segment
> ---
>
> Key: CARBONDATA-2853
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2853
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.0
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: streaming_minmax_v2.pdf
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Streaming index file in stream segment adds min/max meta index for each 
> streaming file during streaming ingestion. So the filter query can use the 
> min/max index to prune the streaming files to reduce the number of the spark 
> tasks in the driver side. Streaming file adds min/max into the blocklet 
> header, so the filter query can skip data during scanning file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2853) Add file-level min/max index for streaming segment

2018-09-10 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-2853:
-
Attachment: (was: streaming_minmax.pdf)

> Add file-level min/max index for streaming segment
> --
>
> Key: CARBONDATA-2853
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2853
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.0
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: streaming_minmax_v2.pdf
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Streaming index file in stream segment adds file-level min/max meta index for 
> each streaming file during streaming ingestion. So the filter query can use 
> the file-level min/max index to prune the streaming files to reduce the 
> number of the spark tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2853) Add file-level min/max index for streaming segment

2018-09-10 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-2853:
-
Attachment: streaming_minmax_v2.pdf

> Add file-level min/max index for streaming segment
> --
>
> Key: CARBONDATA-2853
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2853
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.0
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: streaming_minmax_v2.pdf
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Streaming index file in stream segment adds file-level min/max meta index for 
> each streaming file during streaming ingestion. So the filter query can use 
> the file-level min/max index to prune the streaming files to reduce the 
> number of the spark tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2853) Add file-level min/max index for streaming segment

2018-09-10 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-2853:
-
Description: Streaming index file in stream segment adds min/max meta index 
for each streaming file during streaming ingestion. So the filter query can use 
the file-level min/max index to prune the streaming files to reduce the number 
of the spark tasks.  (was: Streaming index file in stream segment adds 
file-level min/max meta index for each streaming file during streaming 
ingestion. So the filter query can use the file-level min/max index to prune 
the streaming files to reduce the number of the spark tasks.)

> Add file-level min/max index for streaming segment
> --
>
> Key: CARBONDATA-2853
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2853
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.0
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: streaming_minmax_v2.pdf
>
>  Time Spent: 8h 50m
>  Remaining Estimate: 0h
>
> Streaming index file in stream segment adds min/max meta index for each 
> streaming file during streaming ingestion. So the filter query can use the 
> file-level min/max index to prune the streaming files to reduce the number of 
> the spark tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2923) should log the info of the min/max identification on streaming table

2018-09-09 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2923:


 Summary: should log the info of the min/max identification on 
streaming table
 Key: CARBONDATA-2923
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2923
 Project: CarbonData
  Issue Type: Improvement
Affects Versions: 1.5.0
Reporter: QiangCai


currently, the query doesn't log the info of the min/max identification on the 
streaming table, so we don't know whether the min/max of streaming is working 
fine or not.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2917) Should support binary datatype

2018-09-04 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2917:


 Summary: Should support binary datatype
 Key: CARBONDATA-2917
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2917
 Project: CarbonData
  Issue Type: Improvement
  Components: file-format
Affects Versions: 1.5.0
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2884) Should rename the methods of ByteUtil class to avoid the misuse

2018-08-23 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2884:


 Summary: Should rename the methods of ByteUtil class to avoid the 
misuse
 Key: CARBONDATA-2884
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2884
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Reporter: QiangCai


the method toBytes will execute XOR operation on data.

So the result is not the byte array of the real value.

Better to rename the methods of ByteUtil class to avoid the misuse



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2884) Should rename the methods of ByteUtil class to avoid the misuse

2018-08-23 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-2884:


Assignee: QiangCai

> Should rename the methods of ByteUtil class to avoid the misuse
> ---
>
> Key: CARBONDATA-2884
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2884
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Minor
>
> the method toBytes will execute XOR operation on data.
> So the result is not the byte array of the real value.
> Better to rename the methods of ByteUtil class to avoid the misuse



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2853) Add file-level min/max index for streaming segment

2018-08-23 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-2853:
-
Description: Streaming index file in stream segment adds file-level min/max 
meta index for each streaming file during streaming ingestion. So the filter 
query can use the file-level min/max index to prune the streaming files to 
reduce the number of the spark tasks.

> Add file-level min/max index for streaming segment
> --
>
> Key: CARBONDATA-2853
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2853
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.0
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: streaming_minmax.pdf
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Streaming index file in stream segment adds file-level min/max meta index for 
> each streaming file during streaming ingestion. So the filter query can use 
> the file-level min/max index to prune the streaming files to reduce the 
> number of the spark tasks.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2853) Add file-level min/max index for streaming segment

2018-08-23 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-2853:
-
Attachment: streaming_minmax.pdf

> Add file-level min/max index for streaming segment
> --
>
> Key: CARBONDATA-2853
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2853
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.0
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: streaming_minmax.pdf
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2854) Release table status file lock before delete physical files when execute 'clean files' command

2018-08-14 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2854.
--
Resolution: Fixed

> Release table status file lock before delete physical files when execute 
> 'clean files' command
> --
>
> Key: CARBONDATA-2854
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2854
> Project: CarbonData
>  Issue Type: Bug
>  Components: spark-integration
>Affects Versions: 1.4.0, 1.5.0
>Reporter: Zhichao  Zhang
>Assignee: Zhichao  Zhang
>Priority: Minor
> Fix For: 1.5.0
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> Release table status file lock before delete physical files when execute 
> 'clean files' command, otherwise table status file will be locked during 
> deleting physical files, it may take a long time, other operations will fail 
> to get table status file lock.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2853) Add file-level min/max index for streaming segment

2018-08-13 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2853:


 Summary: Add file-level min/max index for streaming segment
 Key: CARBONDATA-2853
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2853
 Project: CarbonData
  Issue Type: Sub-task
Affects Versions: 1.5.0
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2853) Add file-level min/max index for streaming segment

2018-08-13 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2853?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-2853:


Assignee: QiangCai

> Add file-level min/max index for streaming segment
> --
>
> Key: CARBONDATA-2853
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2853
> Project: CarbonData
>  Issue Type: Sub-task
>Affects Versions: 1.5.0
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2776) Support ingesting data from Kafka service

2018-07-24 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2776:


 Summary: Support ingesting data from Kafka service
 Key: CARBONDATA-2776
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2776
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2377) Support message throttling in search mode

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2377.
--
Resolution: Fixed

> Support message throttling in search mode
> -
>
> Key: CARBONDATA-2377
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2377
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
>  Time Spent: 7h 20m
>  Remaining Estimate: 0h
>
> In search mode concurrent query scenario, we should control the number of 
> request sent to Worker to prevent it from overloaded



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2752) Carbon provide Zeppelin support

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2752?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2752.
--
Resolution: Fixed

> Carbon provide Zeppelin support
> ---
>
> Key: CARBONDATA-2752
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2752
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Ajith S
>Assignee: Ajith S
>Priority: Major
> Fix For: 1.5.0
>
> Attachments: image-2018-07-18-17-09-04-583.png
>
>  Time Spent: 4.5h
>  Remaining Estimate: 0h
>
> *Apache Zeppelin* is a popular open web-based notebook that enables 
> interactive data analytics. This is one of the favored solutions for 
> providing UI frontend as it can support solutions like Spark already. Carbon 
> can leverage this to provide a UI for its operations. After CARBONDATA-2688 
> which provides a carbon REST server, we can add a UI support from zeppelin to 
> provide a complete solution.
> Reference: [https://zeppelin.apache.org/]
> +Proposed solution:+
> !image-2018-07-18-17-09-04-583.png!
>  
> This JIRA propose to add a carbon based interpreter for Zeppelin.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2759) Add Bad_Records_Options to STMPROPERTIES for Streaming Table

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2759.
--
Resolution: Fixed

> Add Bad_Records_Options to STMPROPERTIES for Streaming Table
> 
>
> Key: CARBONDATA-2759
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2759
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthumurugesh
>Assignee: Indhumathi Muthumurugesh
>Priority: Major
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2160) Compacted streaming segment's Merged To is empty

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2160.
--
Resolution: Fixed
  Assignee: QiangCai

> Compacted streaming segment's Merged To is empty
> 
>
> Key: CARBONDATA-2160
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2160
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Minor
>
> |SegmentSequenceId|Status|Load Start Time|Load End Time|Merged To|File Format|
> |7|Success|2018-02-11 02:02:37.105|2018-02-11 02:02:37.293|NA|COLUMNAR_V3|
> |6|Success|2018-02-11 02:02:15.068|2018-02-11 02:02:15.194|NA|COLUMNAR_V3|
> |5|Compacted|2018-02-11 02:02:15.062|2018-02-11 02:02:37.102|NA|ROW_V1|
> |4|Compacted|2018-02-11 02:02:06.981|2018-02-11 02:02:15.062|NA|ROW_V1|
> |3|Success|2018-02-11 02:02:04.348|2018-02-11 02:02:04.462|NA|COLUMNAR_V3|
> |2|Success|2018-02-11 02:02:04.072|2018-02-11 02:02:04.346|NA|COLUMNAR_V3|
> |1|Compacted|2018-02-11 02:01:50.188|2018-02-11 02:02:04.066|NA|ROW_V1|
> |0|Compacted|2018-02-11 02:01:44.06|2018-02-11 02:01:50.188|NA|ROW_V1|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2067) Streaming hand off operation throw NullPointerException

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2067.
--
Resolution: Fixed

> Streaming hand off operation throw NullPointerException
> ---
>
> Key: CARBONDATA-2067
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2067
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 40m
>  Remaining Estimate: 0h
>
> 18/01/23 16:01:10 ERROR CompactionResultSortProcessor: Executor task launch 
> worker for task 0 Compaction failed: null java.lang.NullPointerException at 
> org.apache.carbondata.processing.util.CarbonDataProcessorUtil.getLocalDataFolderLocation(CarbonDataProcessorUtil.java:152)
>  at 
> org.apache.carbondata.processing.merger.CompactionResultSortProcessor.initTempStoreLocation(CompactionResultSortProcessor.java:424)
>  at 
> org.apache.carbondata.processing.merger.CompactionResultSortProcessor.execute(CompactionResultSortProcessor.java:156)
>  at 
> org.apache.carbondata.streaming.StreamHandoffRDD.internalCompute(StreamHandoffRDD.scala:113)
>  at org.apache.carbondata.spark.rdd.CarbonRDD.compute(CarbonRDD.scala:60) at 
> org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323) at 
> org.apache.spark.rdd.RDD.iterator(RDD.scala:287) at 
> org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) at 
> org.apache.spark.scheduler.Task.run(Task.scala:108) at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>  at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1780) Create configuration from SparkSession for data loading

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-1780.
--
Resolution: Fixed
  Assignee: QiangCai

> Create configuration from SparkSession for data loading
> ---
>
> Key: CARBONDATA-1780
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1780
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Create configuration form SparkSession for data loading



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1587) Integrate with Kafka and Spark structured streaming to ingest data

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-1587.
--
Resolution: Fixed
  Assignee: QiangCai

> Integrate with Kafka and Spark structured streaming to ingest data
> --
>
> Key: CARBONDATA-1587
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1587
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Assignee: QiangCai
>Priority: Major
>
> Should support ingest data from kafka and spark structured streaming into 
> carbon streaming table.
> The solution should provide E2E exactly once semantic and fault tolerance.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1261) load sql add 'header' option

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-1261.
--
Resolution: Fixed
  Assignee: QiangCai

> load sql add 'header' option
> 
>
> Key: CARBONDATA-1261
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1261
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> When we load the CSV files without file header and the file header is the 
> same with the table schema,  add 'header'='false' to load data sql, no need 
> to let user provide the file header.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1475) fixed dependencies for Intellij Idea

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-1475.
--
Resolution: Fixed
  Assignee: QiangCai

> fixed dependencies for Intellij Idea
> 
>
> Key: CARBONDATA-1475
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1475
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
> Attachments: Selection_007.png
>
>  Time Spent: 3h 20m
>  Remaining Estimate: 0h
>
> When we choose profile spark-1.6, libraries contain dependencies of 
> scala-2.11.
> !Selection_007.png|thumbnail!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1407) The default end key is not correct for plain-text dimension

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-1407.
--
Resolution: Fixed
  Assignee: QiangCai

> The default end key is not correct for plain-text dimension
> ---
>
> Key: CARBONDATA-1407
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1407
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 1.2.0
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Minor
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> Now the default end key is 127 (0xEF),  we should change it to 0xFF



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1253) Sort_columns should not support float,double,decimal

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-1253.
--
Resolution: Fixed
  Assignee: QiangCai

> Sort_columns should not support float,double,decimal
> 
>
> Key: CARBONDATA-1253
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1253
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-1228) the query result of double is not correct

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-1228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-1228.
--
Resolution: Fixed
  Assignee: QiangCai

> the query result of double is not correct
> -
>
> Key: CARBONDATA-1228
> URL: https://issues.apache.org/jira/browse/CARBONDATA-1228
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-886) remove all redundant local variable

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-886.
-
Resolution: Fixed

> remove all redundant local variable
> ---
>
> Key: CARBONDATA-886
> URL: https://issues.apache.org/jira/browse/CARBONDATA-886
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Minor
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-842) when SORT_COLUMN is empty, no need to sort data.

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-842.
-
Resolution: Fixed

> when SORT_COLUMN is empty, no need to sort data.
> 
>
> Key: CARBONDATA-842
> URL: https://issues.apache.org/jira/browse/CARBONDATA-842
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-765) dataframe wirter need to first drop table unless loading said table not found

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai closed CARBONDATA-765.
---
Resolution: Fixed
  Assignee: (was: QiangCai)

> dataframe wirter need to first drop table unless loading said table not found
> -
>
> Key: CARBONDATA-765
> URL: https://issues.apache.org/jira/browse/CARBONDATA-765
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Priority: Major
>
> dataframe wirter need to first drop table unless loading said table not found



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-782) Support SORT_COLUMNS

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-782?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-782.
-
Resolution: Fixed

> Support SORT_COLUMNS
> 
>
> Key: CARBONDATA-782
> URL: https://issues.apache.org/jira/browse/CARBONDATA-782
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 13h 50m
>  Remaining Estimate: 0h
>
> The tasks of SORT_COLUMNS:
> 1.Support create table with sort_columns property.
> e.g. tblproperties('sort_columns' = 'col7,col3')
> The table with SORT_COLUMNS property will be sorted by SORT_COLUMNS. The 
> order of columns is decided by SORT_COLUMNS.
> 2.Change the encoding rule of SORT_COLUMNS
> Firstly, the rule of column encoding will keep consistent with previous.
> Secondly, if a column of SORT_COLUMNS is a measure before, now this column 
> will be created as a dimension. And this dimension is a no-dicitonary 
> column(Better to use other direct-dictionary).
> Thirdly, the dimension of SORT_COLUMNS have RLE and ROWID page, other 
> dimension have only RLE(not sorted).
> 3.The start/end key should be composed of SORT_COLUMNS.
> Using SORT_COLUMNS to build start/end key during data loading and select 
> query.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Closed] (CARBONDATA-763) Add L5 loading support, global sorting like HBase

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai closed CARBONDATA-763.
---
Resolution: Fixed

already implemented

> Add L5 loading support, global sorting like HBase
> -
>
> Key: CARBONDATA-763
> URL: https://issues.apache.org/jira/browse/CARBONDATA-763
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: QiangCai
>Priority: Major
>
> Add L5 loading support, global sorting like HBase



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2175) Query with Lucene DataMap while filters contain match UDF

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2175.
--
Resolution: Fixed
  Assignee: Jacky Li  (was: QiangCai)

> Query with Lucene DataMap while filters contain match UDF 
> --
>
> Key: CARBONDATA-2175
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2175
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Assignee: Jacky Li
>Priority: Major
>
> query with Lucene DataMap while filters contain match UDF 
>  
> Limitation::(
> 1. only support one match in the where condition at first
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2173) Build Lucene DataMap for the segment

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2173?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2173.
--
Resolution: Fixed

> Build Lucene DataMap for the segment 
> -
>
> Key: CARBONDATA-2173
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2173
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>
> load data should build Lucene DataMap for the segment 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2172) Create Lucene DataMap with 'text_columns' property

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2172.
--
Resolution: Fixed

> Create Lucene DataMap with 'text_columns' property 
> ---
>
> Key: CARBONDATA-2172
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2172
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> create Lucene DataMap with 'text_columns' property and build Lucene 
> DataMap for all exists segments
> create datamap  on  
>  using 'lucene' 
>  dmproperties('text_columns'='col1,col2')



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2220) Reduce unnecessary audit log

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2220.
--
Resolution: Fixed
  Assignee: Jacky Li

> Reduce unnecessary audit log
> 
>
> Key: CARBONDATA-2220
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2220
> Project: CarbonData
>  Issue Type: Improvement
>Affects Versions: 1.3.1
>Reporter: QiangCai
>Assignee: Jacky Li
>Priority: Minor
>  Time Spent: 3h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2311) Should avoid to append data to "streaming finish" streaming

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2311.
--
Resolution: Fixed
  Assignee: QiangCai

> Should avoid to append data to "streaming finish" streaming
> ---
>
> Key: CARBONDATA-2311
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2311
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 5h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2326) Statistics feature throw NPE when spark.sql.execution.id is null

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2326.
--
Resolution: Fixed

> Statistics feature throw NPE  when spark.sql.execution.id is null
> -
>
> Key: CARBONDATA-2326
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2326
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2326) Statistics feature throw NPE when spark.sql.execution.id is null

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2326?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-2326:


Assignee: QiangCai

> Statistics feature throw NPE  when spark.sql.execution.id is null
> -
>
> Key: CARBONDATA-2326
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2326
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Minor
>  Time Spent: 1h
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2363) Should add CarbonStreamingQueryListener to SparkSession

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-2363:


Assignee: QiangCai

> Should add CarbonStreamingQueryListener to SparkSession
> ---
>
> Key: CARBONDATA-2363
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2363
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2363) Should add CarbonStreamingQueryListener to SparkSession

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2363.
--
Resolution: Fixed

> Should add CarbonStreamingQueryListener to SparkSession
> ---
>
> Key: CARBONDATA-2363
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2363
> Project: CarbonData
>  Issue Type: Bug
>Reporter: QiangCai
>Priority: Major
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2718) Support SQL interface in REST API

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2718.
--
Resolution: Fixed

> Support SQL interface in REST API
> -
>
> Key: CARBONDATA-2718
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2718
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Assignee: Jacky Li
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2705) CarbonStore Java API and implementation

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2705.
--
Resolution: Fixed

> CarbonStore Java API and implementation
> ---
>
> Key: CARBONDATA-2705
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2705
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Jacky Li
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> Support two implementations:
> 1. LocalCarbonStore for usage in local mode
> 2. DistributedCarbonStore leveraging multiple server (Master and Workers) via 
> RPC



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Assigned] (CARBONDATA-2691) add RESTful API

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2691?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai reassigned CARBONDATA-2691:


Assignee: QiangCai

> add RESTful API
> ---
>
> Key: CARBONDATA-2691
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2691
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2692) add filter expression parser

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2692.
--
Resolution: Fixed
  Assignee: QiangCai

> add filter expression parser
> 
>
> Key: CARBONDATA-2692
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2692
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Assignee: QiangCai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2690) basic framework: use java to rewrite master and work

2018-07-22 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2690.
--
Resolution: Fixed

> basic framework: use java to rewrite master and work
> 
>
> Key: CARBONDATA-2690
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2690
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Priority: Major
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2767) Query take more than 5 seconds for RACK_LOCAL

2018-07-22 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2767:


 Summary: Query take more than 5 seconds for RACK_LOCAL
 Key: CARBONDATA-2767
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2767
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai


If the Spark cluster and the Hadoop cluster are two different machine cluster, 
the Spark tasks will run in RACK_LOCAL mode. So no need to provide the 
preferred locations to the task.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (CARBONDATA-2690) basic framework: use java to rewrite master and work

2018-07-05 Thread QiangCai (JIRA)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2690?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai updated CARBONDATA-2690:
-
Summary: basic framework: use java to rewrite master and work  (was: Basic 
framework: use java to rewrite master and work)

> basic framework: use java to rewrite master and work
> 
>
> Key: CARBONDATA-2690
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2690
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: QiangCai
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2692) add filter expression parser

2018-07-05 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2692:


 Summary: add filter expression parser
 Key: CARBONDATA-2692
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2692
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2691) add RESTful API

2018-07-05 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2691:


 Summary: add RESTful API
 Key: CARBONDATA-2691
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2691
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2690) Basic framework: use java to rewrite master and work

2018-07-05 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2690:


 Summary: Basic framework: use java to rewrite master and work
 Key: CARBONDATA-2690
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2690
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2524) Support create carbonReader with default projection

2018-05-28 Thread QiangCai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2524.
--
   Resolution: Fixed
Fix Version/s: 1.4.0

> Support create carbonReader with default projection
> ---
>
> Key: CARBONDATA-2524
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2524
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
> Fix For: 1.4.0
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> Support create carbonReader with default projection



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2499) Validate the visible/invisible status of datamap

2018-05-28 Thread QiangCai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2499.
--
Resolution: Fixed

> Validate the visible/invisible status of datamap
> 
>
> Key: CARBONDATA-2499
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2499
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: xubo245
>Assignee: xubo245
>Priority: Major
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> This jira https://issues.apache.org/jira/browse/CARBONDATA-2380 not check the 
> visible/invisible status of datamap.  There will not thro exception when the 
> related code changed and affect visible/invisible status of datamap.  So we 
> should add some test case for this function.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (CARBONDATA-2416) Index DataMap should support immediate load and deferred load when creating the DataMap

2018-05-09 Thread QiangCai (JIRA)

 [ 
https://issues.apache.org/jira/browse/CARBONDATA-2416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

QiangCai resolved CARBONDATA-2416.
--
Resolution: Fixed

> Index DataMap should support immediate load and deferred load when creating 
> the DataMap
> ---
>
> Key: CARBONDATA-2416
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2416
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jacky Li
>Priority: Major
> Fix For: 1.4.0
>
>  Time Spent: 10h
>  Remaining Estimate: 0h
>
> For 'preaggregate' and 'timeseries' datamap, carbon is loading the datamap as 
> soon as user creates it. But for 'lucene' and 'bloomfilter' it is not. This 
> behavior should be aligned, otherwise user will be confused. 
> A better option is that when creating datamap, let user to choose whether 
> load the datamap immediately or deferred (manually refresh later)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2403) Should re-factory the implement of data map refresh.

2018-04-26 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2403:


 Summary: Should re-factory the implement of data map refresh.
 Key: CARBONDATA-2403
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2403
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2390) Refresh Lucene data map for the exists table with data

2018-04-23 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2390:


 Summary: Refresh Lucene data map for the exists table with data
 Key: CARBONDATA-2390
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2390
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Reporter: QiangCai


if the table has old data before the creation of the Lucene data map, we should 
use Refresh command to build data map incrementally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2363) Should add CarbonStreamingQueryListener to SparkSession

2018-04-19 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2363:


 Summary: Should add CarbonStreamingQueryListener to SparkSession
 Key: CARBONDATA-2363
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2363
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2326) Statistics feature throw NPE when spark.sql.execution.id is null

2018-04-09 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2326:


 Summary: Statistics feature throw NPE  when spark.sql.execution.id 
is null
 Key: CARBONDATA-2326
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2326
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2319) carbon_scan_time and carbon_IO_time are incorrect in task statistics

2018-04-07 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2319:


 Summary: carbon_scan_time and carbon_IO_time are incorrect in task 
statistics
 Key: CARBONDATA-2319
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2319
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai


carbon_scan_time, carbon_IO_time are incorrect.

 
|query_id|task_id|start_time|total_time|load_blocks_time|load_dictionary_time|carbon_scan_time|carbon_IO_time|scan_blocks_num|total_blocklets|valid_blocklets|total_pages|scanned_pages|valid_pages|result_size|
|5385749464281|0|2018-04-08 10:52:09.013|47ms|0ms|0ms|-1ms|-1ms|1|1|1|1|0|1|10|



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (CARBONDATA-2311) Should avoid to append data to "streaming finish" streaming

2018-04-03 Thread QiangCai (JIRA)
QiangCai created CARBONDATA-2311:


 Summary: Should avoid to append data to "streaming finish" 
streaming
 Key: CARBONDATA-2311
 URL: https://issues.apache.org/jira/browse/CARBONDATA-2311
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >