[GitHub] [carbondata] Indhumathi27 commented on pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.merge.index.i

2020-06-18 Thread GitBox


Indhumathi27 commented on pull request #3776:
URL: https://github.com/apache/carbondata/pull/3776#issuecomment-64696


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command

2020-06-18 Thread GitBox


Indhumathi27 commented on pull request #3792:
URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646429823


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] Store Size Optimization

2020-06-18 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-646116733


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1447/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] Store Size Optimization

2020-06-18 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-646113706


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3172/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3793: [CARBONDATA-3858] Increase the parallelism of CDC deltafiles processing

2020-06-18 Thread GitBox


CarbonDataQA1 commented on pull request #3793:
URL: https://github.com/apache/carbondata/pull/3793#issuecomment-646097526


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3171/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3793: [CARBONDATA-3858] Increase the parallelism of CDC deltafiles processing

2020-06-18 Thread GitBox


CarbonDataQA1 commented on pull request #3793:
URL: https://github.com/apache/carbondata/pull/3793#issuecomment-646096200


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1445/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command

2020-06-18 Thread GitBox


CarbonDataQA1 commented on pull request #3792:
URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646020176


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1444/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command

2020-06-18 Thread GitBox


CarbonDataQA1 commented on pull request #3792:
URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646019412


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3169/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] Store Size Optimization

2020-06-18 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-646009456


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1446/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure opened a new pull request #3793: [CARBONDATA-3858] Increase the parallelism of CDC deltafiles processing

2020-06-18 Thread GitBox


marchpure opened a new pull request #3793:
URL: https://github.com/apache/carbondata/pull/3793


   ### Why is this PR needed?
In the CDC flow. the parallelism of processing deltafiles is the same as 
executor number. The insufficient parallelism limits CDC's performance.

### What changes were proposed in this PR?
Set the parallelism of processing deltafiles as same as the configured 
value of 'spark.sql.suffle.partitions'.
Specially, it won't increase the file count of deltafiles because of the 
deltafiles combination.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] Store Size Optimization

2020-06-18 Thread GitBox


CarbonDataQA1 commented on pull request #3789:
URL: https://github.com/apache/carbondata/pull/3789#issuecomment-646006463


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3170/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3858) Increase the parallelism of CDC deltafiles processing

2020-06-18 Thread Xingjun Hao (Jira)
Xingjun Hao created CARBONDATA-3858:
---

 Summary: Increase the parallelism of CDC deltafiles processing
 Key: CARBONDATA-3858
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3858
 Project: CarbonData
  Issue Type: Improvement
Reporter: Xingjun Hao


In the CDC flow. the parallelism of deltafiles processing is the same as 
executor number, which reduce the parallelism heavily. The insufficient 
parallelism limits CPU overhead, hampers CDC's performance.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command

2020-06-18 Thread GitBox


Indhumathi27 commented on a change in pull request #3792:
URL: https://github.com/apache/carbondata/pull/3792#discussion_r442174299



##
File path: docs/segment-management-on-carbondata.md
##
@@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy 
transaction managem
   SHOW SEGMENTS ON CarbonDatabase.CarbonTable
   ```
 
+  Show lastest 10 visible segments

Review comment:
   yes. i think it is better to display latest segments based on timestamp 
on query with limit





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command

2020-06-18 Thread GitBox


Indhumathi27 commented on a change in pull request #3792:
URL: https://github.com/apache/carbondata/pull/3792#discussion_r442174299



##
File path: docs/segment-management-on-carbondata.md
##
@@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy 
transaction managem
   SHOW SEGMENTS ON CarbonDatabase.CarbonTable
   ```
 
+  Show lastest 10 visible segments

Review comment:
   yes. i think it is better to display latest segments based on timestamp 
on limit





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command

2020-06-18 Thread GitBox


marchpure commented on a change in pull request #3792:
URL: https://github.com/apache/carbondata/pull/3792#discussion_r442154411



##
File path: docs/segment-management-on-carbondata.md
##
@@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy 
transaction managem
   SHOW SEGMENTS ON CarbonDatabase.CarbonTable
   ```
 
+  Show lastest 10 visible segments

Review comment:
   Yeah. currently, the showed segments are ordered by loadname desc. in 
the type of´╝Ü
   +---+-+---+
   |ID |Status   |Load Start Time|
   +---+-+---+
   |5  |Compacted|2020-06-18 04:20:09.041|
   |4.1|Success  |2020-06-18 04:20:09.041|
   |4  |Compacted|2020-06-18 04:20:08.69 |
   |3  |Compacted|2020-06-18 04:20:07.622|
   |2.1|Compacted|2020-06-18 04:20:07.622|
   |2  |Compacted|2020-06-18 04:20:07.226|





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command

2020-06-18 Thread GitBox


marchpure commented on a change in pull request #3792:
URL: https://github.com/apache/carbondata/pull/3792#discussion_r442151742



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segment/ShowSegmentTestCase.scala
##
@@ -191,7 +214,11 @@ class ShowSegmentTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists a")
 sql("create table a(a string) stored as carbondata")
 sql("insert into a select 'k'")
+sql("insert into a select 'j'")
+sql("insert into a select 'k'")
 val rows = sql("show segments for table a").collect()
+assert(sql(s"show segments for table a").collect().length == 3)

Review comment:
   modified

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segmentreading/TestSegmentReading.scala
##
@@ -251,13 +251,16 @@ class TestSegmentReading extends QueryTest with 
BeforeAndAfterAll {
 s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE 
carbon_table_show_seg OPTIONS
 |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
   val df = sql("SHOW SEGMENTS for table carbon_table_show_seg as select * 
from carbon_table_show_seg_segments")
+  sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from 
carbon_table_show_seg_segments").show()

Review comment:
   modified





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command

2020-06-18 Thread GitBox


Indhumathi27 commented on a change in pull request #3792:
URL: https://github.com/apache/carbondata/pull/3792#discussion_r442128902



##
File path: docs/segment-management-on-carbondata.md
##
@@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy 
transaction managem
   SHOW SEGMENTS ON CarbonDatabase.CarbonTable
   ```
 
+  Show lastest 10 visible segments

Review comment:
   After compaction also, it should show latest segments right? For 
example, there are 6 segments, after major compaction, show segments with limit 
2 will display 4.1 and 5th segment only, whereas latest segments are 0.2 and 
0.3. I think we should get latest segments based on timestamp. 

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segment/ShowSegmentTestCase.scala
##
@@ -191,7 +214,11 @@ class ShowSegmentTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql("drop table if exists a")
 sql("create table a(a string) stored as carbondata")
 sql("insert into a select 'k'")
+sql("insert into a select 'j'")
+sql("insert into a select 'k'")
 val rows = sql("show segments for table a").collect()
+assert(sql(s"show segments for table a").collect().length == 3)

Review comment:
   ```suggestion
   assert(rows.length == 3)
   ```

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segmentreading/TestSegmentReading.scala
##
@@ -251,13 +251,16 @@ class TestSegmentReading extends QueryTest with 
BeforeAndAfterAll {
 s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE 
carbon_table_show_seg OPTIONS
 |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin)
   val df = sql("SHOW SEGMENTS for table carbon_table_show_seg as select * 
from carbon_table_show_seg_segments")
+  sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from 
carbon_table_show_seg_segments").show()

Review comment:
   Can remove this line





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-3857) Implement delete and update feature in carbondata SDK.

2020-06-18 Thread Karanpreet Singh (Jira)
Karanpreet Singh created CARBONDATA-3857:


 Summary: Implement delete and update feature in carbondata SDK.
 Key: CARBONDATA-3857
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3857
 Project: CarbonData
  Issue Type: New Feature
Reporter: Karanpreet Singh
 Attachments: Implement delete and update feature in carbondata SDK.pdf

Please find the design document attached.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)