[GitHub] [carbondata] Indhumathi27 commented on pull request #3776: [CARBONDATA-3834]Segment directory and the segment file in metadata are not created for partitioned table when 'carbon.merge.index.i
Indhumathi27 commented on pull request #3776: URL: https://github.com/apache/carbondata/pull/3776#issuecomment-64696 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command
Indhumathi27 commented on pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646429823 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-646116733 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1447/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-646113706 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3172/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3793: [CARBONDATA-3858] Increase the parallelism of CDC deltafiles processing
CarbonDataQA1 commented on pull request #3793: URL: https://github.com/apache/carbondata/pull/3793#issuecomment-646097526 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3171/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3793: [CARBONDATA-3858] Increase the parallelism of CDC deltafiles processing
CarbonDataQA1 commented on pull request #3793: URL: https://github.com/apache/carbondata/pull/3793#issuecomment-646096200 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1445/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command
CarbonDataQA1 commented on pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646020176 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1444/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command
CarbonDataQA1 commented on pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#issuecomment-646019412 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3169/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-646009456 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/1446/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure opened a new pull request #3793: [CARBONDATA-3858] Increase the parallelism of CDC deltafiles processing
marchpure opened a new pull request #3793: URL: https://github.com/apache/carbondata/pull/3793 ### Why is this PR needed? In the CDC flow. the parallelism of processing deltafiles is the same as executor number. The insufficient parallelism limits CDC's performance. ### What changes were proposed in this PR? Set the parallelism of processing deltafiles as same as the configured value of 'spark.sql.suffle.partitions'. Specially, it won't increase the file count of deltafiles because of the deltafiles combination. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA1 commented on pull request #3789: [WIP] Store Size Optimization
CarbonDataQA1 commented on pull request #3789: URL: https://github.com/apache/carbondata/pull/3789#issuecomment-646006463 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/3170/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3858) Increase the parallelism of CDC deltafiles processing
Xingjun Hao created CARBONDATA-3858: --- Summary: Increase the parallelism of CDC deltafiles processing Key: CARBONDATA-3858 URL: https://issues.apache.org/jira/browse/CARBONDATA-3858 Project: CarbonData Issue Type: Improvement Reporter: Xingjun Hao In the CDC flow. the parallelism of deltafiles processing is the same as executor number, which reduce the parallelism heavily. The insufficient parallelism limits CPU overhead, hampers CDC's performance. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command
Indhumathi27 commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442174299 ## File path: docs/segment-management-on-carbondata.md ## @@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy transaction managem SHOW SEGMENTS ON CarbonDatabase.CarbonTable ``` + Show lastest 10 visible segments Review comment: yes. i think it is better to display latest segments based on timestamp on query with limit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command
Indhumathi27 commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442174299 ## File path: docs/segment-management-on-carbondata.md ## @@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy transaction managem SHOW SEGMENTS ON CarbonDatabase.CarbonTable ``` + Show lastest 10 visible segments Review comment: yes. i think it is better to display latest segments based on timestamp on limit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command
marchpure commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442154411 ## File path: docs/segment-management-on-carbondata.md ## @@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy transaction managem SHOW SEGMENTS ON CarbonDatabase.CarbonTable ``` + Show lastest 10 visible segments Review comment: Yeah. currently, the showed segments are ordered by loadname desc. in the type of: +---+-+---+ |ID |Status |Load Start Time| +---+-+---+ |5 |Compacted|2020-06-18 04:20:09.041| |4.1|Success |2020-06-18 04:20:09.041| |4 |Compacted|2020-06-18 04:20:08.69 | |3 |Compacted|2020-06-18 04:20:07.622| |2.1|Compacted|2020-06-18 04:20:07.622| |2 |Compacted|2020-06-18 04:20:07.226| This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command
marchpure commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442151742 ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segment/ShowSegmentTestCase.scala ## @@ -191,7 +214,11 @@ class ShowSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table if exists a") sql("create table a(a string) stored as carbondata") sql("insert into a select 'k'") +sql("insert into a select 'j'") +sql("insert into a select 'k'") val rows = sql("show segments for table a").collect() +assert(sql(s"show segments for table a").collect().length == 3) Review comment: modified ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segmentreading/TestSegmentReading.scala ## @@ -251,13 +251,16 @@ class TestSegmentReading extends QueryTest with BeforeAndAfterAll { s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE carbon_table_show_seg OPTIONS |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin) val df = sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from carbon_table_show_seg_segments") + sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from carbon_table_show_seg_segments").show() Review comment: modified This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3792: [CARBONDATA-3856] Support the LIMIT operator for show segments command
Indhumathi27 commented on a change in pull request #3792: URL: https://github.com/apache/carbondata/pull/3792#discussion_r442128902 ## File path: docs/segment-management-on-carbondata.md ## @@ -54,6 +54,12 @@ concept which helps to maintain consistency of data and easy transaction managem SHOW SEGMENTS ON CarbonDatabase.CarbonTable ``` + Show lastest 10 visible segments Review comment: After compaction also, it should show latest segments right? For example, there are 6 segments, after major compaction, show segments with limit 2 will display 4.1 and 5th segment only, whereas latest segments are 0.2 and 0.3. I think we should get latest segments based on timestamp. ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segment/ShowSegmentTestCase.scala ## @@ -191,7 +214,11 @@ class ShowSegmentTestCase extends QueryTest with BeforeAndAfterAll { sql("drop table if exists a") sql("create table a(a string) stored as carbondata") sql("insert into a select 'k'") +sql("insert into a select 'j'") +sql("insert into a select 'k'") val rows = sql("show segments for table a").collect() +assert(sql(s"show segments for table a").collect().length == 3) Review comment: ```suggestion assert(rows.length == 3) ``` ## File path: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/segmentreading/TestSegmentReading.scala ## @@ -251,13 +251,16 @@ class TestSegmentReading extends QueryTest with BeforeAndAfterAll { s"""LOAD DATA local inpath '$resourcesPath/data.csv' INTO TABLE carbon_table_show_seg OPTIONS |('DELIMITER'= ',', 'QUOTECHAR'= '\"')""".stripMargin) val df = sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from carbon_table_show_seg_segments") + sql("SHOW SEGMENTS for table carbon_table_show_seg as select * from carbon_table_show_seg_segments").show() Review comment: Can remove this line This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-3857) Implement delete and update feature in carbondata SDK.
Karanpreet Singh created CARBONDATA-3857: Summary: Implement delete and update feature in carbondata SDK. Key: CARBONDATA-3857 URL: https://issues.apache.org/jira/browse/CARBONDATA-3857 Project: CarbonData Issue Type: New Feature Reporter: Karanpreet Singh Attachments: Implement delete and update feature in carbondata SDK.pdf Please find the design document attached. -- This message was sent by Atlassian Jira (v8.3.4#803005)