[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733503678 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3142/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733502985 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4896/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on a change in pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI
akashrn5 commented on a change in pull request #4015: URL: https://github.com/apache/carbondata/pull/4015#discussion_r530123834 ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithInsertOverwrite.scala ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.spark.testsuite.secondaryindex; + +import org.apache.spark.sql.Row +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterEach + +class TestSIWithInsertOverwrite extends QueryTest with BeforeAndAfterEach { + + override protected def beforeEach(): Unit = { +sql("drop table if exists maintable") +sql("create table maintable(name string, Id int, address string) stored as carbondata") +sql("drop index if exists maintable_si on maintable") +sql("CREATE INDEX maintable_si on table maintable (address) as 'carbondata'") + } + + test("test insert overwrite with SI") { +sql("insert into maintable select 'nihal',1,'nko'") +sql("insert into maintable select 'brinjal',2,'valid'") +checkAnswer(sql("select count(*) from maintable_si WHERE address='nko'"), Seq(Row(1))) +checkAnswer(sql("select address from maintable_si"), Seq(Row("nko"), Row("valid"))) Review comment: the above two check answer not required, as its basic test and covered in many test cases ## File path: index/secondary-index/src/test/scala/org/apache/carbondata/spark/testsuite/secondaryindex/TestSIWithInsertOverwrite.scala ## @@ -0,0 +1,79 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.spark.testsuite.secondaryindex; + +import org.apache.spark.sql.Row +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterEach + +class TestSIWithInsertOverwrite extends QueryTest with BeforeAndAfterEach { + + override protected def beforeEach(): Unit = { +sql("drop table if exists maintable") +sql("create table maintable(name string, Id int, address string) stored as carbondata") +sql("drop index if exists maintable_si on maintable") +sql("CREATE INDEX maintable_si on table maintable (address) as 'carbondata'") + } + + test("test insert overwrite with SI") { +sql("insert into maintable select 'nihal',1,'nko'") +sql("insert into maintable select 'brinjal',2,'valid'") +checkAnswer(sql("select count(*) from maintable_si WHERE address='nko'"), Seq(Row(1))) +checkAnswer(sql("select address from maintable_si"), Seq(Row("nko"), Row("valid"))) +sql("insert overwrite table maintable select 'nihal', 1, 'asdfa'") +checkAnswer(sql("select count(*) from maintable_si WHERE address='nko'"), Seq(Row(0))) +checkAnswer(sql("select address from maintable_si"), Seq(Row("asdfa"))) +checkAnswer(sql("select * from maintable"), Seq(Row("nihal", 1, "asdfa"))) + } + + test("test insert overwrite with CTAS and SI") { +sql("insert into maintable select 'nihal',1,'nko'") +sql("drop table if exists ctas_maintable") +sql("CREATE TABLE ctas_maintable " + + "STORED AS carbondata as select * from maintable") +checkAnswer(sql("select count(*) from ctas_maintable"), Seq(Row(1))) +assert(sql("show indexes on table ctas_maintable").collect().isEmpty) +sql("CREATE INDEX ctas_maintable_si on table ctas_maintable (address) as 'carbondata'") Review comment: line 48, 49 not
[GitHub] [carbondata] Zhangshunyu commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
Zhangshunyu commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-733497266 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Zhangshunyu closed pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
Zhangshunyu closed pull request #4020: URL: https://github.com/apache/carbondata/pull/4020 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] shenjiayu17 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
shenjiayu17 commented on a change in pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#discussion_r530130471 ## File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java ## @@ -26,4 +26,31 @@ private GeoConstants() { // GeoHash type Spatial Index public static final String GEOHASH = "geohash"; + + // Regular expression to parse input polygons for IN_POLYGON_LIST + // public static final String POLYGON_REG_EXPRESSION = "POLYGON \\(\\(.*?\\)\\)"; + public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON \\(\\()(.*?)(?=(\\)\\)))"; + + // Regular expression to parse input polylines for IN_POLYLINE_LIST + public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)"; + + // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST + public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST \\()(.*?)(?=\\))"; + + // delimiter of input points or ranges + public static final String DEFAULT_DELIMITER = ","; + + // conversion factor of angle to radian + public static final double CONVERT_FACTOR = 180.0; + // Earth radius + public static final double EARTH_RADIUS = 6371004.0; Review comment: Done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4029) After delete in the table which has Alter-added SDK segments, then the count(*) is 0.
[ https://issues.apache.org/jira/browse/CARBONDATA-4029?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-4029. -- Fix Version/s: 2.2.0 Resolution: Fixed > After delete in the table which has Alter-added SDK segments, then the > count(*) is 0. > - > > Key: CARBONDATA-4029 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4029 > Project: CarbonData > Issue Type: Bug >Affects Versions: 2.0.0 > Environment: 3 node FI cluster >Reporter: Prasanna Ravichandran >Priority: Minor > Fix For: 2.2.0 > > Attachments: Primitive.rar > > Time Spent: 5h 10m > Remaining Estimate: 0h > > Do delete on a table which has alter added SDK segments. then the count* is > 0. Even count* will be 0 even any number of SDK segments are added after it. > Test queries: > drop table if exists external_primitive; > create table external_primitive (id int, name string, rank smallint, salary > double, active boolean, dob date, doj timestamp, city string, dept string) > stored as carbondata; > --before executing the below alter add segment-place the attached SDK files > in hdfs at /sdkfiles/primitive2 folder; > alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select > * from external_primitive; > delete from external_primitive where id =2;select * from external_primitive; > Console output: > /> drop table if exists external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.586 seconds) > /> create table external_primitive (id int, name string, rank smallint, > salary double, active boolean, dob date, doj timestamp, city string, dept > string) stored as carbondata; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.774 seconds) > /> alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive2','format'='carbon');select > * from external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (1.077 seconds) > INFO : Execution ID: 320 > +-+---+---+--+-+-++++ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+--+-+-++++ > | 1 | AAA | 3 | 3444345.66 | true | 1979-12-09 | 2011-02-10 01:00:20.0 | Pune > | IT | > | 2 | BBB | 2 | 543124.66 | false | 1987-02-19 | 2017-01-01 12:00:20.0 | > Bangalore | DATA | > | 3 | CCC | 1 | 787878.888 | false | 1982-05-12 | 2015-12-01 02:20:20.0 | > Pune | DATA | > | 4 | DDD | 1 | 9.24 | true | 1981-04-09 | 2000-01-15 07:00:20.0 | Delhi > | MAINS | > | 5 | EEE | 3 | 545656.99 | true | 1987-12-09 | 2017-11-25 04:00:20.0 | Delhi > | IT | > | 6 | FFF | 2 | 768678.0 | false | 1987-12-20 | 2017-01-10 05:00:20.0 | > Bangalore | DATA | > | 7 | GGG | 3 | 765665.0 | true | 1983-06-12 | 2017-01-01 02:00:20.0 | Pune | > IT | > | 8 | HHH | 2 | 567567.66 | false | 1979-01-12 | 1995-01-01 12:00:20.0 | > Bangalore | DATA | > | 9 | III | 2 | 787878.767 | true | 1985-02-19 | 2005-08-15 01:00:20.0 | Pune > | DATA | > | 10 | JJJ | 3 | 887877.14 | true | 2000-05-19 | 2016-10-10 12:00:20.0 | > Bangalore | MAINS | > | 18 | | 3 | 7.86786786787E9 | true | 1980-10-05 | 1995-10-07 22:00:20.0 | > Bangalore | IT | > | 19 | | 2 | 5464545.33 | true | 1986-06-06 | 2008-08-15 01:00:20.0 | Delhi | > DATA | > | 20 | NULL | 3 | 7867867.34 | true | 2000-05-01 | 2014-01-18 12:00:20.0 | > Bangalore | MAINS | > +-+---+---+--+-+-++++ > 13 rows selected (2.458 seconds) > /> delete from external_primitive where id =2;select * from > external_primitive; > INFO : Execution ID: 322 > ++ > | Deleted Row Count | > ++ > | 1 | > ++ > 1 row selected (3.723 seconds) > +-+---+---+-+-+--+--+---+---+ > | id | name | rank | salary | active | dob | doj | city | dept | > +-+---+---+-+-+--+--+---+---+ > +-+---+---+-+-+--+--+---+---+ > No rows selected (1.531 seconds) > /> alter table external_primitive add segment > options('path'='hdfs://hacluster/sdkfiles/primitive3','format'='carbon');select > * from external_primitive; > +-+ > | Result | > +-+ > +-+ > No rows selected (0.766 seconds) > +-+---+---+-+-+--+--+---+---+ > | id | name | rank | salary | active | dob | doj | city | dept
[GitHub] [carbondata] asfgit closed pull request #4024: [CARBONDATA-4029] Fix oldTimeStamp issue in alter table add segment query.
asfgit closed pull request #4024: URL: https://github.com/apache/carbondata/pull/4024 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4024: [CARBONDATA-4029] Fix oldTimeStamp issue in alter table add segment query.
ajantha-bhat commented on pull request #4024: URL: https://github.com/apache/carbondata/pull/4024#issuecomment-733466550 Already reviewed in #4009 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: (was: CarbonData Spatial Index Design Doc v2.docx) > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex > Cloud Carbon Spatial Index Specification.docx > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: CarbonData Spatial Index Design Doc v2.docx > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex > Cloud Carbon Spatial Index Specification.docx > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4024: [CARBONDATA-4029] Fix oldTimeStamp issue in alter table add segment query.
CarbonDataQA2 commented on pull request #4024: URL: https://github.com/apache/carbondata/pull/4024#issuecomment-733459907 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4894/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4024: [CARBONDATA-4029] Fix oldTimeStamp issue in alter table add segment query.
CarbonDataQA2 commented on pull request #4024: URL: https://github.com/apache/carbondata/pull/4024#issuecomment-733459252 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3140/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-733449700 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4895/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
CarbonDataQA2 commented on pull request #4025: URL: https://github.com/apache/carbondata/pull/4025#issuecomment-733449135 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3141/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4021: [CARBONDATA-4057] Support Complex DataType when Save DataFrame with MODE.OVERWRITE
CarbonDataQA2 commented on pull request #4021: URL: https://github.com/apache/carbondata/pull/4021#issuecomment-733447024 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3139/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4021: [CARBONDATA-4057] Support Complex DataType when Save DataFrame with MODE.OVERWRITE
CarbonDataQA2 commented on pull request #4021: URL: https://github.com/apache/carbondata/pull/4021#issuecomment-733446467 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4893/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure opened a new pull request #4025: [WIP] Make TableStatus/UpdateTableStatus/SegmentFile Smaller
marchpure opened a new pull request #4025: URL: https://github.com/apache/carbondata/pull/4025 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733433233 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3138/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement
CarbonDataQA2 commented on pull request #4012: URL: https://github.com/apache/carbondata/pull/4012#issuecomment-733432993 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4892/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4021: [CARBONDATA-4057] Support Complex DataType when Save DataFrame with MODE.OVERWRITE
ajantha-bhat commented on a change in pull request #4021: URL: https://github.com/apache/carbondata/pull/4021#discussion_r530077716 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDataFrameWriter.scala ## @@ -74,6 +74,9 @@ class CarbonDataFrameWriter(sqlContext: SQLContext, val dataFrame: DataFrame) { case decimal: DecimalType => s"decimal(${decimal.precision}, ${decimal.scale})" case BooleanType => CarbonType.BOOLEAN.getName case BinaryType => CarbonType.BINARY.getName + case ArrayType(elementType, _) => sparkType.simpleString + case StructType(fields) => sparkType.simpleString + case MapType(keyType, valueType, _) => sparkType.simpleString Review comment: Long string columns (CarbonType.VARCHAR) also seems to be missing, can you test and add it also if required? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4024: [CARBONDATA-4029] Fix oldTimeStamp issue in alter table add segment query.
ajantha-bhat commented on pull request #4024: URL: https://github.com/apache/carbondata/pull/4024#issuecomment-733429807 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4021: [CARBONDATA-4057] Support Complex DataType when Save DataFrame with MODE.OVERWRITE
CarbonDataQA2 commented on pull request #4021: URL: https://github.com/apache/carbondata/pull/4021#issuecomment-733149879 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3137/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4021: [CARBONDATA-4057] Support Complex DataType when Save DataFrame with MODE.OVERWRITE
CarbonDataQA2 commented on pull request #4021: URL: https://github.com/apache/carbondata/pull/4021#issuecomment-733149233 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4891/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4057) Support Complex DataType when Save DataFrame
Jiayu Shen created CARBONDATA-4057: -- Summary: Support Complex DataType when Save DataFrame Key: CARBONDATA-4057 URL: https://issues.apache.org/jira/browse/CARBONDATA-4057 Project: CarbonData Issue Type: New Feature Reporter: Jiayu Shen Currently,once trigger df.mode(overwrite).save, complex datatype isn't supported, which shall be optimized -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4024: [CARBONDATA-4029] Fix oldTimeStamp issue in alter table add segment query.
CarbonDataQA2 commented on pull request #4024: URL: https://github.com/apache/carbondata/pull/4024#issuecomment-733056113 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3136/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4024: [CARBONDATA-4029] Fix oldTimeStamp issue in alter table add segment query.
CarbonDataQA2 commented on pull request #4024: URL: https://github.com/apache/carbondata/pull/4024#issuecomment-733052535 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4890/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
CarbonDataQA2 commented on pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#issuecomment-733019583 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3135/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
CarbonDataQA2 commented on pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#issuecomment-733012325 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4889/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-732989642 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3134/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4023: [WIP]added testcase for custom compaction for SI table
CarbonDataQA2 commented on pull request #4023: URL: https://github.com/apache/carbondata/pull/4023#issuecomment-732987853 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3133/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-732986999 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4888/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4023: [WIP]added testcase for custom compaction for SI table
CarbonDataQA2 commented on pull request #4023: URL: https://github.com/apache/carbondata/pull/4023#issuecomment-732984227 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4887/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 closed pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement
Karan980 closed pull request #4009: URL: https://github.com/apache/carbondata/pull/4009 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 opened a new pull request #4024: [CARBONDATA-4029] Fix oldTimeStamp issue in alter table add segment query.
Karan980 opened a new pull request #4024: URL: https://github.com/apache/carbondata/pull/4024 ### Why is this PR needed? Earlier timestamp present in name of carbondata files was in nanoseconds. Currently the timestamp is in milliseconds. When old SDK file segment is added to table through alter table add segment query then it is treated as invalid block due to timestamp present in nanoseconds. ### What changes were proposed in this PR? Removed update validation for SDK written files. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
CarbonDataQA2 commented on pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#issuecomment-732973817 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4886/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat edited a comment on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement
ajantha-bhat edited a comment on pull request #4009: URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732971703 @Karan980 : I tried to merge, But when I checkout using `git fetch origin pull/4009/head:4009`, this 4009 is pointing some old PR. I guess some problem with this PR. Can you verify once? If same problem just raise a new PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement
ajantha-bhat commented on pull request #4009: URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732971703 @Karan980 : I tried to merge, But when I checkout using `git fetch origin pull/4009/head:4009`, this 4009 is pointing some old PR. I guess some problem with this PR. Can you verify once. If same problem just raise a new PR. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4053) Alter table rename column failed
[ https://issues.apache.org/jira/browse/CARBONDATA-4053?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ajantha Bhat resolved CARBONDATA-4053. -- Fix Version/s: 2.2.0 Resolution: Fixed > Alter table rename column failed > > > Key: CARBONDATA-4053 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4053 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Major > Fix For: 2.2.0 > > Attachments: 截图.PNG > > Time Spent: 2h 50m > Remaining Estimate: 0h > > Alter table rename column failed because incorrectly replace the content in > tblproperties by new column name, which the content is not related to column > name. > !截图.PNG! -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
asfgit closed pull request #4019: URL: https://github.com/apache/carbondata/pull/4019 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
CarbonDataQA2 commented on pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#issuecomment-732958986 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3131/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
CarbonDataQA2 commented on pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#issuecomment-732958502 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4885/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
nihal0107 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r52951 ## File path: integration/spark/src/test/scala/org/apache/carbondata/index/bloom/BloomCoarseGrainIndexFunctionSuite.scala ## @@ -660,6 +660,22 @@ class BloomCoarseGrainIndexFunctionSuite sql(s"SELECT * FROM $normalTable WHERE salary='1040'")) } + test("test drop index when more than one bloom index exists") { +sql(s"CREATE TABLE $bloomSampleTable " + + "(id int,name string,salary int)STORED as carbondata TBLPROPERTIES('SORT_COLUMNS'='id')") +sql(s"CREATE index index1 ON TABLE $bloomSampleTable(id) as 'bloomfilter' " + + "PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"CREATE index index2 ON TABLE $bloomSampleTable (name) as 'bloomfilter' " + + "PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"insert into $bloomSampleTable values(1,'nihal',20)") +sql(s"SHOW INDEXES ON TABLE $bloomSampleTable").collect() +checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, "index1") Review comment: done ## File path: integration/spark/src/test/scala/org/apache/carbondata/index/bloom/BloomCoarseGrainIndexFunctionSuite.scala ## @@ -660,6 +660,22 @@ class BloomCoarseGrainIndexFunctionSuite sql(s"SELECT * FROM $normalTable WHERE salary='1040'")) } + test("test drop index when more than one bloom index exists") { +sql(s"CREATE TABLE $bloomSampleTable " + + "(id int,name string,salary int)STORED as carbondata TBLPROPERTIES('SORT_COLUMNS'='id')") +sql(s"CREATE index index1 ON TABLE $bloomSampleTable(id) as 'bloomfilter' " + + "PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"CREATE index index2 ON TABLE $bloomSampleTable (name) as 'bloomfilter' " + + "PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 'BLOOM_COMPRESS'='true')") +sql(s"insert into $bloomSampleTable values(1,'nihal',20)") +sql(s"SHOW INDEXES ON TABLE $bloomSampleTable").collect() +checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, "index1") +checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, "index2") +sql(s"drop index index1 on $bloomSampleTable") +sql(s"show indexes on table $bloomSampleTable").show() Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists
nihal0107 commented on a change in pull request #4000: URL: https://github.com/apache/carbondata/pull/4000#discussion_r529518625 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala ## @@ -184,10 +184,10 @@ private[sql] case class DropIndexCommand( parentCarbonTable = getRefreshedParentTable(sparkSession, dbName) val indexMetadata = parentCarbonTable.getIndexMetadata if (null != indexMetadata && null != indexMetadata.getIndexesMap) { - val hasCgFgIndexes = -!(indexMetadata.getIndexesMap.size() == 1 && - indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName)) - if (hasCgFgIndexes) { + val hasCgFgIndexes = indexMetadata.getIndexesMap.size() != 0 && Review comment: added. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: CarbonData Spatial Index Design Doc v2.docx > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex > Cloud Carbon Spatial Index Specification.docx > > Time Spent: 2h > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] nihal0107 opened a new pull request #4023: [WIP]added testcase for custom compaction for SI table
nihal0107 opened a new pull request #4023: URL: https://github.com/apache/carbondata/pull/4023 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4021: [WIP] Support Complex DataType in DataFrame Save
CarbonDataQA2 commented on pull request #4021: URL: https://github.com/apache/carbondata/pull/4021#issuecomment-732920382 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3130/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4021: [WIP] Support Complex DataType in DataFrame Save
CarbonDataQA2 commented on pull request #4021: URL: https://github.com/apache/carbondata/pull/4021#issuecomment-732917707 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4884/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
CarbonDataQA2 commented on pull request #4022: URL: https://github.com/apache/carbondata/pull/4022#issuecomment-732902364 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3132/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] Karan980 opened a new pull request #4022: [CARBONDATA-4056] Added global sort for data files merge operation in SI segments.
Karan980 opened a new pull request #4022: URL: https://github.com/apache/carbondata/pull/4022 ### Why is this PR needed? Earlier global sort was not supported during data files merge operation of SI segments. So if some SI is created with global sort and value of carbon.si.segment.merge is true, it merges the data files in SI segments but disorder the globally sorted data. ### What changes were proposed in this PR? Added global sort for data files merge operation in SI segments. ### Does this PR introduce any user interface change? - No ### Is any new testcase added? - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4056) Adding global sort support for SI segments data files merge operation.
Karan created CARBONDATA-4056: - Summary: Adding global sort support for SI segments data files merge operation. Key: CARBONDATA-4056 URL: https://issues.apache.org/jira/browse/CARBONDATA-4056 Project: CarbonData Issue Type: New Feature Components: other Affects Versions: 2.0.0 Reporter: Karan Fix For: 2.0.1 Enabling carbon property (carbon.si.segment.merge) helps to reduce number of carbondata files in the SI segments. When SI is created with sort scope as global sort and this carbon property is enabled, then the data in SI segments must be globally sorted after data files are merged. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] jack86596 commented on a change in pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
jack86596 commented on a change in pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#discussion_r529470116 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala ## @@ -472,10 +483,10 @@ class AlterTableColumnRenameTestCase extends QueryTest with BeforeAndAfterAll { def createPartitionTableAndLoad(): Unit = { sql( "CREATE TABLE rename_partition (empno int, empname String, designation String," + -" doj Timestamp, workgroupcategory int, workgroupcategoryname String, deptno int," + -" deptname String," + -" projectjoindate Timestamp, projectenddate Timestamp,attendance int," + -" utilization int,salary int) PARTITIONED BY (projectcode int) STORED AS carbondata") + " doj Timestamp, workgroupcategory int, workgroupcategoryname String, deptno int," + Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
jack86596 commented on a change in pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#discussion_r529470015 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala ## @@ -98,7 +108,7 @@ class AlterTableColumnRenameTestCase extends QueryTest with BeforeAndAfterAll { // Bucket Column sql("DROP TABLE IF EXISTS rename_bucket") sql("CREATE TABLE rename_bucket (ID Int, date Timestamp, country String, name String)" + - " STORED AS carbondata TBLPROPERTIES ('BUCKET_NUMBER'='4', 'BUCKET_COLUMNS'='name')") +" STORED AS carbondata TBLPROPERTIES ('BUCKET_NUMBER'='4', 'BUCKET_COLUMNS'='name')") Review comment: Done. ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala ## @@ -81,7 +91,7 @@ class AlterTableColumnRenameTestCase extends QueryTest with BeforeAndAfterAll { // Non-Partition Column with Complex Datatype sql("DROP TABLE IF EXISTS rename_complextype") sql(s"create table rename_complextype(mapcol map," + - s" arraycol array) stored as carbondata") +s" arraycol array) stored as carbondata") Review comment: Done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
CarbonDataQA2 commented on pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#issuecomment-732839012 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4882/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
CarbonDataQA2 commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-732837957 Build Failed with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3127/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
CarbonDataQA2 commented on pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#issuecomment-732832812 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3128/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
CarbonDataQA2 commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-732822942 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4881/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on pull request #4021: [WIP] Support Complex DataType in DataFrame Save
marchpure commented on pull request #4021: URL: https://github.com/apache/carbondata/pull/4021#issuecomment-732811550 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement
ajantha-bhat commented on pull request #4009: URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732804667 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure commented on a change in pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
marchpure commented on a change in pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#discussion_r529413543 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala ## @@ -98,7 +108,7 @@ class AlterTableColumnRenameTestCase extends QueryTest with BeforeAndAfterAll { // Bucket Column sql("DROP TABLE IF EXISTS rename_bucket") sql("CREATE TABLE rename_bucket (ID Int, date Timestamp, country String, name String)" + - " STORED AS carbondata TBLPROPERTIES ('BUCKET_NUMBER'='4', 'BUCKET_COLUMNS'='name')") +" STORED AS carbondata TBLPROPERTIES ('BUCKET_NUMBER'='4', 'BUCKET_COLUMNS'='name')") Review comment: please revert this change of format ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala ## @@ -81,7 +91,7 @@ class AlterTableColumnRenameTestCase extends QueryTest with BeforeAndAfterAll { // Non-Partition Column with Complex Datatype sql("DROP TABLE IF EXISTS rename_complextype") sql(s"create table rename_complextype(mapcol map," + - s" arraycol array) stored as carbondata") +s" arraycol array) stored as carbondata") Review comment: please revert this change of format ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala ## @@ -472,10 +483,10 @@ class AlterTableColumnRenameTestCase extends QueryTest with BeforeAndAfterAll { def createPartitionTableAndLoad(): Unit = { sql( "CREATE TABLE rename_partition (empno int, empname String, designation String," + -" doj Timestamp, workgroupcategory int, workgroupcategoryname String, deptno int," + -" deptname String," + -" projectjoindate Timestamp, projectenddate Timestamp,attendance int," + -" utilization int,salary int) PARTITIONED BY (projectcode int) STORED AS carbondata") + " doj Timestamp, workgroupcategory int, workgroupcategoryname String, deptno int," + Review comment: please revert this change of format This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and entry to table status when there is no update
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-732803344 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4878/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [CARBONDATA-4055]Fix creation of empty segment directory and entry to table status when there is no update
CarbonDataQA2 commented on pull request #4018: URL: https://github.com/apache/carbondata/pull/4018#issuecomment-732802293 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3125/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] ajantha-bhat commented on pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
ajantha-bhat commented on pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#issuecomment-732801828 LGTM. Thanks for your contribution. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-732790935 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3124/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] marchpure opened a new pull request #4021: [WIP] Support Complex DataType in DataFrame Save
marchpure opened a new pull request #4021: URL: https://github.com/apache/carbondata/pull/4021 ### Why is this PR needed? ### What changes were proposed in this PR? ### Does this PR introduce any user interface change? - No - Yes. (please explain the change and update document) ### Is any new testcase added? - No - Yes This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4005: [CARBONDATA-3978] Trash Folder support in carbondata
CarbonDataQA2 commented on pull request #4005: URL: https://github.com/apache/carbondata/pull/4005#issuecomment-732789701 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4877/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark
CarbonDataQA2 commented on pull request #4004: URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732771764 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3123/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark
CarbonDataQA2 commented on pull request #4004: URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732771366 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4876/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
jack86596 commented on a change in pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#discussion_r529342790 ## File path: integration/spark/src/main/scala/org/apache/spark/util/AlterTableUtil.scala ## @@ -332,12 +332,26 @@ object AlterTableUtil { tableProperties: mutable.Map[String, String], oldColumnName: String, newColumnName: String): Unit = { +val columnProperties = Seq("NO_INVERTED_INDEX", + "INVERTED_INDEX", + "INDEX_COLUMNS", + "COLUMN_META_CACHE", + "DICTIONARY_INCLUDE", Review comment: Removed "DICTIONARY_INCLUDE" and "DICTIONARY_EXCLUDE". All properties which related to column name are now included in the list. "INDEX_COLUMNS" and "SPATIAL_INDEX" is the property only for bloom index and spatial index table, which these two kinds of table are blocked to rename the column, so it is ok for these two properties not to be included in the property list. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"
jack86596 commented on a change in pull request #4019: URL: https://github.com/apache/carbondata/pull/4019#discussion_r529334930 ## File path: integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala ## @@ -38,6 +38,15 @@ class AlterTableColumnRenameTestCase extends QueryTest with BeforeAndAfterAll { assert(null == carbonTable.getColumnByName("empname")) } + test("CARBONDATA-4053 test rename column when column name is a") { +sql("create table simple_table(a int) stored as carbondata") +sql("alter table simple_table change a a1 int") +val carbonTable = CarbonMetadata.getInstance().getCarbonTable("default", "simple_table") Review comment: Done, modify the testcase, now can cover the scenario you mentioned. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction
CarbonDataQA2 commented on pull request #4020: URL: https://github.com/apache/carbondata/pull/4020#issuecomment-732761517 Build Failed with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4879/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4055) Empty segment created and unnecessary entry to table status in update
Akash R Nilugal created CARBONDATA-4055: --- Summary: Empty segment created and unnecessary entry to table status in update Key: CARBONDATA-4055 URL: https://issues.apache.org/jira/browse/CARBONDATA-4055 Project: CarbonData Issue Type: Bug Reporter: Akash R Nilugal Assignee: Akash R Nilugal When the update command is executed and no data is updated, empty segment directories are created and an in progress stale entry added to table status, and even segment dirs are not cleaned during clean files. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement
CarbonDataQA2 commented on pull request #4009: URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732732764 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4875/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement
CarbonDataQA2 commented on pull request #4009: URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732731912 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3122/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org