[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory
[ https://issues.apache.org/jira/browse/CARBONDATA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4347: --- Description: We printed a stack when delete multiple segments, got !https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png! When delete multi segments,{_}deletePhysicalPartition{_} will be called each segment and _deleteEmptyPartitionFolders_ will be called with each carbonindex file. But location.getParent (is the partition directory) may be the same between two segments. So there is repetitive action. {code:java} for (Map.Entry> entry : locationMap.entrySet()) { if (partitionSpecs != null) { Path location = new Path(entry.getKey()); boolean exists = pathExistsInPartitionSpec(partitionSpecs, location); if (!exists) { FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString())); for (String carbonDataFile : entry.getValue()) { FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile)); } } CarbonFile path = FileFactory.getCarbonFile(location.getParent().toString()); deleteEmptyPartitionFolders(path); } } {code} Try to collect all partition directories which are related to deleted segments, delete empty directory after all the segments deleted. was: We printed a stack when delete multiple segments, got !https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png! When delete multi segments,{_}deletePhysicalPartition{_} will be called each segment and _deleteEmptyPartitionFolders_ will be called with each carbonindex file. But location.getParent (is the partition directory) may be the same between two segments. So there is repetitive action. {code:java} for (Map.Entry> entry : locationMap.entrySet()) { if (partitionSpecs != null) { Path location = new Path(entry.getKey()); boolean exists = pathExistsInPartitionSpec(partitionSpecs, location); if (!exists) { FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString())); for (String carbonDataFile : entry.getValue()) { FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile)); } } CarbonFile path = FileFactory.getCarbonFile(location.getParent().toString()); deleteEmptyPartitionFolders(path); } } {code} Try to collect all partition directories which is relative to deleted segments, delete them after segments deleted. > Improve performance when delete empty partition directory > - > > Key: CARBONDATA-4347 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4347 > Project: CarbonData > Issue Type: Improvement >Reporter: Jiayu Shen >Priority: Major > Fix For: 2.3.1 > > > We printed a stack when delete multiple segments, got > !https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png! > > When delete multi segments,{_}deletePhysicalPartition{_} will be called each > segment and > _deleteEmptyPartitionFolders_ will be called with each carbonindex file. But > location.getParent (is the partition directory) may be the same between two > segments. So there is repetitive action. > > {code:java} > for (Map.Entry> entry : locationMap.entrySet()) { > if (partitionSpecs != null) { > Path location = new Path(entry.getKey()); > boolean exists = pathExistsInPartitionSpec(partitionSpecs, location); > if (!exists) { > > FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString())); > for (String carbonDataFile : entry.getValue()) { > > FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile)); > } > } > CarbonFile path = > FileFactory.getCarbonFile(location.getParent().toString()); > deleteEmptyPartitionFolders(path); > } > } {code} > Try to collect all partition directories which are related to deleted > segments, delete empty directory after all the segments deleted. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory
[ https://issues.apache.org/jira/browse/CARBONDATA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4347: --- Description: We printed a stack when delete multiple segments, got !https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png! When delete multi segments,{_}deletePhysicalPartition{_} will be called each segment and _deleteEmptyPartitionFolders_ will be called with each carbonindex file. But location.getParent (is the partition directory) may be the same between two segments. So there is repetitive action. {code:java} for (Map.Entry> entry : locationMap.entrySet()) { if (partitionSpecs != null) { Path location = new Path(entry.getKey()); boolean exists = pathExistsInPartitionSpec(partitionSpecs, location); if (!exists) { FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString())); for (String carbonDataFile : entry.getValue()) { FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile)); } } CarbonFile path = FileFactory.getCarbonFile(location.getParent().toString()); deleteEmptyPartitionFolders(path); } } {code} Try to collect all partition directories which is relative to deleted segments, delete them after segments deleted. > Improve performance when delete empty partition directory > - > > Key: CARBONDATA-4347 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4347 > Project: CarbonData > Issue Type: Improvement >Reporter: Jiayu Shen >Priority: Major > Fix For: 2.3.1 > > > We printed a stack when delete multiple segments, got > !https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png! > > When delete multi segments,{_}deletePhysicalPartition{_} will be called each > segment and > _deleteEmptyPartitionFolders_ will be called with each carbonindex file. But > location.getParent (is the partition directory) may be the same between two > segments. So there is repetitive action. > > {code:java} > for (Map.Entry> entry : locationMap.entrySet()) { > if (partitionSpecs != null) { > Path location = new Path(entry.getKey()); > boolean exists = pathExistsInPartitionSpec(partitionSpecs, location); > if (!exists) { > > FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString())); > for (String carbonDataFile : entry.getValue()) { > > FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile)); > } > } > CarbonFile path = > FileFactory.getCarbonFile(location.getParent().toString()); > deleteEmptyPartitionFolders(path); > } > } {code} > Try to collect all partition directories which is relative to deleted > segments, delete them after segments deleted. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory
[ https://issues.apache.org/jira/browse/CARBONDATA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4347: --- Fix Version/s: 2.3.1 > Improve performance when delete empty partition directory > - > > Key: CARBONDATA-4347 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4347 > Project: CarbonData > Issue Type: Improvement >Reporter: Jiayu Shen >Priority: Major > Fix For: 2.3.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CARBONDATA-4347) Improve performance when delete empty partition directory
Jiayu Shen created CARBONDATA-4347: -- Summary: Improve performance when delete empty partition directory Key: CARBONDATA-4347 URL: https://issues.apache.org/jira/browse/CARBONDATA-4347 Project: CarbonData Issue Type: Improvement Reporter: Jiayu Shen -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (CARBONDATA-4251) optimize clean index file performance
Jiayu Shen created CARBONDATA-4251: -- Summary: optimize clean index file performance Key: CARBONDATA-4251 URL: https://issues.apache.org/jira/browse/CARBONDATA-4251 Project: CarbonData Issue Type: Improvement Components: core Affects Versions: 2.2.0 Reporter: Jiayu Shen Fix For: 2.2.1 When cleanfile cleans up data, it cleans up all the carbonindex and carbonmergeindex that once existed, even though many carbonindex have been all deleted, which have been merged into carbonergeindex. considering that there are tens of thousands of carbonindex that once existed after the completion of the compaction, the clean file command will take serveral hours. Here, we just need to clean up the existing files, carbonmergeindex or carbonindex files -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4176) Fail to achive aksk when create table on S3/OBS
[ https://issues.apache.org/jira/browse/CARBONDATA-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4176: --- Issue Type: Bug (was: New Feature) Priority: Minor (was: Major) > Fail to achive aksk when create table on S3/OBS > --- > > Key: CARBONDATA-4176 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4176 > Project: CarbonData > Issue Type: Bug >Reporter: Jiayu Shen >Priority: Minor > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4176) Fail to achive aksk when create table on S3/OBS
Jiayu Shen created CARBONDATA-4176: -- Summary: Fail to achive aksk when create table on S3/OBS Key: CARBONDATA-4176 URL: https://issues.apache.org/jira/browse/CARBONDATA-4176 Project: CarbonData Issue Type: New Feature Reporter: Jiayu Shen -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Description: The requirement is from SEQ,related algorithms are provided by Discovery Team. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolylineList (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support :* * *"OR", means calculating union of two polygons* * *"AND", means calculating intersection of two polygons* geo util UDFs : * _*GeoIdToGridXy(Long geoId) :* *Pair*_ * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ * _*GeoIdToLatLng(Long geoId) : Pair*_ * _*ToUpperLayerGeoId(Long geoId) : Long*_ * _*ToRangeList (String polygon) : List*_ 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO. For example, {code:java} INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; It uesed to be as below, '855280799612' is generated internally, ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |855280799612|157542840|116285807|40084087| ++-+-++ but now is ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |0 |157542840|116285807|40084087| ++-+-++{code} was: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolylineList (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support :* * *"OR", means calculating union of two polygons* * *"AND", means calculating intersection of two polygons* geo util UDFs : * _*GeoIdToGridXy(Long geoId) :* *Pair*_ * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ * _*GeoIdToLatLng(Long geoId) : Pair*_ * _*ToUpperLayerGeoId(Long geoId) : Long*_ * _*ToRangeList (String polygon) : List*_ 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO. For example, {code:java} INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; It uesed to be as below, '855280799612' is generated internally, ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |855280799612|157542840|116285807|40084087| ++-+-++ but now is ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |0 |157542840|116285807|40084087| ++-+-++{code} > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx > > Time Spent: 4h 20m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by Discovery Team. > 1. Replace geohash encoded algorithm,
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: (was: Genex Cloud Carbon Spatial Index Specification.docx) > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx > > Time Spent: 4h 20m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: (was: CarbonData Spatial Index Design Doc v2.docx) > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex > Cloud Carbon Spatial Index Specification.docx > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: CarbonData Spatial Index Design Doc v2.docx > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex > Cloud Carbon Spatial Index Specification.docx > > Time Spent: 2h 20m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4057) Support Complex DataType when Save DataFrame
Jiayu Shen created CARBONDATA-4057: -- Summary: Support Complex DataType when Save DataFrame Key: CARBONDATA-4057 URL: https://issues.apache.org/jira/browse/CARBONDATA-4057 Project: CarbonData Issue Type: New Feature Reporter: Jiayu Shen Currently,once trigger df.mode(overwrite).save, complex datatype isn't supported, which shall be optimized -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: CarbonData Spatial Index Design Doc v2.docx > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex > Cloud Carbon Spatial Index Specification.docx > > Time Spent: 2h > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Attachment: Genex Cloud Carbon Spatial Index Specification.docx > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Attachments: Genex Cloud Carbon Spatial Index > Specification.docx > > Time Spent: 1h 50m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support :* > * *"OR", means calculating union of two polygons* > * *"AND", means calculating intersection of two polygons* > geo util UDFs : > * _*GeoIdToGridXy(Long geoId) :* *Pair*_ > * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ > * _*GeoIdToLatLng(Long geoId) : Pair*_ > * _*ToUpperLayerGeoId(Long geoId) : Long*_ > * _*ToRangeList (String polygon) : List*_ > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO. For > example, > {code:java} > INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; > It uesed to be as below, '855280799612' is generated internally, > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |855280799612|157542840|116285807|40084087| > ++-+-++ > but now is > ++-+-++ > |mygeohash |timevalue |longitude|latitude| > ++-+-++ > |0 |157542840|116285807|40084087| > ++-+-++{code} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Description: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolylineList (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support :* * *"OR", means calculating union of two polygons* * *"AND", means calculating intersection of two polygons* geo util UDFs : * _*GeoIdToGridXy(Long geoId) :* *Pair*_ * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_ * _*GeoIdToLatLng(Long geoId) : Pair*_ * _*ToUpperLayerGeoId(Long geoId) : Long*_ * _*ToRangeList (String polygon) : List*_ 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO. For example, {code:java} INSERT INTO geoTable SELECT 0,157542840,116285807,40084087; It uesed to be as below, '855280799612' is generated internally, ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |855280799612|157542840|116285807|40084087| ++-+-++ but now is ++-+-++ |mygeohash |timevalue |longitude|latitude| ++-+-++ |0 |157542840|116285807|40084087| ++-+-++{code} was: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolylineList (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support "OR", "AND"* geo util UDFs : * *GeoIdToGridXy(Long geoId) :* *Pair* * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* * *GeoIdToLatLng(Long geoId) : Pair* * *ToUpperLayerGeoId(Long geoId) : Long* * *ToRangeList (String polygon) : List* 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > Time Spent: 10m > Remaining Estimate: 0h > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Description: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolylineList (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support "OR", "AND"* geo util UDFs : * *GeoIdToGridXy(Long geoId) :* *Pair* * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* * *GeoIdToLatLng(Long geoId) : Pair* * *ToUpperLayerGeoId(Long geoId) : Long* * *ToRangeList (String polygon) : List* 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO was: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolyline (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support "OR", "AND"* geo util UDFs : * *GeoIdToGridXy(Long geoId) :* *Pair* * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* * *GeoIdToLatLng(Long geoId) : Pair* * *ToUpperLayerGeoId(Long geoId) : Long* * *ToRangeList (String polygon) : List* 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolylineList (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support "OR", "AND"* > geo util UDFs : > * *GeoIdToGridXy(Long geoId) :* *Pair* > * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* > * *GeoIdToLatLng(Long geoId) : Pair* > * *ToUpperLayerGeoId(Long geoId) : Long* > * *ToRangeList (String polygon) : List* > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Description: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolyline (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support "OR", "AND"* geo util UDFs : * *GeoIdToGridXy(Long geoId) :* *Pair* * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* * *GeoIdToLatLng(Long geoId) : Pair* * *ToUpperLayerGeoId(Long geoId) : Long* * *ToRangeList (String polygon) : List* 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO was: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolyline (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support "OR", "AND"* geo util UDFs : * *GeoIdToGridXy(Long geoId) :* *Pair* * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* * *GeoIdToLatLng(Long geoId) : Pair* * *ToUpperLayerGeoId(Long geoId) : Long* * *ToRangeList (String polygon) : List* 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolyline (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support "OR", "AND"* > geo util UDFs : > * *GeoIdToGridXy(Long geoId) :* *Pair* > * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* > * *GeoIdToLatLng(Long geoId) : Pair* > * *ToUpperLayerGeoId(Long geoId) : Long* > * *ToRangeList (String polygon) : List* > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Description: The requirement is from SEQ,related algorithms are provided by group Discovery. 1. Replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2. Add geo query UDFs query filter UDFs : * _*InPolygonList (List polygonList, OperationType opType)*_ * _*InPolyline (List polylineList, Float bufferInMeter)*_ * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ *operation only support "OR", "AND"* geo util UDFs : * *GeoIdToGridXy(Long geoId) :* *Pair* * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* * *GeoIdToLatLng(Long geoId) : Pair* * *ToUpperLayerGeoId(Long geoId) : Long* * *ToRangeList (String polygon) : List* 3. Currently GeoID is a column created internally for spatial tables, this PR will support GeoID column to be customized during LOAD/INSERT INTO was: This is requirement from SEQ,related algorithms is provided by group Discovery. 1 replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2 Add some geo UDFs query filter UDFs *InPolygonList (List polygonList, OperationType opType)* ***InPolyline (List polylineList, Float bufferInMeter)*** ***InPolygonRangeList (List RangeList)*** geo util UDFs ***GeoIdToGridXy(Long geoId) :* *Pair*** *LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long* ***GeoIdToLatLng(Long geoId) : Pair*** ***ToUpperLayerGeoId(Long geoId) : Long*** ***ToRangeList (String polygon) : List*** 3 Currently GeoID is a column created internally for spatial tables, this PR will support GeoID to be customized during LOAD/INSERT INTO > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > > The requirement is from SEQ,related algorithms are provided by group > Discovery. > 1. Replace geohash encoded algorithm, and reduce required properties of > CREATE TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2. Add geo query UDFs > query filter UDFs : > > > * _*InPolygonList (List polygonList, OperationType opType)*_ > * _*InPolyline (List polylineList, Float bufferInMeter)*_ > * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_ > *operation only support "OR", "AND"* > geo util UDFs : > * *GeoIdToGridXy(Long geoId) :* *Pair* > * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long* > * *GeoIdToLatLng(Long geoId) : Pair* > * *ToUpperLayerGeoId(Long geoId) : Long* > * *ToRangeList (String polygon) : List* > 3. Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID column to be customized during LOAD/INSERT INTO > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Description: This is requirement from SEQ,related algorithms is provided by group Discovery. 1 replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} 2 Add some geo UDFs query filter UDFs *InPolygonList (List polygonList, OperationType opType)* ***InPolyline (List polylineList, Float bufferInMeter)*** ***InPolygonRangeList (List RangeList)*** geo util UDFs ***GeoIdToGridXy(Long geoId) :* *Pair*** *LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long* ***GeoIdToLatLng(Long geoId) : Pair*** ***ToUpperLayerGeoId(Long geoId) : Long*** ***ToRangeList (String polygon) : List*** 3 Currently GeoID is a column created internally for spatial tables, this PR will support GeoID to be customized during LOAD/INSERT INTO was: This is requirement from SEQ,related algorithms is provided by group Discovery. 1 replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') add some geo spatial UDFs,{code} 2 Add some geo UDFs query filter UDFs *InPolygonList (List polygonList, OperationType opType)* ***InPolyline (List polylineList, Float bufferInMeter)*** ***InPolygonRangeList (List RangeList)*** geo util UDFs ***GeoIdToGridXy(Long geoId) :* *Pair*** *LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long* ***GeoIdToLatLng(Long geoId) : Pair*** ***ToUpperLayerGeoId(Long geoId) : Long*** ***ToRangeList (String polygon) : List*** 3 Currently GeoID is a column created internally for spatial tables, this PR will support GeoID to be customized during LOAD/INSERT INTO > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > > This is requirement from SEQ,related algorithms is provided by group > Discovery. > 1 replace geohash encoded algorithm, and reduce required properties of CREATE > TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code} > 2 Add some geo UDFs > query filter UDFs > *InPolygonList (List polygonList, OperationType opType)* > ***InPolyline (List polylineList, Float bufferInMeter)*** > ***InPolygonRangeList (List RangeList)*** > geo util UDFs > ***GeoIdToGridXy(Long geoId) :* *Pair*** > *LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long* > ***GeoIdToLatLng(Long geoId) : Pair*** > ***ToUpperLayerGeoId(Long geoId) : Long*** > ***ToRangeList (String polygon) : List*** > 3 Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID to be customized during LOAD/INSERT INTO > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
[ https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4051: --- Description: This is requirement from SEQ,related algorithms is provided by group Discovery. 1 replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') add some geo spatial UDFs,{code} 2 Add some geo UDFs query filter UDFs *InPolygonList (List polygonList, OperationType opType)* ***InPolyline (List polylineList, Float bufferInMeter)*** ***InPolygonRangeList (List RangeList)*** geo util UDFs ***GeoIdToGridXy(Long geoId) :* *Pair*** *LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long* ***GeoIdToLatLng(Long geoId) : Pair*** ***ToUpperLayerGeoId(Long geoId) : Long*** ***ToRangeList (String polygon) : List*** 3 Currently GeoID is a column created internally for spatial tables, this PR will support GeoID to be customized during LOAD/INSERT INTO was: This is requirement from SEQ,related algorithms is provided by group Discovery. # replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') {code} # add some geo spatial UDFs, {code:java} // query filter UDFs {code} > Geo spatial index algorithm improvement and UDFs enhancement > > > Key: CARBONDATA-4051 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 > Project: CarbonData > Issue Type: New Feature >Reporter: Jiayu Shen >Priority: Minor > > This is requirement from SEQ,related algorithms is provided by group > Discovery. > 1 replace geohash encoded algorithm, and reduce required properties of CREATE > TABLE. For example, > {code:java} > CREATE TABLE geoTable( > timevalue BIGINT, > longitude LONG, > latitude LONG) COMMENT "This is a GeoTable" > STORED AS carbondata > TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', > 'SPATIAL_INDEX.mygeohash.type'='geohash', > 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', > 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', > 'SPATIAL_INDEX.mygeohash.gridSize'='50', > 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') add some geo spatial > UDFs,{code} > 2 Add some geo UDFs > query filter UDFs > *InPolygonList (List polygonList, OperationType opType)* > ***InPolyline (List polylineList, Float bufferInMeter)*** > ***InPolygonRangeList (List RangeList)*** > geo util UDFs > ***GeoIdToGridXy(Long geoId) :* *Pair*** > *LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long* > ***GeoIdToLatLng(Long geoId) : Pair*** > ***ToUpperLayerGeoId(Long geoId) : Long*** > ***ToRangeList (String polygon) : List*** > 3 Currently GeoID is a column created internally for spatial tables, this PR > will support GeoID to be customized during LOAD/INSERT INTO > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement
Jiayu Shen created CARBONDATA-4051: -- Summary: Geo spatial index algorithm improvement and UDFs enhancement Key: CARBONDATA-4051 URL: https://issues.apache.org/jira/browse/CARBONDATA-4051 Project: CarbonData Issue Type: New Feature Reporter: Jiayu Shen This is requirement from SEQ,related algorithms is provided by group Discovery. # replace geohash encoded algorithm, and reduce required properties of CREATE TABLE. For example, {code:java} CREATE TABLE geoTable( timevalue BIGINT, longitude LONG, latitude LONG) COMMENT "This is a GeoTable" STORED AS carbondata TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash', 'SPATIAL_INDEX.mygeohash.type'='geohash', 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude', 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277', 'SPATIAL_INDEX.mygeohash.gridSize'='50', 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') {code} # add some geo spatial UDFs, {code:java} // query filter UDFs {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Resolved] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update
[ https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen resolved CARBONDATA-4034. Resolution: Resolved > Improve the time-consuming of Horizontal Compaction for update > -- > > Key: CARBONDATA-4034 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4034 > Project: CarbonData > Issue Type: Bug >Reporter: Jiayu Shen >Priority: Minor > Time Spent: 17h 10m > Remaining Estimate: 0h > > In the update flow, horizontal compaction will be significantly slower when > updating with a lot of segments(or a lot of blocks). There is a case whose > costing is as shown in the log. > {code:java} > 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation completed for > [ods_oms.oms_wh_outbound_order].{code} > In this PR, we optimize the process between second and third row of the log, > by optimizing the method _performDeleteDeltaCompaction_ in horizontal > compaction flow. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update
[ https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4034: --- Description: In the update flow, horizontal compaction will be significantly slower when updating with a lot of segments(or a lot of blocks). There is a case whose costing is as shown in the log. {code:java} 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation completed for [ods_oms.oms_wh_outbound_order]. 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation completed for [ods_oms.oms_wh_outbound_order].{code} In this PR, we optimize the process between second and third row of the log, by optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction flow. was: In the update flow, horizontal compaction will be significantly slower when updating with a lot of segments(or a lot of blocks). There is a case whose costing is as shown in the log. 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation completed for [ods_oms.oms_wh_outbound_order]. 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation completed for [ods_oms.oms_wh_outbound_order]. In this PR, we optimize the process between second and third row of the log, by optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction flow. > Improve the time-consuming of Horizontal Compaction for update > -- > > Key: CARBONDATA-4034 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4034 > Project: CarbonData > Issue Type: Bug >Reporter: Jiayu Shen >Priority: Minor > Time Spent: 17h 10m > Remaining Estimate: 0h > > In the update flow, horizontal compaction will be significantly slower when > updating with a lot of segments(or a lot of blocks). There is a case whose > costing is as shown in the log. > {code:java} > 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation completed for > [ods_oms.oms_wh_outbound_order].{code} > In this PR, we optimize the process between second and third row of the log, > by optimizing the method _performDeleteDeltaCompaction_ in horizontal > compaction flow. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update
[ https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4034: --- Summary: Improve the time-consuming of Horizontal Compaction for update (was: Improve the time-comsuming of Horizontal Compaction for update) > Improve the time-consuming of Horizontal Compaction for update > -- > > Key: CARBONDATA-4034 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4034 > Project: CarbonData > Issue Type: Bug >Reporter: Jiayu Shen >Priority: Minor > > In the update flow, horizontal compaction will be significantly slower when > updating with a lot of segments(or a lot of blocks). There is a case whose > costing is as shown in the log. > 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > In this PR, we optimize the process between second and third row of the log, > by optimizing the method _performDeleteDeltaCompaction_ in horizontal > compaction flow. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update
[ https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4034: --- Description: In the update flow, horizontal compaction will be significantly slower when updating with a lot of segments(or a lot of blocks). There is a case whose costing is as shown in the log. 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation completed for [ods_oms.oms_wh_outbound_order]. 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation completed for [ods_oms.oms_wh_outbound_order]. In this PR, we optimize the process between second and third row of the log, by optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction flow. was: In the update flow, horizontal compaction will be significantly slower when updating with a lot of segments(or a lot of blocks). There is a case whose costing is as shown in the log. 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation completed for [ods_oms.oms_wh_outbound_order]. 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation completed for [ods_oms.oms_wh_outbound_order]. In this PR, we optimize the process between second and third row of the log, by optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction flow. > Improve the time-comsuming of Horizontal Compaction for update > -- > > Key: CARBONDATA-4034 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4034 > Project: CarbonData > Issue Type: Bug >Reporter: Jiayu Shen >Priority: Minor > > In the update flow, horizontal compaction will be significantly slower when > updating with a lot of segments(or a lot of blocks). There is a case whose > costing is as shown in the log. > 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > In this PR, we optimize the process between second and third row of the log, > by optimizing the method _performDeleteDeltaCompaction_ in horizontal > compaction flow. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update
[ https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jiayu Shen updated CARBONDATA-4034: --- Description: In the update flow, horizontal compaction will be significantly slower when updating with a lot of segments(or a lot of blocks). There is a case whose costing is as shown in the log. 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation completed for [ods_oms.oms_wh_outbound_order]. 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation completed for [ods_oms.oms_wh_outbound_order]. In this PR, we optimize the process between second and third row of the log, by optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction flow. was: In the update flow, horizontal compaction will be significantly slower when updating with a lot of segments(or a lot of blocks). There is a case whose costing is as shown in the log. 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation completed for [ods_oms.oms_wh_outbound_order]. 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation completed for [ods_oms.oms_wh_outbound_order]. In this PR, we optimize the process between second and third row of the log, by optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction flow. > Improve the time-comsuming of Horizontal Compaction for update > -- > > Key: CARBONDATA-4034 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4034 > Project: CarbonData > Issue Type: Bug >Reporter: Jiayu Shen >Priority: Minor > > In the update flow, horizontal compaction will be significantly slower when > updating with a lot of segments(or a lot of blocks). > There is a case whose costing is as shown in the log. > 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Update Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation started for > [ods_oms.oms_wh_outbound_order] > 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | > Horizontal Delete Compaction operation completed for > [ods_oms.oms_wh_outbound_order]. > In this PR, we optimize the process between second and third row of the log, > by optimizing the method _performDeleteDeltaCompaction_ in horizontal > compaction flow. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update
Jiayu Shen created CARBONDATA-4034: -- Summary: Improve the time-comsuming of Horizontal Compaction for update Key: CARBONDATA-4034 URL: https://issues.apache.org/jira/browse/CARBONDATA-4034 Project: CarbonData Issue Type: Bug Reporter: Jiayu Shen In the update flow, horizontal compaction will be significantly slower when updating with a lot of segments(or a lot of blocks). There is a case whose costing is as shown in the log. 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | Horizontal Update Compaction operation completed for [ods_oms.oms_wh_outbound_order]. 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation started for [ods_oms.oms_wh_outbound_order] 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | Horizontal Delete Compaction operation completed for [ods_oms.oms_wh_outbound_order]. In this PR, we optimize the process between second and third row of the log, by optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction flow. -- This message was sent by Atlassian Jira (v8.3.4#803005)