from:"Jiayu Shen \(Jira\)"

[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory

2022-07-17 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4347:
---
Description: 
We printed a stack when delete multiple segments, got

!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png!

 

When delete multi segments，{_}deletePhysicalPartition{_} will be called each 
segment and 

_deleteEmptyPartitionFolders_ will be called with each carbonindex file. But 
location.getParent (is the partition directory) may be the same between two 
segments. So there is repetitive action.

 
{code:java}
for (Map.Entry> entry : locationMap.entrySet()) {
  if (partitionSpecs != null) {
Path location = new Path(entry.getKey());
boolean exists = pathExistsInPartitionSpec(partitionSpecs, location);
if (!exists) {
  
FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString()));
  for (String carbonDataFile : entry.getValue()) {

FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile));
  }
}
CarbonFile path = 
FileFactory.getCarbonFile(location.getParent().toString());
deleteEmptyPartitionFolders(path);
  }
} {code}
Try to collect all partition directories which are related to deleted segments, 
delete empty directory after all the segments deleted.

 

  was:
We printed a stack when delete multiple segments, got

!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png!

 

When delete multi segments，{_}deletePhysicalPartition{_} will be called each 
segment and 

_deleteEmptyPartitionFolders_ will be called with each carbonindex file. But 
location.getParent (is the partition directory) may be the same between two 
segments. So there is repetitive action.

 
{code:java}
for (Map.Entry> entry : locationMap.entrySet()) {
  if (partitionSpecs != null) {
Path location = new Path(entry.getKey());
boolean exists = pathExistsInPartitionSpec(partitionSpecs, location);
if (!exists) {
  
FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString()));
  for (String carbonDataFile : entry.getValue()) {

FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile));
  }
}
CarbonFile path = 
FileFactory.getCarbonFile(location.getParent().toString());
deleteEmptyPartitionFolders(path);
  }
} {code}
Try to collect all partition directories which is relative to deleted segments, 
delete them after segments deleted.

 


> Improve performance when delete empty partition directory
> -
>
> Key: CARBONDATA-4347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4347
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jiayu Shen
>Priority: Major
> Fix For: 2.3.1
>
>
> We printed a stack when delete multiple segments, got
> !https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png!
>  
> When delete multi segments，{_}deletePhysicalPartition{_} will be called each 
> segment and 
> _deleteEmptyPartitionFolders_ will be called with each carbonindex file. But 
> location.getParent (is the partition directory) may be the same between two 
> segments. So there is repetitive action.
>  
> {code:java}
> for (Map.Entry> entry : locationMap.entrySet()) {
>   if (partitionSpecs != null) {
> Path location = new Path(entry.getKey());
> boolean exists = pathExistsInPartitionSpec(partitionSpecs, location);
> if (!exists) {
>   
> FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString()));
>   for (String carbonDataFile : entry.getValue()) {
> 
> FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile));
>   }
> }
> CarbonFile path = 
> FileFactory.getCarbonFile(location.getParent().toString());
> deleteEmptyPartitionFolders(path);
>   }
> } {code}
> Try to collect all partition directories which are related to deleted 
> segments, delete empty directory after all the segments deleted.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory

2022-07-17 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4347:
---
Description: 
We printed a stack when delete multiple segments, got

!https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png!

 

When delete multi segments，{_}deletePhysicalPartition{_} will be called each 
segment and 

_deleteEmptyPartitionFolders_ will be called with each carbonindex file. But 
location.getParent (is the partition directory) may be the same between two 
segments. So there is repetitive action.

 
{code:java}
for (Map.Entry> entry : locationMap.entrySet()) {
  if (partitionSpecs != null) {
Path location = new Path(entry.getKey());
boolean exists = pathExistsInPartitionSpec(partitionSpecs, location);
if (!exists) {
  
FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString()));
  for (String carbonDataFile : entry.getValue()) {

FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile));
  }
}
CarbonFile path = 
FileFactory.getCarbonFile(location.getParent().toString());
deleteEmptyPartitionFolders(path);
  }
} {code}
Try to collect all partition directories which is relative to deleted segments, 
delete them after segments deleted.

 

> Improve performance when delete empty partition directory
> -
>
> Key: CARBONDATA-4347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4347
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jiayu Shen
>Priority: Major
> Fix For: 2.3.1
>
>
> We printed a stack when delete multiple segments, got
> !https://clouddevops.huawei.com/vision-file-storage/api/file/download/upload-v2/2022/6/6/s00494122/40313f6afbd24bdcbb2239bdaf054272/image.png!
>  
> When delete multi segments，{_}deletePhysicalPartition{_} will be called each 
> segment and 
> _deleteEmptyPartitionFolders_ will be called with each carbonindex file. But 
> location.getParent (is the partition directory) may be the same between two 
> segments. So there is repetitive action.
>  
> {code:java}
> for (Map.Entry> entry : locationMap.entrySet()) {
>   if (partitionSpecs != null) {
> Path location = new Path(entry.getKey());
> boolean exists = pathExistsInPartitionSpec(partitionSpecs, location);
> if (!exists) {
>   
> FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(location.toString()));
>   for (String carbonDataFile : entry.getValue()) {
> 
> FileFactory.deleteAllCarbonFilesOfDir(FileFactory.getCarbonFile(carbonDataFile));
>   }
> }
> CarbonFile path = 
> FileFactory.getCarbonFile(location.getParent().toString());
> deleteEmptyPartitionFolders(path);
>   }
> } {code}
> Try to collect all partition directories which is relative to deleted 
> segments, delete them after segments deleted.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory

2022-07-17 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4347:
---
Fix Version/s: 2.3.1

> Improve performance when delete empty partition directory
> -
>
> Key: CARBONDATA-4347
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4347
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jiayu Shen
>Priority: Major
> Fix For: 2.3.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CARBONDATA-4347) Improve performance when delete empty partition directory

2022-07-17 Thread Jiayu Shen (Jira)

Jiayu Shen created CARBONDATA-4347:
--

 Summary: Improve performance when delete empty partition directory
 Key: CARBONDATA-4347
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4347
 Project: CarbonData
  Issue Type: Improvement
Reporter: Jiayu Shen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

[jira] [Created] (CARBONDATA-4251) optimize clean index file performance

2021-07-26 Thread Jiayu Shen (Jira)

Jiayu Shen created CARBONDATA-4251:
--

 Summary: optimize clean index file performance
 Key: CARBONDATA-4251
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4251
 Project: CarbonData
  Issue Type: Improvement
  Components: core
Affects Versions: 2.2.0
Reporter: Jiayu Shen
 Fix For: 2.2.1


When cleanfile cleans up data, it cleans up all the carbonindex and 
carbonmergeindex that once existed, even though many carbonindex have been all 
deleted, which have been merged into carbonergeindex. considering that there 
are tens of thousands of carbonindex that once existed after the completion of 
the compaction, the clean file command will take serveral hours.
Here, we just need to clean up the existing files, carbonmergeindex or 
carbonindex files



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4176) Fail to achive aksk when create table on S3/OBS

2021-04-28 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4176:
---
Issue Type: Bug  (was: New Feature)
  Priority: Minor  (was: Major)

> Fail to achive aksk when create table on S3/OBS
> ---
>
> Key: CARBONDATA-4176
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4176
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jiayu Shen
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CARBONDATA-4176) Fail to achive aksk when create table on S3/OBS

2021-04-28 Thread Jiayu Shen (Jira)

Jiayu Shen created CARBONDATA-4176:
--

 Summary: Fail to achive aksk when create table on S3/OBS
 Key: CARBONDATA-4176
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4176
 Project: CarbonData
  Issue Type: New Feature
Reporter: Jiayu Shen






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-26 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Description: 
The requirement is from SEQ，related algorithms are provided by Discovery Team.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolylineList (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support :*
 * *"OR", means calculating union of two polygons*
 * *"AND", means calculating intersection of two polygons*

geo util UDFs :
 * _*GeoIdToGridXy(Long geoId) :* *Pair*_
 * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
 * _*GeoIdToLatLng(Long geoId) : Pair*_
 * _*ToUpperLayerGeoId(Long geoId) : Long*_
 * _*ToRangeList (String polygon) : List*_

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO. For 
example, 
{code:java}
INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;

It uesed to be as below, '855280799612' is generated internally,
++-+-++
|mygeohash  |timevalue   |longitude|latitude|
++-+-++
|855280799612|157542840|116285807|40084087|
++-+-++
but now is
++-+-++
|mygeohash  |timevalue  |longitude|latitude|
++-+-++
|0   |157542840|116285807|40084087|
++-+-++{code}
 

  was:
The requirement is from SEQ，related algorithms are provided by group Discovery.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolylineList (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support :*
 * *"OR", means calculating union of two polygons*
 * *"AND", means calculating intersection of two polygons*

geo util UDFs :
 * _*GeoIdToGridXy(Long geoId) :* *Pair*_
 * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
 * _*GeoIdToLatLng(Long geoId) : Pair*_
 * _*ToUpperLayerGeoId(Long geoId) : Long*_
 * _*ToRangeList (String polygon) : List*_

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO. For 
example, 
{code:java}
INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;

It uesed to be as below, '855280799612' is generated internally,
++-+-++
|mygeohash  |timevalue   |longitude|latitude|
++-+-++
|855280799612|157542840|116285807|40084087|
++-+-++
but now is
++-+-++
|mygeohash  |timevalue  |longitude|latitude|
++-+-++
|0   |157542840|116285807|40084087|
++-+-++{code}
 


> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Attachments: CarbonData Spatial Index Design Doc v2.docx
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ，related algorithms are provided by Discovery Team.
> 1. Replace geohash encoded algorithm,

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-26 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Attachment: (was: Genex Cloud Carbon Spatial Index 
Specification.docx)

> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Attachments: CarbonData Spatial Index Design Doc v2.docx
>
>  Time Spent: 4h 20m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support :*
>  * *"OR", means calculating union of two polygons*
>  * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
>  * _*GeoIdToGridXy(Long geoId) :* *Pair*_
>  * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
>  * _*GeoIdToLatLng(Long geoId) : Pair*_
>  * _*ToUpperLayerGeoId(Long geoId) : Long*_
>  * _*ToRangeList (String polygon) : List*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO. For 
> example, 
> {code:java}
> INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> ++-+-++
> |mygeohash  |timevalue   |longitude|latitude|
> ++-+-++
> |855280799612|157542840|116285807|40084087|
> ++-+-++
> but now is
> ++-+-++
> |mygeohash  |timevalue  |longitude|latitude|
> ++-+-++
> |0   |157542840|116285807|40084087|
> ++-+-++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-24 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Attachment: (was: CarbonData Spatial Index Design Doc v2.docx)

> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex 
> Cloud Carbon Spatial Index Specification.docx
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support :*
>  * *"OR", means calculating union of two polygons*
>  * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
>  * _*GeoIdToGridXy(Long geoId) :* *Pair*_
>  * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
>  * _*GeoIdToLatLng(Long geoId) : Pair*_
>  * _*ToUpperLayerGeoId(Long geoId) : Long*_
>  * _*ToRangeList (String polygon) : List*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO. For 
> example, 
> {code:java}
> INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> ++-+-++
> |mygeohash  |timevalue   |longitude|latitude|
> ++-+-++
> |855280799612|157542840|116285807|40084087|
> ++-+-++
> but now is
> ++-+-++
> |mygeohash  |timevalue  |longitude|latitude|
> ++-+-++
> |0   |157542840|116285807|40084087|
> ++-+-++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-24 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Attachment: CarbonData Spatial Index Design Doc v2.docx

> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex 
> Cloud Carbon Spatial Index Specification.docx
>
>  Time Spent: 2h 20m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support :*
>  * *"OR", means calculating union of two polygons*
>  * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
>  * _*GeoIdToGridXy(Long geoId) :* *Pair*_
>  * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
>  * _*GeoIdToLatLng(Long geoId) : Pair*_
>  * _*ToUpperLayerGeoId(Long geoId) : Long*_
>  * _*ToRangeList (String polygon) : List*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO. For 
> example, 
> {code:java}
> INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> ++-+-++
> |mygeohash  |timevalue   |longitude|latitude|
> ++-+-++
> |855280799612|157542840|116285807|40084087|
> ++-+-++
> but now is
> ++-+-++
> |mygeohash  |timevalue  |longitude|latitude|
> ++-+-++
> |0   |157542840|116285807|40084087|
> ++-+-++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CARBONDATA-4057) Support Complex DataType when Save DataFrame

2020-11-24 Thread Jiayu Shen (Jira)

Jiayu Shen created CARBONDATA-4057:
--

 Summary: Support Complex DataType when Save DataFrame
 Key: CARBONDATA-4057
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4057
 Project: CarbonData
  Issue Type: New Feature
Reporter: Jiayu Shen


Currently，once trigger df.mode(overwrite).save, complex datatype isn't 
supported, which shall be optimized



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-24 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Attachment: CarbonData Spatial Index Design Doc v2.docx

> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Attachments: CarbonData Spatial Index Design Doc v2.docx, Genex 
> Cloud Carbon Spatial Index Specification.docx
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support :*
>  * *"OR", means calculating union of two polygons*
>  * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
>  * _*GeoIdToGridXy(Long geoId) :* *Pair*_
>  * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
>  * _*GeoIdToLatLng(Long geoId) : Pair*_
>  * _*ToUpperLayerGeoId(Long geoId) : Long*_
>  * _*ToRangeList (String polygon) : List*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO. For 
> example, 
> {code:java}
> INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> ++-+-++
> |mygeohash  |timevalue   |longitude|latitude|
> ++-+-++
> |855280799612|157542840|116285807|40084087|
> ++-+-++
> but now is
> ++-+-++
> |mygeohash  |timevalue  |longitude|latitude|
> ++-+-++
> |0   |157542840|116285807|40084087|
> ++-+-++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-23 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Attachment: Genex Cloud Carbon Spatial Index Specification.docx

> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Attachments: Genex Cloud Carbon Spatial Index 
> Specification.docx
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support :*
>  * *"OR", means calculating union of two polygons*
>  * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
>  * _*GeoIdToGridXy(Long geoId) :* *Pair*_
>  * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
>  * _*GeoIdToLatLng(Long geoId) : Pair*_
>  * _*ToUpperLayerGeoId(Long geoId) : Long*_
>  * _*ToRangeList (String polygon) : List*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO. For 
> example, 
> {code:java}
> INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> ++-+-++
> |mygeohash  |timevalue   |longitude|latitude|
> ++-+-++
> |855280799612|157542840|116285807|40084087|
> ++-+-++
> but now is
> ++-+-++
> |mygeohash  |timevalue  |longitude|latitude|
> ++-+-++
> |0   |157542840|116285807|40084087|
> ++-+-++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-19 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Description: 
The requirement is from SEQ，related algorithms are provided by group Discovery.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolylineList (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support :*
 * *"OR", means calculating union of two polygons*
 * *"AND", means calculating intersection of two polygons*

geo util UDFs :
 * _*GeoIdToGridXy(Long geoId) :* *Pair*_
 * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
 * _*GeoIdToLatLng(Long geoId) : Pair*_
 * _*ToUpperLayerGeoId(Long geoId) : Long*_
 * _*ToRangeList (String polygon) : List*_

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO. For 
example, 
{code:java}
INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;

It uesed to be as below, '855280799612' is generated internally,
++-+-++
|mygeohash  |timevalue   |longitude|latitude|
++-+-++
|855280799612|157542840|116285807|40084087|
++-+-++
but now is
++-+-++
|mygeohash  |timevalue  |longitude|latitude|
++-+-++
|0   |157542840|116285807|40084087|
++-+-++{code}
 

  was:
The requirement is from SEQ，related algorithms are provided by group Discovery.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolylineList (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support "OR", "AND"*

geo util UDFs :
 * *GeoIdToGridXy(Long geoId) :* *Pair*
 * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
 * *GeoIdToLatLng(Long geoId) : Pair*
 * *ToUpperLayerGeoId(Long geoId) : Long*
 * *ToRangeList (String polygon) : List*

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO

 


> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-18 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Description: 
The requirement is from SEQ，related algorithms are provided by group Discovery.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolylineList (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support "OR", "AND"*

geo util UDFs :
 * *GeoIdToGridXy(Long geoId) :* *Pair*
 * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
 * *GeoIdToLatLng(Long geoId) : Pair*
 * *ToUpperLayerGeoId(Long geoId) : Long*
 * *ToRangeList (String polygon) : List*

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO

 

  was:
The requirement is from SEQ，related algorithms are provided by group Discovery.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolyline (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support "OR", "AND"*

geo util UDFs :
 * *GeoIdToGridXy(Long geoId) :* *Pair*
 * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
 * *GeoIdToLatLng(Long geoId) : Pair*
 * *ToUpperLayerGeoId(Long geoId) : Long*
 * *ToRangeList (String polygon) : List*

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO

 


> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support "OR", "AND"*
> geo util UDFs :
>  * *GeoIdToGridXy(Long geoId) :* *Pair*
>  * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
>  * *GeoIdToLatLng(Long geoId) : Pair*
>  * *ToUpperLayerGeoId(Long geoId) : Long*
>  * *ToRangeList (String polygon) : List*
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-18 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Description: 
The requirement is from SEQ，related algorithms are provided by group Discovery.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolyline (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support "OR", "AND"*

geo util UDFs :
 * *GeoIdToGridXy(Long geoId) :* *Pair*
 * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
 * *GeoIdToLatLng(Long geoId) : Pair*
 * *ToUpperLayerGeoId(Long geoId) : Long*
 * *ToRangeList (String polygon) : List*

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO

 

  was:
The requirement is from SEQ，related algorithms are provided by group Discovery.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :

 

 
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolyline (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support "OR", "AND"*

geo util UDFs :
 * *GeoIdToGridXy(Long geoId) :* *Pair*
 * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
 * *GeoIdToLatLng(Long geoId) : Pair*
 * *ToUpperLayerGeoId(Long geoId) : Long*
 * *ToRangeList (String polygon) : List*

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO

 


> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolyline (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support "OR", "AND"*
> geo util UDFs :
>  * *GeoIdToGridXy(Long geoId) :* *Pair*
>  * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
>  * *GeoIdToLatLng(Long geoId) : Pair*
>  * *ToUpperLayerGeoId(Long geoId) : Long*
>  * *ToRangeList (String polygon) : List*
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-18 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Description: 
The requirement is from SEQ，related algorithms are provided by group Discovery.

1. Replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2. Add geo query UDFs

query filter UDFs :

 

 
 * _*InPolygonList (List polygonList, OperationType opType)*_
 * _*InPolyline (List polylineList, Float bufferInMeter)*_
 * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_

*operation only support "OR", "AND"*

geo util UDFs :
 * *GeoIdToGridXy(Long geoId) :* *Pair*
 * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
 * *GeoIdToLatLng(Long geoId) : Pair*
 * *ToUpperLayerGeoId(Long geoId) : Long*
 * *ToRangeList (String polygon) : List*

3. Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID column to be customized during LOAD/INSERT INTO

 

  was:
This is requirement from SEQ，related algorithms is provided by group Discovery.

1 replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2 Add some geo UDFs

query filter UDFs

*InPolygonList (List polygonList, OperationType opType)*

***InPolyline (List polylineList, Float bufferInMeter)***

***InPolygonRangeList (List RangeList)***

geo util UDFs

***GeoIdToGridXy(Long geoId) :* *Pair***

*LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long*

***GeoIdToLatLng(Long geoId) : Pair***

***ToUpperLayerGeoId(Long geoId) : Long***

***ToRangeList (String polygon) : List***

3 Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID to be customized during LOAD/INSERT INTO

 


> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
>
> The requirement is from SEQ，related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  
>  
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolyline (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support "OR", "AND"*
> geo util UDFs :
>  * *GeoIdToGridXy(Long geoId) :* *Pair*
>  * *LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*
>  * *GeoIdToLatLng(Long geoId) : Pair*
>  * *ToUpperLayerGeoId(Long geoId) : Long*
>  * *ToRangeList (String polygon) : List*
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-18 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Description: 
This is requirement from SEQ，related algorithms is provided by group Discovery.

1 replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
2 Add some geo UDFs

query filter UDFs

*InPolygonList (List polygonList, OperationType opType)*

***InPolyline (List polylineList, Float bufferInMeter)***

***InPolygonRangeList (List RangeList)***

geo util UDFs

***GeoIdToGridXy(Long geoId) :* *Pair***

*LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long*

***GeoIdToLatLng(Long geoId) : Pair***

***ToUpperLayerGeoId(Long geoId) : Long***

***ToRangeList (String polygon) : List***

3 Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID to be customized during LOAD/INSERT INTO

 

  was:
This is requirement from SEQ，related algorithms is provided by group Discovery.

1 replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') add some geo spatial 
UDFs,{code}
2 Add some geo UDFs

query filter UDFs

*InPolygonList (List polygonList, OperationType opType)*

***InPolyline (List polylineList, Float bufferInMeter)***

***InPolygonRangeList (List RangeList)***

geo util UDFs

***GeoIdToGridXy(Long geoId) :* *Pair***

*LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long*

***GeoIdToLatLng(Long geoId) : Pair***

***ToUpperLayerGeoId(Long geoId) : Long***

***ToRangeList (String polygon) : List***

3 Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID to be customized during LOAD/INSERT INTO

 


> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
>
> This is requirement from SEQ，related algorithms is provided by group 
> Discovery.
> 1 replace geohash encoded algorithm, and reduce required properties of CREATE 
> TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2 Add some geo UDFs
> query filter UDFs
> *InPolygonList (List polygonList, OperationType opType)*
> ***InPolyline (List polylineList, Float bufferInMeter)***
> ***InPolygonRangeList (List RangeList)***
> geo util UDFs
> ***GeoIdToGridXy(Long geoId) :* *Pair***
> *LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long*
> ***GeoIdToLatLng(Long geoId) : Pair***
> ***ToUpperLayerGeoId(Long geoId) : Long***
> ***ToRangeList (String polygon) : List***
> 3 Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID to be customized during LOAD/INSERT INTO
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-18 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Description: 
This is requirement from SEQ，related algorithms is provided by group Discovery.

1 replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') add some geo spatial 
UDFs,{code}
2 Add some geo UDFs

query filter UDFs

*InPolygonList (List polygonList, OperationType opType)*

***InPolyline (List polylineList, Float bufferInMeter)***

***InPolygonRangeList (List RangeList)***

geo util UDFs

***GeoIdToGridXy(Long geoId) :* *Pair***

*LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long*

***GeoIdToLatLng(Long geoId) : Pair***

***ToUpperLayerGeoId(Long geoId) : Long***

***ToRangeList (String polygon) : List***

3 Currently GeoID is a column created internally for spatial tables, this PR 
will support GeoID to be customized during LOAD/INSERT INTO

 

  was:
This is requirement from SEQ，related algorithms is provided by group Discovery.
 # replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') {code}

 # add some geo spatial UDFs,
{code:java}
// query filter UDFs {code}


> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
>
> This is requirement from SEQ，related algorithms is provided by group 
> Discovery.
> 1 replace geohash encoded algorithm, and reduce required properties of CREATE 
> TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100') add some geo spatial 
> UDFs,{code}
> 2 Add some geo UDFs
> query filter UDFs
> *InPolygonList (List polygonList, OperationType opType)*
> ***InPolyline (List polylineList, Float bufferInMeter)***
> ***InPolygonRangeList (List RangeList)***
> geo util UDFs
> ***GeoIdToGridXy(Long geoId) :* *Pair***
> *LatLngToGeoId(**Long* *latitude, **Long* *longitude) : Long*
> ***GeoIdToLatLng(Long geoId) : Pair***
> ***ToUpperLayerGeoId(Long geoId) : Long***
> ***ToRangeList (String polygon) : List***
> 3 Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID to be customized during LOAD/INSERT INTO
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-18 Thread Jiayu Shen (Jira)

Jiayu Shen created CARBONDATA-4051:
--

 Summary: Geo spatial index algorithm improvement and UDFs 
enhancement
 Key: CARBONDATA-4051
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
 Project: CarbonData
  Issue Type: New Feature
Reporter: Jiayu Shen


This is requirement from SEQ，related algorithms is provided by group Discovery.
 # replace geohash encoded algorithm, and reduce required properties of CREATE 
TABLE. For example,
{code:java}
CREATE TABLE geoTable(
 timevalue BIGINT,
 longitude LONG,
 latitude LONG) COMMENT "This is a GeoTable"
 STORED AS carbondata
 TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
 'SPATIAL_INDEX.mygeohash.type'='geohash',
 'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
 'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
 'SPATIAL_INDEX.mygeohash.gridSize'='50',
 'SPATIAL_INDEX.mygeohash.conversionRatio'='100') {code}

 # add some geo spatial UDFs,
{code:java}
// query filter UDFs {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Resolved] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update

2020-11-18 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen resolved CARBONDATA-4034.

Resolution: Resolved

> Improve the time-consuming of Horizontal Compaction for update
> --
>
> Key: CARBONDATA-4034
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4034
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jiayu Shen
>Priority: Minor
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> In the update flow, horizontal compaction will be significantly slower when 
> updating with a lot of segments(or a lot of blocks). There is a case whose 
> costing is as shown in the log.
> {code:java}
> 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order]. 
>  2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order].{code}
> In this PR, we optimize the process between second and third row of the log, 
> by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
> compaction flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update

2020-10-29 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4034:
---
Description: 
In the update flow, horizontal compaction will be significantly slower when 
updating with a lot of segments(or a lot of blocks). There is a case whose 
costing is as shown in the log.
{code:java}
2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation completed for 
[ods_oms.oms_wh_outbound_order]. 
 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation completed for 
[ods_oms.oms_wh_outbound_order].{code}
In this PR, we optimize the process between second and third row of the log, by 
optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction 
flow.

 

  was:
In the update flow, horizontal compaction will be significantly slower when 
updating with a lot of segments(or a lot of blocks). There is a case whose 
costing is as shown in the log.

2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation completed for 
[ods_oms.oms_wh_outbound_order]. 
 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation completed for 
[ods_oms.oms_wh_outbound_order].

In this PR, we optimize the process between second and third row of the log, by 
optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction 
flow.

 


> Improve the time-consuming of Horizontal Compaction for update
> --
>
> Key: CARBONDATA-4034
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4034
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jiayu Shen
>Priority: Minor
>  Time Spent: 17h 10m
>  Remaining Estimate: 0h
>
> In the update flow, horizontal compaction will be significantly slower when 
> updating with a lot of segments(or a lot of blocks). There is a case whose 
> costing is as shown in the log.
> {code:java}
> 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order]. 
>  2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order].{code}
> In this PR, we optimize the process between second and third row of the log, 
> by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
> compaction flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update

2020-10-15 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4034:
---
Summary: Improve the time-consuming of Horizontal Compaction for update  
(was: Improve the time-comsuming of Horizontal Compaction for update)

> Improve the time-consuming of Horizontal Compaction for update
> --
>
> Key: CARBONDATA-4034
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4034
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jiayu Shen
>Priority: Minor
>
> In the update flow, horizontal compaction will be significantly slower when 
> updating with a lot of segments(or a lot of blocks). There is a case whose 
> costing is as shown in the log.
> 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order]. 
>  2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order].
> In this PR, we optimize the process between second and third row of the log, 
> by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
> compaction flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update

2020-10-15 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4034:
---
Description: 
In the update flow, horizontal compaction will be significantly slower when 
updating with a lot of segments(or a lot of blocks). There is a case whose 
costing is as shown in the log.

2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation completed for 
[ods_oms.oms_wh_outbound_order]. 
 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation completed for 
[ods_oms.oms_wh_outbound_order].

In this PR, we optimize the process between second and third row of the log, by 
optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction 
flow.

 

  was:
In the update flow, horizontal compaction will be significantly slower when 
updating with a lot of segments(or a lot of blocks).

There is a case whose costing is as shown in the log.

2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation completed for 
[ods_oms.oms_wh_outbound_order]. 
 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation completed for 
[ods_oms.oms_wh_outbound_order].


 In this PR, we optimize the process between second and third row of the log, 
by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
compaction flow.

 


> Improve the time-comsuming of Horizontal Compaction for update
> --
>
> Key: CARBONDATA-4034
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4034
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jiayu Shen
>Priority: Minor
>
> In the update flow, horizontal compaction will be significantly slower when 
> updating with a lot of segments(or a lot of blocks). There is a case whose 
> costing is as shown in the log.
> 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order]. 
>  2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order].
> In this PR, we optimize the process between second and third row of the log, 
> by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
> compaction flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update

2020-10-15 Thread Jiayu Shen (Jira)



 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4034:
---
Description: 
In the update flow, horizontal compaction will be significantly slower when 
updating with a lot of segments(or a lot of blocks).

There is a case whose costing is as shown in the log.

2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation completed for 
[ods_oms.oms_wh_outbound_order]. 
 2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
 2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation completed for 
[ods_oms.oms_wh_outbound_order].


 In this PR, we optimize the process between second and third row of the log, 
by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
compaction flow.

 

  was:
In the update flow, horizontal compaction will be significantly slower when 
updating with a lot of segments(or a lot of blocks).

There is a case whose costing is as shown in the log.


2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation completed for 
[ods_oms.oms_wh_outbound_order]. 
2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation completed for 
[ods_oms.oms_wh_outbound_order].
In this PR, we optimize the process between second and third row of the log, by 
optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction 
flow.

 


> Improve the time-comsuming of Horizontal Compaction for update
> --
>
> Key: CARBONDATA-4034
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4034
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Jiayu Shen
>Priority: Minor
>
> In the update flow, horizontal compaction will be significantly slower when 
> updating with a lot of segments(or a lot of blocks).
> There is a case whose costing is as shown in the log.
> 2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Update Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order]. 
>  2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation started for 
> [ods_oms.oms_wh_outbound_order] 
>  2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
> Horizontal Delete Compaction operation completed for 
> [ods_oms.oms_wh_outbound_order].
>  In this PR, we optimize the process between second and third row of the log, 
> by optimizing the method _performDeleteDeltaCompaction_ in horizontal 
> compaction flow.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Created] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update

2020-10-15 Thread Jiayu Shen (Jira)

Jiayu Shen created CARBONDATA-4034:
--

 Summary: Improve the time-comsuming of Horizontal Compaction for 
update
 Key: CARBONDATA-4034
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4034
 Project: CarbonData
  Issue Type: Bug
Reporter: Jiayu Shen


In the update flow, horizontal compaction will be significantly slower when 
updating with a lot of segments(or a lot of blocks).

There is a case whose costing is as shown in the log.


2020-10-10 09:38:10,466 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
2020-10-10 09:50:25,718 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Update Compaction operation completed for 
[ods_oms.oms_wh_outbound_order]. 
2020-10-10 10:15:44,302 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation started for 
[ods_oms.oms_wh_outbound_order] 
2020-10-10 10:15:54,874 | INFO | [OperationManager-Background-Pool-28] | 
Horizontal Delete Compaction operation completed for 
[ods_oms.oms_wh_outbound_order].
In this PR, we optimize the process between second and third row of the log, by 
optimizing the method _performDeleteDeltaCompaction_ in horizontal compaction 
flow.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory

[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory

[jira] [Updated] (CARBONDATA-4347) Improve performance when delete empty partition directory

[jira] [Created] (CARBONDATA-4347) Improve performance when delete empty partition directory

[jira] [Created] (CARBONDATA-4251) optimize clean index file performance

[jira] [Updated] (CARBONDATA-4176) Fail to achive aksk when create table on S3/OBS

[jira] [Created] (CARBONDATA-4176) Fail to achive aksk when create table on S3/OBS

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Created] (CARBONDATA-4057) Support Complex DataType when Save DataFrame

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Created] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

[jira] [Resolved] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update

[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update

[jira] [Updated] (CARBONDATA-4034) Improve the time-consuming of Horizontal Compaction for update

[jira] [Updated] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update

[jira] [Updated] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update

[jira] [Created] (CARBONDATA-4034) Improve the time-comsuming of Horizontal Compaction for update

27 matches

Site Navigation

Mail list logo

Footer information