[GitHub] [carbondata] CarbonDataQA1 commented on issue #3566: [CARBONDATA-3492]: Added prepriming in the Index Server Documentation

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3566: [CARBONDATA-3492]: Added prepriming in 
the Index Server Documentation
URL: https://github.com/apache/carbondata/pull/3566#issuecomment-572912915
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1580/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ajantha-bhat commented on issue #3573: [CARBONDATA-3661] Fix target file size check fail when upload local file to carbon store

2020-01-09 Thread GitBox
ajantha-bhat commented on issue #3573: [CARBONDATA-3661] Fix target file size 
check fail when upload local file to carbon store
URL: https://github.com/apache/carbondata/pull/3573#issuecomment-572904579
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kunal642 commented on issue #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update does not load cache in memory, behavior inconsistent with scenario when index server is not running

2020-01-09 Thread GitBox
kunal642 commented on issue #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update 
does not load cache in memory, behavior inconsistent with scenario when index 
server is not running
URL: https://github.com/apache/carbondata/pull/3511#issuecomment-572894082
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on issue #3573: [CARBONDATA-3661] Fix target file size check fail when upload local file to carbon store

2020-01-09 Thread GitBox
QiangCai commented on issue #3573: [CARBONDATA-3661] Fix target file size check 
fail when upload local file to carbon store
URL: https://github.com/apache/carbondata/pull/3573#issuecomment-572892276
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akashrn5 commented on issue #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update does not load cache in memory, behavior inconsistent with scenario when index server is not running

2020-01-09 Thread GitBox
akashrn5 commented on issue #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update 
does not load cache in memory, behavior inconsistent with scenario when index 
server is not running
URL: https://github.com/apache/carbondata/pull/3511#issuecomment-572891357
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3573: [CARBONDATA-3661] Fix target file size check fail when upload local file to carbon store

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3573: [CARBONDATA-3661] Fix target file size 
check fail when upload local file to carbon store
URL: https://github.com/apache/carbondata/pull/3573#issuecomment-572891277
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1578/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3566: [CARBONDATA-3492]: Added prepriming in the Index Server Documentation

2020-01-09 Thread GitBox
vikramahuja1001 commented on a change in pull request #3566: [CARBONDATA-3492]: 
Added prepriming in the Index Server Documentation
URL: https://github.com/apache/carbondata/pull/3566#discussion_r365086686
 
 

 ##
 File path: docs/index-server.md
 ##
 @@ -119,6 +119,16 @@ be written to file.
 The user can set the location for these file by using 
'carbon.indexserver.temp.path'. By default
 table path would be used to write the files.
 
+## Prepriming
+The caching of the datamaps in the Index Server will start once the query is 
fired on the table for
+the first time. All the datamaps will be loaded if a count(*) query is fired 
and only the required
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3566: [CARBONDATA-3492]: Added prepriming in the Index Server Documentation

2020-01-09 Thread GitBox
vikramahuja1001 commented on a change in pull request #3566: [CARBONDATA-3492]: 
Added prepriming in the Index Server Documentation
URL: https://github.com/apache/carbondata/pull/3566#discussion_r365086674
 
 

 ##
 File path: docs/index-server.md
 ##
 @@ -119,6 +119,16 @@ be written to file.
 The user can set the location for these file by using 
'carbon.indexserver.temp.path'. By default
 table path would be used to write the files.
 
+## Prepriming
+The caching of the datamaps in the Index Server will start once the query is 
fired on the table for
+the first time. All the datamaps will be loaded if a count(*) query is fired 
and only the required
+will be loaded for any filter query. Unless the query is fired on the table 
there will be no caching, 
+which reduces the performance of the first time query. To improve the 
performance of the first time 
+query, cache can be preprimed in the Index Server. During prepriming the 
datamaps can be loaded in the 
+Index Server during load command before a select query is fired on the table.
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] vikramahuja1001 commented on a change in pull request #3566: [CARBONDATA-3492]: Added prepriming in the Index Server Documentation

2020-01-09 Thread GitBox
vikramahuja1001 commented on a change in pull request #3566: [CARBONDATA-3492]: 
Added prepriming in the Index Server Documentation
URL: https://github.com/apache/carbondata/pull/3566#discussion_r365086695
 
 

 ##
 File path: docs/index-server.md
 ##
 @@ -119,6 +119,16 @@ be written to file.
 The user can set the location for these file by using 
'carbon.indexserver.temp.path'. By default
 table path would be used to write the files.
 
+## Prepriming
+The caching of the datamaps in the Index Server will start once the query is 
fired on the table for
+the first time. All the datamaps will be loaded if a count(*) query is fired 
and only the required
+will be loaded for any filter query. Unless the query is fired on the table 
there will be no caching, 
+which reduces the performance of the first time query. To improve the 
performance of the first time 
+query, cache can be preprimed in the Index Server. During prepriming the 
datamaps can be loaded in the 
+Index Server during load command before a select query is fired on the table.
+
+The user can enable prepriming by using 'carbon.indexserver.enable.prepriming'.
 
 Review comment:
   done


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Resolved] (CARBONDATA-3660) Throw FileNotFoundException when concurrent loading

2020-01-09 Thread Zhi Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhi Liu resolved CARBONDATA-3660.
-
Resolution: Fixed

> Throw FileNotFoundException when concurrent loading
> ---
>
> Key: CARBONDATA-3660
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3660
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Zhi Liu
>Priority: Major
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> 2020-01-09 14:42:47 ERROR CarbonFactDataWriterImplV3:390 - Problem while 
> writing the index file2020-01-09 14:42:47 ERROR 
> CarbonFactDataWriterImplV3:390 - Problem while writing the index 
> fileorg.apache.carbondata.core.datastore.exception.CarbonDataWriterException: 
> Problem while copying file from local store to carbon store at 
> org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2772)
>  at 
> org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2721)
>  at 
> org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.commitCurrentFile(AbstractFactDataWriter.java:277)
>  at 
> org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:387)
>  at 
> org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:508)
>  at 
> org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:233)
>  at 
> org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:211)
>  at 
> org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:175)
>  at 
> org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:129)
>  at 
> org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:52)
>  at 
> org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.run(CarbonTableOutputFormat.java:278)
>  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) 
> at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)Caused by: 
> java.io.FileNotFoundException: 
> D:\Workspace\Carbon-Master\integration\flink\target\test-classes\data\temp\8df45d4dc38449c69147083cdbe79e4d\part-0-156_batchno0-0-null-1578552167465.carbondata
>  (系统找不到指定的路径。) at java.io.FileOutputStream.open0(Native Method) at 
> java.io.FileOutputStream.open(FileOutputStream.java:270) at 
> java.io.FileOutputStream.(FileOutputStream.java:213) at 
> java.io.FileOutputStream.(FileOutputStream.java:101) at 
> org.apache.carbondata.core.datastore.filesystem.LocalCarbonFile.getDataOutputStream(LocalCarbonFile.java:371)
>  at 
> org.apache.carbondata.core.datastore.filesystem.LocalCarbonFile.getDataOutputStream(LocalCarbonFile.java:365)
>  at 
> org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:231)
>  at 
> org.apache.carbondata.core.util.CarbonUtil.copyLocalFileToCarbonStore(CarbonUtil.java:2799)
>  at 
> org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2756)
>  ... 15 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (CARBONDATA-3655) Support set base64 string as struct field value.

2020-01-09 Thread Zhi Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhi Liu resolved CARBONDATA-3655.
-
Resolution: Fixed

> Support set base64 string as struct field value.
> 
>
> Key: CARBONDATA-3655
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3655
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Zhi Liu
>Priority: Major
>  Time Spent: 6h 10m
>  Remaining Estimate: 0h
>
> Currently, only support set string with delimiter as struct field value, 
> sometime it doesn't work very well on struct field.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3572: [HOTFIX] Fix carbon file name duplicate problem

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3572: [HOTFIX] Fix carbon file name duplicate 
problem
URL: https://github.com/apache/carbondata/pull/3572#issuecomment-572890147
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1577/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3661) Fix target file size check fail when upload local file to carbon store

2020-01-09 Thread Zhi Liu (Jira)
Zhi Liu created CARBONDATA-3661:
---

 Summary: Fix target file size check fail when upload local file to 
carbon store
 Key: CARBONDATA-3661
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3661
 Project: CarbonData
  Issue Type: Bug
Reporter: Zhi Liu


Multi flink tasks write carbon data may use the same carbon data file name, it 
will cause target file size check fail when upload local file to carbon store.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] vikramahuja1001 commented on issue #3466: [CARBONDATA-3586] [CARBONDATA-3587] [CARBONDATA-3595]:Adding valid segments into segments to be refreshed map before inserting segments

2020-01-09 Thread GitBox
vikramahuja1001 commented on issue #3466: [CARBONDATA-3586] [CARBONDATA-3587] 
[CARBONDATA-3595]:Adding valid segments into segments to be refreshed map 
before inserting segments to index server
URL: https://github.com/apache/carbondata/pull/3466#issuecomment-572888219
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 commented on issue #3573: [HOTFIX] Fix carbon file name duplicate problem

2020-01-09 Thread GitBox
niuge01 commented on issue #3573: [HOTFIX] Fix carbon file name duplicate 
problem
URL: https://github.com/apache/carbondata/pull/3573#issuecomment-572873423
 
 
   please test this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 closed pull request #3572: [HOTFIX] Fix carbon file name duplicate problem

2020-01-09 Thread GitBox
niuge01 closed pull request #3572: [HOTFIX] Fix carbon file name duplicate 
problem
URL: https://github.com/apache/carbondata/pull/3572
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 opened a new pull request #3573: [HOTFIX] Fix carbon file name duplicate problem

2020-01-09 Thread GitBox
niuge01 opened a new pull request #3573: [HOTFIX] Fix carbon file name 
duplicate problem
URL: https://github.com/apache/carbondata/pull/3573
 
 
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Comment Edited] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011544#comment-17011544
 ] 

Venugopal Reddy K edited comment on CARBONDATA-3548 at 1/10/20 4:51 AM:


Updated for
 # algorithm description.
 # Modified polygon UDF syntax as -
IN_POLYGON('116.321011 40.123503, 116.137676 39.947911, 116.560993 39.935276, 
116.321011 40.123503')
 # Used IN filter expression with a LIST expression containing all the 
geohashIds to be filtered instead of RANGE filter expression as this improves 
the query performance significantly.


was (Author: venureddy):
Updated for
 # algorithm description.
 # Used IN filter expression with a LIST expression containing all the 
geohashIds to be filtered instead of RANGE filter expression as this improves 
the query performance significantly.

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf, 
> Geospatial Index Design Doc-OpenSource.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] niuge01 opened a new pull request #3572: [HOTFIX] Fix carbon file name duplicate problem

2020-01-09 Thread GitBox
niuge01 opened a new pull request #3572: [HOTFIX] Fix carbon file name 
duplicate problem
URL: https://github.com/apache/carbondata/pull/3572
 
 
### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3548:
--
Attachment: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf, 
> Geospatial Index Design Doc-OpenSource.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3548:
--
Attachment: (was: Geospatial Index Design Doc-OpenSource-Version 
2.0.pdf)

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf, 
> Geospatial Index Design Doc-OpenSource.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3548:
--
Attachment: Geospatial Index Design Doc-OpenSource.pdf

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3548:
--
Attachment: (was: Geospatial Index Design Doc-OpenSource.pdf)

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3548:
--
Attachment: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3548:
--
Attachment: (was: Geospatial Index Design Doc-OpenSource-Version 
2.0.pdf)

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] IceMimosa commented on issue #3430: [CARBONDATA-3565] Fix complex binary data broken issue when loading dataframe data

2020-01-09 Thread GitBox
IceMimosa commented on issue #3430: [CARBONDATA-3565] Fix complex binary data 
broken issue when loading dataframe data
URL: https://github.com/apache/carbondata/pull/3430#issuecomment-572863921
 
 
   @niuge01 I tried and it can not work, because of the default BinaryDecoder 
is `""`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only Matched partitioned segments for filter on Partitioned table

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3568: [CARBONDATA-3658] Prune and Cache only 
Matched partitioned segments for filter on Partitioned table
URL: https://github.com/apache/carbondata/pull/3568#issuecomment-572850496
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1576/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #3566: [CARBONDATA-3492]: Added prepriming in the Index Server Documentation

2020-01-09 Thread GitBox
Indhumathi27 commented on a change in pull request #3566: [CARBONDATA-3492]: 
Added prepriming in the Index Server Documentation
URL: https://github.com/apache/carbondata/pull/3566#discussion_r365046175
 
 

 ##
 File path: docs/index-server.md
 ##
 @@ -119,6 +119,16 @@ be written to file.
 The user can set the location for these file by using 
'carbon.indexserver.temp.path'. By default
 table path would be used to write the files.
 
+## Prepriming
+The caching of the datamaps in the Index Server will start once the query is 
fired on the table for
+the first time. All the datamaps will be loaded if a count(*) query is fired 
and only the required
+will be loaded for any filter query. Unless the query is fired on the table 
there will be no caching, 
+which reduces the performance of the first time query. To improve the 
performance of the first time 
+query, cache can be preprimed in the Index Server. During prepriming the 
datamaps can be loaded in the 
+Index Server during load command before a select query is fired on the table.
 
 Review comment:
   during Load command => i think should be for both Load and Insert into 
command. Please change


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 commented on issue #3430: [CARBONDATA-3565] Fix complex binary data broken issue when loading dataframe data

2020-01-09 Thread GitBox
niuge01 commented on issue #3430: [CARBONDATA-3565] Fix complex binary data 
broken issue when loading dataframe data
URL: https://github.com/apache/carbondata/pull/3430#issuecomment-572825802
 
 
   @IceMimosa 
   This problem may be fixed in pr 3564, please check it.
   https://github.com/apache/carbondata/pull/3564


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572637232
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1575/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3466: [CARBONDATA-3586] [CARBONDATA-3587] [CARBONDATA-3595]:Adding valid segments into segments to be refreshed map before inserting segments to

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3466: [CARBONDATA-3586] [CARBONDATA-3587] 
[CARBONDATA-3595]:Adding valid segments into segments to be refreshed map 
before inserting segments to index server
URL: https://github.com/apache/carbondata/pull/3466#issuecomment-572609764
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1568/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572597036
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1574/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.

2020-01-09 Thread GitBox
jackylk commented on a change in pull request #3571: [CARBONDATA-3659] Fix 
issues with alluxio without host and port.
URL: https://github.com/apache/carbondata/pull/3571#discussion_r364779297
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
 ##
 @@ -554,6 +555,21 @@ public short getDefaultReplication() {
 return fileSystem.getDefaultReplication(path);
   }
 
+  @Override
+  public boolean equals(Object o) {
+if (this == o) return true;
 
 Review comment:
   add `{` and `}`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2020-01-09 Thread GitBox
akkio-97 commented on a change in pull request #3515: [CARBONDATA-3623]: Fixed 
global sort compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#discussion_r364769297
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/carbondata/spark/load/DataLoadProcessBuilderOnSpark.scala
 ##
 @@ -443,23 +438,18 @@ object DataLoadProcessBuilderOnSpark {
   .asScala
   .map(_.getColName)
   .toArray
+
+/**
+ * 
[[org.apache.spark.sql.catalyst.expressions.objects.ValidateExternalType]] 
validates the
+ * datatype of column data and corresponding datatype in schema provided 
to create dataframe.
+ * Since carbonScanRDD gives Long data for timestamp column and 
corresponding column datatype in
+ * schema is Timestamp, this validation fails if we use createDataFrame 
API which takes rdd as
+ * input. Hence, We need to give the List[Row] compatible with the schema 
datatypes. So using
+ * the createDataFrame API which takes List[Row] and schema as input.
+ */
 val schema = SparkTypeConverter.createSparkSchema(carbonTable, columns)
-val rdd: RDD[InternalRow] = new CarbonScanRDD[CarbonRow](
-  sparkSession,
-  columnProjection = new CarbonProjection(columns),
-  null,
-  carbonTable.getAbsoluteTableIdentifier,
-  carbonTable.getTableInfo.serialize,
-  carbonTable.getTableInfo,
-  new CarbonInputMetrics,
-  null,
-  classOf[SparkDataTypeConverterImpl],
-  classOf[CarbonRowReadSupport],
-  splits.asJava)
-  .map { row =>
-new GenericInternalRow(row.getData.asInstanceOf[Array[Any]])
-  }
-SparkSQLUtil.execute(rdd, schema, sparkSession)
+val listOfRows = 
sparkSession.sqlContext.table(carbonTable.getTableName).collect().toList.asJava
 
 Review comment:
   This method will be called by compaction and insert into stage command. So 
the carbonTable ought to be the source table.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort compaction failure on timestamp column

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3515: [CARBONDATA-3623]: Fixed global sort 
compaction failure on timestamp column
URL: https://github.com/apache/carbondata/pull/3515#issuecomment-572583789
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1573/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3561: [HOTFIX] Fix INSERT STAGE footer read error

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3561: [HOTFIX] Fix INSERT STAGE footer read 
error
URL: https://github.com/apache/carbondata/pull/3561#issuecomment-572577165
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1571/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text for all 3 paragraphs. Can we 
rephrase like this -->
   
   There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the Z order curve.`

   CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
   
   Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   `There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the Z order curve.`
`CarbonData rasterize the user data during data load 
into segments. A set of latitude and longitude represents a grid range. The 
size of the grid can be configured. Hence, the coordinates loaded are often 
discrete and not continuous.`
   
   Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   ``
   There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the Z order curve.`

   CarbonData rasterize the user data during data load into segments. A set of 
latitude and longitude represents a grid range. The size of the grid can be 
configured. Hence, the coordinates loaded are often discrete and not continuous.
   
   Below figure shows the relationship between the grid and the points residing 
in it. Black point represents the center point of the grid, and the red points 
are the coordinates at the arbitrary positions inside the grid. The red points 
can be replaced by the center point of the grid to indicate that the points 
lies within the grid. During data load, CarbonData generates an Index for 
coordinate according to row and column of the grid(in the raster) where that 
coordinate lies. These Indexes are the same as Z order. For the detailed 
conversion algorithm, please refer to the design documents of spatial index.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update does not load cache in memory, behavior inconsistent with scenario when index server is not run

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3511: [CARBONDATA-3620][CARBONDATA-3622]: 
Update does not load cache in memory, behavior inconsistent with scenario when 
index server is not running
URL: https://github.com/apache/carbondata/pull/3511#issuecomment-572569167
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1570/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3537: [CARBONDATA-3646] [CARBONDATA-3647]: Fix query failure with Index Server

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3537: [CARBONDATA-3646] [CARBONDATA-3647]: 
Fix query failure with Index Server
URL: https://github.com/apache/carbondata/pull/3537#issuecomment-572568660
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1569/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   `There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the Z order curve.`
`CarbonData rasterize the user data during data load 
into segments. A set of latitude and longitude represents a grid range. The 
size of the grid can be configured. Hence, the coordinates loaded are often 
discrete and not continuous.`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3393: [CARBONDATA-3503][Carbon2] Adapt to SparkSessionExtension

2020-01-09 Thread GitBox
asfgit closed pull request #3393: [CARBONDATA-3503][Carbon2] Adapt to 
SparkSessionExtension
URL: https://github.com/apache/carbondata/pull/3393
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3565: Changes to show metacache command

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3565: Changes to show metacache command
URL: https://github.com/apache/carbondata/pull/3565#issuecomment-572546261
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1567/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on a change in pull request #3393: [CARBONDATA-3503][Carbon2] Adapt to SparkSessionExtension

2020-01-09 Thread GitBox
QiangCai commented on a change in pull request #3393: 
[CARBONDATA-3503][Carbon2] Adapt to SparkSessionExtension
URL: https://github.com/apache/carbondata/pull/3393#discussion_r364714651
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/test/util/CarbonQueryTest.scala
 ##
 @@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.test.util
+
+import java.util.{Locale, ServiceLoader, TimeZone}
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{DataFrame, Row, SQLContext}
+import org.apache.spark.sql.catalyst.plans.logical
+import org.apache.spark.sql.catalyst.util.sideBySide
+import org.apache.spark.sql.test.{TestQueryExecutor, TestQueryExecutorRegister}
+import org.apache.spark.util.Utils
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class CarbonQueryTest extends PlanTest {
 
 Review comment:
   @ravipesala this pr still need to use CarbonSession for some test cases
   I will raise another new pr to fix it. So I will merge it at first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on issue #3393: [CARBONDATA-3503][Carbon2] Adapt to SparkSessionExtension

2020-01-09 Thread GitBox
QiangCai commented on issue #3393: [CARBONDATA-3503][Carbon2] Adapt to 
SparkSessionExtension
URL: https://github.com/apache/carbondata/pull/3393#issuecomment-572542227
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on a change in pull request #3393: [CARBONDATA-3503][Carbon2] Adapt to SparkSessionExtension

2020-01-09 Thread GitBox
QiangCai commented on a change in pull request #3393: 
[CARBONDATA-3503][Carbon2] Adapt to SparkSessionExtension
URL: https://github.com/apache/carbondata/pull/3393#discussion_r364714651
 
 

 ##
 File path: 
integration/spark-common/src/main/scala/org/apache/spark/sql/test/util/CarbonQueryTest.scala
 ##
 @@ -0,0 +1,212 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.test.util
+
+import java.util.{Locale, ServiceLoader, TimeZone}
+
+import scala.collection.JavaConverters._
+
+import org.apache.spark.sql.{DataFrame, Row, SQLContext}
+import org.apache.spark.sql.catalyst.plans.logical
+import org.apache.spark.sql.catalyst.util.sideBySide
+import org.apache.spark.sql.test.{TestQueryExecutor, TestQueryExecutorRegister}
+import org.apache.spark.util.Utils
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+
+class CarbonQueryTest extends PlanTest {
 
 Review comment:
   @ravipesala this pr still need to use CarbonSession for some test cases
   I will raise anotehr new pr to fix it. So I will merge it at first.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai removed a comment on issue #3393: [CARBONDATA-3503][Carbon2] Adapt to SparkSessionExtension

2020-01-09 Thread GitBox
QiangCai removed a comment on issue #3393: [CARBONDATA-3503][Carbon2] Adapt to 
SparkSessionExtension
URL: https://github.com/apache/carbondata/pull/3393#issuecomment-572504556
 
 
   @ajithme  please rebase to the latest master and fix the comments


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3561: [HOTFIX] Fix INSERT STAGE footer read error

2020-01-09 Thread GitBox
jackylk commented on a change in pull request #3561: [HOTFIX] Fix INSERT STAGE 
footer read error
URL: https://github.com/apache/carbondata/pull/3561#discussion_r364712006
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/reader/CarbonIndexFileReader.java
 ##
 @@ -112,4 +112,5 @@ public void openThriftReader(byte[] fileData) throws 
IOException {
   public boolean hasNext() throws IOException {
 return thriftReader.hasNext();
   }
+
 
 Review comment:
   fixed


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] jackylk commented on a change in pull request #3561: [HOTFIX] Fix INSERT STAGE footer read error

2020-01-09 Thread GitBox
jackylk commented on a change in pull request #3561: [HOTFIX] Fix INSERT STAGE 
footer read error
URL: https://github.com/apache/carbondata/pull/3561#discussion_r364712733
 
 

 ##
 File path: 
core/src/main/java/org/apache/carbondata/core/util/path/CarbonTablePath.java
 ##
 @@ -714,4 +714,18 @@ public static String generateBadRecordsPath(String 
badLogStoreLocation, String s
   CarbonCommonConstants.FILE_SEPARATOR + taskNo;
 }
   }
+
+  /**
+   * Return the parent path of the input file.
+   * For example, if input file path is /user/warehouse/t1/file.carbondata
+   * then return will be /user/warehouse/t1
+   */
+  public static String getParentPath(String dataFilePath) {
+int endIndex = 
dataFilePath.lastIndexOf(CarbonCommonConstants.FILE_SEPARATOR);
+if (endIndex > -1) {
 
 Review comment:
   I think this is better, especially for debugging


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] vikramahuja1001 commented on issue #3511: [CARBONDATA-3620][CARBONDATA-3622]: Update does not load cache in memory, behavior inconsistent with scenario when index server is not r

2020-01-09 Thread GitBox
vikramahuja1001 commented on issue #3511: [CARBONDATA-3620][CARBONDATA-3622]: 
Update does not load cache in memory, behavior inconsistent with scenario when 
index server is not running
URL: https://github.com/apache/carbondata/pull/3511#issuecomment-572538686
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] vikramahuja1001 commented on issue #3537: [CARBONDATA-3646] [CARBONDATA-3647]: Fix query failure with Index Server

2020-01-09 Thread GitBox
vikramahuja1001 commented on issue #3537: [CARBONDATA-3646] [CARBONDATA-3647]: 
Fix query failure with Index Server
URL: https://github.com/apache/carbondata/pull/3537#issuecomment-572538636
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] vikramahuja1001 commented on issue #3466: [CARBONDATA-3586] [CARBONDATA-3587] [CARBONDATA-3595]:Adding valid segments into segments to be refreshed map before inserting segments

2020-01-09 Thread GitBox
vikramahuja1001 commented on issue #3466: [CARBONDATA-3586] [CARBONDATA-3587] 
[CARBONDATA-3595]:Adding valid segments into segments to be refreshed map 
before inserting segments to index server
URL: https://github.com/apache/carbondata/pull/3466#issuecomment-572538267
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   `There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude pair, like the Z order curve.`
`CarbonData requires rasterization of data before 
loading into segments. A set of latitude and longitude represents a grid range. 
The size of the grid can be configured. Hence, the coordinates loaded are often 
discrete and not continuous.`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   `There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for that coodrinate is generated using longitude and 
latitude, like the Z order curve.`
`CarbonData requires rasterization of data before 
loading into segments. A set of latitude and longitude represents a grid range. 
The size of the grid can be configured. Hence, the coordinates loaded are often 
discrete and not continuous.`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   `There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for the coodrinate is generated using longitude and 
latitude pair, like the Z order curve.`
`CarbonData implements a grid spatial index. It 
requires rasterization of data before loading into segments. A set of latitude 
and longitude represents a grid range. And the size of the grid can be 
configured. So the coordinates of the points loaded are often discrete and not 
continuous.`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   `There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for the coodrinate is generated using longitude and 
latitude pair, like the Z order curve.`
   
`CarbonData implements a grid spatial index. It 
requires rasterization of data before loading into segments. A set of latitude 
and longitude represents a grid range. And the size of the grid can be 
configured. So the coordinates of the points loaded are often discrete and not 
continuous.`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   `There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for the coodrinate is generated using longitude and 
latitude pair, like the Z order curve.`
`CarbonData implements a grid spatial index. It 
requires rasterization of data before loading into segments. A set of latitude 
and longitude represents a grid range. And the size of the grid can be 
configured. So the coordinates of the points loaded are often discrete and not 
continuous.`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3571: [CARBONDATA-3659] Fix issues with 
alluxio without host and port.
URL: https://github.com/apache/carbondata/pull/3571#issuecomment-572532495
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1565/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   `There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for the coodrinate is generated using longitude and 
latitude pair, like the Z order curve.`
`CarbonData implements a grid spatial index. It 
requires rasterization of data before loading into segments. A set of latitude 
and longitude represents a grid range. The size of the grid can be configured. 
So the coordinates of the loaded points are often discrete and not continuous.`
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for the coodrinate is generated using longitude and 
latitude pair, like the Z order curve.
   
 CarbonData implements a grid spatial index. It requires 
rasterization of data before loading into segments. A set of latitude and 
longitude represents a grid range. The size of the grid can be configured. So 
the coordinates of the loaded points are often discrete and not continuous. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364701741
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   Have read the text and have few suggestions on repharsing certain parts of 
it, To be clear, have attached the modified text below. Can we rephrase like 
this -->
   There are many opensource implementations for spatial indexing and to 
process spatial queries. CarbonData implements a different way of spatial 
index. Its core idea is to use the raster data. Raster is made up of matrix of 
cells organized into rows and columns(called a grid). Each cell represents a 
coordinate. And the index for the coodrinate is generated using longitude and 
latitude pair, like the Z order curve.
CarbonData implements a grid spatial index. It requires 
rasterization of data before loading into segments. A set of latitude and 
longitude represents a grid range. The size of the grid can be configured. So 
the coordinates of the loaded points are often discrete and not continuous. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index 
user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364687660
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been 
gridded when it is load into segments. A set of latitude and longitude 
represents a grid range, the size of the grid can be specified artificially. So 
the coordinates of the loaded points are often discrete and not continuous. 
+
+The grid and point relationship is like that black point is the middle of a 
grid, the red dot is just inside the grid. The red point is inside the grid, it 
can be replaced by the center point of the grid, indicating that the point is 
within the grid. Therefore, the coordinates of points in a grid are replaced by 
black points in the middle. This is the characteristic of data load.  At the 
same time of data load, carbondata will generate hash ID according to the 
coordinates of rows and columns of the grid. These hash IDs are the same as Z 
order when querying. Detailed conversion algorithm can refer to the design 
documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree. The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area and 
group photo in map area. When the query polygon area is not disjon from the 
grid center point, the grid is considered selected.  In the following figure, 
user select a quadrilateral polygon,  The grid with the center point in the 
region will generate a quadtree. A list of line with continuous properties will 
be generated in the query process, like [97->97  99->99  102->102  104->111  
120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  
225->225  228->229], each part of the list represents a continuous grid area. 
Carbondata use that line list to prune and filtered. About the detail can be 
search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the 
mode has been open. 
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 
'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional 
information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 
'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash', 
+'INDEX_HANDLER.mygeohash.type'='geohash',   
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',   
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.gridSize'='50',   
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',   
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',   
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',   
+'INDEX_HANDLER.mygeohash.conversionRatio'='100');
+```
+
+| **Property**  | **Description**  
|
+| - | :--- 
|
+| INDEX_HANDLER | Custom index handler. This handler allows user to  create a 
new column from the set of schema columns. Newly created column name  is same 
as that of handler name. Type and sourcecolumns propert

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index 
user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364686383
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been 
gridded when it is load into segments. A set of latitude and longitude 
represents a grid range, the size of the grid can be specified artificially. So 
the coordinates of the loaded points are often discrete and not continuous. 
+
+The grid and point relationship is like that black point is the middle of a 
grid, the red dot is just inside the grid. The red point is inside the grid, it 
can be replaced by the center point of the grid, indicating that the point is 
within the grid. Therefore, the coordinates of points in a grid are replaced by 
black points in the middle. This is the characteristic of data load.  At the 
same time of data load, carbondata will generate hash ID according to the 
coordinates of rows and columns of the grid. These hash IDs are the same as Z 
order when querying. Detailed conversion algorithm can refer to the design 
documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree. The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area and 
group photo in map area. When the query polygon area is not disjon from the 
grid center point, the grid is considered selected.  In the following figure, 
user select a quadrilateral polygon,  The grid with the center point in the 
region will generate a quadtree. A list of line with continuous properties will 
be generated in the query process, like [97->97  99->99  102->102  104->111  
120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  
225->225  228->229], each part of the list represents a continuous grid area. 
Carbondata use that line list to prune and filtered. About the detail can be 
search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the 
mode has been open. 
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 
'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional 
information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 
'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash', 
+'INDEX_HANDLER.mygeohash.type'='geohash',   
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',   
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.gridSize'='50',   
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',   
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',   
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',   
+'INDEX_HANDLER.mygeohash.conversionRatio'='100');
+```
+
+| **Property**  | **Description**  
|
+| - | :--- 
|
+| INDEX_HANDLER | Custom index handler. This handler allows user to  create a 
new column from the set of schema columns. Newly created column name  is same 
as that of handler name. Type and sourcecolumns propert

[GitHub] [carbondata] vikramahuja1001 commented on issue #3565: Changes to show metacache command

2020-01-09 Thread GitBox
vikramahuja1001 commented on issue #3565: Changes to show metacache command
URL: https://github.com/apache/carbondata/pull/3565#issuecomment-572517130
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index 
user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364685343
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been 
gridded when it is load into segments. A set of latitude and longitude 
represents a grid range, the size of the grid can be specified artificially. So 
the coordinates of the loaded points are often discrete and not continuous. 
+
+The grid and point relationship is like that black point is the middle of a 
grid, the red dot is just inside the grid. The red point is inside the grid, it 
can be replaced by the center point of the grid, indicating that the point is 
within the grid. Therefore, the coordinates of points in a grid are replaced by 
black points in the middle. This is the characteristic of data load.  At the 
same time of data load, carbondata will generate hash ID according to the 
coordinates of rows and columns of the grid. These hash IDs are the same as Z 
order when querying. Detailed conversion algorithm can refer to the design 
documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree. The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area and 
group photo in map area. When the query polygon area is not disjon from the 
grid center point, the grid is considered selected.  In the following figure, 
user select a quadrilateral polygon,  The grid with the center point in the 
region will generate a quadtree. A list of line with continuous properties will 
be generated in the query process, like [97->97  99->99  102->102  104->111  
120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  
225->225  228->229], each part of the list represents a continuous grid area. 
Carbondata use that line list to prune and filtered. About the detail can be 
search under https://issues.apache.org/jira/browse/CARBONDATA-3548
+
+![File Directory Structure](../docs/images/spatial-index-2.png?raw=true)
+
+
+
+## Installation and Deployment
+
+Build source with modules geo open, can open "pom.xml" and check whether the 
mode has been open. 
+
+![File Directory Structure](../docs/images/spatial-index-3.png?raw=true)
+
+Then you can get the "carbondata-geo-2.0.0-SNAPSHOT.jar" keep this jar and 
'jst-core.jar' to your carbonlib path.
+
+## Basic Command
+
+### Create Table
+
+spatial index need to appoint the source column and other regional 
information. carbon will create a Invisible hash id column.
+
+example
+
+```
+create table source_index(id BIGINT, latitude long, longitude long) stored by 
'carbondata' TBLPROPERTIES (
+'INDEX_HANDLER'='mygeohash', 
+'INDEX_HANDLER.mygeohash.type'='geohash',   
+'INDEX_HANDLER.mygeohash.sourcecolumns'='longitude, latitude',   
+'INDEX_HANDLER.mygeohash.originLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.gridSize'='50',   
+'INDEX_HANDLER.mygeohash.minLongitude'='1.811865',   
+'INDEX_HANDLER.mygeohash.maxLongitude'='2.782233',   
+'INDEX_HANDLER.mygeohash.minLatitude'='19.832277',   
+'INDEX_HANDLER.mygeohash.maxLatitude'='20.225281',   
+'INDEX_HANDLER.mygeohash.conversionRatio'='100');
+```
+
+| **Property**  | **Description**  
|
+| - | :--- 
|
+| INDEX_HANDLER | Custom index handler. This handler allows user to  create a 
new column from the set of schema columns. Newly created column name  is same 
as that of handler name. Type and sourcecolumns propert

[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index 
user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364684380
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been 
gridded when it is load into segments. A set of latitude and longitude 
represents a grid range, the size of the grid can be specified artificially. So 
the coordinates of the loaded points are often discrete and not continuous. 
+
+The grid and point relationship is like that black point is the middle of a 
grid, the red dot is just inside the grid. The red point is inside the grid, it 
can be replaced by the center point of the grid, indicating that the point is 
within the grid. Therefore, the coordinates of points in a grid are replaced by 
black points in the middle. This is the characteristic of data load.  At the 
same time of data load, carbondata will generate hash ID according to the 
coordinates of rows and columns of the grid. These hash IDs are the same as Z 
order when querying. Detailed conversion algorithm can refer to the design 
documents of spatial index.
+
+![File Directory Structure](../docs/images/spatial-index-1.png?raw=true)
+
+When querying, the user enters the true space polygon coordinates, carbondata 
use the polygon and spatial region information passed in when creating a table 
build a quad tree. The nodes in the quad tree are composed of hash ids 
generated by the row and column information projected in the polygon area and 
group photo in map area. When the query polygon area is not disjon from the 
grid center point, the grid is considered selected.  In the following figure, 
user select a quadrilateral polygon,  The grid with the center point in the 
region will generate a quadtree. A list of line with continuous properties will 
be generated in the query process, like [97->97  99->99  102->102  104->111  
120->120  122->123  151->151  157->158  159->159  192->208  210->210  216->216  
225->225  228->229], each part of the list represents a continuous grid area. 
Carbondata use that line list to prune and filtered. About the detail can be 
search under https://issues.apache.org/jira/browse/CARBONDATA-3548
 
 Review comment:
   "When the query polygon area is not disjon " - disjon can be changed to 
disjoint


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index 
user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364682173
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
+
+CarbonData implements a grid spatial index. It requires that the data has been 
gridded when it is load into segments. A set of latitude and longitude 
represents a grid range, the size of the grid can be specified artificially. So 
the coordinates of the loaded points are often discrete and not continuous. 
 
 Review comment:
   "data has been gridded" - This can be changed to "data has been arranged as 
grid"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index 
user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364681826
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   "it's also regionally continuous."  --> this is confusing and can be 
rephrased.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] chetandb commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
chetandb commented on a change in pull request #3520: [WIP]add spatial-index 
user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364681431
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
 
 Review comment:
   " What does carbondata implement spatial index" should be changed to  "How 
does carbondata implement spatial index"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3570: [CARBONDATA-3660] Fix FileNotFound error when concurrent loading

2020-01-09 Thread GitBox
asfgit closed pull request #3570: [CARBONDATA-3660] Fix FileNotFound error when 
concurrent loading
URL: https://github.com/apache/carbondata/pull/3570
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3660) Throw FileNotFoundException when concurrent loading

2020-01-09 Thread Zhi Liu (Jira)
Zhi Liu created CARBONDATA-3660:
---

 Summary: Throw FileNotFoundException when concurrent loading
 Key: CARBONDATA-3660
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3660
 Project: CarbonData
  Issue Type: Bug
Reporter: Zhi Liu


2020-01-09 14:42:47 ERROR CarbonFactDataWriterImplV3:390 - Problem while 
writing the index file2020-01-09 14:42:47 ERROR CarbonFactDataWriterImplV3:390 
- Problem while writing the index 
fileorg.apache.carbondata.core.datastore.exception.CarbonDataWriterException: 
Problem while copying file from local store to carbon store at 
org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2772)
 at 
org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2721)
 at 
org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.commitCurrentFile(AbstractFactDataWriter.java:277)
 at 
org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:387)
 at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:508)
 at 
org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:233)
 at 
org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:211)
 at 
org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:175)
 at 
org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:129)
 at 
org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:52)
 at 
org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.run(CarbonTableOutputFormat.java:278)
 at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at 
java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) 
at java.lang.Thread.run(Thread.java:748)Caused by: 
java.io.FileNotFoundException: 
D:\Workspace\Carbon-Master\integration\flink\target\test-classes\data\temp\8df45d4dc38449c69147083cdbe79e4d\part-0-156_batchno0-0-null-1578552167465.carbondata
 (系统找不到指定的路径。) at java.io.FileOutputStream.open0(Native Method) at 
java.io.FileOutputStream.open(FileOutputStream.java:270) at 
java.io.FileOutputStream.(FileOutputStream.java:213) at 
java.io.FileOutputStream.(FileOutputStream.java:101) at 
org.apache.carbondata.core.datastore.filesystem.LocalCarbonFile.getDataOutputStream(LocalCarbonFile.java:371)
 at 
org.apache.carbondata.core.datastore.filesystem.LocalCarbonFile.getDataOutputStream(LocalCarbonFile.java:365)
 at 
org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:231)
 at 
org.apache.carbondata.core.util.CarbonUtil.copyLocalFileToCarbonStore(CarbonUtil.java:2799)
 at 
org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2756)
 ... 15 more



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To 
Avoid Conflicts when concurrently write data by SDK
URL: https://github.com/apache/carbondata/pull/3567#issuecomment-572507478
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1564/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on issue #3570: [HOTFIX] Fix FileNotFound error when concurrent loading

2020-01-09 Thread GitBox
QiangCai commented on issue #3570: [HOTFIX] Fix FileNotFound error when 
concurrent loading
URL: https://github.com/apache/carbondata/pull/3570#issuecomment-572507213
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] QiangCai commented on issue #3393: [CARBONDATA-3503][Carbon2] Adapt to SparkSessionExtension

2020-01-09 Thread GitBox
QiangCai commented on issue #3393: [CARBONDATA-3503][Carbon2] Adapt to 
SparkSessionExtension
URL: https://github.com/apache/carbondata/pull/3393#issuecomment-572504556
 
 
   @ajithme  please rebase to the latest master and fix the comments


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Created] (CARBONDATA-3659) alluxio without host and port cannot read or write data in carbon.

2020-01-09 Thread Ravindra Pesala (Jira)
Ravindra Pesala created CARBONDATA-3659:
---

 Summary: alluxio without host and port cannot read or write data 
in carbon.
 Key: CARBONDATA-3659
 URL: https://issues.apache.org/jira/browse/CARBONDATA-3659
 Project: CarbonData
  Issue Type: New Feature
Reporter: Ravindra Pesala


When alluxio path is provided without host and port like 
alluxio:///user/warehouse then carbon cannot read or write data because of path 
comparison fails and extracting parent path fails.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] ravipesala opened a new pull request #3571: [CARBONDATA-3659] Fix issues with alluxio without host and port.

2020-01-09 Thread GitBox
ravipesala opened a new pull request #3571: [CARBONDATA-3659] Fix issues with 
alluxio without host and port.
URL: https://github.com/apache/carbondata/pull/3571
 
 
### Why is this PR needed?
When alluxio path is provided without host and port like 
alluxio:///user/warehouse then carbon cannot read or write data because of path 
comparison fails and extracting parent path fails.

### What changes were proposed in this PR?
 Use Path object to compare paths. And use string utils to extract the 
parent path.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3518: [DOC] add performance-tuning with codegen parameters support

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3518: [DOC] add performance-tuning with 
codegen parameters support
URL: https://github.com/apache/carbondata/pull/3518#issuecomment-572498675
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1563/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3570: [HOTFIX] Fix FileNotFound error when concurrent loading

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3570: [HOTFIX] Fix FileNotFound error when 
concurrent loading
URL: https://github.com/apache/carbondata/pull/3570#issuecomment-572496793
 
 
   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1560/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] asfgit closed pull request #3378: [CARBONDATA-3514] Support Spark 2.4.4 integration

2020-01-09 Thread GitBox
asfgit closed pull request #3378: [CARBONDATA-3514] Support Spark 2.4.4 
integration
URL: https://github.com/apache/carbondata/pull/3378
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] ravipesala commented on issue #3378: [CARBONDATA-3514] Support Spark 2.4.4 integration

2020-01-09 Thread GitBox
ravipesala commented on issue #3378: [CARBONDATA-3514] Support Spark 2.4.4 
integration
URL: https://github.com/apache/carbondata/pull/3378#issuecomment-572492636
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-572479378
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1561/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Zhangshunyu commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To Avoid Conflicts when concurrently write data by SDK

2020-01-09 Thread GitBox
Zhangshunyu commented on issue #3567: [CARBONDATA-3656] set Default TaskNo To 
Avoid Conflicts when concurrently write data by SDK
URL: https://github.com/apache/carbondata/pull/3567#issuecomment-572471972
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364629046
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
+
+There are many components that implement spatial indexing, like GeoSpark that 
use GeoMesa format for spatial query. now carbondata implements  a different 
way of spatial index, more like an UDF.  Its core is to use grid coordinates to 
generate coordinate based hash ID, like Z order, it's also regionally 
continuous.
 
 Review comment:
   1. N should be capital letter as it starts after full stop - "now carbondata"
   2. Remove double space at -  "implements  a"
   3. Remove double space after fullstop at text - "UDF.  Its"


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] Zhangshunyu commented on issue #3570: [HOTFIX] Fix FileNotFound error when concurrent loading

2020-01-09 Thread GitBox
Zhangshunyu commented on issue #3570: [HOTFIX] Fix FileNotFound error when 
concurrent loading
URL: https://github.com/apache/carbondata/pull/3570#issuecomment-572466070
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] kevinjmh commented on issue #3518: [DOC] add performance-tuning with codegen parameters support

2020-01-09 Thread GitBox
kevinjmh commented on issue #3518: [DOC] add performance-tuning with codegen 
parameters support
URL: https://github.com/apache/carbondata/pull/3518#issuecomment-572464562
 
 
   LGTM


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364625466
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
+
+
+
+## What does carbondata implement spatial index
 
 Review comment:
   Should be single #. It is a first level header.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364625182
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
+
+A spatial index is a data structure that allows for accessing a spatial object 
efficiently. It is a common technique used by spatial databases.  Without 
indexing, any search for a feature would require a "sequential scan" of every 
record in the database, resulting in much longer processing time. In a spatial 
index construction process, the minimum bounding rectangle serves as an object 
approximation. Various types of spatial indices across commercial and 
open-source databases yield measurable performance differences. Spatial 
indexing techniques are playing a central role in time-critical applications 
and the manipulation of spatial big data.
 
 Review comment:
   This whole section is lifted from 
https://gistbok.ucgis.org/topic-keywords/indexing
   Better give courtesy/citation to that material link to aviod plagiarism.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 commented on issue #3570: [HOTFIX] Fix FileNotFound error when concurrent loading

2020-01-09 Thread GitBox
niuge01 commented on issue #3570: [HOTFIX] Fix FileNotFound error when 
concurrent loading
URL: https://github.com/apache/carbondata/pull/3570#issuecomment-572462286
 
 
   please test this


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] niuge01 opened a new pull request #3570: [HOTFIX] Fix FileNotFound error when concurrent loading

2020-01-09 Thread GitBox
niuge01 opened a new pull request #3570: [HOTFIX] Fix FileNotFound error when 
concurrent loading
URL: https://github.com/apache/carbondata/pull/3570
 
 
### Why is this PR needed?
When multiple threads use SDK write data, they will transfer 
CarbonLoadModel parameter with the same Configuration object, the following 
process may use an CarbonLoadModel which set by other thread, this will lead to 
a series of problems, such as:
   2020-01-09 14:42:47 ERROR CarbonFactDataWriterImplV3:390 - Problem while 
writing the index file
   org.apache.carbondata.core.datastore.exception.CarbonDataWriterException: 
Problem while copying file from local store to carbon store
at 
org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2772)
at 
org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2721)
at 
org.apache.carbondata.processing.store.writer.AbstractFactDataWriter.commitCurrentFile(AbstractFactDataWriter.java:277)
at 
org.apache.carbondata.processing.store.writer.v3.CarbonFactDataWriterImplV3.closeWriter(CarbonFactDataWriterImplV3.java:387)
at 
org.apache.carbondata.processing.store.CarbonFactDataHandlerColumnar.closeHandler(CarbonFactDataHandlerColumnar.java:508)
at 
org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.processingComplete(CarbonRowDataWriterProcessorStepImpl.java:233)
at 
org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.finish(CarbonRowDataWriterProcessorStepImpl.java:211)
at 
org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.doExecute(CarbonRowDataWriterProcessorStepImpl.java:175)
at 
org.apache.carbondata.processing.loading.steps.CarbonRowDataWriterProcessorStepImpl.execute(CarbonRowDataWriterProcessorStepImpl.java:129)
at 
org.apache.carbondata.processing.loading.DataLoadExecutor.execute(DataLoadExecutor.java:52)
at 
org.apache.carbondata.hadoop.api.CarbonTableOutputFormat$1.run(CarbonTableOutputFormat.java:278)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
   Caused by: java.io.FileNotFoundException: 
D:\Workspace\Carbon-Master\integration\flink\target\test-classes\data\temp\8df45d4dc38449c69147083cdbe79e4d\part-0-156_batchno0-0-null-1578552167465.carbondata
 (系统找不到指定的路径。)
at java.io.FileOutputStream.open0(Native Method)
at java.io.FileOutputStream.open(FileOutputStream.java:270)
at java.io.FileOutputStream.(FileOutputStream.java:213)
at java.io.FileOutputStream.(FileOutputStream.java:101)
at 
org.apache.carbondata.core.datastore.filesystem.LocalCarbonFile.getDataOutputStream(LocalCarbonFile.java:371)
at 
org.apache.carbondata.core.datastore.filesystem.LocalCarbonFile.getDataOutputStream(LocalCarbonFile.java:365)
at 
org.apache.carbondata.core.datastore.impl.FileFactory.getDataOutputStream(FileFactory.java:231)
at 
org.apache.carbondata.core.util.CarbonUtil.copyLocalFileToCarbonStore(CarbonUtil.java:2799)
at 
org.apache.carbondata.core.util.CarbonUtil.copyCarbonDataFileToCarbonStorePath(CarbonUtil.java:2756)
... 15 more

### What changes were proposed in this PR?
   Different thread use different configuration object.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- No
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #3520: [WIP]add spatial-index user guid to doc

2020-01-09 Thread GitBox
VenuReddy2103 commented on a change in pull request #3520: [WIP]add 
spatial-index user guid to doc
URL: https://github.com/apache/carbondata/pull/3520#discussion_r364620267
 
 

 ##
 File path: docs/spatial-index-guide.md
 ##
 @@ -0,0 +1,94 @@
+
+
+## What is spatial index
 
 Review comment:
   This is the first header in doc. Should be single #


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[jira] [Commented] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011544#comment-17011544
 ] 

Venugopal Reddy K commented on CARBONDATA-3548:
---

Updated for
 # algorithm description.
 # Used IN filter expression with a LIST expression containing all the 
geohashIds to be filtered instead of RANGE filter expression as this improves 
the query performance significantly.

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf, 
> Geospatial Index Design Doc-OpenSource.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3548:
--
Attachment: (was: Geospatial Index Design Doc-OpenSource-Version 
2.0.pdf)

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf, 
> Geospatial Index Design Doc-OpenSource.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-3548) Support for Geospatial indexing

2020-01-09 Thread Venugopal Reddy K (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-3548?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Venugopal Reddy K updated CARBONDATA-3548:
--
Attachment: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf

> Support for Geospatial indexing
> ---
>
> Key: CARBONDATA-3548
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3548
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Venugopal Reddy K
>Priority: Major
> Attachments: Geospatial Index Design Doc-OpenSource-Version 2.0.pdf, 
> Geospatial Index Design Doc-OpenSource.pdf
>
>  Time Spent: 63h
>  Remaining Estimate: 0h
>
> In general, database may contain geographical location data. For instance, 
> Telecom operators require to perform analytics based on a particular region, 
> cell tower IDs(within a region) and/or may include geographical locations for 
> a particular period of time. At present, Carbon do not have native support to 
> store geographical locations/coordinates and to do filter queries based on 
> them. Yet, longitude and latitude of coordinates can be treated as 
> independent columns, sort hierarchically and store them.
>          But, when longitude and latitude are treated independently, 2D space 
> is linearized i.e., points in the two dimensional domain are ordered by 
> sorting first on longitide and then on latitude. Thus, data is not ordered by 
> geospatial proximity. Hence range queries require lot of IO operations and 
> query performance is degraded.
>         To alleviate it, we can use z-order curve to store geospatial data 
> points. This ensures that geographically nearer points are present at same 
> block/blocklet. This reduces the IO operations for range queries and improves 
> query performance. Also can support polygon queries of geodata. Attached 
> design document describes in detailed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later optimize insert.

2020-01-09 Thread GitBox
CarbonDataQA1 commented on issue #3538: [WIP] Separate Insert and load to later 
optimize insert.
URL: https://github.com/apache/carbondata/pull/3538#issuecomment-572439137
 
 
   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/1556/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services