[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-732722971


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3121/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-732721963


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4874/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732691284


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3120/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732686545


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4873/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


Karan980 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732682464


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Indhumathi27 commented on a change in pull request #4000: [CARBONDATA-4020] Fixed drop index when multiple index exists

2020-11-23 Thread GitBox


Indhumathi27 commented on a change in pull request #4000:
URL: https://github.com/apache/carbondata/pull/4000#discussion_r529229059



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/index/bloom/BloomCoarseGrainIndexFunctionSuite.scala
##
@@ -660,6 +660,22 @@ class BloomCoarseGrainIndexFunctionSuite
   sql(s"SELECT * FROM $normalTable WHERE salary='1040'"))
   }
 
+  test("test drop index when more than one bloom index exists") {
+sql(s"CREATE TABLE $bloomSampleTable " +
+  "(id int,name string,salary int)STORED as carbondata 
TBLPROPERTIES('SORT_COLUMNS'='id')")
+sql(s"CREATE index index1 ON TABLE $bloomSampleTable(id) as 'bloomfilter' 
" +
+  "PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 
'BLOOM_COMPRESS'='true')")
+sql(s"CREATE index index2 ON TABLE $bloomSampleTable (name) as 
'bloomfilter' " +
+  "PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 
'BLOOM_COMPRESS'='true')")
+sql(s"insert into $bloomSampleTable values(1,'nihal',20)")
+sql(s"SHOW INDEXES ON TABLE $bloomSampleTable").collect()
+checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, 
"index1")

Review comment:
   ```suggestion
   checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, 
"index1", "index2" )
   ```
   Remove next line

##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/index/bloom/BloomCoarseGrainIndexFunctionSuite.scala
##
@@ -660,6 +660,22 @@ class BloomCoarseGrainIndexFunctionSuite
   sql(s"SELECT * FROM $normalTable WHERE salary='1040'"))
   }
 
+  test("test drop index when more than one bloom index exists") {
+sql(s"CREATE TABLE $bloomSampleTable " +
+  "(id int,name string,salary int)STORED as carbondata 
TBLPROPERTIES('SORT_COLUMNS'='id')")
+sql(s"CREATE index index1 ON TABLE $bloomSampleTable(id) as 'bloomfilter' 
" +
+  "PROPERTIES ( 'BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 
'BLOOM_COMPRESS'='true')")
+sql(s"CREATE index index2 ON TABLE $bloomSampleTable (name) as 
'bloomfilter' " +
+  "PROPERTIES ('BLOOM_SIZE'='64', 'BLOOM_FPP'='0.1', 
'BLOOM_COMPRESS'='true')")
+sql(s"insert into $bloomSampleTable values(1,'nihal',20)")
+sql(s"SHOW INDEXES ON TABLE $bloomSampleTable").collect()
+checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, 
"index1")
+checkExistence(sql(s"SHOW INDEXES ON TABLE $bloomSampleTable"), true, 
"index2")
+sql(s"drop index index1 on $bloomSampleTable")
+sql(s"show indexes on table $bloomSampleTable").show()

Review comment:
   remove this line

##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/index/DropIndexCommand.scala
##
@@ -184,10 +184,10 @@ private[sql] case class DropIndexCommand(
 parentCarbonTable = getRefreshedParentTable(sparkSession, dbName)
 val indexMetadata = parentCarbonTable.getIndexMetadata
 if (null != indexMetadata && null != indexMetadata.getIndexesMap) {
-  val hasCgFgIndexes =
-!(indexMetadata.getIndexesMap.size() == 1 &&
-  
indexMetadata.getIndexesMap.containsKey(IndexType.SI.getIndexProviderName))
-  if (hasCgFgIndexes) {
+  val hasCgFgIndexes = indexMetadata.getIndexesMap.size() != 0 &&

Review comment:
   Please add a comment, on which case, we need to set 'indexExists' to 
false





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Closed] (CARBONDATA-4021) With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment.

2020-11-23 Thread Prasanna Ravichandran (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Prasanna Ravichandran closed CARBONDATA-4021.
-
Resolution: Not A Problem

> With Index server running, Upon executing count* we are getting the below 
> error, after adding the parquet and ORC segment. 
> ---
>
> Key: CARBONDATA-4021
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4021
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> We are getting below issues while index server enable and index server 
> fallback disable is configured as true. With count* we are getting the below 
> error, after adding the parquet and ORC segment.
> Queries and error:
> > use rps;
> +-+|
> Result  |
> +-+
> +-+
> No rows selected (0.054 seconds)
> > drop table if exists uniqdata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.229 seconds)
> > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) stored as carbondata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.756 seconds)
> > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> > table uniqdata 
> > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> INFO  : Execution ID: 95
> +-+|
> Result  |
> +-+
> +-+
> No rows selected(2.789 seconds)
>  > use default;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.052 seconds)
>  > drop table if exists uniqdata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (1.122 seconds)
> > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) stored as carbondata;
> +-+
> |  Result  |
> +-+
> +-+
> No rows selected (0.508 seconds)
> > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> > table uniqdata 
> > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> INFO  : Execution ID: 108
> +-+
> |Result|
> +-+
> +-+
> No rows selected (1.316 seconds)
> > drop table if exists uniqdata_parquet;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.668 seconds)
> > CREATE TABLE uniqdata_parquet (cust_id int,cust_name 
> > String,active_emui_version string, dob timestamp, doj timestamp, 
> > bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), 
> > decimal_column2 decimal(36,36),double_column1 double, double_column2 
> > double,integer_column1 int) stored as parquet;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.397 seconds)
> > insert into uniqdata_parquet select * from uniqdata;
> INFO  : Execution ID: 116
> +-+
> |Result|
> +-+
> +-+
> No rows selected (4.805 seconds)
> >  drop table if exists uniqdata_orc;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.553 seconds)
> > CREATE TABLE uniqdata_orc (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) using orc;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.396 seconds)
> > insert into uniqdata_orc select * from uniqdata;
> INFO  : Execution ID: 122
> +-+
> |Result|
> +-+
> +-+
> No rows selected (3.403 seconds)
> > use rps;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.06 seconds)
> > Alter table uniqdata add segment options 
> > ('path'='hdfs://hacluster/user/hive/warehouse/uniqdata_parquet','format'='parquet');
> INFO  : Execution ID: 126
> +-+
> |Result|
> +-+
> +-+
> No rows selected (1.511 seconds)
> > Alter table uniqdata add segment options 

[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-11-23 Thread GitBox


VenuReddy2103 commented on a change in pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#discussion_r529216859



##
File path: geo/src/main/java/org/apache/carbondata/geo/GeoConstants.java
##
@@ -26,4 +26,31 @@ private GeoConstants() {
 
   // GeoHash type Spatial Index
   public static final String GEOHASH = "geohash";
+
+  // Regular expression to parse input polygons for IN_POLYGON_LIST
+  // public static final String POLYGON_REG_EXPRESSION = "POLYGON 
\\(\\(.*?\\)\\)";
+  public static final String POLYGON_REG_EXPRESSION = "(?<=POLYGON 
\\(\\()(.*?)(?=(\\)\\)))";
+
+  // Regular expression to parse input polylines for IN_POLYLINE_LIST
+  public static final String POLYLINE_REG_EXPRESSION = "LINESTRING \\(.*?\\)";
+
+  // Regular expression to parse input rangelists for IN_POLYGON_RANGE_LIST
+  public static final String RANGELIST_REG_EXPRESSION = "(?<=RANGELIST 
\\()(.*?)(?=\\))";
+
+  // delimiter of input points or ranges
+  public static final String DEFAULT_DELIMITER = ",";
+
+  // conversion factor of angle to radian
+  public static final double CONVERT_FACTOR = 180.0;
+  // Earth radius
+  public static final double EARTH_RADIUS = 6371004.0;

Review comment:
   Can remove EARTH_RADIUS const def from GeoHashIndex.java now.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-732662834


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4871/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020#issuecomment-732662506


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3118/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


marchpure commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732653163


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4019:
URL: https://github.com/apache/carbondata/pull/4019#issuecomment-732651208


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3117/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732649478


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3116/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4019:
URL: https://github.com/apache/carbondata/pull/4019#issuecomment-732649284


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4870/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732648042


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4869/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] VenuReddy2103 commented on a change in pull request #4010: [CARBONDATA-4050]Avoid redundant RPC calls to get file status when CarbonFile is instantiated with fileStatus construct

2020-11-23 Thread GitBox


VenuReddy2103 commented on a change in pull request #4010:
URL: https://github.com/apache/carbondata/pull/4010#discussion_r529192088



##
File path: 
core/src/main/java/org/apache/carbondata/core/datastore/filesystem/AbstractDFSCarbonFile.java
##
@@ -541,7 +547,10 @@ public boolean createNewLockFile() throws IOException {
   @Override
   public String[] getLocations() throws IOException {
 BlockLocation[] blkLocations;
-FileStatus fileStatus = fileSystem.getFileStatus(path);
+FileStatus fileStatus = this.fileStatus;

Review comment:
   Please refer to reply in below comment.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Zhangshunyu opened a new pull request #4020: [CARBONDATA-4054] Support data size control for minor compaction

2020-11-23 Thread GitBox


Zhangshunyu opened a new pull request #4020:
URL: https://github.com/apache/carbondata/pull/4020


### Why is this PR needed?
Currentlly, minor compaction only consider the num of segments and major
   compaction only consider the SUM size of segments, but consider a scenario
   that the user want to use minor compaction by the num of segments but he
   dont want to merge the segment whose datasize larger the threshold for
   example 2GB, as it is no need to merge so much big segment and it is time
   costly.

### What changes were proposed in this PR?
   add a parameter to control the threshold of segment included
   in minor compaction, so that the user can specify the segment not included
   in minor compaction once the datasize exeed the threshold, system level and 
table level can be set, and if not set the use default
   value.
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-4051) Geo spatial index algorithm improvement and UDFs enhancement

2020-11-23 Thread Jiayu Shen (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiayu Shen updated CARBONDATA-4051:
---
Attachment: Genex Cloud Carbon Spatial Index Specification.docx

> Geo spatial index algorithm improvement and UDFs enhancement
> 
>
> Key: CARBONDATA-4051
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4051
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Jiayu Shen
>Priority: Minor
> Attachments: Genex Cloud Carbon Spatial Index 
> Specification.docx
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> The requirement is from SEQ,related algorithms are provided by group 
> Discovery.
> 1. Replace geohash encoded algorithm, and reduce required properties of 
> CREATE TABLE. For example,
> {code:java}
> CREATE TABLE geoTable(
>  timevalue BIGINT,
>  longitude LONG,
>  latitude LONG) COMMENT "This is a GeoTable"
>  STORED AS carbondata
>  TBLPROPERTIES ($customProperties 'SPATIAL_INDEX'='mygeohash',
>  'SPATIAL_INDEX.mygeohash.type'='geohash',
>  'SPATIAL_INDEX.mygeohash.sourcecolumns'='longitude, latitude',
>  'SPATIAL_INDEX.mygeohash.originLatitude'='39.832277',
>  'SPATIAL_INDEX.mygeohash.gridSize'='50',
>  'SPATIAL_INDEX.mygeohash.conversionRatio'='100'){code}
> 2. Add geo query UDFs
> query filter UDFs :
>  * _*InPolygonList (List polygonList, OperationType opType)*_
>  * _*InPolylineList (List polylineList, Float bufferInMeter)*_
>  * _*InPolygonRangeList (List RangeList, **OperationType opType**)*_
> *operation only support :*
>  * *"OR", means calculating union of two polygons*
>  * *"AND", means calculating intersection of two polygons*
> geo util UDFs :
>  * _*GeoIdToGridXy(Long geoId) :* *Pair*_
>  * _*LatLngToGeoId(**Long* *latitude, Long* *longitude) : Long*_
>  * _*GeoIdToLatLng(Long geoId) : Pair*_
>  * _*ToUpperLayerGeoId(Long geoId) : Long*_
>  * _*ToRangeList (String polygon) : List*_
> 3. Currently GeoID is a column created internally for spatial tables, this PR 
> will support GeoID column to be customized during LOAD/INSERT INTO. For 
> example, 
> {code:java}
> INSERT INTO geoTable SELECT 0,157542840,116285807,40084087;
> It uesed to be as below, '855280799612' is generated internally,
> ++-+-++
> |mygeohash  |timevalue   |longitude|latitude|
> ++-+-++
> |855280799612|157542840|116285807|40084087|
> ++-+-++
> but now is
> ++-+-++
> |mygeohash  |timevalue  |longitude|latitude|
> ++-+-++
> |0   |157542840|116285807|40084087|
> ++-+-++{code}
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"

2020-11-23 Thread GitBox


ajantha-bhat commented on a change in pull request #4019:
URL: https://github.com/apache/carbondata/pull/4019#discussion_r529175578



##
File path: 
integration/spark/src/main/scala/org/apache/spark/util/AlterTableUtil.scala
##
@@ -332,12 +332,26 @@ object AlterTableUtil {
   tableProperties: mutable.Map[String, String],
   oldColumnName: String,
   newColumnName: String): Unit = {
+val columnProperties = Seq("NO_INVERTED_INDEX",
+  "INVERTED_INDEX",
+  "INDEX_COLUMNS",
+  "COLUMN_META_CACHE",
+  "DICTIONARY_INCLUDE",

Review comment:
   Also INDEX_COLUMNS are changed to SPATIAL_INDEX I think, refer 
docs/spatial-index-guide.md





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"

2020-11-23 Thread GitBox


ajantha-bhat commented on a change in pull request #4019:
URL: https://github.com/apache/carbondata/pull/4019#discussion_r529174796



##
File path: 
integration/spark/src/test/scala/org/apache/spark/carbondata/restructure/vectorreader/AlterTableColumnRenameTestCase.scala
##
@@ -38,6 +38,15 @@ class AlterTableColumnRenameTestCase extends QueryTest with 
BeforeAndAfterAll {
 assert(null == carbonTable.getColumnByName("empname"))
   }
 
+  test("CARBONDATA-4053 test rename column when column name is a") {
+sql("create table simple_table(a int) stored as carbondata")
+sql("alter table simple_table change a a1 int")
+val carbonTable = CarbonMetadata.getInstance().getCarbonTable("default", 
"simple_table")

Review comment:
   This issue happens when you have table properties say SORT_COLUMNS = 
"a,b"
   Then the column name changed from a to a1, then it was changing SORT_COLUMNS 
= "a1" instead of SORT_COLUMNS = "a1,b". So, your current testcase cannot 
reproduce issue. so, add the testcase that gives issue without the change by 
keeping some table properties. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"

2020-11-23 Thread GitBox


ajantha-bhat commented on a change in pull request #4019:
URL: https://github.com/apache/carbondata/pull/4019#discussion_r529174200



##
File path: 
integration/spark/src/main/scala/org/apache/spark/util/AlterTableUtil.scala
##
@@ -332,12 +332,26 @@ object AlterTableUtil {
   tableProperties: mutable.Map[String, String],
   oldColumnName: String,
   newColumnName: String): Unit = {
+val columnProperties = Seq("NO_INVERTED_INDEX",
+  "INVERTED_INDEX",
+  "INDEX_COLUMNS",
+  "COLUMN_META_CACHE",
+  "DICTIONARY_INCLUDE",

Review comment:
   we have removed  DICTIONARY_INCLUDE, DICTIONARY_EXCLUDE from 2.0, please 
check the applicable table properties and keep only those also add if something 
new added in 2.0





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 opened a new pull request #4019: [CARBONDATA-4053] Fix alter table rename column failed when column name is "a"

2020-11-23 Thread GitBox


jack86596 opened a new pull request #4019:
URL: https://github.com/apache/carbondata/pull/4019


### Why is this PR needed?
   Alter table rename column failed because incorrectly replace the content in 
tblproperties by new column name, which the content is not related to column 
name.

### What changes were proposed in this PR?
   Instead of calling replace method on property value directly, first filter 
out the properties which related to column name, then find the matched old 
column name, replace it with new name.
   
### Does this PR introduce any user interface change?
- No
   
### Is any new testcase added?
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732553078


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3115/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732552402


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4868/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732364295


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3114/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732363870


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4867/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (CARBONDATA-3896) Throw an exception using an index server query

2020-11-23 Thread Karan (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-3896?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237544#comment-17237544
 ] 

Karan commented on CARBONDATA-3896:
---

Please share the query details upon which you are getting above error.

> Throw an exception using an index server query
> --
>
> Key: CARBONDATA-3896
> URL: https://issues.apache.org/jira/browse/CARBONDATA-3896
> Project: CarbonData
>  Issue Type: Bug
>  Components: core
>Affects Versions: 1.6.1
>Reporter: li
>Priority: Major
> Fix For: 1.6.1
>
>
> 2020-07-10 10:49:02 WARN Server:1853 - Unable to read call parameters for 
> client 10.10.151.15on connection protocol Server for rpcKind RPC_WRITABLE
> java.io.EOFException
>  at java.io.DataInputStream.readFully(DataInputStream.java:197)
>  at java.io.DataInputStream.readUTF(DataInputStream.java:609)
>  at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>  at 
> org.apache.carbondata.core.datamap.DistributableDataMapFormat.readFields(DistributableDataMapFormat.java:286)
>  at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
>  at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:161)
>  at 
> org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1851)
>  at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1783)
>  at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1541)
>  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)
>  at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)
>  at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)
> 2020-07-10 10:49:02 INFO Server:780 - Socket Reader #1 for port 9596: 
> readAndProcess from client 10.10.151.15 threw exception 
> [org.apache.hadoop.ipc.RpcServerException: IPC server unable to read call 
> parameters: null]
> 2020-07-10 10:50:00 WARN Server:1853 - Unable to read call parameters for 
> client 10.10.151.15on connection protocol Server for rpcKind RPC_WRITABLE
> java.io.EOFException
>  at java.io.DataInputStream.readFully(DataInputStream.java:197)
>  at java.io.DataInputStream.readUTF(DataInputStream.java:609)
>  at java.io.DataInputStream.readUTF(DataInputStream.java:564)
>  at 
> org.apache.carbondata.core.datamap.DistributableDataMapFormat.readFields(DistributableDataMapFormat.java:286)
>  at org.apache.hadoop.io.ObjectWritable.readObject(ObjectWritable.java:285)
>  at 
> org.apache.hadoop.ipc.WritableRpcEngine$Invocation.readFields(WritableRpcEngine.java:161)
>  at 
> org.apache.hadoop.ipc.Server$Connection.processRpcRequest(Server.java:1851)
>  at org.apache.hadoop.ipc.Server$Connection.processOneRpc(Server.java:1783)
>  at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1541)
>  at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:771)
>  at org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:637)
>  at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:608)
> 2020-07-10 10:50:00 INFO Server:780 - Socket Reader #1 for port 9596: 
> readAndProcess from client 10.10.151.15 threw exception 
> [org.apache.hadoop.ipc.RpcServerException: IPC server unable to read call 
> parameters: null]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (CARBONDATA-4021) With Index server running, Upon executing count* we are getting the below error, after adding the parquet and ORC segment.

2020-11-23 Thread Karan (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4021?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17237543#comment-17237543
 ] 

Karan commented on CARBONDATA-4021:
---

Caching parquet or ORC segments in Index Server is not supported. Please do not 
enable index server while querying on carbonn data table having parquet or ORC 
segments. Even if index Server is ON, please make sure that fallback is not 
disabled.

> With Index server running, Upon executing count* we are getting the below 
> error, after adding the parquet and ORC segment. 
> ---
>
> Key: CARBONDATA-4021
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4021
> Project: CarbonData
>  Issue Type: Bug
>Affects Versions: 2.0.0
>Reporter: Prasanna Ravichandran
>Priority: Major
>
> We are getting below issues while index server enable and index server 
> fallback disable is configured as true. With count* we are getting the below 
> error, after adding the parquet and ORC segment.
> Queries and error:
> > use rps;
> +-+|
> Result  |
> +-+
> +-+
> No rows selected (0.054 seconds)
> > drop table if exists uniqdata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.229 seconds)
> > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) stored as carbondata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.756 seconds)
> > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> > table uniqdata 
> > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> INFO  : Execution ID: 95
> +-+|
> Result  |
> +-+
> +-+
> No rows selected(2.789 seconds)
>  > use default;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.052 seconds)
>  > drop table if exists uniqdata;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (1.122 seconds)
> > CREATE TABLE uniqdata (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) stored as carbondata;
> +-+
> |  Result  |
> +-+
> +-+
> No rows selected (0.508 seconds)
> > load data inpath 'hdfs://hacluster/user/prasanna/2000_UniqData.csv' into 
> > table uniqdata 
> > options('fileheader'='cust_id,cust_name,active_emui_version,dob,doj,bigint_column1,bigint_column2,decimal_column1,decimal_column2,double_column1,double_column2,integer_column1','bad_records_action'='force');
> INFO  : Execution ID: 108
> +-+
> |Result|
> +-+
> +-+
> No rows selected (1.316 seconds)
> > drop table if exists uniqdata_parquet;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.668 seconds)
> > CREATE TABLE uniqdata_parquet (cust_id int,cust_name 
> > String,active_emui_version string, dob timestamp, doj timestamp, 
> > bigint_column1 bigint,bigint_column2 bigint,decimal_column1 decimal(30,10), 
> > decimal_column2 decimal(36,36),double_column1 double, double_column2 
> > double,integer_column1 int) stored as parquet;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.397 seconds)
> > insert into uniqdata_parquet select * from uniqdata;
> INFO  : Execution ID: 116
> +-+
> |Result|
> +-+
> +-+
> No rows selected (4.805 seconds)
> >  drop table if exists uniqdata_orc;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.553 seconds)
> > CREATE TABLE uniqdata_orc (cust_id int,cust_name String,active_emui_version 
> > string, dob timestamp, doj timestamp, bigint_column1 bigint,bigint_column2 
> > bigint,decimal_column1 decimal(30,10), decimal_column2 
> > decimal(36,36),double_column1 double, double_column2 double,integer_column1 
> > int) using orc;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.396 seconds)
> > insert into uniqdata_orc select * from uniqdata;
> INFO  : Execution ID: 122
> +-+
> |Result|
> +-+
> +-+
> No rows selected (3.403 seconds)
> > use rps;
> +-+
> |Result|
> +-+
> +-+
> No rows selected (0.06 seconds)
> > Alter table uniqdata add segment options 
> > 

[GitHub] [carbondata] ajantha-bhat commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


ajantha-bhat commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732304718


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] ajantha-bhat commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


ajantha-bhat commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528870553



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job, 
AbsoluteTableIdentifi
   public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
 this.readCommittedScope = readCommittedScope;
   }
+
+  public String getSegmentIdFromFilePath(String filePath) {

Review comment:
   ok





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [wip]test

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-732267359


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3113/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732264929


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3112/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4018: [wip]test

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018#issuecomment-732263808


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4866/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732260135


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4865/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4013: [WIP] Remove automatic data cleaning function from all features

2020-11-23 Thread GitBox


akashrn5 commented on pull request #4013:
URL: https://github.com/apache/carbondata/pull/4013#issuecomment-732219124


   @QiangCai i think we need to wait before considering these changes, because 
already multiple people are working on same area



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] marchpure commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-11-23 Thread GitBox


marchpure commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732209550


   @MarvinLitt  Please review



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 opened a new pull request #4018: [wip]test

2020-11-23 Thread GitBox


akashrn5 opened a new pull request #4018:
URL: https://github.com/apache/carbondata/pull/4018


### Why is this PR needed?


### What changes were proposed in this PR?
   
   
### Does this PR introduce any user interface change?
- No
- Yes. (please explain the change and update document)
   
### Is any new testcase added?
- No
- Yes
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732162102


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3111/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732158583


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4864/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732105514


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4863/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732104984


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3110/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4017: [CARBONDATA-4022] Fix invalid path issue for segment added through alter table add segment query.

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4017:
URL: https://github.com/apache/carbondata/pull/4017#issuecomment-732098794


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3107/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4017: [CARBONDATA-4022] Fix invalid path issue for segment added through alter table add segment query.

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4017:
URL: https://github.com/apache/carbondata/pull/4017#issuecomment-732095318


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4860/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732088276


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3106/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#issuecomment-732084303


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4859/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732048532


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3108/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732046920


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4861/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on a change in pull request #4017: [CARBONDATA-4022] Fix invalid path issue for segment added through alter table add segment query.

2020-11-23 Thread GitBox


Karan980 commented on a change in pull request #4017:
URL: https://github.com/apache/carbondata/pull/4017#discussion_r528569260



##
File path: 
core/src/main/java/org/apache/carbondata/core/indexstore/ExtendedBlocklet.java
##
@@ -219,7 +220,13 @@ public void deserializeFields(DataInput in, String[] 
locations, String tablePath
 if (in.readBoolean()) {
   indexUniqueId = in.readUTF();
 }
-setFilePath(tablePath + getPath());
+String filePath = getPath();
+if (filePath.startsWith(CarbonCommonConstants.FILE_SEPARATOR) ||

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732037653


   Build Failed  with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3103/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4004: [WIP] update benchmark

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4004:
URL: https://github.com/apache/carbondata/pull/4004#issuecomment-732032392


   Build Failed  with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4856/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


Karan980 commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528558754



##
File path: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/addsegment/AddSegmentTestCase.scala
##
@@ -781,6 +781,26 @@ class AddSegmentTestCase extends QueryTest with 
BeforeAndAfterAll {
 sql(s"drop table $tableName")
   }
 
+  test("Test add segment by carbon written by sdk having old timestamp") {
+sql(s"drop table if exists external_primitive")
+sql(
+  s"""
+ |create table external_primitive (id int, name string, rank smallint, 
salary double,
+ | active boolean, dob date, doj timestamp, city string, dept string) 
stored as carbondata
+ |""".stripMargin)
+val externalSegmentPathWithOldTimestamp = storeLocation + "/" +
+  "external_segment_with_old_timestamp"
+val externalSegmentPath = storeLocation + "/" + "external_segment"
+FileFactory.deleteAllFilesOfDir(new File(externalSegmentPath))
+copy(externalSegmentPathWithOldTimestamp, externalSegmentPath)

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


Karan980 commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528558123



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##
@@ -572,4 +574,18 @@ public ReadCommittedScope getReadCommitted(JobContext job, 
AbsoluteTableIdentifi
   public void setReadCommittedScope(ReadCommittedScope readCommittedScope) {
 this.readCommittedScope = readCommittedScope;
   }
+
+  public String getSegmentIdFromFilePath(String filePath) {

Review comment:
   getSegmentId() also return segmentId from LoadMetaDataDetails of segment 
which is not null. Only null for segmentId is present in name of carbondata 
file of SDK segment and there is no already existing method to get that.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] Karan980 commented on a change in pull request #4009: [CARBONDATA-4029] Fix Old Timestamp issue in alter add segement

2020-11-23 Thread GitBox


Karan980 commented on a change in pull request #4009:
URL: https://github.com/apache/carbondata/pull/4009#discussion_r528556523



##
File path: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
##
@@ -367,8 +368,9 @@ protected FileSplit makeSplit(String segmentId, String 
filePath, long start, lon
   String[] deleteDeltaFilePath = null;
   if (isIUDTable) {
 // In case IUD is not performed in this table avoid searching for
-// invalidated blocks.
-if (CarbonUtil
+// invalidated blocks. No need to check validation for splits written 
by SDK.
+String segmentId = getSegmentIdFromFilePath(inputSplit.getFilePath());

Review comment:
   Done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732021580


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4854/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4012: [CARBONDATA-4051] Geo spatial index algorithm improvement and UDFs enhancement

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4012:
URL: https://github.com/apache/carbondata/pull/4012#issuecomment-732021180


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3101/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-4054) Size control of minor compaction

2020-11-23 Thread ZHANGSHUNYU (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZHANGSHUNYU updated CARBONDATA-4054:

Description: 
{{Currentlly, minor compaction only consider the num of segments and major}}

compaction only consider the SUM size of segments, but consider a scenario
 that the user want to use minor compaction by the num of segments but he
 dont want to merge the segment whose datasize larger the threshold for
 example 2GB, as it is no need to merge so much big segment and it is time
 costly.
 so we need to add a parameter to control the threshold of segment included
 in minor compaction, so that the user can specify the segment not included
 in minor compaction once the datasize exeed the threshold, of course default
 value must be threre.

  was:
h1. Currentlly, minor compaction only consider the num of segments and major
compaction only consider the SUM size of segments, but consider a scenario
that the user want to use minor compaction by the num of segments but he
dont want to merge the segment whose datasize larger the threshold for
example 2GB, as it is no need to merge so much big segment and it is time
costly.
so we need to add a parameter to control the threshold of segment included
in minor compaction, so that the user can specify the segment not included
in minor compaction once the datasize exeed the threshold, of course default
value must be threre.


> Size control of minor compaction
> 
>
> Key: CARBONDATA-4054
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4054
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: ZHANGSHUNYU
>Priority: Major
>
> {{Currentlly, minor compaction only consider the num of segments and major}}
> compaction only consider the SUM size of segments, but consider a scenario
>  that the user want to use minor compaction by the num of segments but he
>  dont want to merge the segment whose datasize larger the threshold for
>  example 2GB, as it is no need to merge so much big segment and it is time
>  costly.
>  so we need to add a parameter to control the threshold of segment included
>  in minor compaction, so that the user can specify the segment not included
>  in minor compaction once the datasize exeed the threshold, of course default
>  value must be threre.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (CARBONDATA-4054) Size control of minor compaction

2020-11-23 Thread ZHANGSHUNYU (Jira)
ZHANGSHUNYU created CARBONDATA-4054:
---

 Summary: Size control of minor compaction
 Key: CARBONDATA-4054
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4054
 Project: CarbonData
  Issue Type: Improvement
Reporter: ZHANGSHUNYU


h1. Currentlly, minor compaction only consider the num of segments and major
compaction only consider the SUM size of segments, but consider a scenario
that the user want to use minor compaction by the num of segments but he
dont want to merge the segment whose datasize larger the threshold for
example 2GB, as it is no need to merge so much big segment and it is time
costly.
so we need to add a parameter to control the threshold of segment included
in minor compaction, so that the user can specify the segment not included
in minor compaction once the datasize exeed the threshold, of course default
value must be threre.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-732002235


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbon_PR_Builder_2.4.5/3100/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4015: [CARBONDATA-4052] Handled insert overwrite scenario for SI

2020-11-23 Thread GitBox


CarbonDataQA2 commented on pull request #4015:
URL: https://github.com/apache/carbondata/pull/4015#issuecomment-731997987


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12545/job/ApacheCarbonPRBuilder2.3/4853/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org