[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-750728073


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3482/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-750726829


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5243/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] QiangCai commented on a change in pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-23 Thread GitBox


QiangCai commented on a change in pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#discussion_r548352547



##
File path: integration/spark/pom.xml
##
@@ -264,6 +269,18 @@
   junit
   test
 
+  

Review comment:
   1. remove redundant dependency
   2. scope is test?
   

##
File path: integration/spark/pom.xml
##
@@ -528,6 +545,22 @@
   
 
   
+  
+org.antlr
+antlr4-maven-plugin
+
+  
+
+  antlr4
+
+  
+
+
+  true
+  ../spark/src/main/antlr4

Review comment:
   how about src/main/antlr4?

##
File path: 
integration/spark/src/main/antlr4/org/apache/spark/sql/parser/CarbonSqlBase.g4
##
@@ -0,0 +1,1842 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * This file is an adaptation of Presto's 
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/CarbonSqlBase.g4 
grammar.

Review comment:
   this comment is incorrect

##
File path: 
integration/spark/src/main/antlr4/org/apache/spark/sql/parser/CarbonSqlBase.g4
##
@@ -0,0 +1,1842 @@
+/*
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ *
+ * This file is an adaptation of Presto's 
presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/CarbonSqlBase.g4 
grammar.
+ */
+
+grammar CarbonSqlBase;
+
+@parser::members {

Review comment:
   better to simplify this file, now we only use mergeInto part

##
File path: 
integration/spark/src/main/java/org/apache/spark/sql/CarbonAntlrSqlVisitor.java
##
@@ -0,0 +1,353 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.spark.sql.catalyst.expressions.Expression;
+import org.apache.spark.sql.catalyst.parser.ParseException;
+import org.apache.spark.sql.catalyst.parser.ParserInterface;
+import org.apache.spark.sql.execution.command.mutation.merge.DeleteAction;
+import org.apache.spark.sql.execution.command.mutation.merge.InsertAction;
+import org.apache.spark.sql.execution.command.mutation.merge.MergeAction;
+import org.apache.spark.sql.execution.command.mutation.merge.UpdateAction;
+import org.apache.spark.sql.merge.model.CarbonJoinExpression;
+import org.apache.spark.sql.merge.model.CarbonMergeIntoModel;
+import org.apache.spark.sql.merge.model.ColumnModel;
+import org.apache.spark.sql.merge.model.TableModel;
+import org.apache.spark.sql.parser.CarbonSqlBaseBaseVisitor;
+import org.apache.spark.sql.parser.CarbonSqlBaseParser;
+import org.apache.spark.util.SparkUtil;
+
+public class CarbonAntlrSqlVisitor extends CarbonSqlBaseBaseVisitor {
+
+  private final ParserInterface sparkParser;
+
+  public CarbonAntlrSqlVisitor(ParserInterface sparkParser) {
+this.sparkParser = sparkParser;
+  }
+
+  @Override
+  public String visitTableAlias(CarbonSqlBaseParser.TableAliasContext ctx) {
+if (null == ctx.children) {
+  return null;
+}
+String res = ctx.getChild(1).getText();
+System.out.println(res);

[GitHub] [carbondata] QiangCai commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command

2020-12-23 Thread GitBox


QiangCai commented on pull request #4032:
URL: https://github.com/apache/carbondata/pull/4032#issuecomment-750706348


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Commented] (CARBONDATA-4088) Drop metacache didn't clear some cache information which leads to memory leak

2020-12-23 Thread Akash R Nilugal (Jira)


[ 
https://issues.apache.org/jira/browse/CARBONDATA-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254093#comment-17254093
 ] 

Akash R Nilugal commented on CARBONDATA-4088:
-

please handle 
[https://issues.apache.org/jira/browse/CARBONDATA-4098|https://issues.apache.org/jira/browse/CARBONDATA-4098]

> Drop metacache didn't clear some cache information which leads to memory leak
> -
>
> Key: CARBONDATA-4088
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4088
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Minor
>  Time Spent: 6h
>  Remaining Estimate: 0h
>
> When there are two spark applications, one drop a table, some cache 
> information of this table stay in another application and cannot be removed 
> with any method like "Drop metacache" command. This leads to memory leak. 
> With the passage of time, memory leak will also accumulate which finally 
> leads to driver OOM. Following are the leak points: 1) tableModifiedTimeStore 
> in CarbonFileMetastore; 2) segmentLockMap in BlockletDataMapIndexStore; 3) 
> absoluteTableIdentifierByteMap in SegmentPropertiesAndSchemaHolder; 4) 
> tableInfoMap in CarbonMetadata.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] asfgit closed pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


asfgit closed pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 edited a comment on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


akashrn5 edited a comment on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750317465


   LGTM, please handle the 
https://issues.apache.org/jira/browse/CARBONDATA-4098 soon in another PR



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


akashrn5 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750317465


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750315770


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3481/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750314179


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5242/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Updated] (CARBONDATA-4089) Create table with location, if the location didn't have scheme, the default will be local file system, which is not the file system defined by defaultFS

2020-12-23 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4089:
--
Summary: Create table with location, if the location didn't have scheme, 
the default will be local file system, which is not the file system defined by 
defaultFS  (was: Create table with location, if the location didn't have 
schema, the default will be local file system, which is not the file system 
defined by defaultFS)

> Create table with location, if the location didn't have scheme, the default 
> will be local file system, which is not the file system defined by defaultFS
> 
>
> Key: CARBONDATA-4089
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4089
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Blocker
> Fix For: 2.2.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> If the location didn't specify schema, should use the file system defined by 
> defaultFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4089) Create table with location, if the location didn't have scheme, the default will be local file system, which is not the file system defined by defaultFS

2020-12-23 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4089:
--
Description: If the location didn't specify scheme, should use the file 
system defined by defaultFS.  (was: If the location didn't specify schema, 
should use the file system defined by defaultFS.)

> Create table with location, if the location didn't have scheme, the default 
> will be local file system, which is not the file system defined by defaultFS
> 
>
> Key: CARBONDATA-4089
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4089
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Blocker
> Fix For: 2.2.0
>
>  Time Spent: 2.5h
>  Remaining Estimate: 0h
>
> If the location didn't specify scheme, should use the file system defined by 
> defaultFS.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired

2020-12-23 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4098:
--
Priority: Minor  (was: Major)

> NullPointerException will be thrown during query if at the time carbon table 
> cache is being expired
> ---
>
> Key: CARBONDATA-4098
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4098
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Minor
>
> CARBONDATA-4088 introduce the expiring map to expire the carbon table cache 
> by time. But it will have one problem: during query, if cache is being 
> expired, queries on the table may fail with NullPointerException.
> If users don't want their queries to be failed, they need to choose a proper 
> value for the configuration carbon.metacache.expiration.seconds: at the time 
> of cache expired, the table will never be queried. For example, customer will 
> have a new table every day, and this table will be queried only at that day. 
> So he can choose 1 week as the property value so that this table cache will 
> only leak for one week and after one week will be cleared.
> So mostly this nullpointerexception will not happen if user chooses the 
> proper value of carbon.metacache.expiration.seconds. But still we will have 
> one Jira to track this seldom issue and maybe will fix it in future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Closed] (CARBONDATA-4090) After compaction interrupted accidentally, compact again and still fail, third time compaction can succeed but the new segment for SI table data size and index size i

2020-12-23 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu closed CARBONDATA-4090.
-
Resolution: Fixed

> After compaction interrupted accidentally, compact again and still fail, 
> third time compaction can succeed but the new segment for SI table data size 
> and index size is zero
> 
>
> Key: CARBONDATA-4090
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4090
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-load
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Minor
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> When compaction interrupted accidentally, the segment status is "Insert in 
> Progress" and there will be stale files left in the new segment folder. Next 
> time compaction will read the stale files. If some stale file is not 
> complete, then compaction will fail always.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired

2020-12-23 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4098:
--
Description: 
CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by 
time. But it will have one problem: during query, if cache is being expired, 
queries on the table may fail with NullPointerException.

If users don't want their queries to be failed, they need to choose a proper 
value for the configuration carbon.metacache.expiration.seconds: at the time of 
cache expired, the table will never be queried. For example, customer will have 
a new table every day, and this table will be queried only at that day. So he 
can choose 1 week as the property value so that this table cache will only leak 
for one week and after one week will be cleared.

So mostly this nullpointerexception will not happen if user chooses the proper 
value of carbon.metacache.expiration.seconds. But still we will have one Jira 
to track this seldom issue and maybe will fix it in future.

  was:
CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by 
time. But it will have one problem: during query, if cache is being expired, 
queries on the table may fail with NullPointerException.

If users don't want their queries to be failed, they need to choose a proper 
value for the configuration carbon.metacache.expiration.seconds: at the time of 
cache expired, the table will never be queried. For example, customer will have 
a new table every day, and this table will be queried only at that day. So he 
can choose 1 week as the property value so that this table cache will only leak 
for one week and after one week will be cleared.

So mostly this nullpointerexception will not happen if user chooses the proper 
value of carbon.metacache.expiration.seconds. But still we will have one Jira 
to track this seldom behavior and maybe will fix it in future.


> NullPointerException will be thrown during query if at the time carbon table 
> cache is being expired
> ---
>
> Key: CARBONDATA-4098
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4098
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Major
>
> CARBONDATA-4088 introduce the expiring map to expire the carbon table cache 
> by time. But it will have one problem: during query, if cache is being 
> expired, queries on the table may fail with NullPointerException.
> If users don't want their queries to be failed, they need to choose a proper 
> value for the configuration carbon.metacache.expiration.seconds: at the time 
> of cache expired, the table will never be queried. For example, customer will 
> have a new table every day, and this table will be queried only at that day. 
> So he can choose 1 week as the property value so that this table cache will 
> only leak for one week and after one week will be cleared.
> So mostly this nullpointerexception will not happen if user chooses the 
> proper value of carbon.metacache.expiration.seconds. But still we will have 
> one Jira to track this seldom issue and maybe will fix it in future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired

2020-12-23 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4098:
--
Description: 
CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by 
time. But it will have one problem: during query, if cache is being expired, 
queries on the table may fail with NullPointerException.

If users don't want their queries to be failed, they need to choose a proper 
value for the configuration carbon.metacache.expiration.seconds: at the time of 
cache expired, the table will never be queried. For example, customer will have 
a new table every day, and this table will be queried only at that day. So he 
can choose 1 week as the property value so that this table cache will only leak 
for one week and after one week will be cleared.

So mostly this nullpointerexception will not happen if user chooses the proper 
value of carbon.metacache.expiration.seconds. But still we will have one Jira 
to track this seldom behavior and maybe will fix it in future.

  was:
CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by 
time. But it will have one problem: during query, if cache is being expired, 
queries on the table may fail with NullPointerException.

If users don't want their queries to be failed, they need to choose a proper 
value for the configuration carbon.metacache.expiration.seconds: at the time of 
cache expired, the table will never be queried. For example, customer will have 
a new table every day, and this table will be queried only at that day. So he 
can choose 1 week as the property value so that this table cache will only leak 
for one week and after one week will be cleared.

So mostly this nullpointerexception will not happen if user chooses the proper 
value of carbon.metacache.expiration.seconds. But still we will have one Jira 
to track this seldom behavior and maybe in future will fix it.


> NullPointerException will be thrown during query if at the time carbon table 
> cache is being expired
> ---
>
> Key: CARBONDATA-4098
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4098
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Major
>
> CARBONDATA-4088 introduce the expiring map to expire the carbon table cache 
> by time. But it will have one problem: during query, if cache is being 
> expired, queries on the table may fail with NullPointerException.
> If users don't want their queries to be failed, they need to choose a proper 
> value for the configuration carbon.metacache.expiration.seconds: at the time 
> of cache expired, the table will never be queried. For example, customer will 
> have a new table every day, and this table will be queried only at that day. 
> So he can choose 1 week as the property value so that this table cache will 
> only leak for one week and after one week will be cleared.
> So mostly this nullpointerexception will not happen if user chooses the 
> proper value of carbon.metacache.expiration.seconds. But still we will have 
> one Jira to track this seldom behavior and maybe will fix it in future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired

2020-12-23 Thread Yahui Liu (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yahui Liu updated CARBONDATA-4098:
--
Description: 
CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by 
time. But it will have one problem: during query, if cache is being expired, 
queries on the table may fail with NullPointerException.

If users don't want their queries to be failed, they need to choose a proper 
value for the configuration carbon.metacache.expiration.seconds: at the time of 
cache expired, the table will never be queried. For example, customer will have 
a new table every day, and this table will be queried only at that day. So he 
can choose 1 week as the property value so that this table cache will only leak 
for one week and after one week will be cleared.

So mostly this nullpointerexception will not happen if user chooses the proper 
value of carbon.metacache.expiration.seconds. But still we will have one Jira 
to track this seldom behavior and maybe in future will fix it.

  was:CARBONDATA-4088 introduce the expiring map to expire the stale carbon 
table cache. But it will have a problem: during query, if 


> NullPointerException will be thrown during query if at the time carbon table 
> cache is being expired
> ---
>
> Key: CARBONDATA-4098
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4098
> Project: CarbonData
>  Issue Type: Bug
>  Components: sql
>Affects Versions: 2.1.0
>Reporter: Yahui Liu
>Priority: Major
>
> CARBONDATA-4088 introduce the expiring map to expire the carbon table cache 
> by time. But it will have one problem: during query, if cache is being 
> expired, queries on the table may fail with NullPointerException.
> If users don't want their queries to be failed, they need to choose a proper 
> value for the configuration carbon.metacache.expiration.seconds: at the time 
> of cache expired, the table will never be queried. For example, customer will 
> have a new table every day, and this table will be queried only at that day. 
> So he can choose 1 week as the property value so that this table cache will 
> only leak for one week and after one week will be cleared.
> So mostly this nullpointerexception will not happen if user chooses the 
> proper value of carbon.metacache.expiration.seconds. But still we will have 
> one Jira to track this seldom behavior and maybe in future will fix it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [CARBONDATA-4070] [CARBONDATA-4059] Fixed SI issues and improved FT.

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4037:
URL: https://github.com/apache/carbondata/pull/4037#issuecomment-750255128


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5241/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [CARBONDATA-4070] [CARBONDATA-4059] Fixed SI issues and improved FT.

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4037:
URL: https://github.com/apache/carbondata/pull/4037#issuecomment-750253944


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3480/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


kunal642 commented on pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750203466


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


kunal642 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547919082



##
File path: docs/configuration-parameters.md
##
@@ -149,6 +149,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** 
upto which the SDK pagination reader can cache the blocklet rows. Suggest to 
configure as multiple of blocklet size. Default value of -1 means there is no 
memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** 
upto which driver can cache partition metadata. Beyond this, least recently 
used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown._.column| empty | If order by 
column is in sort column, specify that sort column here to avoid ordering at 
map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in 
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in 
CarbonFileMetastore, after the time configured since last access to the cache 
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent 
access will refresh the timer. Default value of Long.MAX_VALUE means the cache 
will not be expired by time. **NOTE:** At the time when cache is being expired, 
queries on the table may fail with NullPointerException. |

Review comment:
   please fix the NPE in a separate PR





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] kunal642 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


kunal642 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547918942



##
File path: docs/configuration-parameters.md
##
@@ -149,6 +149,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** 
upto which the SDK pagination reader can cache the blocklet rows. Suggest to 
configure as multiple of blocklet size. Default value of -1 means there is no 
memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** 
upto which driver can cache partition metadata. Beyond this, least recently 
used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown._.column| empty | If order by 
column is in sort column, specify that sort column here to avoid ordering at 
map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in 
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in 
CarbonFileMetastore, after the time configured since last access to the cache 
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent 
access will refresh the timer. Default value of Long.MAX_VALUE means the cache 
will not be expired by time. **NOTE:** At the time when cache is being expired, 
queries on the table may fail with NullPointerException. |

Review comment:
   ok, ill merge this as the default behavior is not to remove cache.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] jack86596 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …

2020-12-23 Thread GitBox


jack86596 commented on a change in pull request #4057:
URL: https://github.com/apache/carbondata/pull/4057#discussion_r547918254



##
File path: docs/configuration-parameters.md
##
@@ -149,6 +149,7 @@ This section provides the details of all the configurations 
required for the Car
 | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** 
upto which the SDK pagination reader can cache the blocklet rows. Suggest to 
configure as multiple of blocklet size. Default value of -1 means there is no 
memory limit for caching. Only integer values greater than 0 are accepted. |
 | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** 
upto which driver can cache partition metadata. Beyond this, least recently 
used data will be removed from cache before loading new set of values.
 | carbon.mapOrderPushDown._.column| empty | If order by 
column is in sort column, specify that sort column here to avoid ordering at 
map task . |
+| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in 
seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in 
CarbonFileMetastore, after the time configured since last access to the cache 
entry, tableInfo and tableModifiedTime will be removed from each cache. Recent 
access will refresh the timer. Default value of Long.MAX_VALUE means the cache 
will not be expired by time. **NOTE:** At the time when cache is being expired, 
queries on the table may fail with NullPointerException. |

Review comment:
   @kunal642 https://issues.apache.org/jira/browse/CARBONDATA-4098 raised 
one jira to tack the null pointer issue.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Created] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired

2020-12-23 Thread Yahui Liu (Jira)
Yahui Liu created CARBONDATA-4098:
-

 Summary: NullPointerException will be thrown during query if at 
the time carbon table cache is being expired
 Key: CARBONDATA-4098
 URL: https://issues.apache.org/jira/browse/CARBONDATA-4098
 Project: CarbonData
  Issue Type: Bug
  Components: sql
Affects Versions: 2.1.0
Reporter: Yahui Liu


CARBONDATA-4088 introduce the expiring map to expire the stale carbon table 
cache. But it will have a problem: during query, if 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[GitHub] [carbondata] nihal0107 commented on a change in pull request #4037: [CARBONDATA-4070] [CARBONDATA-4059] Fixed SI issues and improved FT.

2020-12-23 Thread GitBox


nihal0107 commented on a change in pull request #4037:
URL: https://github.com/apache/carbondata/pull/4037#discussion_r547896201



##
File path: 
integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala
##
@@ -542,100 +542,6 @@ object SecondaryIndexUtil {
 indexToFactColMapping
   }
 
-  /**
-   * Identifies all segments which can be merged for compaction type - CUSTOM.
-   *
-   * @param sparkSession
-   * @param tableName
-   * @param dbName
-   * @param customSegments
-   * @return list of LoadMetadataDetails
-   * @throws UnsupportedOperationException   if customSegments is null or empty
-   */
-  def identifySegmentsToBeMergedCustom(sparkSession: SparkSession,

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4062: [CARBONDATA-4097] ColumnVectors should not be initialized as ColumnVectorWrapperDirect for alter tables.

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4062:
URL: https://github.com/apache/carbondata/pull/4062#issuecomment-750038993


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3477/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4062: [CARBONDATA-4097] ColumnVectors should not be initialized as ColumnVectorWrapperDirect for alter tables.

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4062:
URL: https://github.com/apache/carbondata/pull/4062#issuecomment-750035107


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5238/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[jira] [Resolved] (CARBONDATA-4095) Select Query with SI filter fails, when columnDrift is enabled

2020-12-23 Thread Akash R Nilugal (Jira)


 [ 
https://issues.apache.org/jira/browse/CARBONDATA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash R Nilugal resolved CARBONDATA-4095.
-
Fix Version/s: 2.2.0
 Assignee: Indhumathi Muthu Murugesh
   Resolution: Fixed

> Select Query with SI filter fails, when columnDrift is enabled
> --
>
> Key: CARBONDATA-4095
> URL: https://issues.apache.org/jira/browse/CARBONDATA-4095
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Indhumathi Muthu Murugesh
>Assignee: Indhumathi Muthu Murugesh
>Priority: Major
> Fix For: 2.2.0
>
>  Time Spent: 1.5h
>  Remaining Estimate: 0h
>
> sql({color:#067d17}"drop table if exists maintable"{color})
>  sql({color:#067d17}"create table maintable (a string,b string,c int,d int) 
> STORED AS carbondata "{color})
>  sql({color:#067d17}"insert into maintable values('k','d',2,3)"{color})
>  sql({color:#067d17}"alter table maintable set 
> tblproperties('sort_columns'='c,d','sort_scope'='local_sort')"{color})
>  sql({color:#067d17}"create index indextable on table maintable(b) AS 
> 'carbondata'"{color})
>  sql({color:#067d17}"insert into maintable values('k','x',2,4)"{color})
>  sql({color:#067d17}"select * from maintable where b='x'"{color}).show(false)
>  
>  
>  
>  
> 2020-12-22 18:58:37 ERROR Executor:91 - Exception in task 0.0 in stage 40.0 
> (TID 422)
> java.lang.RuntimeException: Error while resolving filter expression
>  at 
> org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:283)
>  at 
> org.apache.carbondata.core.index.IndexFilter.getResolver(IndexFilter.java:203)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:152)
>  at 
> org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382)
>  at 
> org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:43)
>  at 
> org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:141)
>  at 
> org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown
>  Source)
>  at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
>  Source)
>  at 
> org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
>  at 
> org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:631)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at 
> org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49)
>  at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324)
>  at org.apache.spark.rdd.RDD.iterator(RDD.scala:288)
>  at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
>  at org.apache.spark.scheduler.Task.run(Task.scala:109)
>  at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345)
>  at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>  at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>  at java.lang.Thread.run(Thread.java:748)
> Caused by: java.lang.NullPointerException
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverBasedOnExpressionType(FilterExpressionProcessor.java:190)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:128)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:121)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverTree(FilterExpressionProcessor.java:77)
>  at 
> org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:61)
>  at 
> org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:281)
>  ... 26 more
> 2020-12-22 18:58:37 ERROR TaskSetMan

[GitHub] [carbondata] asfgit closed pull request #4063: [CARBONDATA-4095] Fix Select Query with SI filter fails, when columnDrift is Set

2020-12-23 Thread GitBox


asfgit closed pull request #4063:
URL: https://github.com/apache/carbondata/pull/4063


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] akashrn5 commented on pull request #4063: [CARBONDATA-4095] Fix Select Query with SI filter fails, when columnDrift is Set

2020-12-23 Thread GitBox


akashrn5 commented on pull request #4063:
URL: https://github.com/apache/carbondata/pull/4063#issuecomment-750008809


   LGTM



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4063: [CARBONDATA-4095] Fix Select Query with SI filter fails, when columnDrift is Set

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4063:
URL: https://github.com/apache/carbondata/pull/4063#issuecomment-749996377


   Build Success with Spark 2.3.4, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5236/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4063: [CARBONDATA-4095] Fix Select Query with SI filter fails, when columnDrift is Set

2020-12-23 Thread GitBox


CarbonDataQA2 commented on pull request #4063:
URL: https://github.com/apache/carbondata/pull/4063#issuecomment-749996165


   Build Success with Spark 2.4.5, Please check CI 
http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3475/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org