[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command
CarbonDataQA2 commented on pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#issuecomment-750728073 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3482/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command
CarbonDataQA2 commented on pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#issuecomment-750726829 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5243/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] QiangCai commented on a change in pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command
QiangCai commented on a change in pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#discussion_r548352547 ## File path: integration/spark/pom.xml ## @@ -264,6 +269,18 @@ junit test + Review comment: 1. remove redundant dependency 2. scope is test? ## File path: integration/spark/pom.xml ## @@ -528,6 +545,22 @@ + +org.antlr +antlr4-maven-plugin + + + + antlr4 + + + + + true + ../spark/src/main/antlr4 Review comment: how about src/main/antlr4? ## File path: integration/spark/src/main/antlr4/org/apache/spark/sql/parser/CarbonSqlBase.g4 ## @@ -0,0 +1,1842 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * This file is an adaptation of Presto's presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/CarbonSqlBase.g4 grammar. Review comment: this comment is incorrect ## File path: integration/spark/src/main/antlr4/org/apache/spark/sql/parser/CarbonSqlBase.g4 ## @@ -0,0 +1,1842 @@ +/* + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + * + * This file is an adaptation of Presto's presto-parser/src/main/antlr4/com/facebook/presto/sql/parser/CarbonSqlBase.g4 grammar. + */ + +grammar CarbonSqlBase; + +@parser::members { Review comment: better to simplify this file, now we only use mergeInto part ## File path: integration/spark/src/main/java/org/apache/spark/sql/CarbonAntlrSqlVisitor.java ## @@ -0,0 +1,353 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql; + +import java.util.ArrayList; +import java.util.HashMap; +import java.util.List; +import java.util.Map; + +import org.apache.spark.sql.catalyst.expressions.Expression; +import org.apache.spark.sql.catalyst.parser.ParseException; +import org.apache.spark.sql.catalyst.parser.ParserInterface; +import org.apache.spark.sql.execution.command.mutation.merge.DeleteAction; +import org.apache.spark.sql.execution.command.mutation.merge.InsertAction; +import org.apache.spark.sql.execution.command.mutation.merge.MergeAction; +import org.apache.spark.sql.execution.command.mutation.merge.UpdateAction; +import org.apache.spark.sql.merge.model.CarbonJoinExpression; +import org.apache.spark.sql.merge.model.CarbonMergeIntoModel; +import org.apache.spark.sql.merge.model.ColumnModel; +import org.apache.spark.sql.merge.model.TableModel; +import org.apache.spark.sql.parser.CarbonSqlBaseBaseVisitor; +import org.apache.spark.sql.parser.CarbonSqlBaseParser; +import org.apache.spark.util.SparkUtil; + +public class CarbonAntlrSqlVisitor extends CarbonSqlBaseBaseVisitor { + + private final ParserInterface sparkParser; + + public CarbonAntlrSqlVisitor(ParserInterface sparkParser) { +this.sparkParser = sparkParser; + } + + @Override + public String visitTableAlias(CarbonSqlBaseParser.TableAliasContext ctx) { +if (null == ctx.children) { + return null; +} +String res = ctx.getChild(1).getText(); +System.out.println(res);
[GitHub] [carbondata] QiangCai commented on pull request #4032: [WIP][CARBONDATA-4065] Support MERGE INTO SQL Command
QiangCai commented on pull request #4032: URL: https://github.com/apache/carbondata/pull/4032#issuecomment-750706348 retest this please This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (CARBONDATA-4088) Drop metacache didn't clear some cache information which leads to memory leak
[ https://issues.apache.org/jira/browse/CARBONDATA-4088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17254093#comment-17254093 ] Akash R Nilugal commented on CARBONDATA-4088: - please handle [https://issues.apache.org/jira/browse/CARBONDATA-4098|https://issues.apache.org/jira/browse/CARBONDATA-4098] > Drop metacache didn't clear some cache information which leads to memory leak > - > > Key: CARBONDATA-4088 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4088 > Project: CarbonData > Issue Type: Improvement > Components: core >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Minor > Time Spent: 6h > Remaining Estimate: 0h > > When there are two spark applications, one drop a table, some cache > information of this table stay in another application and cannot be removed > with any method like "Drop metacache" command. This leads to memory leak. > With the passage of time, memory leak will also accumulate which finally > leads to driver OOM. Following are the leak points: 1) tableModifiedTimeStore > in CarbonFileMetastore; 2) segmentLockMap in BlockletDataMapIndexStore; 3) > absoluteTableIdentifierByteMap in SegmentPropertiesAndSchemaHolder; 4) > tableInfoMap in CarbonMetadata. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] asfgit closed pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
asfgit closed pull request #4057: URL: https://github.com/apache/carbondata/pull/4057 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 edited a comment on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
akashrn5 edited a comment on pull request #4057: URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750317465 LGTM, please handle the https://issues.apache.org/jira/browse/CARBONDATA-4098 soon in another PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
akashrn5 commented on pull request #4057: URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750317465 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
CarbonDataQA2 commented on pull request #4057: URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750315770 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3481/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
CarbonDataQA2 commented on pull request #4057: URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750314179 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5242/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (CARBONDATA-4089) Create table with location, if the location didn't have scheme, the default will be local file system, which is not the file system defined by defaultFS
[ https://issues.apache.org/jira/browse/CARBONDATA-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4089: -- Summary: Create table with location, if the location didn't have scheme, the default will be local file system, which is not the file system defined by defaultFS (was: Create table with location, if the location didn't have schema, the default will be local file system, which is not the file system defined by defaultFS) > Create table with location, if the location didn't have scheme, the default > will be local file system, which is not the file system defined by defaultFS > > > Key: CARBONDATA-4089 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4089 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Blocker > Fix For: 2.2.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > If the location didn't specify schema, should use the file system defined by > defaultFS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4089) Create table with location, if the location didn't have scheme, the default will be local file system, which is not the file system defined by defaultFS
[ https://issues.apache.org/jira/browse/CARBONDATA-4089?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4089: -- Description: If the location didn't specify scheme, should use the file system defined by defaultFS. (was: If the location didn't specify schema, should use the file system defined by defaultFS.) > Create table with location, if the location didn't have scheme, the default > will be local file system, which is not the file system defined by defaultFS > > > Key: CARBONDATA-4089 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4089 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Blocker > Fix For: 2.2.0 > > Time Spent: 2.5h > Remaining Estimate: 0h > > If the location didn't specify scheme, should use the file system defined by > defaultFS. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired
[ https://issues.apache.org/jira/browse/CARBONDATA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4098: -- Priority: Minor (was: Major) > NullPointerException will be thrown during query if at the time carbon table > cache is being expired > --- > > Key: CARBONDATA-4098 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4098 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Minor > > CARBONDATA-4088 introduce the expiring map to expire the carbon table cache > by time. But it will have one problem: during query, if cache is being > expired, queries on the table may fail with NullPointerException. > If users don't want their queries to be failed, they need to choose a proper > value for the configuration carbon.metacache.expiration.seconds: at the time > of cache expired, the table will never be queried. For example, customer will > have a new table every day, and this table will be queried only at that day. > So he can choose 1 week as the property value so that this table cache will > only leak for one week and after one week will be cleared. > So mostly this nullpointerexception will not happen if user chooses the > proper value of carbon.metacache.expiration.seconds. But still we will have > one Jira to track this seldom issue and maybe will fix it in future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Closed] (CARBONDATA-4090) After compaction interrupted accidentally, compact again and still fail, third time compaction can succeed but the new segment for SI table data size and index size i
[ https://issues.apache.org/jira/browse/CARBONDATA-4090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu closed CARBONDATA-4090. - Resolution: Fixed > After compaction interrupted accidentally, compact again and still fail, > third time compaction can succeed but the new segment for SI table data size > and index size is zero > > > Key: CARBONDATA-4090 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4090 > Project: CarbonData > Issue Type: Bug > Components: data-load >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Minor > Time Spent: 1h 50m > Remaining Estimate: 0h > > When compaction interrupted accidentally, the segment status is "Insert in > Progress" and there will be stale files left in the new segment folder. Next > time compaction will read the stale files. If some stale file is not > complete, then compaction will fail always. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired
[ https://issues.apache.org/jira/browse/CARBONDATA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4098: -- Description: CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by time. But it will have one problem: during query, if cache is being expired, queries on the table may fail with NullPointerException. If users don't want their queries to be failed, they need to choose a proper value for the configuration carbon.metacache.expiration.seconds: at the time of cache expired, the table will never be queried. For example, customer will have a new table every day, and this table will be queried only at that day. So he can choose 1 week as the property value so that this table cache will only leak for one week and after one week will be cleared. So mostly this nullpointerexception will not happen if user chooses the proper value of carbon.metacache.expiration.seconds. But still we will have one Jira to track this seldom issue and maybe will fix it in future. was: CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by time. But it will have one problem: during query, if cache is being expired, queries on the table may fail with NullPointerException. If users don't want their queries to be failed, they need to choose a proper value for the configuration carbon.metacache.expiration.seconds: at the time of cache expired, the table will never be queried. For example, customer will have a new table every day, and this table will be queried only at that day. So he can choose 1 week as the property value so that this table cache will only leak for one week and after one week will be cleared. So mostly this nullpointerexception will not happen if user chooses the proper value of carbon.metacache.expiration.seconds. But still we will have one Jira to track this seldom behavior and maybe will fix it in future. > NullPointerException will be thrown during query if at the time carbon table > cache is being expired > --- > > Key: CARBONDATA-4098 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4098 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Major > > CARBONDATA-4088 introduce the expiring map to expire the carbon table cache > by time. But it will have one problem: during query, if cache is being > expired, queries on the table may fail with NullPointerException. > If users don't want their queries to be failed, they need to choose a proper > value for the configuration carbon.metacache.expiration.seconds: at the time > of cache expired, the table will never be queried. For example, customer will > have a new table every day, and this table will be queried only at that day. > So he can choose 1 week as the property value so that this table cache will > only leak for one week and after one week will be cleared. > So mostly this nullpointerexception will not happen if user chooses the > proper value of carbon.metacache.expiration.seconds. But still we will have > one Jira to track this seldom issue and maybe will fix it in future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired
[ https://issues.apache.org/jira/browse/CARBONDATA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4098: -- Description: CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by time. But it will have one problem: during query, if cache is being expired, queries on the table may fail with NullPointerException. If users don't want their queries to be failed, they need to choose a proper value for the configuration carbon.metacache.expiration.seconds: at the time of cache expired, the table will never be queried. For example, customer will have a new table every day, and this table will be queried only at that day. So he can choose 1 week as the property value so that this table cache will only leak for one week and after one week will be cleared. So mostly this nullpointerexception will not happen if user chooses the proper value of carbon.metacache.expiration.seconds. But still we will have one Jira to track this seldom behavior and maybe will fix it in future. was: CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by time. But it will have one problem: during query, if cache is being expired, queries on the table may fail with NullPointerException. If users don't want their queries to be failed, they need to choose a proper value for the configuration carbon.metacache.expiration.seconds: at the time of cache expired, the table will never be queried. For example, customer will have a new table every day, and this table will be queried only at that day. So he can choose 1 week as the property value so that this table cache will only leak for one week and after one week will be cleared. So mostly this nullpointerexception will not happen if user chooses the proper value of carbon.metacache.expiration.seconds. But still we will have one Jira to track this seldom behavior and maybe in future will fix it. > NullPointerException will be thrown during query if at the time carbon table > cache is being expired > --- > > Key: CARBONDATA-4098 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4098 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Major > > CARBONDATA-4088 introduce the expiring map to expire the carbon table cache > by time. But it will have one problem: during query, if cache is being > expired, queries on the table may fail with NullPointerException. > If users don't want their queries to be failed, they need to choose a proper > value for the configuration carbon.metacache.expiration.seconds: at the time > of cache expired, the table will never be queried. For example, customer will > have a new table every day, and this table will be queried only at that day. > So he can choose 1 week as the property value so that this table cache will > only leak for one week and after one week will be cleared. > So mostly this nullpointerexception will not happen if user chooses the > proper value of carbon.metacache.expiration.seconds. But still we will have > one Jira to track this seldom behavior and maybe will fix it in future. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired
[ https://issues.apache.org/jira/browse/CARBONDATA-4098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yahui Liu updated CARBONDATA-4098: -- Description: CARBONDATA-4088 introduce the expiring map to expire the carbon table cache by time. But it will have one problem: during query, if cache is being expired, queries on the table may fail with NullPointerException. If users don't want their queries to be failed, they need to choose a proper value for the configuration carbon.metacache.expiration.seconds: at the time of cache expired, the table will never be queried. For example, customer will have a new table every day, and this table will be queried only at that day. So he can choose 1 week as the property value so that this table cache will only leak for one week and after one week will be cleared. So mostly this nullpointerexception will not happen if user chooses the proper value of carbon.metacache.expiration.seconds. But still we will have one Jira to track this seldom behavior and maybe in future will fix it. was:CARBONDATA-4088 introduce the expiring map to expire the stale carbon table cache. But it will have a problem: during query, if > NullPointerException will be thrown during query if at the time carbon table > cache is being expired > --- > > Key: CARBONDATA-4098 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4098 > Project: CarbonData > Issue Type: Bug > Components: sql >Affects Versions: 2.1.0 >Reporter: Yahui Liu >Priority: Major > > CARBONDATA-4088 introduce the expiring map to expire the carbon table cache > by time. But it will have one problem: during query, if cache is being > expired, queries on the table may fail with NullPointerException. > If users don't want their queries to be failed, they need to choose a proper > value for the configuration carbon.metacache.expiration.seconds: at the time > of cache expired, the table will never be queried. For example, customer will > have a new table every day, and this table will be queried only at that day. > So he can choose 1 week as the property value so that this table cache will > only leak for one week and after one week will be cleared. > So mostly this nullpointerexception will not happen if user chooses the > proper value of carbon.metacache.expiration.seconds. But still we will have > one Jira to track this seldom behavior and maybe in future will fix it. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [CARBONDATA-4070] [CARBONDATA-4059] Fixed SI issues and improved FT.
CarbonDataQA2 commented on pull request #4037: URL: https://github.com/apache/carbondata/pull/4037#issuecomment-750255128 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5241/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4037: [CARBONDATA-4070] [CARBONDATA-4059] Fixed SI issues and improved FT.
CarbonDataQA2 commented on pull request #4037: URL: https://github.com/apache/carbondata/pull/4037#issuecomment-750253944 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3480/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
kunal642 commented on pull request #4057: URL: https://github.com/apache/carbondata/pull/4057#issuecomment-750203466 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
kunal642 commented on a change in pull request #4057: URL: https://github.com/apache/carbondata/pull/4057#discussion_r547919082 ## File path: docs/configuration-parameters.md ## @@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. | | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values. | carbon.mapOrderPushDown._.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . | +| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. | Review comment: please fix the NPE in a separate PR This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] kunal642 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
kunal642 commented on a change in pull request #4057: URL: https://github.com/apache/carbondata/pull/4057#discussion_r547918942 ## File path: docs/configuration-parameters.md ## @@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. | | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values. | carbon.mapOrderPushDown._.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . | +| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. | Review comment: ok, ill merge this as the default behavior is not to remove cache. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] jack86596 commented on a change in pull request #4057: [CARBONDATA-4088] Drop metacache didn't clear some cache information …
jack86596 commented on a change in pull request #4057: URL: https://github.com/apache/carbondata/pull/4057#discussion_r547918254 ## File path: docs/configuration-parameters.md ## @@ -149,6 +149,7 @@ This section provides the details of all the configurations required for the Car | carbon.max.pagination.lru.cache.size.in.mb | -1 | Maximum memory **(in MB)** upto which the SDK pagination reader can cache the blocklet rows. Suggest to configure as multiple of blocklet size. Default value of -1 means there is no memory limit for caching. Only integer values greater than 0 are accepted. | | carbon.partition.max.driver.lru.cache.size | -1 | Maximum memory **(in MB)** upto which driver can cache partition metadata. Beyond this, least recently used data will be removed from cache before loading new set of values. | carbon.mapOrderPushDown._.column| empty | If order by column is in sort column, specify that sort column here to avoid ordering at map task . | +| carbon.metacache.expiration.seconds | Long.MAX_VALUE | Expiration time **(in seconds)** for tableInfo cache in CarbonMetadata and tableModifiedTime in CarbonFileMetastore, after the time configured since last access to the cache entry, tableInfo and tableModifiedTime will be removed from each cache. Recent access will refresh the timer. Default value of Long.MAX_VALUE means the cache will not be expired by time. **NOTE:** At the time when cache is being expired, queries on the table may fail with NullPointerException. | Review comment: @kunal642 https://issues.apache.org/jira/browse/CARBONDATA-4098 raised one jira to tack the null pointer issue. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (CARBONDATA-4098) NullPointerException will be thrown during query if at the time carbon table cache is being expired
Yahui Liu created CARBONDATA-4098: - Summary: NullPointerException will be thrown during query if at the time carbon table cache is being expired Key: CARBONDATA-4098 URL: https://issues.apache.org/jira/browse/CARBONDATA-4098 Project: CarbonData Issue Type: Bug Components: sql Affects Versions: 2.1.0 Reporter: Yahui Liu CARBONDATA-4088 introduce the expiring map to expire the stale carbon table cache. But it will have a problem: during query, if -- This message was sent by Atlassian Jira (v8.3.4#803005)
[GitHub] [carbondata] nihal0107 commented on a change in pull request #4037: [CARBONDATA-4070] [CARBONDATA-4059] Fixed SI issues and improved FT.
nihal0107 commented on a change in pull request #4037: URL: https://github.com/apache/carbondata/pull/4037#discussion_r547896201 ## File path: integration/spark/src/main/scala/org/apache/spark/sql/secondaryindex/util/SecondaryIndexUtil.scala ## @@ -542,100 +542,6 @@ object SecondaryIndexUtil { indexToFactColMapping } - /** - * Identifies all segments which can be merged for compaction type - CUSTOM. - * - * @param sparkSession - * @param tableName - * @param dbName - * @param customSegments - * @return list of LoadMetadataDetails - * @throws UnsupportedOperationException if customSegments is null or empty - */ - def identifySegmentsToBeMergedCustom(sparkSession: SparkSession, Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4062: [CARBONDATA-4097] ColumnVectors should not be initialized as ColumnVectorWrapperDirect for alter tables.
CarbonDataQA2 commented on pull request #4062: URL: https://github.com/apache/carbondata/pull/4062#issuecomment-750038993 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3477/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4062: [CARBONDATA-4097] ColumnVectors should not be initialized as ColumnVectorWrapperDirect for alter tables.
CarbonDataQA2 commented on pull request #4062: URL: https://github.com/apache/carbondata/pull/4062#issuecomment-750035107 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5238/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (CARBONDATA-4095) Select Query with SI filter fails, when columnDrift is enabled
[ https://issues.apache.org/jira/browse/CARBONDATA-4095?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Akash R Nilugal resolved CARBONDATA-4095. - Fix Version/s: 2.2.0 Assignee: Indhumathi Muthu Murugesh Resolution: Fixed > Select Query with SI filter fails, when columnDrift is enabled > -- > > Key: CARBONDATA-4095 > URL: https://issues.apache.org/jira/browse/CARBONDATA-4095 > Project: CarbonData > Issue Type: Improvement >Reporter: Indhumathi Muthu Murugesh >Assignee: Indhumathi Muthu Murugesh >Priority: Major > Fix For: 2.2.0 > > Time Spent: 1.5h > Remaining Estimate: 0h > > sql({color:#067d17}"drop table if exists maintable"{color}) > sql({color:#067d17}"create table maintable (a string,b string,c int,d int) > STORED AS carbondata "{color}) > sql({color:#067d17}"insert into maintable values('k','d',2,3)"{color}) > sql({color:#067d17}"alter table maintable set > tblproperties('sort_columns'='c,d','sort_scope'='local_sort')"{color}) > sql({color:#067d17}"create index indextable on table maintable(b) AS > 'carbondata'"{color}) > sql({color:#067d17}"insert into maintable values('k','x',2,4)"{color}) > sql({color:#067d17}"select * from maintable where b='x'"{color}).show(false) > > > > > 2020-12-22 18:58:37 ERROR Executor:91 - Exception in task 0.0 in stage 40.0 > (TID 422) > java.lang.RuntimeException: Error while resolving filter expression > at > org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:283) > at > org.apache.carbondata.core.index.IndexFilter.getResolver(IndexFilter.java:203) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.initQuery(AbstractQueryExecutor.java:152) > at > org.apache.carbondata.core.scan.executor.impl.AbstractQueryExecutor.getBlockExecutionInfos(AbstractQueryExecutor.java:382) > at > org.apache.carbondata.core.scan.executor.impl.VectorDetailQueryExecutor.execute(VectorDetailQueryExecutor.java:43) > at > org.apache.carbondata.spark.vectorreader.VectorizedCarbonRecordReader.initialize(VectorizedCarbonRecordReader.java:141) > at > org.apache.carbondata.spark.rdd.CarbonScanRDD$$anon$1.hasNext(CarbonScanRDD.scala:540) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.scan_nextBatch_0$(Unknown > Source) > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > at > org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$12$$anon$1.hasNext(WholeStageCodegenExec.scala:631) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:253) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > at > org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:836) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:49) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:324) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:288) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87) > at org.apache.spark.scheduler.Task.run(Task.scala:109) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:345) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: java.lang.NullPointerException > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverBasedOnExpressionType(FilterExpressionProcessor.java:190) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:128) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.createFilterResolverTree(FilterExpressionProcessor.java:121) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolverTree(FilterExpressionProcessor.java:77) > at > org.apache.carbondata.core.scan.filter.FilterExpressionProcessor.getFilterResolver(FilterExpressionProcessor.java:61) > at > org.apache.carbondata.core.index.IndexFilter.resolveFilter(IndexFilter.java:281) > ... 26 more > 2020-12-22 18:58:37 ERROR TaskSetMan
[GitHub] [carbondata] asfgit closed pull request #4063: [CARBONDATA-4095] Fix Select Query with SI filter fails, when columnDrift is Set
asfgit closed pull request #4063: URL: https://github.com/apache/carbondata/pull/4063 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] akashrn5 commented on pull request #4063: [CARBONDATA-4095] Fix Select Query with SI filter fails, when columnDrift is Set
akashrn5 commented on pull request #4063: URL: https://github.com/apache/carbondata/pull/4063#issuecomment-750008809 LGTM This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4063: [CARBONDATA-4095] Fix Select Query with SI filter fails, when columnDrift is Set
CarbonDataQA2 commented on pull request #4063: URL: https://github.com/apache/carbondata/pull/4063#issuecomment-749996377 Build Success with Spark 2.3.4, Please check CI http://121.244.95.60:12444/job/ApacheCarbonPRBuilder2.3/5236/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [carbondata] CarbonDataQA2 commented on pull request #4063: [CARBONDATA-4095] Fix Select Query with SI filter fails, when columnDrift is Set
CarbonDataQA2 commented on pull request #4063: URL: https://github.com/apache/carbondata/pull/4063#issuecomment-749996165 Build Success with Spark 2.4.5, Please check CI http://121.244.95.60:12444/job/ApacheCarbon_PR_Builder_2.4.5/3475/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org