[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
Github user KanakaKumar commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3059#discussion_r246286475 --- Diff: processing/src/main/java/org/apache/carbondata/processing/util/CarbonLoaderUtil.java --- @@ -1164,4 +1156,35 @@ private static void deleteFiles(List filesToBeDeleted) throws IOExceptio FileFactory.deleteFile(filePath, FileFactory.getFileType(filePath)); } } + + /** + * This method will calculate the average expected size for each node + * + * @param blockInfos blocks + * @param uniqueBlocks unique blocks + * @param numOfNodes if number of nodes has to be decided + * based on block location information + * @param blockAssignmentStrategy strategy used to assign blocks + * @return the average expected size for each node + */ + private static long calcAvgLoadSizePerNode(List blockInfos, --- End diff -- Please separate the code for identifying the numberOfBlocksPerNode and dataSizeperNode. Use the required variable based on the BlockAssignmentStartegy. This way this function also not required I think. Using same name for both purposes is confusing. ---
[GitHub] carbondata issue #3058: [CARBONDATA-3238] Solve StackOverflowError using MV ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3058 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10487/ ---
[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3054 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2232/ ---
[GitHub] carbondata issue #3059: [HOTFIX][DataLoad]fix task assignment issue using NO...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3059 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2231/ ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r246280098 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala --- @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils /** * configure alluxio: * 1.start alluxio - * 2.upload the jar :"/alluxio_path/core/client/target/ - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar" - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html */ - object AlluxioExample { - def main(args: Array[String]) { -val spark = ExampleUtils.createCarbonSession("AlluxioExample") -exampleBody(spark) -spark.close() + def main (args: Array[String]) { +val carbon = ExampleUtils.createCarbonSession("AlluxioExample", + storePath = "alluxio://localhost:19998/carbondata") +exampleBody(carbon) +carbon.close() } - def exampleBody(spark : SparkSession): Unit = { + def exampleBody (spark: SparkSession): Unit = { +val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") --- End diff -- now Spark-shell and spark-submit is ok, but CarbonThriftServer and beeline still have some problem. ---
[GitHub] carbondata issue #3057: [Test][CARBONDATA-3238] Solve StackOverflowError usi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3057 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10486/ ---
[GitHub] carbondata issue #3057: [Test][CARBONDATA-3238] Solve StackOverflowError usi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3057 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2448/ ---
[GitHub] carbondata pull request #3059: [HOTFIX][DataLoad]fix task assignment issue u...
GitHub user ndwangsen opened a pull request: https://github.com/apache/carbondata/pull/3059 [HOTFIX][DataLoad]fix task assignment issue using NODE_MIN_SIZE_FIRST block assignment strategy fix task assignment issue using NODE_MIN_SIZE_FIRST block assignment strategy Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? NA - [ ] Any backward compatibility impacted? NA - [ ] Document update required? NA - [ ] Testing done Test OK in local env - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. NA You can merge this pull request into a Git repository by running: $ git pull https://github.com/ndwangsen/incubator-carbondata fix_load_min_size_bug Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3059.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3059 commit 04d6bff55a5c9120ae8d5c4899a82bc63f1e2e37 Author: ndwangsen Date: 2019-01-09T07:10:21Z [HOTFIX][DataLoad]fix task assignment issue using NODE_MIN_SIZE_FIRST block assignment strategy. ---
[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/3001 Is that ok? ---
[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...
Github user QiangCai commented on the issue: https://github.com/apache/carbondata/pull/3001 @ravipesala because we use spark to generate streaming segment and presto can't dependency spark, So I want to use StructuredStreamingExample to create stream table and commit the table data(streaming carbondata file) to create a test case in presto module. ---
[GitHub] carbondata pull request #3055: [CARBONDATA-3237] Fix presto carbon issues in...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3055#discussion_r246277431 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java --- @@ -142,5 +144,17 @@ public SliceStreamReader(int batchSize, DataType dataType, @Override public void reset() { builder = type.createBlockBuilder(null, batchSize); +this.isLocalDict = false; + } + + @Override public void putInt(int rowId, int value) { +Object data = DataTypeUtil --- End diff -- Direct overriding does not create problem to local dictionary? how do you handle local dictionary here? ---
[GitHub] carbondata pull request #3055: [CARBONDATA-3237] Fix presto carbon issues in...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3055#discussion_r246277337 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/readers/SliceStreamReader.java --- @@ -95,22 +105,14 @@ public SliceStreamReader(int batchSize, DataType dataType, dictOffsets[dictOffsets.length - 1] = size; dictionaryBlock = new VariableWidthBlock(dictionary.getDictionarySize(), Slices.wrappedBuffer(singleArrayDictValues), dictOffsets, Optional.of(nulls)); -values = (int[]) ((CarbonColumnVectorImpl) getDictionaryVector()).getDataArray(); +this.isLocalDict = true; } - @Override public void setBatchSize(int batchSize) { + --- End diff -- remove empty space ---
[GitHub] carbondata issue #3058: [CARBONDATA-3238] Solve StackOverflowError using MV ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3058 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2230/ ---
[GitHub] carbondata pull request #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user NamanRastogi commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3014#discussion_r246274857 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/management/CarbonLoadDataCommand.scala --- @@ -191,10 +191,17 @@ case class CarbonLoadDataCommand( optionsFinal .put("complex_delimiter_level_4", ComplexDelimitersEnum.COMPLEX_DELIMITERS_LEVEL_4.value()) -optionsFinal.put("sort_scope", tableProperties.asScala.getOrElse("sort_scope", - carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, -carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, - CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT +optionsFinal.put( --- End diff -- Done. ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/3014 LGTM ---
[GitHub] carbondata pull request #3058: [CARBONDATA-3238] Solve StackOverflowError us...
GitHub user qiuchenjian opened a pull request: https://github.com/apache/carbondata/pull/3058 [CARBONDATA-3238] Solve StackOverflowError using MV datamap ãProblemã An exception or error caused a run to abort. ï¼Using MVï¼ java.lang.StackOverflowError at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) at org.apache.carbondata.mv.rewrite.SelectSelectGroupbyChildDelta$$anonfun$13.applyOrElse(DefaultMatchMaker.scala:693) ãCauseãWhen column of table is lowcase and column of mv is uppercase and this column is the selected column(detail test case see the code), the tree node of this selected column in the logic plan tree have will be alias which has a child of attributeReference, when this code run in the sel_3q.transformExpressions in DefaultMatchMaker.scala, the executor rule will cause loop call in transformDown of TreeNode classã ãSolutionãthis executor rule only need be transformed twrice or less by one experssion(select and having), so define a flag to solve it Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/qiuchenjian/carbondata MVStackException Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3058.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3058 commit 3568ba53299b9f019ea41fc483bb5f79652c5352 Author: qiuchenjian <807169000@...> Date: 2019-01-09T03:07:41Z [CARBONDATA-3238] Solve StackOverflowError using MV datamap commit 42cf1c70b7b14f8e79801fb3579e84ada5671735 Author: qiuchenjian <807169000@...> Date: 2019-01-09T06:17:33Z [CARBONDATA-3238] Solve StackOverflowError using MV datamap ---
[GitHub] carbondata pull request #3057: [Test][CARBONDATA-3238] Solve StackOverflowEr...
Github user qiuchenjian closed the pull request at: https://github.com/apache/carbondata/pull/3057 ---
[GitHub] carbondata issue #3057: [Test][CARBONDATA-3238] Solve StackOverflowError usi...
Github user qiuchenjian commented on the issue: https://github.com/apache/carbondata/pull/3057 retest please ---
[GitHub] carbondata pull request #3057: [Test][CARBONDATA-3238] Solve StackOverflowEr...
GitHub user qiuchenjian reopened a pull request: https://github.com/apache/carbondata/pull/3057 [Test][CARBONDATA-3238] Solve StackOverflowError using MV datamap Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/qiuchenjian/carbondata MVStackException Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3057.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3057 commit 3568ba53299b9f019ea41fc483bb5f79652c5352 Author: qiuchenjian <807169000@...> Date: 2019-01-09T03:07:41Z [CARBONDATA-3238] Solve StackOverflowError using MV datamap commit 42cf1c70b7b14f8e79801fb3579e84ada5671735 Author: qiuchenjian <807169000@...> Date: 2019-01-09T06:17:33Z [CARBONDATA-3238] Solve StackOverflowError using MV datamap ---
[GitHub] carbondata pull request #3057: [Test][CARBONDATA-3238] Solve StackOverflowEr...
Github user qiuchenjian closed the pull request at: https://github.com/apache/carbondata/pull/3057 ---
[GitHub] carbondata issue #3057: [Test][CARBONDATA-3238] Solve StackOverflowError usi...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3057 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2229/ ---
[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...
Github user ajantha-bhat commented on the issue: https://github.com/apache/carbondata/pull/3055 @ravipesala : please check ---
[GitHub] carbondata issue #2996: [CARBONDATA-3235] Fix Rename-Fail & Datamap-creation...
Github user kunal642 commented on the issue: https://github.com/apache/carbondata/pull/2996 LGTM ---
[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/3001 @QiangCai Please try to add test case to it, otherwise it will be easy to break in future commits. ---
[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/3001 LGTM ---
[GitHub] carbondata issue #3057: [CARBONDATA-3238] Solve StackOverflowError using MV ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3057 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2447/ ---
[GitHub] carbondata issue #3057: [CARBONDATA-3238] Solve StackOverflowError using MV ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3057 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10485/ ---
[GitHub] carbondata issue #3057: [CARBONDATA-3238] Solve StackOverflowError using MV ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3057 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2228/ ---
[GitHub] carbondata pull request #3057: [CARBONDATA-3238] Solve StackOverflowError us...
GitHub user qiuchenjian opened a pull request: https://github.com/apache/carbondata/pull/3057 [CARBONDATA-3238] Solve StackOverflowError using MV datamap ãProblemã An exception or error caused a run to abort. ï¼Using MVï¼ java.lang.StackOverflowError at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) at org.apache.carbondata.mv.rewrite.SelectSelectGroupbyChildDelta$$anonfun$13.applyOrElse(DefaultMatchMaker.scala:693) ãCauseãWhen column of table is lowcase and column of mv is uppercase and this column is the selected column(detail test case see the code), the tree node of this selected column in the logic plan tree have will be alias which has a child of attributeReference, when this code run in the sel_3q.transformExpressions in DefaultMatchMaker.scala, the executor rule will cause loop callã ãSolutionãthis executor rule only need be transformed once by one experssion, so define a flag to solve it Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/qiuchenjian/carbondata MVStackException Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3057.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3057 commit 3568ba53299b9f019ea41fc483bb5f79652c5352 Author: qiuchenjian <807169000@...> Date: 2019-01-09T03:07:41Z [CARBONDATA-3238] Solve StackOverflowError using MV datamap ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r246249496 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala --- @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils /** * configure alluxio: * 1.start alluxio - * 2.upload the jar :"/alluxio_path/core/client/target/ - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar" - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html */ - object AlluxioExample { - def main(args: Array[String]) { -val spark = ExampleUtils.createCarbonSession("AlluxioExample") -exampleBody(spark) -spark.close() + def main (args: Array[String]) { +val carbon = ExampleUtils.createCarbonSession("AlluxioExample", + storePath = "alluxio://localhost:19998/carbondata") +exampleBody(carbon) +carbon.close() } - def exampleBody(spark : SparkSession): Unit = { + def exampleBody (spark: SparkSession): Unit = { +val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") FileFactory.getConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") // Specify date format based on raw data CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd") -spark.sql("DROP TABLE IF EXISTS alluxio_table") +val time = new SimpleDateFormat("MMddHHmmssSSS").format(new Date()) + +val mFsShell = new FileSystemShell() +val localFile = rootPath + "/hadoop/src/test/resources/data.csv" +val remotePath = "/carbon_alluxio" + time + ".csv" +val remoteFile = "alluxio://localhost:19998/carbon_alluxio" + time + ".csv" --- End diff -- ok ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r246249301 --- Diff: docs/alluxio-guide.md --- @@ -0,0 +1,42 @@ + + + +# Presto guide --- End diff -- changed ---
[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...
Github user xubo245 commented on the issue: https://github.com/apache/carbondata/pull/3037 @KanakaKumar @ravipesala @jackylk @QiangCai @kunal642 Please review it. ---
[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3001 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2446/ ---
[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3001 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10484/ ---
[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3001 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2227/ ---
[GitHub] carbondata issue #3001: [CARBONDATA-3220] Support presto to read stream segm...
Github user brijoobopanna commented on the issue: https://github.com/apache/carbondata/pull/3001 retest this please ---
[jira] [Updated] (CARBONDATA-3238) Throw StackOverflowError exception using MV datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chenjian Qiu updated CARBONDATA-3238: - Description: Exception: java.lang.StackOverflowError at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) TestCase: sql("drop datamap if exists all_table_mv") sql("drop table if exists all_table") sql("create table all_table(x1 bigint,x2 bigint,x3 string,x4 bigint,x5 bigint,x6 int,x7 string,x8 int, x9 int,x10 bigint," + "x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint,x17 bigint,x18 bigint,x19 bigint) stored by 'carbondata'") sql("insert into all_table select 1,1,null,1,1,1,null,1,1,1,1,1,1,1,1,1,1,1,1") sql("create datamap all_table_mv on table all_table using 'mv' " + "as select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2") sql("rebuild datamap all_table_mv") sql("explain select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2") was: Exception: java.lang.StackOverflowError at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) TestCase: test("select mv stack exception") { sql("drop datamap if exists all_table_mv") sql("drop table if exists all_table") sql("create table all_table(x1 bigint,x2 bigint,x3 string,x4 bigint,x5 bigint,x6 int,x7 string,x8 int, x9 int,x10 bigint," + "x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint,x17 bigint,x18 bigint,x19 bigint) stored by 'carbondata'") sql("insert into all_table select 1,1,null,1,1,1,null,1,1,1,1,1,1,1,1,1,1,1,1") sql("create datamap all_table_mv on table all_table using 'mv' " + "as select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2") sql("rebuild datamap all_table_mv") sql("explain select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2").collect().foreach(println) } > Throw StackOverflowError exception using MV datamap > --- > > Key: CARBONDATA-3238 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3238 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.5.1 >Reporter: Chenjian Qiu >Priority: Blocker > > Exception: > java.lang.StackOverflowError > at > org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) > at > org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) > at > org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) > TestCase: > sql("drop datamap if exists all_table_mv") > sql("drop table if exists all_table") > sql("create table all_table(x1 bigint,x2 bigint,x3 string,x4 bigint,x5 > bigint,x6 int,x7 string,x8 int, x9 int,x10 bigint," + > "x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint,x17 > bigint,x18 bigint,x19 bigint) stored by 'carbondata'") > sql("insert into all_table select > 1,1,null,1,1,1,null,1,1,1,1,1,1,1,1,1,1,1,1") > sql("create datamap all_table_mv on table all_table using 'mv' " + > "as select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as > y4,X8,x9,x2 from all_table group by X8,x9,x2") > sql("rebuild datamap all_table_mv") > sql("explain select sum(x12) as y1, sum(x13) as y2, sum(x14) as > y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2") -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (CARBONDATA-3238) Throw StackOverflowError exception using MV datamap
[ https://issues.apache.org/jira/browse/CARBONDATA-3238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chenjian Qiu updated CARBONDATA-3238: - Description: Exception: java.lang.StackOverflowError at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) TestCase: test("select mv stack exception") { sql("drop datamap if exists all_table_mv") sql("drop table if exists all_table") sql("create table all_table(x1 bigint,x2 bigint,x3 string,x4 bigint,x5 bigint,x6 int,x7 string,x8 int, x9 int,x10 bigint," + "x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint,x17 bigint,x18 bigint,x19 bigint) stored by 'carbondata'") sql("insert into all_table select 1,1,null,1,1,1,null,1,1,1,1,1,1,1,1,1,1,1,1") sql("create datamap all_table_mv on table all_table using 'mv' " + "as select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2") sql("rebuild datamap all_table_mv") sql("explain select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2").collect().foreach(println) } was: Exception: java.lang.StackOverflowError at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) TestCase: test("select mv stack exception") { sql("drop datamap if exists all_table_mv") sql("drop table if exists all_table") sql("create table all_table(x1 bigint,x2 bigint,x3 string,x4 bigint,x5 bigint,x6 int,x7 string,x8 int, x9 int,x10 bigint," + "x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint,x17 bigint,x18 bigint,x19 bigint) stored by 'carbondata'") sql("insert into all_table select 1,1,null,1,1,1,null,1,1,1,1,1,1,1,1,1,1,1,1") sql("create datamap all_table_mv on table all_table using 'mv' " + "as select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2") sql("rebuild datamap all_table_mv") sql("explain select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2").collect().foreach(println) } > Throw StackOverflowError exception using MV datamap > --- > > Key: CARBONDATA-3238 > URL: https://issues.apache.org/jira/browse/CARBONDATA-3238 > Project: CarbonData > Issue Type: Bug > Components: data-query >Affects Versions: 1.5.1 >Reporter: Chenjian Qiu >Priority: Blocker > > Exception: > java.lang.StackOverflowError > at > org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) > at > org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) > at scala.Option.map(Option.scala:146) > at > org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) > at > org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) > TestCase: > test("select mv stack exception") { > sql("drop datamap if exists all_table_mv") > sql("drop table if exists all_table") > sql("create table all_table(x1 bigint,x2 bigint,x3 string,x4 bigint,x5 > bigint,x6 int,x7 string,x8 int, x9 int,x10 bigint," + > "x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint,x17 > bigint,x18 bigint,x19 bigint) stored by 'carbondata'") > sql("insert into all_table select > 1,1,null,1,1,1,null,1,1,1,1,1,1,1,1,1,1,1,1") > sql("create datamap all_table_mv on table all_table using 'mv' " + > "as select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as > y4,X8,x9,x2 from all_table group by X8,x9,x2") > sql("rebuild datamap all_table_mv") > sql("explain select sum(x12) as y1, sum(x13) as y2, sum(x14) as > y3,sum(x15) as y4,X8,x9,x2 from all_table group by > X8,x9,x2").collect().foreach(println) > } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (CARBONDATA-3238) Throw StackOverflowError exception using MV datamap
Chenjian Qiu created CARBONDATA-3238: Summary: Throw StackOverflowError exception using MV datamap Key: CARBONDATA-3238 URL: https://issues.apache.org/jira/browse/CARBONDATA-3238 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.5.1 Reporter: Chenjian Qiu Exception: java.lang.StackOverflowError at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap$$anonfun$get$1.apply(AttributeMap.scala:34) at scala.Option.map(Option.scala:146) at org.apache.spark.sql.catalyst.expressions.AttributeMap.get(AttributeMap.scala:34) at org.apache.spark.sql.catalyst.expressions.AttributeMap.contains(AttributeMap.scala:36) TestCase: test("select mv stack exception") { sql("drop datamap if exists all_table_mv") sql("drop table if exists all_table") sql("create table all_table(x1 bigint,x2 bigint,x3 string,x4 bigint,x5 bigint,x6 int,x7 string,x8 int, x9 int,x10 bigint," + "x11 bigint, x12 bigint,x13 bigint,x14 bigint,x15 bigint,x16 bigint,x17 bigint,x18 bigint,x19 bigint) stored by 'carbondata'") sql("insert into all_table select 1,1,null,1,1,1,null,1,1,1,1,1,1,1,1,1,1,1,1") sql("create datamap all_table_mv on table all_table using 'mv' " + "as select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2") sql("rebuild datamap all_table_mv") sql("explain select sum(x12) as y1, sum(x13) as y2, sum(x14) as y3,sum(x15) as y4,X8,x9,x2 from all_table group by X8,x9,x2").collect().foreach(println) } -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #3032: [CARBONDATA-3210] Merge common method into CarbonSpa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3032 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2445/ ---
[GitHub] carbondata issue #3032: [CARBONDATA-3210] Merge common method into CarbonSpa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3032 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10483/ ---
[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3055 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10481/ ---
[GitHub] carbondata issue #3032: [CARBONDATA-3210] Merge common method into CarbonSpa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3032 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2226/ ---
[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3055 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2443/ ---
[GitHub] carbondata issue #3032: [CARBONDATA-3210] Merge common method into CarbonSpa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2225/ ---
[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3055 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2224/ ---
[GitHub] carbondata issue #3032: [CARBONDATA-3210] Merge common method into CarbonSpa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3032 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10480/ ---
[GitHub] carbondata issue #3032: [CARBONDATA-3210] Merge common method into CarbonSpa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3032 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2442/ ---
[GitHub] carbondata issue #3046: [CARBONDATA-3231] Fix OOM exception when dictionary ...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/3046 > This PR is just to add the size based limitation so that the map size can be controlled. @kunal642 Yeah, I noticed that. So my proposal is that please make a reservation for minimal changes when we want to implement that feature (automatically size detect and fall back) later. ---
[GitHub] carbondata pull request #3046: [CARBONDATA-3231] Fix OOM exception when dict...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3046#discussion_r246056127 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -2076,4 +2076,15 @@ private CarbonCommonConstants() { */ public static final String CARBON_QUERY_DATAMAP_BLOOM_CACHE_SIZE_DEFAULT_VAL = "512"; + public static final String CARBON_LOCAL_DICTIONARY_MAX_THRESHOLD = --- End diff -- It still failed to make it clear that what kind of size it supposed to be since we have a storage size and a counting size. ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/3053 Does this PR fix two problems? If it is yes, better to separate it into two. And for the first problem, I'm also concerning about the performance decrease. The rawCompress can save some memory copy operations, that's why we add a check there and try to use that feature if the compressor supports that. It may needs more observations about the performance decreasement OR we can just add a switch there to control the behavior and it will be helpful for comparison. ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r246047276 --- Diff: docs/documentation.md --- @@ -29,15 +29,15 @@ Apache CarbonData is a new big data file format for faster interactive query usi **Quick Start:** [Run an example program](./quick-start-guide.md#installing-and-configuring-carbondata-to-run-locally-with-spark-shell) on your local machine or [study some examples](https://github.com/apache/carbondata/tree/master/examples/spark2/src/main/scala/org/apache/carbondata/examples). -**CarbonData SQL Language Reference:** CarbonData extends the Spark SQL language and adds several [DDL](./ddl-of-carbondata.md) and [DML](./dml-of-carbondata.md) statements to support operations on it.Refer to the [Reference Manual](./language-manual.md) to understand the supported features and functions. +**CarbonData SQL Language Reference:** CarbonData extends the Spark SQL language and adds several [DDL](./ddl-of-carbondata.md) and [DML](./dml-of-carbondata.md) statements to support operations on it. Refer to the [Reference Manual](./language-manual.md) to understand the supported features and functions. **Programming Guides:** You can read our guides about [Java APIs supported](./sdk-guide.md) or [C++ APIs supported](./csdk-guide.md) to learn how to integrate CarbonData with your applications. ## Integration -CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) and [Hive](./quick-start-guide.md#hive).Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData. +CarbonData can be integrated with popular Execution engines like [Spark](./quick-start-guide.md#spark) , [Presto](./quick-start-guide.md#presto) and [Hive](./quick-start-guide.md#hive). CarbonData also supports read and write with [Alluxio](./quick-start-guide.md#alluxio). Refer to the [Installation and Configuration](./quick-start-guide.md#integration) section to understand all modes of Integrating CarbonData. --- End diff -- I think it's not proper to mention Alluxio after e(*Not E*)xecution engines like SparkSQL/Presto/Hive. Meanwhile we can add another paragraph and mention CarbonData can integrate with other storage engines such as HDFS, S3, OBS, Alluxio. @chenliang613 How do you think about it? ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r246049322 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala --- @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils /** * configure alluxio: * 1.start alluxio - * 2.upload the jar :"/alluxio_path/core/client/target/ - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar" - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html */ - object AlluxioExample { - def main(args: Array[String]) { -val spark = ExampleUtils.createCarbonSession("AlluxioExample") -exampleBody(spark) -spark.close() + def main (args: Array[String]) { +val carbon = ExampleUtils.createCarbonSession("AlluxioExample", + storePath = "alluxio://localhost:19998/carbondata") +exampleBody(carbon) +carbon.close() } - def exampleBody(spark : SparkSession): Unit = { + def exampleBody (spark: SparkSession): Unit = { +val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") FileFactory.getConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") // Specify date format based on raw data CarbonProperties.getInstance() .addProperty(CarbonCommonConstants.CARBON_DATE_FORMAT, "/MM/dd") -spark.sql("DROP TABLE IF EXISTS alluxio_table") +val time = new SimpleDateFormat("MMddHHmmssSSS").format(new Date()) + +val mFsShell = new FileSystemShell() +val localFile = rootPath + "/hadoop/src/test/resources/data.csv" +val remotePath = "/carbon_alluxio" + time + ".csv" +val remoteFile = "alluxio://localhost:19998/carbon_alluxio" + time + ".csv" --- End diff -- use 'prefix + remotePath' instead of concating the path by hand ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r246050916 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/AlluxioExample.scala --- @@ -28,46 +33,86 @@ import org.apache.carbondata.examples.util.ExampleUtils /** * configure alluxio: * 1.start alluxio - * 2.upload the jar :"/alluxio_path/core/client/target/ - * alluxio-core-client-YOUR-VERSION-jar-with-dependencies.jar" - * 3.Get more detail at:http://www.alluxio.org/docs/master/en/Running-Spark-on-Alluxio.html + * 2.Get more detail at: https://www.alluxio.org/docs/1.8/en/compute/Spark.html */ - object AlluxioExample { - def main(args: Array[String]) { -val spark = ExampleUtils.createCarbonSession("AlluxioExample") -exampleBody(spark) -spark.close() + def main (args: Array[String]) { +val carbon = ExampleUtils.createCarbonSession("AlluxioExample", + storePath = "alluxio://localhost:19998/carbondata") +exampleBody(carbon) +carbon.close() } - def exampleBody(spark : SparkSession): Unit = { + def exampleBody (spark: SparkSession): Unit = { +val rootPath = new File(this.getClass.getResource("/").getPath + + "../../../..").getCanonicalPath spark.sparkContext.hadoopConfiguration.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem") --- End diff -- Only providing an example for dataframe is not enough. Seems we should add some configurations in carbon property file and spark properties to make it work through beeline. So we can make it clear in case the user want to try it from beeline. ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r246044066 --- Diff: docs/alluxio-guide.md --- @@ -0,0 +1,42 @@ + + + +# Presto guide --- End diff -- presto? ---
[GitHub] carbondata pull request #3054: [CARBONDATA-3232] Add example and doc for all...
Github user xuchuanyin commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3054#discussion_r246047576 --- Diff: docs/quick-start-guide.md --- @@ -54,7 +54,8 @@ CarbonData can be integrated with Spark,Presto and Hive Execution Engines. The b ### Hive [Installing and Configuring CarbonData on Hive](https://github.com/apache/carbondata/blob/master/docs/hive-guide.md) - +### Alluxio --- End diff -- As mentioned above, we may need to adjust the location for this section. ---
[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...
Github user xuchuanyin commented on the issue: https://github.com/apache/carbondata/pull/3056 > because both the query and load flow were assigned the same taskId and once query finished it freed the unsafe memory while the insert still in progress. How do you handle the scenario for stored-by-carbondata carbontable? In that scenario, both of the query flow and load flow use offheap and encounter the same problem just as you described above. But I remembered we handle that differently from the current PR, which I think their modification can be similar. Please check this again. ---
[GitHub] carbondata issue #3032: [CARBONDATA-3210] Merge common method into CarbonSpa...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3032 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2223/ ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3014 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10479/ ---
[GitHub] carbondata pull request #3032: [CARBONDATA-3210] Merge common method into Ca...
Github user xiaohui0318 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3032#discussion_r246029848 --- Diff: integration/spark2/src/main/scala/org/apache/carbondata/spark/thriftserver/CarbonThriftServer.scala --- @@ -28,12 +28,13 @@ import org.slf4j.{Logger, LoggerFactory} import org.apache.carbondata.common.logging.LogServiceFactory import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.spark.util.CarbonSparkUtil -/** - * CarbonThriftServer support different modes: - * 1. read/write data from/to HDFS or local,it only needs configurate storePath - * 2. read/write data from/to S3, it needs provide access-key, secret-key, s3-endpoint - */ + /** + * CarbonThriftServer support different modes: + * 1. read/write data from/to HDFS or local,it only needs configurate storePath + * 2. read/write data from/to S3, it needs provide access-key, secret-key, s3-endpoint + */ object CarbonThriftServer { --- End diff -- done ---
[GitHub] carbondata pull request #3032: [CARBONDATA-3210] Merge common method into Ca...
Github user xiaohui0318 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3032#discussion_r246029862 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3UsingSDkExample.scala --- @@ -16,28 +16,26 @@ */ package org.apache.carbondata.examples -import org.apache.hadoop.conf.Configuration -import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY} import org.apache.spark.sql.SparkSession import org.slf4j.{Logger, LoggerFactory} -import org.apache.carbondata.core.constants.CarbonCommonConstants import org.apache.carbondata.core.metadata.datatype.DataTypes import org.apache.carbondata.sdk.file.{CarbonWriter, Field, Schema} +import org.apache.carbondata.spark.util.CarbonSparkUtil /** * Generate data and write data to S3 * User can generate different numbers of data by specifying the number-of-rows in parameters */ -object S3UsingSDKExample { +object S3UsingSdkExample { --- End diff -- done ---
[GitHub] carbondata pull request #3032: [CARBONDATA-3210] Merge common method into Ca...
Github user xiaohui0318 commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3032#discussion_r246029819 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/S3Example.scala --- @@ -18,11 +18,10 @@ package org.apache.carbondata.examples import java.io.File -import org.apache.hadoop.fs.s3a.Constants.{ACCESS_KEY, ENDPOINT, SECRET_KEY} import org.apache.spark.sql.{Row, SparkSession} import org.slf4j.{Logger, LoggerFactory} -import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.spark.util.CarbonSparkUtil object S3Example { --- End diff -- done ---
[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3037 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10478/ ---
[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...
Github user chenliang613 commented on the issue: https://github.com/apache/carbondata/pull/3056 Reviewed, LGTM ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3014 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1// ---
[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3037 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2221/ ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3053 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2220/ ---
[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3055 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2219/ ---
[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3056 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10475/ ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3053 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10477/ ---
[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3055 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10476/ ---
[GitHub] carbondata issue #3055: [CARBONDATA-3237] Fix presto carbon issues in dictio...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3055 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2439/ ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3053 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2440/ ---
[GitHub] carbondata pull request #3053: [CARBONDATA-3233]Fix JVM crash issue in snapp...
Github user qiuchenjian commented on a diff in the pull request: https://github.com/apache/carbondata/pull/3053#discussion_r245982727 --- Diff: core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java --- @@ -369,7 +367,7 @@ public BigDecimal getDecimal(int rowId) { @Override public double[] getDoublePage() { -double[] data = new double[getPageSize()]; +double[] data = new double[getEndLoop()]; --- End diff -- get itï¼ thank you ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3053 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2218/ ---
[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3056 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2217/ ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3014 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10474/ ---
[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3037 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2438/ ---
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3053 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10472/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10473/ ---
[GitHub] carbondata issue #3056: [CARBONDATA-3236] Fix for JVM Crash for insert into ...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3056 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2437/ ---
[jira] [Created] (CARBONDATA-3237) optimize presto query time for dictionary include string column
Ajantha Bhat created CARBONDATA-3237: Summary: optimize presto query time for dictionary include string column Key: CARBONDATA-3237 URL: https://issues.apache.org/jira/browse/CARBONDATA-3237 Project: CarbonData Issue Type: Bug Reporter: Ajantha Bhat Assignee: Ajantha Bhat optimize presto query time for dictionary include string column. problem: currently, for each query, presto carbon creates dictionary block for string columns. This happens for each query and if cardinality is more , it takes more time to build. This is not required. we can lookup using normal dictionary lookup. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #3053: [CARBONDATA-3233]Fix JVM crash issue in snappy compr...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3053 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2436/ ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3014 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2216/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2215/ ---
[GitHub] carbondata issue #3033: [CARBONDATA-3215] Optimize the documentation
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3033 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2214/ ---
[GitHub] carbondata pull request #3056: [CARBONDATA-3236] Fix for JVM Crash for inser...
GitHub user manishnalla1994 opened a pull request: https://github.com/apache/carbondata/pull/3056 [CARBONDATA-3236] Fix for JVM Crash for insert into new table from old table Problem: Insert into new table from old table fails with JVM crash. This happened because both the query and load flow were assigned the same taskId and once query finished it freed the unsafe memory while the insert still in progress. Solution: As the flow for file format is direct flow and uses on-heap(safe) so no need to free the unsafe memory. Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? - [ ] Any backward compatibility impacted? - [ ] Document update required? - [x] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/manishnalla1994/carbondata JVMCrashForLoadAndQuery Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/3056.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3056 commit 150c710218ff3c09ccfccc9b2df970006964ef6d Author: manishnalla1994 Date: 2019-01-08T10:42:55Z Fix for JVM Crash for file format ---
[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3037 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2213/ ---
[jira] [Created] (CARBONDATA-3236) JVM Crash for insert into new table from old table
MANISH NALLA created CARBONDATA-3236: Summary: JVM Crash for insert into new table from old table Key: CARBONDATA-3236 URL: https://issues.apache.org/jira/browse/CARBONDATA-3236 Project: CarbonData Issue Type: Bug Reporter: MANISH NALLA Assignee: MANISH NALLA -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10470/ ---
[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3054 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder2.1/2212/ ---
[GitHub] carbondata issue #3033: [CARBONDATA-3215] Optimize the documentation
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3033 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10469/ ---
[GitHub] carbondata issue #3029: [CARBONDATA-3200] No-Sort compaction
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3029 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2435/ ---
[GitHub] carbondata issue #3014: [CARBONDATA-3201] Added load level SORT_SCOPE
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3014 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2434/ ---
[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3037 Build Failed with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10468/ ---
[GitHub] carbondata issue #3037: [CARBONDATA-3190] Open example module code style che...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3037 Build Failed with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2430/ ---
[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3054 Build Success with Spark 2.3.2, Please check CI http://136.243.101.176:8080/job/carbondataprbuilder2.3/10467/ ---
[GitHub] carbondata pull request #2996: [CARBONDATA-3235] Fix Rename-Fail & Datamap-c...
Github user NamanRastogi commented on a diff in the pull request: https://github.com/apache/carbondata/pull/2996#discussion_r245925395 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/schema/CarbonAlterTableRenameCommand.scala --- @@ -165,15 +167,22 @@ private[sql] case class CarbonAlterTableRenameCommand( case e: ConcurrentOperationException => throw e case e: Exception => +if (hiveRenameSuccess) { + sparkSession.sessionState.catalog.asInstanceOf[CarbonSessionCatalog].alterTableRename( --- End diff -- @qiuchenjian @kevinjmh Is Correct. ---
[GitHub] carbondata issue #3033: [CARBONDATA-3215] Optimize the documentation
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3033 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2432/ ---
[GitHub] carbondata issue #3054: [CARBONDATA-3232] Add example and doc for alluxio in...
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/3054 Build Success with Spark 2.2.1, Please check CI http://95.216.28.178:8080/job/ApacheCarbonPRBuilder1/2431/ ---