[GitHub] carbondata pull request #2029: [CARBONDATA-2222] Update the FAQ doc for some...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/2029 [CARBONDATA-] Update the FAQ doc for some mistakes Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [No] Any interfaces changed? - [No] Any backward compatibility impacted? - [Yes ] Document update required? - [NA ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ NA] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata updatedoc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/2029.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2029 commit e10ed2c0d96070112bb5701ba85b896fb6fe1f18 Author: chenerlu <chenerlu@...> Date: 2018-03-04T15:39:40Z update the FAQ doc for some mistakes ---
[jira] [Created] (CARBONDATA-2222) Update the FAQ doc for some mistakes
chenerlu created CARBONDATA-: Summary: Update the FAQ doc for some mistakes Key: CARBONDATA- URL: https://issues.apache.org/jira/browse/CARBONDATA- Project: CarbonData Issue Type: Bug Reporter: chenerlu Assignee: chenerlu -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] carbondata issue #1894: [CARBONDATA-2107]Fixed query failure in case if aver...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1894 retest this please ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1657 retest this please ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1657 retest this please ---
[GitHub] carbondata pull request #1103: [WIP] Implement range interval partition
Github user chenerlu closed the pull request at: https://github.com/apache/carbondata/pull/1103 ---
[GitHub] carbondata pull request #880: [CARBONDATA-1021] Update compact for code styl...
Github user chenerlu closed the pull request at: https://github.com/apache/carbondata/pull/880 ---
[GitHub] carbondata pull request #1105: [WIP] Implement range interval partition
Github user chenerlu closed the pull request at: https://github.com/apache/carbondata/pull/1105 ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1657 retest this please ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1657 retest this please ---
[GitHub] carbondata issue #1657: [CARBONDATA-1895] Fix issue of create table if not e...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1657 retest this please ---
[GitHub] carbondata pull request #1657: [CARBONDATA-1895] Fix issue of create table i...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1657 [CARBONDATA-1895] Fix issue of create table if not exists Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Already add test case in project. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata pr-1212 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1657.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1657 ---
[jira] [Created] (CARBONDATA-1895) Fix issue of create table if not exits
chenerlu created CARBONDATA-1895: Summary: Fix issue of create table if not exits Key: CARBONDATA-1895 URL: https://issues.apache.org/jira/browse/CARBONDATA-1895 Project: CarbonData Issue Type: Bug Reporter: chenerlu Assignee: chenerlu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1587: [CARBONDATA-1835] Fix null exception when get table ...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1587 retest this please ---
[GitHub] carbondata pull request #1587: [CARBONDATA-1835] Fix null exception when get...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1587 [CARBONDATA-1835] Fix null exception when get table detail Be sure to do all of the following checklist to help us incorporate your contribution quickly and easily: - [ ] Any interfaces changed? No - [ ] Any backward compatibility impacted? No - [ ] Document update required? No - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - How it is tested? Please attach test report. - Is it a performance related change? Please attach the performance test report. - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata fixnep Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1587.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1587 commit e37dea76036242008ce20938b4c21ba0027ab4a2 Author: chenerlu <chene...@huawei.com> Date: 2017-11-29T07:51:53Z [CARBONDATA-1835] Fix null exception when get table detail ---
[jira] [Created] (CARBONDATA-1835) Fix null exception when get table details
chenerlu created CARBONDATA-1835: Summary: Fix null exception when get table details Key: CARBONDATA-1835 URL: https://issues.apache.org/jira/browse/CARBONDATA-1835 Project: CarbonData Issue Type: Bug Reporter: chenerlu Assignee: chenerlu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1554: [CARBONDATA-1717] Fix issue of no sort when data in ...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1554 retest this please ---
[GitHub] carbondata pull request #1554: [CARBONDATA-1717] Fix issue of no sort when d...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1554 [CARBONDATA-1717] Fix issue of no sort when data in carbon table is all numeric Modification reason: Fix issue of no sort when data in carbon table is all numeric. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata 1122 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1554.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1554 commit ac84728dbc7ca064149c5b61a10ae4686c729d2d Author: chenerlu <chene...@huawei.com> Date: 2017-11-22T15:04:13Z [CARBONDATA-1717] Fix issue of no sort when data in carbon table is all numeric ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata issue #1537: [CARBONDATA-1778] Support clean data for all
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1537 retest this please ---
[GitHub] carbondata pull request #1537: [CARBONDATA-1778] Support clean data for all
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1537 [CARBONDATA-1778] Support clean data for all Modification reasons: Now Carbon only support clean garbage segments for specified table. Carbon should provide the ability to clean all garbage segments without specified the database name and table name. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata cleanfile Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1537.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1537 commit d5e9b19809b75f3cb8af27ff059c24b25e552309 Author: chenerlu <chene...@huawei.com> Date: 2017-11-20T09:01:42Z Support clean data for all ---
[jira] [Commented] (CARBONDATA-1778) Support clean garbage segments for all
[ https://issues.apache.org/jira/browse/CARBONDATA-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16258972#comment-16258972 ] chenerlu commented on CARBONDATA-1778: -- Now Carbon only support clean garbage segments for specified table. Carbon should provide the ability to clean all garbage segments without specified the database name and table name. > Support clean garbage segments for all > -- > > Key: CARBONDATA-1778 > URL: https://issues.apache.org/jira/browse/CARBONDATA-1778 > Project: CarbonData > Issue Type: Improvement > Reporter: chenerlu > Assignee: chenerlu >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (CARBONDATA-1778) Support clean garbage segments for all
chenerlu created CARBONDATA-1778: Summary: Support clean garbage segments for all Key: CARBONDATA-1778 URL: https://issues.apache.org/jira/browse/CARBONDATA-1778 Project: CarbonData Issue Type: Improvement Reporter: chenerlu Assignee: chenerlu Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1525: [CARBONDATA-1751] Make the type of exception ...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1525#discussion_r151911286 --- Diff: integration/spark2/src/test/scala/org/apache/spark/carbondata/CarbonDataSourceSuite.scala --- @@ -18,12 +18,10 @@ package org.apache.spark.carbondata import scala.collection.mutable - --- End diff -- Suggest keep this space line ---
[GitHub] carbondata pull request #1525: [CARBONDATA-1751] Make the type of exception ...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1525#discussion_r151911377 --- Diff: integration/spark2/src/test/scala/org/apache/spark/carbondata/CarbonDataSourceSuite.scala --- @@ -18,12 +18,10 @@ package org.apache.spark.carbondata import scala.collection.mutable - import org.apache.spark.sql.common.util.Spark2QueryTest import org.apache.spark.sql.types._ -import org.apache.spark.sql.{Row, SaveMode} +import org.apache.spark.sql.{AnalysisException, Row, SaveMode} import org.scalatest.BeforeAndAfterAll - --- End diff -- Suggest keep this space line ---
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1437 retest this please ---
[GitHub] carbondata pull request #1437: [CARBONDATA-1618] Fix issue of not support ta...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1437#discussion_r147736082 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableWithTableComment.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +/** + * test functionality for create table with table comment + */ +class TestCreateTableWithTableComment extends QueryTest with BeforeAndAfterAll { + + override def beforeAll { +sql("use default") +sql("drop table if exists withTableComment") +sql("drop table if exists withoutTableComment") + } + + test("test create table with table comment") { +sql( + s""" + | create table withTableComment( + | id int, + | name string + | ) + | comment "This table has table comment" + | STORED BY 'carbondata' + """.stripMargin +) + +val result = sql("describe formatted withTableComment") + +checkExistence(result, true, "Comment:") +checkExistence(result, true, "This table has table comment") + } + + test("test create table without table comment") { +sql( + s""" + | create table withoutTableComment( + | id int, + | name string + | ) + | STORED BY 'carbondata' + """.stripMargin +) + +val result = sql("describe formatted withoutTableComment") + +checkExistence(result, true, "Comment:") --- End diff -- Done ---
[GitHub] carbondata pull request #1437: [CARBONDATA-1618] Fix issue of not support ta...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1437#discussion_r147437929 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -247,7 +247,8 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { , tableName: String, fields: Seq[Field], partitionCols: Seq[PartitionerField], tableProperties: mutable.Map[String, String], - bucketFields: Option[BucketFields], isAlterFlow: Boolean = false): TableModel = { + bucketFields: Option[BucketFields], isAlterFlow: Boolean = false, + comment: Option[String] = None): TableModel = { --- End diff -- Have renamed. ---
[GitHub] carbondata pull request #1437: [CARBONDATA-1618] Fix issue of not support ta...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1437#discussion_r147436112 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/createTable/TestCreateTableWithTableComment.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.spark.testsuite.createTable + +import org.apache.spark.sql.test.util.QueryTest +import org.scalatest.BeforeAndAfterAll + +/** + * test functionality for create table with table comment + */ +class TestCreateTableWithTableComment extends QueryTest with BeforeAndAfterAll { + + override def beforeAll { +sql("use default") +sql("drop table if exists withTableComment") +sql("drop table if exists withoutTableComment") + } + + test("test create table with table comment") { +sql( + s""" + | create table withTableComment( + | id int, + | name string + | ) + | comment "This table has table comment" + | STORED BY 'carbondata' + """.stripMargin +) + +val result = sql("describe formatted withTableComment") + +checkExistence(result, true, "Comment:") +checkExistence(result, true, "This table has table comment") + } + + test("test create table without table comment") { +sql( + s""" + | create table withoutTableComment( + | id int, + | name string + | ) + | STORED BY 'carbondata' + """.stripMargin +) + --- End diff -- This PR not contains this functions. ---
[GitHub] carbondata pull request #1437: [CARBONDATA-1618] Fix issue of not support ta...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1437#discussion_r147435775 --- Diff: integration/spark-common/src/main/scala/org/apache/spark/sql/catalyst/CarbonDDLSqlParser.scala --- @@ -247,7 +247,8 @@ abstract class CarbonDDLSqlParser extends AbstractCarbonSparkSQLParser { , tableName: String, fields: Seq[Field], partitionCols: Seq[PartitionerField], tableProperties: mutable.Map[String, String], - bucketFields: Option[BucketFields], isAlterFlow: Boolean = false): TableModel = { + bucketFields: Option[BucketFields], isAlterFlow: Boolean = false, + comment: Option[String] = None): TableModel = { --- End diff -- Carbon already support column comment. ---
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1437 retest this please ---
[GitHub] carbondata issue #1437: [CARBONDATA-1618] Fix issue of not support table com...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1437 retest this please ---
[GitHub] carbondata pull request #1437: [CARBONDATA-1618] Fix issue of not support ta...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1437 [CARBONDATA-1618] Fix issue of not support table comment Background: Current carbon do not support table comment when create table. This PR will support table comment. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata tablecomment Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1437.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1437 ---
[jira] [Created] (CARBONDATA-1618) Fix issue of not supporting table comment
chenerlu created CARBONDATA-1618: Summary: Fix issue of not supporting table comment Key: CARBONDATA-1618 URL: https://issues.apache.org/jira/browse/CARBONDATA-1618 Project: CarbonData Issue Type: Bug Reporter: chenerlu Assignee: chenerlu -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1321: [CARBONDATA-1438] Unify the sort column and sort sco...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1321 retest this please ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r139301875 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -433,10 +442,23 @@ case class LoadTable( val dateFormat = options.getOrElse("dateformat", null) ValidateUtil.validateDateFormat(dateFormat, table, tableName) val maxColumns = options.getOrElse("maxcolumns", null) - val sortScope = options.getOrElse("sort_scope", null) + + val tableProperties = table.getTableInfo.getFactTable.getTableProperties + val sortScope = if (null == tableProperties) { +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + } else { +tableProperties.getOrDefault("sort_scope", + CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT) + } + --- End diff -- ok, already update. ---
[GitHub] carbondata issue #1321: [CARBONDATA-1438] Unify the sort column and sort sco...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1321 retest this please ---
[GitHub] carbondata issue #1321: [CARBONDATA-1438] Unify the sort column and sort sco...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1321 retest this please ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138853538 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -432,7 +440,9 @@ case class LoadTable( val dateFormat = options.getOrElse("dateformat", null) ValidateUtil.validateDateFormat(dateFormat, table, tableName) val maxColumns = options.getOrElse("maxcolumns", null) - val sortScope = options.getOrElse("sort_scope", null) + + val tableProperties = table.getTableInfo.getFactTable.getTableProperties + val sortScope = if (null == tableProperties) null else tableProperties.get("sort_scope") --- End diff -- I use CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT as its default value. ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r138852303 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -172,6 +173,13 @@ case class CreateTable(cm: TableModel) extends RunnableCommand { val tableInfo: TableInfo = TableNewProcessor(cm) +// Add validation for sort scope when create table +val sortScope = tableInfo.getFactTable.getTableProperties.get("sort_scope") +if (null != sortScope && !CarbonUtil.isValidSortOption(sortScope)) { + throw new InvalidConfigurationException("The sort scope " + sortScope --- End diff -- For this, I just keep same with error message which is already exists. ---
[GitHub] carbondata issue #1321: [CARBONDATA-1438] Unify the sort column and sort sco...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1321 retest this please ---
[GitHub] carbondata issue #1321: [CARBONDATA-1438] Unify the sort column and sort sco...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1321 please retest this ---
[GitHub] carbondata issue #1321: [CARBONDATA-1438] Unify the sort column and sort sco...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1321 please retest this ---
[GitHub] carbondata issue #1321: [CARBONDATA-1438] Unify the sort column and sort sco...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1321 please retest it ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r137442307 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestGlobalSortDataLoad.scala --- @@ -318,12 +329,12 @@ class TestGlobalSortDataLoad extends QueryTest with BeforeAndAfterEach with Befo | charField CHAR(5), | floatField FLOAT | ) - | STORED BY 'org.apache.carbondata.format' + | STORED BY 'org.apache.carbondata.format' TBLPROPERTIES('SORT_SCOPE'='GLOBAL_SORT') """.stripMargin) sql( s""" | LOAD DATA LOCAL INPATH '$path' INTO TABLE carbon_globalsort_difftypes - | OPTIONS('SORT_SCOPE'='GLOBAL_SORT', + | OPTIONS( | 'FILEHEADER'='shortField,intField,bigintField,doubleField,stringField,timestampField,decimalField,dateField,charField,floatField') """.stripMargin) --- End diff -- ok ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r137441759 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) +val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + +optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + +optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", --- End diff -- Yes, this is only needed for batch sort, but I think if users specify this parameter in global sort, it is better to ignore it. ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1321#discussion_r137441815 --- Diff: integration/spark2/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -639,6 +639,23 @@ case class LoadTable( val carbonProperty: CarbonProperties = CarbonProperties.getInstance() carbonProperty.addProperty("zookeeper.enable.lock", "false") val optionsFinal = getFinalOptions(carbonProperty) +val tableProperties = relation.tableMeta.carbonTable.getTableInfo + .getFactTable.getTableProperties + +optionsFinal.put("sort_scope", tableProperties.getOrDefault("sort_scope", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_SORT_SCOPE, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_SORT_SCOPE, +CarbonCommonConstants.LOAD_SORT_SCOPE_DEFAULT + +optionsFinal.put("batch_sort_size_inmb", tableProperties.getOrDefault("batch_sort_size_inmb", + carbonProperty.getProperty(CarbonLoadOptionConstants.CARBON_OPTIONS_BATCH_SORT_SIZE_INMB, + carbonProperty.getProperty(CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB, + CarbonCommonConstants.LOAD_BATCH_SORT_SIZE_INMB_DEFAULT + +optionsFinal.put("global_sort_partitions", tableProperties.getOrDefault("global_sort_partitions", --- End diff -- Same as batch sort size I think. ---
[GitHub] carbondata pull request #1283: [WIP] Add carbon encoding example
Github user chenerlu closed the pull request at: https://github.com/apache/carbondata/pull/1283 ---
[GitHub] carbondata pull request #1321: [CARBONDATA-1438] Unify the sort column and s...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1321 [CARBONDATA-1438] Unify the sort column and sort scope in create table command Background: In order to improve the ease of usage for users, unify the sort column and sort scope in create table command. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata pr-1438 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1321.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1321 commit c20caf973a70e6c6dc94d3cb26bf22404646e8eb Author: chenerlu <chene...@huawei.com> Date: 2017-09-04T12:54:55Z Unify the sort column and sort scope in create table command ---
[jira] [Updated] (CARBONDATA-1438) Unify the sort column and sort scope in create table command
[ https://issues.apache.org/jira/browse/CARBONDATA-1438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenerlu updated CARBONDATA-1438: - Description: 1 Requirement Currently, Users can specify sort column in table properties when create table. And when load data, users can also specify sort scope in load options. In order to improve the ease of use for users, it will be better to specify the sort related parameters all in create table command. Once sort scope is specified in create table command, it will be used in load data even users have specified in load options. 2 Detailed design 2.1 Task-01 Requirement: Create table can support specify sort scope Implement: Take use of table properties (Map<String, String>), will specify sort scope in table properties by key/value pair, then existing interface will be called to write this key/value pair into metastore. Will support Global Sort,Local Sort and No Sort,it can be specified in sql command: CREATE TABLE tableWithGlobalSort ( shortField SHORT, intField INT, bigintField LONG, doubleField DOUBLE, stringField STRING, timestampField TIMESTAMP, decimalField DECIMAL(18,2), dateField DATE, charField CHAR(5) ) STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT') Tips:If the sort scope is global Sort, users should specify GLOBAL_SORT_PARTITIONS. If users do not specify it, it will use the number of map task. GLOBAL_SORT_PARTITIONS should be Integer type, the range is [1,Integer.MaxValue],it is only used when the sort scope is global sort. Global Sort Use orderby operator in spark, data is ordered in segment level. Local Sort Node ordered, carbondata file is ordered if it is written by one task. No Sort No sort Tips:key and value is case-insensitive. 2.2 Task-02 Requirement: Load data in will support local sort, no sort, global sort Ignore the sort scope specified in load data and use the parameter which specified in create table. Currently, user can specify the sort scope and global sort partitions in load options, After modification, it will ignore the sort scope which specified in load options and will get sort scope from table properties. Current logic: sort scope is from load options Number PrerequisiteSort scope 1 isSortTable is true && Sort Scope is Global SortGlobal Sort(first check) 2 isSortTable is falseNo Sort 3 isSortTable is true Local Sort Tips: isSortTable is true means this table contains sort column or it contains dimensions (except complex type), like string type. For example: Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort table Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties (‘sort_column’=’col1’) –- sort table New logic:sort scope is from create table Number PrerequisiteCode branch 1 isSortTable = true && Sort Scope is Global Sort Global Sort(first check) 2 isSortTable= false || Sort Scope is No Sort No Sort 3 isSortTable is true && Sort Scope is Local Sort Local Sort 4 isSortTable is true,without specify Sort Scope Local Sort, (Keep current logic) 3 Acceptance standard Number Acceptance standard 1 Use can specify sort scope(global, local, no sort) when create carbon table in sql type 2 Load data will ignore the sort scope specified in load options and will use the parameter which specify in create table command. If user still specify the sort scope in load options, will give warning and inform user that he will use the sort scope which specified in create table. 4 Feature restrictions NA 5 Dependencies NA 6 Technical risk NA was: 1 Requirement Currently, Users can specify sort column in table properties when create table. And when load data, users can also specify sort scope in load options. In order to improve the ease of use for users, it will be better to specify the sort related parameters all in create table command. Once sort scope is specified in create table command, it will be used in load data even users have specified in load options. 2 Detailed design 2.1 Task-01 Requirement: Create table can support specify sort scope Implement: Take use of table properties (Map<String, String>), will specify sort scope in table properties by key/value pair, then existing interface will be called to write this key/value pair into metastore. Will support Global Sort,Local Sort and No Sort,it can be specified in sql command: CREATE TABLE tableWithGlobalSort ( shortField SHORT, intField INT, bigintField LONG, doubleField DOUBLE, stringField STRING, timestampField TIMESTAMP, decimalField DECIMAL(18,2), dateField DATE, charField CHAR(5) ) STORED BY 'carbondata' TBLPR
[jira] [Created] (CARBONDATA-1438) Unify the sort column and sort scope in create table command
chenerlu created CARBONDATA-1438: Summary: Unify the sort column and sort scope in create table command Key: CARBONDATA-1438 URL: https://issues.apache.org/jira/browse/CARBONDATA-1438 Project: CarbonData Issue Type: Bug Reporter: chenerlu 1 Requirement Currently, Users can specify sort column in table properties when create table. And when load data, users can also specify sort scope in load options. In order to improve the ease of use for users, it will be better to specify the sort related parameters all in create table command. Once sort scope is specified in create table command, it will be used in load data even users have specified in load options. 2 Detailed design 2.1 Task-01 Requirement: Create table can support specify sort scope Implement: Take use of table properties (Map<String, String>), will specify sort scope in table properties by key/value pair, then existing interface will be called to write this key/value pair into metastore. Will support Global Sort,Local Sort and No Sort,it can be specified in sql command: CREATE TABLE tableWithGlobalSort ( shortField SHORT, intField INT, bigintField LONG, doubleField DOUBLE, stringField STRING, timestampField TIMESTAMP, decimalField DECIMAL(18,2), dateField DATE, charField CHAR(5) ) STORED BY 'carbondata' TBLPROPERTIES('SORT_COLUMNS'='stringField', 'SORT_SCOPE'='GLOBAL_SORT') Tips:If the sort scope is global Sort, users should specify GLOBAL_SORT_PARTITIONS. If users do not specify it, it will use the number of map task. GLOBAL_SORT_PARTITIONS should be Integer type, the range is [1,Integer.MaxValue],it is only used when the sort scope is global sort. Global Sort Use orderby operator in spark, data is ordered in segment level. Local Sort Node ordered, carbondata file is ordered if it is written by one task. No Sort No sort Tips:key and value is case-insensitive. 2.2 Task-02 Requirement: Load data in will support local sort, no sort, global sort Ignore the sort scope specified in load data and use the parameter which specified in create table. Currently, user can specify the sort scope and global sort partitions in load options, After modification, it will ignore the sort scope which specified in load options and will get sort scope from table properties. Current logic: sort scope is from load options Number PrerequisiteSort scope 1 isSortTable is true && Sort Scope is Global SortGlobal Sort(first check) 2 isSortTable is falseNo Sort 3 isSortTable is true Local Sort Tips: isSortTable is true means this table contains sort column or it contains dimensions (except complex type), like string type. For example: Create table xxx1 (col1 string col2 int) stored by ‘carbondata’ --- sort table Create table xx1 (col1 int, col2 int) stored by ‘carbondata’ --- not sort table Create table xx (col1 int, col2 string) stored by ‘carbondata’ tblproperties (‘sort_column’=’col1’) –- sort table New logic:sort scope is from create table Number PrerequisiteCode branch 1 isSortTable = true && Sort Scope is Global Sort Global Sort(first check) 2 isSortTable= false || Sort Scope is No Sort No Sort 3 isSortTable is true && Sort Scope is Local Sort Local Sort 4 isSortTable is true,without specify Sort Scope Local Sort, (Keep current logic) 3 Acceptance standard Number Acceptance standard 1 Use can specify sort scope(global, local, no sort) when create carbon table in sql type 2 Load data will ignore the sort scope specified in load options and will use the parameter which specify in create table command. If user still specify the sort scope in load options, will give warning and inform user that he will use the sort scope which specified in create table. 4 Feature restrictions NA 5 Dependencies NA 6 Technical risk NA -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata issue #1285: [CARBONDATA-1403] Compaction log is not correct
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1285 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1285: [CARBONDATA-1403] Compaction log is not corre...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1285 [CARBONDATA-1403] Compaction log is not correct Modify reason: The log of compaction is not correct, So change it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata fixcompactlog Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1285.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1285 commit b24c5411d1f4621a1600032646d1e39e26db75f9 Author: chenerlu <chene...@huawei.com> Date: 2017-08-23T09:23:38Z fix CARBONDATA-1403 compaction log is not correct --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1403) Compaction log is not correct
chenerlu created CARBONDATA-1403: Summary: Compaction log is not correct Key: CARBONDATA-1403 URL: https://issues.apache.org/jira/browse/CARBONDATA-1403 Project: CarbonData Issue Type: Bug Reporter: chenerlu Assignee: chenerlu Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1275: [CARBONDATA-1376] Fix warn message when setti...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1275 [CARBONDATA-1376] Fix warn message when setting LOCK_TYPE to HDFSLOCK Modify Reason: The below WARN message is not correct, need to optimize. Users may confused by this warn message, now the default value "LOCALLOCK" will be set before validate and configure lock, so it will not be null any more, change this warn message just for better understanding. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata 1376 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1275.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1275 commit 90c7ac53c674362a6ba1a5e7458c53c6b4bb233e Author: chenerlu <chene...@huawei.com> Date: 2017-08-21T02:42:30Z fix JIRA-1376 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129598686 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java --- @@ -440,9 +510,16 @@ protected Expression getFilterPredicates(Configuration configuration) { for (Map.Entry<SegmentTaskIndexStore.TaskBucketHolder, AbstractIndex> entry : segmentIndexMap.entrySet()) { SegmentTaskIndexStore.TaskBucketHolder taskHolder = entry.getKey(); - int taskId = CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(taskHolder.taskNo); + int partitionId = CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(taskHolder.taskNo); + //oldPartitionIdList is only used in alter table partition command because it change + //partition info first and then read data. + //for other normal query should use newest partitionIdList --- End diff -- use /** */ instead if multi line notes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129598531 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java --- @@ -440,9 +510,16 @@ protected Expression getFilterPredicates(Configuration configuration) { for (Map.Entry<SegmentTaskIndexStore.TaskBucketHolder, AbstractIndex> entry : segmentIndexMap.entrySet()) { SegmentTaskIndexStore.TaskBucketHolder taskHolder = entry.getKey(); - int taskId = CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(taskHolder.taskNo); + int partitionId = CarbonTablePath.DataFileUtil.getTaskIdFromTaskNo(taskHolder.taskNo); + //oldPartitionIdList is only used in alter table partition command because it change + //partition info first and then read data. + //for other normal query should use newest partitionIdList --- End diff -- use /** */ instead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129584623 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -101,17 +126,40 @@ object CarbonPartitionExample { spark.sql(""" | CREATE TABLE IF NOT EXISTS t5 | ( + | id Int, | vin String, | logdate Timestamp, | phonenumber Long, - | area String + | area String, + | salary Int |) | PARTITIONED BY (country String) | STORED BY 'carbondata' | TBLPROPERTIES('PARTITION_TYPE'='LIST', - | 'LIST_INFO'='(China,United States),UK ,japan,(Canada,Russia), South Korea ') + | 'LIST_INFO'='(China, US),UK ,Japan,(Canada,Russia, Good, NotGood), Korea ') --- End diff -- add space before , --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129583765 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + +public class PartitionSpliterRawResultIterator extends CarbonIterator<Object[]> { + + private CarbonIterator iterator; + private BatchResult batch; + private int counter; + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(PartitionSpliterRawResultIterator.class.getName()); + + public PartitionSpliterRawResultIterator(CarbonIterator iterator) { +this.iterator = iterator; + } + + + @Override public boolean hasNext() { +if (null == batch || checkBatchEnd(batch)) { + if (iterator.hasNext()) { +batch = iterator.next(); +counter = 0; + } else { +return false; + } +} + +if (!checkBatchEnd(batch)) { + return true; +} else { + return false; +} + } + + @Override public Object[] next() { +if (batch == null) { + batch = iterator.next(); +} +if (!checkBatchEnd(batch)) { + try { +return batch.getRawRow(counter++); + } catch (Exception e) { +LOGGER.error(e.getMessage()); +return null; + } +} else { + batch = iterator.next(); + counter = 0; +} +try { + return batch.getRawRow(counter++); +} catch (Exception e) { + LOGGER.error(e.getMessage()); + return null; --- End diff -- This logical can be optimized. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129583246 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + +public class PartitionSpliterRawResultIterator extends CarbonIterator<Object[]> { + + private CarbonIterator iterator; + private BatchResult batch; + private int counter; + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(PartitionSpliterRawResultIterator.class.getName()); + + public PartitionSpliterRawResultIterator(CarbonIterator iterator) { +this.iterator = iterator; + } + + + @Override public boolean hasNext() { +if (null == batch || checkBatchEnd(batch)) { + if (iterator.hasNext()) { +batch = iterator.next(); +counter = 0; + } else { +return false; + } +} + +if (!checkBatchEnd(batch)) { + return true; +} else { + return false; +} --- End diff -- use return !checkBatchEnd(batch) instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129582365 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + --- End diff -- delete space line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129582258 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + +public class PartitionSpliterRawResultIterator extends CarbonIterator<Object[]> { + + private CarbonIterator iterator; + private BatchResult batch; + private int counter; + + /** + * LOGGER + */ --- End diff -- I think this is not necessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129582111 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/result/iterator/PartitionSpliterRawResultIterator.java --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.carbondata.core.scan.result.iterator; + +import org.apache.carbondata.common.CarbonIterator; +import org.apache.carbondata.common.logging.LogService; +import org.apache.carbondata.common.logging.LogServiceFactory; +import org.apache.carbondata.core.scan.result.BatchResult; + + +public class PartitionSpliterRawResultIterator extends CarbonIterator<Object[]> { + + private CarbonIterator iterator; + private BatchResult batch; + private int counter; + + /** + * LOGGER + */ + private static final LogService LOGGER = + LogServiceFactory.getLogService(PartitionSpliterRawResultIterator.class.getName()); + + public PartitionSpliterRawResultIterator(CarbonIterator iterator) { +this.iterator = iterator; + } + + --- End diff -- delete useless space line --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129534273 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/PartitionInfo.java --- @@ -65,6 +65,31 @@ public PartitionInfo(List columnSchemaList, PartitionType partitio this.partitionIds = new ArrayList<>(); } + /** + * add partition means split default partition, add in last directly --- End diff -- default partition is 0, so why split partition ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1192: [CARBONDATA-940] alter table add/split partit...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1192#discussion_r129533186 --- Diff: core/src/main/java/org/apache/carbondata/core/constants/CarbonCommonConstants.java --- @@ -308,6 +308,10 @@ @CarbonProperty public static final String NUM_CORES_COMPACTING = "carbon.number.of.cores.while.compacting"; /** + * Number of cores to be used while alter partition + */ + public static final String NUM_CORES_ALT_PARTITION = "carbon.number.of.cores.while.altPartition"; + /** --- End diff -- Add spaceline --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1173: [CARBONDATA-1209] add partitionId in show partition ...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1173 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1173: [CARBONDATA-1209] add partitionId in show par...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1173#discussion_r128303568 --- Diff: integration/spark-common/src/main/scala/org/apache/carbondata/spark/util/CommonUtil.scala --- @@ -585,38 +585,44 @@ object CommonUtil { var result = Seq.newBuilder[Row] partitionType match { case PartitionType.RANGE => -result.+=(RowFactory.create(columnName + "=default")) -var rangeInfo = partitionInfo.getRangeInfo -var size = rangeInfo.size() - 1 +result.+=(RowFactory.create("0" + ", " + columnName + " = DEFAULT")) +val rangeInfo = partitionInfo.getRangeInfo +val size = rangeInfo.size() - 1 for (index <- 0 to size) { if (index == 0) { -result.+=(RowFactory.create(columnName + "<" + rangeInfo.get(index))) +val id = partitionInfo.getPartitionId(index + 1).toString +val desc = columnName + " < " + rangeInfo.get(index) +result.+=(RowFactory.create(id + ", " + desc)) --- End diff -- Make sure this logic is correct. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1173: [CARBONDATA-1209] add partitionId in show par...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1173#discussion_r128302516 --- Diff: integration/spark-common-test/src/test/scala/org/apache/carbondata/spark/testsuite/partition/TestShowPartitions.scala --- @@ -146,31 +146,31 @@ class TestShowPartition extends QueryTest with BeforeAndAfterAll { test("show partition table: hash table") { // EqualTo -checkAnswer(sql("show partitions hashTable"), Seq(Row("empno=HASH_NUMBER(3)"))) +checkAnswer(sql("show partitions hashTable"), Seq(Row("empno = HASH_NUMBER(3)"))) } test("show partition table: range partition") { // EqualTo -checkAnswer(sql("show partitions rangeTable"), Seq(Row("doj=default"), - Row("doj<01-01-2010"), Row("01-01-2010<=doj<01-01-2015"))) +checkAnswer(sql("show partitions rangeTable"), Seq(Row("0, doj = DEFAULT"), + Row("1, doj < 01-01-2010"), Row("2, 01-01-2010 <= doj < 01-01-2015"))) } test("show partition table: list partition") { // EqualTo -checkAnswer(sql("show partitions listTable"), Seq(Row("workgroupcategory=default"), - Row("workgroupcategory=0"), Row("workgroupcategory=1"), Row("workgroupcategory=2, 3"))) +checkAnswer(sql("show partitions listTable"), Seq(Row("0, workgroupcategory = DEFAULT"), --- End diff -- May be keep default will be better. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1173: [CARBONDATA-1209] add partitionId in show par...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1173#discussion_r128302086 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -144,15 +144,15 @@ object CarbonPartitionExample { // show partitions try { - spark.sql("""SHOW PARTITIONS t0""").show() + spark.sql("""SHOW PARTITIONS t0""").show(100, false) } catch { - case ex: AnalysisException => print(ex.getMessage()) + case ex: AnalysisException => print(ex.getMessage() + "\n") --- End diff -- same problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1173: [CARBONDATA-1209] add partitionId in show par...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1173#discussion_r128301910 --- Diff: examples/spark2/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -123,7 +123,7 @@ object CarbonPartitionExample { try { spark.sql(s"DROP TABLE IF EXISTS partitionDB.t9") } catch { - case ex: NoSuchDatabaseException => print(ex.getMessage()) + case ex: NoSuchDatabaseException => print(ex.getMessage() + "\n") --- End diff -- same problem. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1173: [CARBONDATA-1209] add partitionId in show par...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1173#discussion_r128301570 --- Diff: examples/spark/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -121,18 +121,18 @@ object CarbonPartitionExample { cc.sql("alter table hiveDB.t7 add partition (city = 'Shanghai')") // show partitions try { - cc.sql("SHOW PARTITIONS t0").show() + cc.sql("SHOW PARTITIONS t0").show(100, false) --- End diff -- why use falseï¼ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1173: [CARBONDATA-1209] add partitionId in show par...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1173#discussion_r128301440 --- Diff: examples/spark/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -121,18 +121,18 @@ object CarbonPartitionExample { cc.sql("alter table hiveDB.t7 add partition (city = 'Shanghai')") // show partitions try { - cc.sql("SHOW PARTITIONS t0").show() + cc.sql("SHOW PARTITIONS t0").show(100, false) } catch { - case ex: AnalysisException => print(ex.getMessage()) + case ex: AnalysisException => print(ex.getMessage() + "\n") --- End diff -- Use println instead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1174: [WIP] Update installation-guide.md
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1174 [WIP] Update installation-guide.md Update installation guide. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata patch-8 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1174.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1174 commit b97eef300ffb3ae0a348acff2d85bff98ad05fdd Author: chenerlu <chene...@huawei.com> Date: 2017-07-14T16:34:34Z Update installation-guide.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1164: [CARBONDATA-1303] Update CarbonContext.scala
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1164#discussion_r126979658 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonContext.scala --- @@ -37,6 +37,7 @@ import org.apache.carbondata.core.stats.{QueryStatistic, QueryStatisticsConstant import org.apache.carbondata.core.util.{CarbonProperties, CarbonTimeStatisticsFactory} class CarbonContext( +@transient --- End diff -- Why you add this annotation ? I think is unnecessary. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1165: Update supported-data-types-in-carbondata.md
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1165 Update supported-data-types-in-carbondata.md Just Update supported-data-types-in-carbondata You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata patch-5 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1165.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1165 commit 3e69af0fe1d0bcee7fed9e0776225ecba2de3220 Author: chenerlu <chene...@huawei.com> Date: 2017-07-12T10:50:02Z Update supported-data-types-in-carbondata.md --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1103: [WIP] Implement range interval partition
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1103#discussion_r126324104 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/PartitionInfo.java --- @@ -47,6 +47,12 @@ */ private int numPartitions; + /** + * range interval information defined for range interval partition table + */ + private List rangeIntervalInfo; + + --- End diff -- Ok~ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1135: [CARBONDATA-1265] Fix AllDictionary because it is on...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1135 Have merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1135: [CARBONDATA-1265] Fix AllDictionary because i...
Github user chenerlu closed the pull request at: https://github.com/apache/carbondata/pull/1135 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1135: [CARBONDATA-1265] Fix AllDictionary because i...
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1135 [CARBONDATA-1265] Fix AllDictionary because it is only supported when single_pass is true Fix AllDictionary because it is only supported when single_pass is true You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata branch-1.1-release Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1135.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1135 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1265) Fix AllDictionaryExample because it is only supported when single_pass is true
chenerlu created CARBONDATA-1265: Summary: Fix AllDictionaryExample because it is only supported when single_pass is true Key: CARBONDATA-1265 URL: https://issues.apache.org/jira/browse/CARBONDATA-1265 Project: CarbonData Issue Type: Bug Reporter: chenerlu Assignee: chenerlu Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1125: [CarbonData-1250] Change default partition id...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1125#discussion_r125428800 --- Diff: format/src/main/thrift/schema.thrift --- @@ -135,6 +135,9 @@ struct PartitionInfo{ 3: optional i32 num_partitions; // number of partitions defined in hash partition table --- End diff -- Same as Hash_num_parititions --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1125: [CarbonData-1250] Change default partition id...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1125#discussion_r125427623 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/schema/table/CarbonTable.java --- @@ -99,10 +98,6 @@ private Map<String, PartitionInfo> tablePartitionMap; /** - * statistic information of partition table - */ - private PartitionStatistic partitionStatistic; - /** --- End diff -- Should keep this line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1125: [CarbonData-1250] Change default partition id...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1125#discussion_r125427240 --- Diff: core/src/main/java/org/apache/carbondata/core/metadata/converter/ThriftWrapperSchemaConverterImpl.java --- @@ -219,6 +219,10 @@ externalPartitionInfo.setList_info(wrapperPartitionInfo.getListInfo()); externalPartitionInfo.setRange_info(wrapperPartitionInfo.getRangeInfo()); externalPartitionInfo.setNum_partitions(wrapperPartitionInfo.getNumPartitions()); + externalPartitionInfo.setNumOfPartitions(wrapperPartitionInfo.getNumberOfPartitions()); --- End diff -- I think it may be better that use Hash_numPartition. otherwise users may confused about this two num of partitions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1125: [CarbonData-1250] change default partition id...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1125#discussion_r125319466 --- Diff: core/src/main/java/org/apache/carbondata/core/scan/partition/RangePartitioner.java --- @@ -17,16 +17,17 @@ package org.apache.carbondata.core.scan.partition; -import java.io.Serializable; -import java.math.BigDecimal; import java.text.SimpleDateFormat; import java.util.List; import org.apache.carbondata.core.constants.CarbonCommonConstants; import org.apache.carbondata.core.metadata.datatype.DataType; import org.apache.carbondata.core.metadata.schema.PartitionInfo; import org.apache.carbondata.core.util.ByteUtil; + --- End diff -- DELETE --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Commented] (CARBONDATA-995) Incorrect result displays while using variance aggregate function in presto integration
[ https://issues.apache.org/jira/browse/CARBONDATA-995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068075#comment-16068075 ] chenerlu commented on CARBONDATA-995: - Hi, What is behave of same operation in hive ? > Incorrect result displays while using variance aggregate function in presto > integration > --- > > Key: CARBONDATA-995 > URL: https://issues.apache.org/jira/browse/CARBONDATA-995 > Project: CarbonData > Issue Type: Bug > Components: data-query, presto-integration >Affects Versions: 1.1.0 > Environment: spark 2.1 , presto 0.166 >Reporter: Vandana Yadav >Priority: Minor > Attachments: 2000_UniqData.csv > > > Incorrect result displays while using variance aggregate function in presto > integration > Steps to reproduce : > 1. In CarbonData: > a) Create table: > CREATE TABLE uniqdata (CUST_ID int,CUST_NAME String,ACTIVE_EMUI_VERSION > string, DOB timestamp, DOJ timestamp, BIGINT_COLUMN1 bigint,BIGINT_COLUMN2 > bigint,DECIMAL_COLUMN1 decimal(30,10), DECIMAL_COLUMN2 > decimal(36,10),Double_COLUMN1 double, Double_COLUMN2 double,INTEGER_COLUMN1 > int) STORED BY 'org.apache.carbondata.format' TBLPROPERTIES > ("TABLE_BLOCKSIZE"= "256 MB"); > b) Load data : > LOAD DATA INPATH 'hdfs://localhost:54310/2000_UniqData.csv' into table > uniqdata OPTIONS('DELIMITER'=',' , > 'QUOTECHAR'='"','BAD_RECORDS_ACTION'='FORCE','FILEHEADER'='CUST_ID,CUST_NAME,ACTIVE_EMUI_VERSION,DOB,DOJ,BIGINT_COLUMN1,BIGINT_COLUMN2,DECIMAL_COLUMN1,DECIMAL_COLUMN2,Double_COLUMN1,Double_COLUMN2,INTEGER_COLUMN1'); > 2. In presto > a) Execute the query: > select variance(DECIMAL_COLUMN1) as a from (select DECIMAL_COLUMN1 from > UNIQDATA order by DECIMAL_COLUMN1) t > Actual result : > In CarbonData : > "++--+ > | a | > ++--+ > | 333832.4983039884 | > ++--+ > 1 row selected (0.695 seconds) > " > in presto: > " a > --- > 333832.3010442859 > (1 row) > Query 20170420_082837_00062_hd7jy, FINISHED, 1 node > Splits: 35 total, 35 done (100.00%) > 0:00 [2.01K rows, 1.97KB] [8.09K rows/s, 7.91KB/s]" > Expected result: it should display the same result as showing in CarbonData. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1112: [CARBONDATA-1244] Rewrite README.md of presto...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1112#discussion_r124708591 --- Diff: integration/presto/src/main/java/org/apache/carbondata/presto/impl/CarbonTableReader.java --- @@ -72,25 +72,54 @@ * 2:FileFactory, (physic table file) * 3:CarbonCommonFactory, (offer some ) * 4:DictionaryFactory, (parse dictionary util) + * + * Currently, it is mainly used to parse metadata of tables under + * the configured carbondata-store path and filter the relevant + * input splits with given query predicates. */ public class CarbonTableReader { private CarbonTableConfig config; + + /** + * The names of the tables under the schema (this.carbonFileList). + */ private List tableList; + + /** + * carbonFileList represents the store path of the schema, which is configured as carbondata-store + * in the CarbonData catalog file ($PRESTO_HOME$/etc/catalog/carbondata.properties). + * Under the schema store path, there should be a directory named as the schema name. + * And under each schema directory, there are directories named as the table names. + * For example, the schema is named 'default' and there is two table named 'foo' and 'bar' in it, then the --- End diff -- Some notes like this, I think it is not necessary. We can discuss. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1112: [CARBONDATA-1244] Rewrite README.md of presto...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1112#discussion_r124708068 --- Diff: integration/presto/README.md --- @@ -59,28 +55,50 @@ Please follow the below steps to query carbondata in presto ``` * config carbondata-connector for presto - First:compile carbondata-presto integration module + Firstly: Compile carbondata, including carbondata-presto integration module ``` $ git clone https://github.com/apache/carbondata - $ cd carbondata/integration/presto - $ mvn clean package + $ cd carbondata + $ mvn -DskipTests -P{spark-version} -Dspark.version={spark-version-number} -Dhadoop.version={hadoop-version-number} clean package + ``` + Replace the spark and hadoop version with you the version you used in your cluster. + For example, if you use Spark2.1.0 and Hadoop 2.7.3, you would like to compile using: + ``` + mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 -Dhadoop.version=2.7.3 clean package + ``` + + Secondly: Create a folder named 'carbondata' under $PRESTO_HOME$/plugin and + copy all jar from carbondata/integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT +to $PRESTO_HOME$/plugin/carbondata + + Thirdly: Create a carbondata.properties file under $PRESTO_HOME$/etc/catalog/ containing the following contents: ``` - Second:create one folder "carbondata" under ./presto-server-0.166/plugin - Third:copy all jar from ./carbondata/integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT -to ./presto-server-0.166/plugin/carbondata + connector.name=carbondata + carbondata-store={schema-store-path} + ``` + Replace the schema-store-path with the absolute path the directory which is the parent of the schema. + For example, if you have a schema named 'default' stored under hdfs://namenode:9000/test/carbondata/, + Then set carbondata-store=hdfs://namenode:9000/test/carbondata + + If you changed the jar balls or configuration files, make sure you have dispatch the new jar balls + and configuration file to all the presto nodes and restart the nodes in the cluster. A modification of the + carbondata connector will not take an effect automatically. ### Generate CarbonData file -Please refer to quick start : https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md +Please refer to quick start: https://github.com/apache/carbondata/blob/master/docs/quick-start-guide.md +Load data statement in Spark can be used to create carbondata tables. And you can easily find the creaed --- End diff -- created -> created --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1112: [CARBONDATA-1244] Rewrite README.md of presto...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1112#discussion_r124707906 --- Diff: integration/presto/README.md --- @@ -59,28 +55,50 @@ Please follow the below steps to query carbondata in presto ``` * config carbondata-connector for presto - First:compile carbondata-presto integration module + Firstly: Compile carbondata, including carbondata-presto integration module ``` $ git clone https://github.com/apache/carbondata - $ cd carbondata/integration/presto - $ mvn clean package + $ cd carbondata + $ mvn -DskipTests -P{spark-version} -Dspark.version={spark-version-number} -Dhadoop.version={hadoop-version-number} clean package + ``` + Replace the spark and hadoop version with you the version you used in your cluster. + For example, if you use Spark2.1.0 and Hadoop 2.7.3, you would like to compile using: + ``` + mvn -DskipTests -Pspark-2.1 -Dspark.version=2.1.0 -Dhadoop.version=2.7.3 clean package + ``` + + Secondly: Create a folder named 'carbondata' under $PRESTO_HOME$/plugin and + copy all jar from carbondata/integration/presto/target/carbondata-presto-x.x.x-SNAPSHOT --- End diff -- jar -> jars --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1112: [CARBONDATA-1244] Rewrite README.md of presto...
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1112#discussion_r124707828 --- Diff: integration/presto/README.md --- @@ -59,28 +55,50 @@ Please follow the below steps to query carbondata in presto ``` * config carbondata-connector for presto - First:compile carbondata-presto integration module + Firstly: Compile carbondata, including carbondata-presto integration module ``` $ git clone https://github.com/apache/carbondata - $ cd carbondata/integration/presto - $ mvn clean package + $ cd carbondata + $ mvn -DskipTests -P{spark-version} -Dspark.version={spark-version-number} -Dhadoop.version={hadoop-version-number} clean package + ``` + Replace the spark and hadoop version with you the version you used in your cluster. --- End diff -- Maybe it will be better to delete these two "you". --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1105: [WIP] Implement range interval partition
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1105 [WIP] Implement range interval partition This PR is try to implement range interval partition and now work on process. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata RangeInterval3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1105.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1105 commit 016377b614a29b65e67a5a965ac2ebaedb86dfe6 Author: chenerlu <chene...@huawei.com> Date: 2017-06-22T09:13:05Z Step 1 of implement range interval partition type commit a21c006bbe20df50d642bc80df351bb83dcab0f2 Author: chenerlu <chene...@huawei.com> Date: 2017-06-27T13:03:44Z Implement range interval partition commit c4105290a4ac791f48525d2dd293909856600920 Author: chenerlu <chene...@huawei.com> Date: 2017-06-28T04:10:31Z just add some test cases commit 90bbd1a58e79bf3548e1e5cff29714ea6c928df6 Author: chenerlu <chene...@huawei.com> Date: 2017-06-28T04:17:05Z Merge branch 'master' into RangeInterval2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1095: [CARBONDATA-1227] Remove useless TableCreator
GitHub user chenerlu opened a pull request: https://github.com/apache/carbondata/pull/1095 [CARBONDATA-1227] Remove useless TableCreator Just remove useless TableCreator because nobody will call it. You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenerlu/incubator-carbondata RemoveTableCreator Alternatively you can review and apply these changes as the patch at: https://github.com/apache/carbondata/pull/1095.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #1095 commit beee6decea29ff5052956eda0580b8a4b77480fc Author: chenerlu <chene...@huawei.com> Date: 2017-06-26T03:33:50Z just remove table TableCreator because nobody will call it --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-1227) Remove useless TableCreator
chenerlu created CARBONDATA-1227: Summary: Remove useless TableCreator Key: CARBONDATA-1227 URL: https://issues.apache.org/jira/browse/CARBONDATA-1227 Project: CarbonData Issue Type: Bug Reporter: chenerlu Assignee: chenerlu Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[GitHub] carbondata pull request #1094: [CARBONDATA-1181] Show partitions
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1094#discussion_r123893089 --- Diff: examples/spark/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import scala.collection.mutable.LinkedHashMap + +import org.apache.spark.sql.AnalysisException + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.examples.util.ExampleUtils + +object CarbonPartitionExample { + def main(args: Array[String]) { +CarbonPartitionExample.extracted("t3", args) + } + def extracted(tableName: String, args: Array[String]): Unit = { +val cc = ExampleUtils.createCarbonContext("CarbonPartitionExample") +val testData = ExampleUtils.currentPath + "/src/main/resources/data.csv" +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") + +// none partition table +cc.sql("DROP TABLE IF EXISTS t0") +cc.sql(""" +| CREATE TABLE IF NOT EXISTS t0 +| ( +| vin String, +| logdate Timestamp, +| phonenumber Int, +| country String, +| area String +| ) +| STORED BY 'carbondata' + """.stripMargin) +try { + cc.sql("""SHOW PARTITIONS t0""").show() +} catch { + case ex: AnalysisException => print(ex.getMessage()) +} + +// range partition +cc.sql("DROP TABLE IF EXISTS t1") +cc.sql(""" +| CREATE TABLE IF NOT EXISTS t1( +| vin STRING, +| phonenumber INT, +| country STRING, +| area STRING +| ) +| PARTITIONED BY (logdate TIMESTAMP) +| STORED BY 'carbondata' +| TBLPROPERTIES('PARTITION_TYPE'='RANGE', +| 'RANGE_INFO'='2014/01/01,2015/01/01,2016/01/01') + """.stripMargin) +cc.sql("""SHOW PARTITIONS t1""").show() + +// hash partition +cc.sql(""" +| CREATE TABLE IF NOT EXISTS t3( +| logdate Timestamp, +| phonenumber Int, +| country String, +| area String +| ) +| PARTITIONED BY (vin String) +| STORED BY 'carbondata' +| TBLPROPERTIES('PARTITION_TYPE'='HASH','NUM_PARTITIONS'='5') +""".stripMargin) +cc.sql("""SHOW PARTITIONS t3""").show() + +// list partition +cc.sql("DROP TABLE IF EXISTS t5") +cc.sql(""" + | CREATE TABLE IF NOT EXISTS t5( + | vin String, + | logdate Timestamp, + | phonenumber Int, + | area String + | ) + | PARTITIONED BY (country string) + | STORED BY 'carbondata' + | TBLPROPERTIES('PARTITION_TYPE'='LIST', + | 'LIST_INFO'='(China,United States),UK ,japan,(Canada,Russia), South Korea ') + """.stripMargin) +cc.sql("""SHOW PARTITIONS t5""").show() + +cc.sql(s"DROP TABLE IF EXISTS partitionDB.$tableName") --- End diff -- I think $tableName is not proper. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata pull request #1094: [CARBONDATA-1181] Show partitions
Github user chenerlu commented on a diff in the pull request: https://github.com/apache/carbondata/pull/1094#discussion_r123893046 --- Diff: examples/spark/src/main/scala/org/apache/carbondata/examples/CarbonPartitionExample.scala --- @@ -0,0 +1,128 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.carbondata.examples + +import scala.collection.mutable.LinkedHashMap + +import org.apache.spark.sql.AnalysisException + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.examples.util.ExampleUtils + +object CarbonPartitionExample { + def main(args: Array[String]) { +CarbonPartitionExample.extracted("t3", args) + } + def extracted(tableName: String, args: Array[String]): Unit = { +val cc = ExampleUtils.createCarbonContext("CarbonPartitionExample") +val testData = ExampleUtils.currentPath + "/src/main/resources/data.csv" +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") + +// none partition table +cc.sql("DROP TABLE IF EXISTS t0") +cc.sql(""" +| CREATE TABLE IF NOT EXISTS t0 +| ( +| vin String, +| logdate Timestamp, +| phonenumber Int, +| country String, +| area String +| ) +| STORED BY 'carbondata' + """.stripMargin) +try { + cc.sql("""SHOW PARTITIONS t0""").show() +} catch { + case ex: AnalysisException => print(ex.getMessage()) +} --- End diff -- Why use try catch ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] carbondata issue #1092: [CARBONDATA-1225] Create Table Failed for partition ...
Github user chenerlu commented on the issue: https://github.com/apache/carbondata/pull/1092 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---