[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85480562 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Block; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.index.IndexLoader; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * This segment is backed by index, thus getSplits can use the index to do file pruning. + */ +public class IndexedSegment extends Segment { + + private IndexLoader loader; + + public IndexedSegment(String name, String path, IndexLoader loader) { +super(name, path); +this.loader = loader; + } + + @Override + public List getSplits(JobContext job, FilterResolverIntf filterResolver) + throws IOException { +// do as following +// 1. create the index or get from cache by the filter name in the configuration +// 2. filter by index to get the filtered block +// 3. create input split from filtered block + +List output = new LinkedList<>(); +Index index = loader.load(job.getConfiguration()); +List blocks = index.filter(job, filterResolver); --- End diff -- You are right, I will modify --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
List the supported datatypes in carbondata
Hi, I would like to know about the datatypes which are supported by carbondata as I am getting this error *"MalformedCarbonCommandException: Unsupported data type"* while running the queries against these datatypes :*" tiny Int, smallInt, float, date, varchar, char, boolean, binary, Map, Union "*. Also there is no information available in the documentation about supported datatypes. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/List-the-supported-datatypes-in-carbondata-tp2419.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[GitHub] incubator-carbondata pull request #267: implement test cases for core.load m...
GitHub user anuragknoldus opened a pull request: https://github.com/apache/incubator-carbondata/pull/267 implement test cases for core.load module Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/anuragknoldus/incubator-carbondata CARBONDATA-340 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/267.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #267 commit c3d66fc38d3b04983697ca28fba9d8ccbeeebf6a Author: Anurag Date: 2016-10-28T05:04:51Z implement test cases for core.load module --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85470691 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.api; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonProjection; +import org.apache.carbondata.hadoop.internal.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.segment.SegmentManager; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.hadoop.util.ObjectSerializationUtil; +import org.apache.carbondata.scan.expression.Expression; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; + +/** + * Input format of CarbonData file. + * @param + */ +public class CarbonTableInputFormat extends FileInputFormat { + + private static final String FILTER_PREDICATE = + "mapreduce.input.carboninputformat.filter.predicate"; + + private SegmentManager segmentManager; + + public CarbonTableInputFormat(SegmentManager segmentManager) { +this.segmentManager = segmentManager; + } + + @Override + public RecordReader createRecordReader(InputSplit split, + TaskAttemptContext context) throws IOException, InterruptedException { +switch (((CarbonInputSplit)split).formatType()) { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85470293 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Block; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.index.IndexLoader; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * This segment is backed by index, thus getSplits can use the index to do file pruning. + */ +public class IndexedSegment extends Segment { + + private IndexLoader loader; + + public IndexedSegment(String name, String path, IndexLoader loader) { +super(name, path); +this.loader = loader; + } + + @Override + public List getSplits(JobContext job, FilterResolverIntf filterResolver) + throws IOException { +// do as following +// 1. create the index or get from cache by the filter name in the configuration +// 2. filter by index to get the filtered block +// 3. create input split from filtered block + +List output = new LinkedList<>(); +Index index = loader.load(job.getConfiguration()); --- End diff -- if loader internally implement cache then we can keep as `IndexLoader` only. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85466134 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala --- @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.spark.testsuite.dataload + +import org.apache.spark.sql.common.util.CarbonHiveContext._ +import org.apache.spark.sql.common.util.QueryTest +import org.scalatest.BeforeAndAfterAll +import java.sql.Timestamp + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.spark.sql.Row + +class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterAll { + override def beforeAll { +sql("DROP TABLE IF EXISTS t3") +sql(""" + CREATE TABLE IF NOT EXISTS t3 + (ID Int, date Timestamp, starttime Timestamp, country String, + name String, phonetype String, serialname String, salary Int) + STORED BY 'carbondata' +""") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") + } + + test("test load data with different timestamp format") { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = 'starttime:-MM-dd HH:mm:ss') + """) + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData2.csv' into table t3 + OPTIONS('dateformat' = 'date:-MM-dd,starttime:/MM/dd HH:mm:ss') + """) + checkAnswer( +sql("SELECT date FROM t3 WHERE ID = 1"), +Seq(Row(Timestamp.valueOf("2015-07-23 00:00:00.0"))) + ) + checkAnswer( +sql("SELECT starttime FROM t3 WHERE ID = 1"), +Seq(Row(Timestamp.valueOf("2016-07-23 01:01:30.0"))) + ) + checkAnswer( +sql("SELECT date FROM t3 WHERE ID = 18"), +Seq(Row(Timestamp.valueOf("2015-07-25 00:00:00.0"))) + ) + checkAnswer( +sql("SELECT starttime FROM t3 WHERE ID = 18"), +Seq(Row(Timestamp.valueOf("2016-07-25 02:32:02.0"))) + ) + } --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85466025 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1129,6 +1130,9 @@ case class LoadTable( carbonLoadModel.setEscapeChar(escapeChar) carbonLoadModel.setQuoteChar(quoteChar) carbonLoadModel.setCommentChar(commentchar) + carbonLoadModel.setDateFormat(dateFormat) --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85465773 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.spark.testsuite.dataload + +import org.apache.spark.sql.common.util.CarbonHiveContext._ +import org.apache.spark.sql.common.util.QueryTest +import org.scalatest.BeforeAndAfterAll +import java.sql.Timestamp + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.spark.exception.MalformedCarbonCommandException +import org.apache.spark.sql.Row + +class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterAll { + override def beforeAll { +sql("DROP TABLE IF EXISTS t3") +sql(""" + CREATE TABLE IF NOT EXISTS t3 + (ID Int, date Timestamp, starttime Timestamp, country String, + name String, phonetype String, serialname String, salary Int) + STORED BY 'carbondata' +""") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") + } + + test("test load data with different timestamp format") { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = 'starttime:-MM-dd HH:mm:ss') + """) + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData2.csv' into table t3 + OPTIONS('dateformat' = 'date:-MM-dd,starttime:/MM/dd HH:mm:ss') + """) + checkAnswer( +sql("SELECT date FROM t3 WHERE ID = 1"), +Seq(Row(Timestamp.valueOf("2015-07-23 00:00:00.0"))) + ) + checkAnswer( +sql("SELECT starttime FROM t3 WHERE ID = 1"), +Seq(Row(Timestamp.valueOf("2016-07-23 01:01:30.0"))) + ) + checkAnswer( +sql("SELECT date FROM t3 WHERE ID = 18"), +Seq(Row(Timestamp.valueOf("2015-07-25 00:00:00.0"))) + ) + checkAnswer( +sql("SELECT starttime FROM t3 WHERE ID = 18"), +Seq(Row(Timestamp.valueOf("2016-07-25 02:32:02.0"))) + ) + } + + test("test load data with different timestamp format with being set an empty string") { +try { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = '') + """) + assert(false) +} catch { + case ex: MalformedCarbonCommandException => +assertResult(ex.getMessage)("Error: Option DateFormat is set an empty string.") + case _ => assert(false) +} + } + + test("test load data with different timestamp format with a wrong column name") { +try { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = 'fasfdas:/MM/dd') + """) + assert(false) +} catch { + case ex: MalformedCarbonCommandException => +assertResult(ex.getMessage)("Error: Wrong Column Name fasfdas is provided in Option DateFormat.") + case _ => assert(false) +} + } + + test("test load data with different timestamp format with a timestamp column is set an empty string") { +try { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = 'fasfdas:') + """) + assert(false) +} catch { + case e
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85464806 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle( Seq.empty } + private def validateDateFormat(dateFormat: String, dateDimensionsName: ArrayBuffer[String]): + Unit = { +if (dateFormat == "") { + throw new MalformedCarbonCommandException("Error: Option DateFormat is set an empty string.") +} else { + var dateFormats: Array[String] = dateFormat.split(",") + for (singleDateFormat <- dateFormats) { +var dateFormatSplits: Array[String] = singleDateFormat.split(":", 2) +if (!dateDimensionsName.contains(dateFormatSplits(0))) { --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85463723 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/Segment.java --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * Within a carbon table, each data load becomes one Segment, which stores all data files belong to this load in + * the segment folder. + */ +public abstract class Segment { + + protected String id; + + /** + * Path of the segment folder + */ + private String path; + + public Segment(String id, String path) { +this.id = id; +this.path = path; + } + + public String getId() { +return id; + } + + public String getPath() { +return path; + } + + /** + * return all InputSplit of this segment, each file is a InputSplit + * @param job job context + * @return all InputSplit + * @throws IOException + */ + public List getAllSplits(JobContext job) throws IOException { --- End diff -- I suggest to return List --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85464310 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; --- End diff -- please use internal.CarbonInputSplit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-343) Optimize the duplicated definition code in GlobalDictionaryUtil.scala
Liang Chen created CARBONDATA-343: - Summary: Optimize the duplicated definition code in GlobalDictionaryUtil.scala Key: CARBONDATA-343 URL: https://issues.apache.org/jira/browse/CARBONDATA-343 Project: CarbonData Issue Type: Improvement Reporter: Liang Chen Priority: Trivial The two rows code have some duplicated definition: - val table = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85461092 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle( Seq.empty } + private def validateDateFormat(dateFormat: String, dateDimensionsName: ArrayBuffer[String]): + Unit = { +if (dateFormat == "") { + throw new MalformedCarbonCommandException("Error: Option DateFormat is set an empty string.") +} else { + var dateFormats: Array[String] = dateFormat.split(",") --- End diff -- CarbonCommonConstant.COMMA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460088 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle( Seq.empty } + private def validateDateFormat(dateFormat: String, dateDimensionsName: ArrayBuffer[String]): + Unit = { +if (dateFormat == "") { + throw new MalformedCarbonCommandException("Error: Option DateFormat is set an empty string.") +} else { + var dateFormats: Array[String] = dateFormat.split(",") + for (singleDateFormat <- dateFormats) { +var dateFormatSplits: Array[String] = singleDateFormat.split(":", 2) +if (!dateDimensionsName.contains(dateFormatSplits(0))) { --- End diff -- take care case-insensitive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459286 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle( val allDictionaryPath = options.getOrElse("all_dictionary_path", "") val complex_delimiter_level_1 = options.getOrElse("complex_delimiter_level_1", "\\$") val complex_delimiter_level_2 = options.getOrElse("complex_delimiter_level_2", "\\:") + val timeFormat = options.getOrElse("timeformat", null) --- End diff -- "timeFormat" is useless --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460589 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -343,7 +345,8 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K } data.setGenerator( - KeyGeneratorFactory.getKeyGenerator(getUpdatedLens(meta.dimLens, meta.dimPresent))); + KeyGeneratorFactory.getKeyGenerator( + getUpdatedLens(meta.dimLens, meta.dimPresent))); --- End diff -- keep code style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459810 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle( val allDictionaryPath = options.getOrElse("all_dictionary_path", "") val complex_delimiter_level_1 = options.getOrElse("complex_delimiter_level_1", "\\$") val complex_delimiter_level_2 = options.getOrElse("complex_delimiter_level_2", "\\:") + val timeFormat = options.getOrElse("timeformat", null) + val dateFormat = options.getOrElse("dateformat", null) + val tableDimensions: util.List[CarbonDimension] = table.getDimensionByTableName(tableName) + val dateDimensionsName = new ArrayBuffer[String] + tableDimensions.toArray.foreach { +dimension => { + val columnSchema: ColumnSchema = dimension.asInstanceOf[CarbonDimension].getColumnSchema + if (columnSchema.getDataType.name == "TIMESTAMP") { +dateDimensionsName += columnSchema.getColumnName + } +} + } + if (dateFormat != null) { +validateDateFormat(dateFormat, dateDimensionsName) + } --- End diff -- please move these code into method validateDateFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460571 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -1176,11 +1215,8 @@ else if(isComplexTypeColumn[j]) { } ColumnSchemaDetails details = columnSchemaDetailsWrapper.get(dimensionColumnIds[m]); if (details.isDirectDictionary()) { - DirectDictionaryGenerator directDictionaryGenerator1 = - DirectDictionaryKeyGeneratorFactory - .getDirectDictionaryGenerator(details.getColumnType()); surrogateKeyForHrrchy[0] = - directDictionaryGenerator1.generateDirectSurrogateKey(tuple); + directDictionaryGenerators[m].generateDirectSurrogateKey(tuple); --- End diff -- take care code style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459633 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1156,6 +1169,9 @@ case class LoadTableUsingKettle( carbonLoadModel.setEscapeChar(escapeChar) carbonLoadModel.setQuoteChar(quoteChar) carbonLoadModel.setCommentChar(commentchar) + carbonLoadModel.setDateFormat(dateFormat) + carbonLoadModel.setSerializationNullFormat("serialization_null_format" + "," + +serializationNullFormat) --- End diff -- this code is useless --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460491 --- Diff: integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala --- @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.spark.testsuite.dataload + +import org.apache.spark.sql.common.util.CarbonHiveContext._ +import org.apache.spark.sql.common.util.QueryTest +import org.scalatest.BeforeAndAfterAll +import java.sql.Timestamp + +import org.apache.carbondata.core.constants.CarbonCommonConstants +import org.apache.carbondata.core.util.CarbonProperties +import org.apache.carbondata.spark.exception.MalformedCarbonCommandException +import org.apache.spark.sql.Row + +class TestLoadDataWithDiffTimestampFormat extends QueryTest with BeforeAndAfterAll { + override def beforeAll { +sql("DROP TABLE IF EXISTS t3") +sql(""" + CREATE TABLE IF NOT EXISTS t3 + (ID Int, date Timestamp, starttime Timestamp, country String, + name String, phonetype String, serialname String, salary Int) + STORED BY 'carbondata' +""") +CarbonProperties.getInstance() + .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, "/MM/dd") + } + + test("test load data with different timestamp format") { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = 'starttime:-MM-dd HH:mm:ss') + """) + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData2.csv' into table t3 + OPTIONS('dateformat' = 'date:-MM-dd,starttime:/MM/dd HH:mm:ss') + """) + checkAnswer( +sql("SELECT date FROM t3 WHERE ID = 1"), +Seq(Row(Timestamp.valueOf("2015-07-23 00:00:00.0"))) + ) + checkAnswer( +sql("SELECT starttime FROM t3 WHERE ID = 1"), +Seq(Row(Timestamp.valueOf("2016-07-23 01:01:30.0"))) + ) + checkAnswer( +sql("SELECT date FROM t3 WHERE ID = 18"), +Seq(Row(Timestamp.valueOf("2015-07-25 00:00:00.0"))) + ) + checkAnswer( +sql("SELECT starttime FROM t3 WHERE ID = 18"), +Seq(Row(Timestamp.valueOf("2016-07-25 02:32:02.0"))) + ) + } + + test("test load data with different timestamp format with being set an empty string") { +try { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = '') + """) + assert(false) +} catch { + case ex: MalformedCarbonCommandException => +assertResult(ex.getMessage)("Error: Option DateFormat is set an empty string.") + case _ => assert(false) +} + } + + test("test load data with different timestamp format with a wrong column name") { +try { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = 'fasfdas:/MM/dd') + """) + assert(false) +} catch { + case ex: MalformedCarbonCommandException => +assertResult(ex.getMessage)("Error: Wrong Column Name fasfdas is provided in Option DateFormat.") + case _ => assert(false) +} + } + + test("test load data with different timestamp format with a timestamp column is set an empty string") { +try { + sql(s""" + LOAD DATA LOCAL INPATH './src/test/resources/timeStampFormatData1.csv' into table t3 + OPTIONS('dateformat' = 'fasfdas:') + """) + assert(false) +} catch { + case
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user lion-x commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460233 --- Diff: processing/src/test/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGeneratorTest.java --- @@ -37,7 +37,7 @@ private int surrogateKey = -1; @Before public void setUp() throws Exception { -TimeStampDirectDictionaryGenerator generator = TimeStampDirectDictionaryGenerator.instance; +TimeStampDirectDictionaryGenerator generator = new TimeStampDirectDictionaryGenerator(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT); --- End diff -- ok --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85457738 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Block; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.index.IndexLoader; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * This segment is backed by index, thus getSplits can use the index to do file pruning. + */ +public class IndexedSegment extends Segment { + + private IndexLoader loader; + + public IndexedSegment(String name, String path, IndexLoader loader) { +super(name, path); +this.loader = loader; + } + + @Override + public List getSplits(JobContext job, FilterResolverIntf filterResolver) + throws IOException { +// do as following +// 1. create the index or get from cache by the filter name in the configuration +// 2. filter by index to get the filtered block +// 3. create input split from filtered block + +List output = new LinkedList<>(); +Index index = loader.load(job.getConfiguration()); --- End diff -- Yes, actually the IndexLoader is the factory, `load` function has the same functionality as `getInstance` in the factory. So I will keep the IndexLoader name? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85457078 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.api; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonProjection; +import org.apache.carbondata.hadoop.internal.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.segment.SegmentManager; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.hadoop.util.ObjectSerializationUtil; +import org.apache.carbondata.scan.expression.Expression; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; + +/** + * Input format of CarbonData file. + * @param + */ +public class CarbonTableInputFormat extends FileInputFormat { + + private static final String FILTER_PREDICATE = + "mapreduce.input.carboninputformat.filter.predicate"; + + private SegmentManager segmentManager; + + public CarbonTableInputFormat(SegmentManager segmentManager) { --- End diff -- accept --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #266: Add one FAQ in the READEME
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/266 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #261: [carbondata-339] Align storePath nam...
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/261 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #266: Add one FAQ in the READEME
GitHub user bill1208 opened a pull request: https://github.com/apache/incubator-carbondata/pull/266 Add one FAQ in the READEME Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- You can merge this pull request into a Git repository by running: $ git pull https://github.com/bill1208/incubator-carbondata master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/266.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #266 commit c9f92e8bcd5c1fd8e14f97b2b2f62892f675b8d8 Author: bill1208 Date: 2016-10-27T18:48:18Z Add one FAQ in the READEME --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #260: Add one FQA in Readme
Github user bill1208 closed the pull request at: https://github.com/apache/incubator-carbondata/pull/260 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #265: [WIP]Improve first time query perfor...
GitHub user kumarvishal09 opened a pull request: https://github.com/apache/incubator-carbondata/pull/265 [WIP]Improve first time query performance Be sure to do all of the following to help us incorporate your contribution quickly and easily: - [ ] Make sure the PR title is formatted like: `[CARBONDATA-] Description of pull request` - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable Travis-CI on your fork and ensure the whole test matrix passes). - [ ] Replace `` in the title with the actual Jira issue number, if there is one. - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.txt). - [ ] Testing done Please provide details on - Whether new unit test cases have been added or why no new tests are required? - What manual testing you have done? - Any additional information to help reviewers in testing this change. - [ ] For large changes, please consider breaking it into sub-tasks under an umbrella JIRA. --- Improve first time query performance You can merge this pull request into a Git repository by running: $ git pull https://github.com/kumarvishal09/incubator-carbondata FirstTimeQueryPerformanceImprovement Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/265.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #265 commit 6af76c5d2d9022740624429a31d657b81db83126 Author: kumarvishal Date: 2016-10-27T17:24:49Z Improve first time query performance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #264: [CARBONDATA-341] CarbonTableIdentifi...
GitHub user mohammadshahidkhan opened a pull request: https://github.com/apache/incubator-carbondata/pull/264 [CARBONDATA-341] CarbonTableIdentifier being passed to the query flow⦠# why CarbonTableIdentifier being passed to the query flow has wrong tableid. While creating the table the CarbonData system assign a uniqueID to the table. In all the places CarbonTableIdentifier should have the same tableId. But CarbonTableIdentifier is having the currentTimeStamp as tableId which is not correct. # Solution Pass the absolutetableidentifier and carbontableidentifier loaded from the carbondata schema file. You can merge this pull request into a Git repository by running: $ git pull https://github.com/mohammadshahidkhan/incubator-carbondata TableID_Fix Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/264.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #264 commit faa88070eeabe37be5db874c4b7908fc2a721da2 Author: mohammadshahidkhan Date: 2016-10-27T17:13:07Z [CARBONDATA-341] CarbonTableIdentifier being passed to the query flow has wrong tableid --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-342) Select query with 'in' has issue with where clause for int, bigint and decimal data types.
Chetan Bhat created CARBONDATA-342: -- Summary: Select query with 'in' has issue with where clause for int, bigint and decimal data types. Key: CARBONDATA-342 URL: https://issues.apache.org/jira/browse/CARBONDATA-342 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 0.1.0-incubating Environment: 3 node cluster. Spark 1.6.2 built for Hadoop 2.6.0 Hadoop 2.7.2 Reporter: Chetan Bhat Priority: Minor Fix For: 0.2.0-incubating Select query with 'in' has issue with where clause for int, bigint and decimal data types. Actual output as shown below - select queries with 'in' does not return any records in resultset. 0: jdbc:hive2://10.18.102.236:1> create table Test_Boundary (c1_int int,c2_Bigint Bigint,c3_Decimal Decimal(38,38),c4_double double,c5_string string,c6_Timestamp Timestamp,c7_Datatype_Desc string) STORED BY 'org.apache.carbondata.format'; +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.078 seconds) 0: jdbc:hive2://10.18.102.236:1> LOAD DATA INPATH 'hdfs://10.18.102.236:54310/chetan/Test_Data1.csv' INTO table Test_Boundary OPTIONS('DELIMITER'=',','QUOTECHAR'='"','FILEHEADER'=''); +-+--+ | Result | +-+--+ +-+--+ No rows selected (0.501 seconds) 0: jdbc:hive2://10.18.102.236:1> select c1_int from test_boundary where c1_int in (2.147483647E9,2345.0,1234.0); +-+--+ | c1_int | +-+--+ +-+--+ No rows selected (0.069 seconds) 0: jdbc:hive2://10.18.102.236:1> select c1_int from test_boundary where c1_int in (-2.147483647E9,2345.0,-1234.0); +-+--+ | c1_int | +-+--+ +-+--+ No rows selected (0.071 seconds) 0: jdbc:hive2://10.18.102.236:1> select c1_int from test_boundary where c1_int in (0,-1234.0); +-+--+ | c1_int | +-+--+ +-+--+ No rows selected (0.076 seconds) 0: jdbc:hive2://10.18.102.236:1> select c2_bigint from test_boundary where c2_bigint in (9223372036854775807,2345.0,1234.0); ++--+ | c2_bigint | ++--+ ++--+ No rows selected (0.059 seconds) 0: jdbc:hive2://10.18.102.236:1> select c2_bigint from test_boundary where c2_bigint in (-9223372036854775808,2345.0,-1234.0); ++--+ | c2_bigint | ++--+ ++--+ No rows selected (0.077 seconds) 0: jdbc:hive2://10.18.102.236:1> select c2_bigint from test_boundary where c2_bigint in (0,-1234.0); ++--+ | c2_bigint | ++--+ ++--+ No rows selected (0.062 seconds) 0: jdbc:hive2://10.18.102.236:1> select c3_decimal from test_boundary where c3_decimal in (0,-1234.0); +-+--+ | c3_decimal | +-+--+ +-+--+ No rows selected (0.072 seconds) Expected Output should be as shown below :- 0: jdbc:hive2://ha-cluster/default> select c1_int from test_boundary where c1_int in (2.147483647E9,2345.0,1234.0); +-+--+ | c1_int| +-+--+ | 2147483647 | | 2147483647 | | 2345| | 1234| +-+--+ 4 rows selected (0.388 seconds) 0: jdbc:hive2://ha-cluster/default> select c1_int from test_boundary where c1_int in (-2.147483647E9,2345.0,-1234.0); +--+--+ |c1_int| +--+--+ | -2147483647 | | 2345 | +--+--+ 2 rows selected (0.258 seconds) 0: jdbc:hive2://ha-cluster/default> select c1_int from test_boundary where c1_int in (0,-1234.0); +-+--+ | c1_int | +-+--+ | 0 | +-+--+ 1 row selected (0.255 seconds) 0: jdbc:hive2://ha-cluster/default> select c2_bigint from test_boundary where c2_bigint in (9223372036854775807,2345.0,1234.0); +--+--+ | c2_bigint | +--+--+ | 9223372036854775807 | | 9223372036854775807 | | 9223372036854775807 | | 9223372036854775807 | | 2345 | | 1234 | +--+--+ 6 rows selected (0.331 seconds) 0: jdbc:hive2://ha-cluster/default> select c2_bigint from test_boundary where c2_bigint in (-9223372036854775808,2345.0,-1234.0); +---+--+ | c2_bigint | +---+--+ | -9223372036854775808 | | 2345 | +---+--+ 2 rows selected (0.299 seconds) 0: jdbc:hive2://ha-cluster/default> select c2_bigint from test_boundary where c2_bigint in (0,-1234.0); ++--+ | c2_bigint | ++--+ | 0 | ++--+ 1 row selected (0.263 seconds) 0: jdbc:hive2://ha-cluster/default> select c3_decimal from test_boundary where c3_decimal in (0,-1234.0); +-+--+ | c3_decimal | +-+--+ | 0E-38 | | 0E-38 | +-+--+ 2 rows selected (0.273 seconds) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-341) CarbonTableIdentifier being passed to the query flow has wrong tableid
Mohammad Shahid Khan created CARBONDATA-341: --- Summary: CarbonTableIdentifier being passed to the query flow has wrong tableid Key: CARBONDATA-341 URL: https://issues.apache.org/jira/browse/CARBONDATA-341 Project: CarbonData Issue Type: Bug Components: data-query, hadoop-integration, spark-integration Affects Versions: 0.1.0-incubating Reporter: Mohammad Shahid Khan Assignee: Mohammad Shahid Khan CarbonTableIdentifier being passed to the query flow has wrong tableid. While creating the table the CarbonData system assign a uniqueID to the table. In all the places CarbonTableIdentifier should have the same tableId. But CarbonTableIdentifier is having the currentTimeStamp as tableId which is not correct. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85346673 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Block; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.index.IndexLoader; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * This segment is backed by index, thus getSplits can use the index to do file pruning. + */ +public class IndexedSegment extends Segment { + + private IndexLoader loader; + + public IndexedSegment(String name, String path, IndexLoader loader) { +super(name, path); +this.loader = loader; + } + + @Override + public List getSplits(JobContext job, FilterResolverIntf filterResolver) + throws IOException { +// do as following +// 1. create the index or get from cache by the filter name in the configuration +// 2. filter by index to get the filtered block +// 3. create input split from filtered block + +List output = new LinkedList<>(); +Index index = loader.load(job.getConfiguration()); --- End diff -- does it required to load index every time? I guess we are just creating the instance of index here, so why don't you use factory here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85343636 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.impl; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { + + private static final Log LOG = LogFactory.getLog(InMemoryBTreeIndex.class); + private Segment segment; + + InMemoryBTreeIndex(Segment segment) { --- End diff -- I guess we supposed to pass list of valid segments here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85340545 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java --- @@ -0,0 +1,215 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.impl; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { + + private static final Log LOG = LogFactory.getLog(InMemoryBTreeIndex.class); + private Segment segment; + + InMemoryBTreeIndex(Segment segment) { +this.segment = segment; + } + + @Override + public String getName() { +return null; + } + + @Override + public List filter(JobContext job, FilterResolverIntf filter) --- End diff -- It seems method return type is incompatible. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85339106 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/CarbonFormat.java --- @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal; + +public enum CarbonFormat { + COLUMNR --- End diff -- typo : COLUMNAR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85337928 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java --- @@ -0,0 +1,169 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.api; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonProjection; +import org.apache.carbondata.hadoop.internal.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.internal.segment.SegmentManager; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.hadoop.util.ObjectSerializationUtil; +import org.apache.carbondata.scan.expression.Expression; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; + +/** + * Input format of CarbonData file. + * @param + */ +public class CarbonTableInputFormat extends FileInputFormat { + + private static final String FILTER_PREDICATE = + "mapreduce.input.carboninputformat.filter.predicate"; + + private SegmentManager segmentManager; + + public CarbonTableInputFormat(SegmentManager segmentManager) { +this.segmentManager = segmentManager; + } + + @Override + public RecordReader createRecordReader(InputSplit split, + TaskAttemptContext context) throws IOException, InterruptedException { +switch (((CarbonInputSplit)split).formatType()) { --- End diff -- Why don't you take the formatType from job conf? Better don't touch InputSplit as it comes from outside. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2][WIP] Data load integr...
GitHub user ravipesala opened a pull request: https://github.com/apache/incubator-carbondata/pull/263 [CARBONDATA-2][WIP] Data load integration of all steps for removing kettle This PR integrates all data load steps to the main flow. Still DataWriterStep need to be integrated.And testing is pending. You can merge this pull request into a Git repository by running: $ git pull https://github.com/ravipesala/incubator-carbondata data-load-integration Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/263.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #263 commit c4bd3a14e3e1f437d365c9f9e4dc21b2d69f56ec Author: ravipesala Date: 2016-10-27T03:44:32Z WIP Integrating new dataloading flow commit 6aa1e738c02e2906b43b372bcad0ed8096962ddf Author: ravipesala Date: 2016-10-27T12:41:11Z Integrated data processor steps to new flow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[jira] [Created] (CARBONDATA-340) Implement test cases for load package in core module
Anurag Srivastava created CARBONDATA-340: Summary: Implement test cases for load package in core module Key: CARBONDATA-340 URL: https://issues.apache.org/jira/browse/CARBONDATA-340 Project: CarbonData Issue Type: Test Reporter: Anurag Srivastava Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)