date:20161027

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85480562
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
+List blocks = index.filter(job, filterResolver);
--- End diff --

You are right, I will modify


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

List the supported datatypes in carbondata

2016-10-27 Thread Swati

Hi,

I would like to know about the datatypes which are supported by carbondata
as I am getting this error *"MalformedCarbonCommandException:  Unsupported
data type"* while running the queries against these datatypes :*" tiny Int,
smallInt, float, date, varchar, char, boolean, binary, Map, Union "*. Also
there is no information available in the documentation about supported
datatypes.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/List-the-supported-datatypes-in-carbondata-tp2419.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[GitHub] incubator-carbondata pull request #267: implement test cases for core.load m...

2016-10-27 Thread anuragknoldus

GitHub user anuragknoldus opened a pull request:

https://github.com/apache/incubator-carbondata/pull/267

implement test cases for core.load module

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/anuragknoldus/incubator-carbondata 
CARBONDATA-340

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #267


commit c3d66fc38d3b04983697ca28fba9d8ccbeeebf6a
Author: Anurag 
Date:   2016-10-28T05:04:51Z

implement test cases for core.load module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85470691
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.internal.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.segment.SegmentManager;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.scan.expression.Expression;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+
+/**
+ * Input format of CarbonData file.
+ * @param 
+ */
+public class CarbonTableInputFormat extends FileInputFormat {
+
+  private static final String FILTER_PREDICATE =
+  "mapreduce.input.carboninputformat.filter.predicate";
+
+  private SegmentManager segmentManager;
+
+  public CarbonTableInputFormat(SegmentManager segmentManager) {
+this.segmentManager = segmentManager;
+  }
+
+  @Override
+  public RecordReader createRecordReader(InputSplit split,
+  TaskAttemptContext context) throws IOException, InterruptedException 
{
+switch (((CarbonInputSplit)split).formatType()) {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85470293
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
--- End diff --

if loader internally implement cache then we can keep as `IndexLoader` only.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread lion-x

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85466134
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
 ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.dataload
+
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import java.sql.Timestamp
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.spark.sql.Row
+
+class TestLoadDataWithDiffTimestampFormat extends QueryTest with 
BeforeAndAfterAll {
+  override def beforeAll {
+sql("DROP TABLE IF EXISTS t3")
+sql("""
+   CREATE TABLE IF NOT EXISTS t3
+   (ID Int, date Timestamp, starttime Timestamp, country String,
+   name String, phonetype String, serialname String, salary Int)
+   STORED BY 'carbondata'
+""")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, 
"/MM/dd")
+  }
+
+  test("test load data with different timestamp format") {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'starttime:-MM-dd HH:mm:ss')
+   """)
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData2.csv' into table t3
+   OPTIONS('dateformat' = 'date:-MM-dd,starttime:/MM/dd 
HH:mm:ss')
+   """)
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2015-07-23 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2016-07-23 01:01:30.0")))
+  )
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2015-07-25 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2016-07-25 02:32:02.0")))
+  )
+  }
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread lion-x

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85466025
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1129,6 +1130,9 @@ case class LoadTable(
   carbonLoadModel.setEscapeChar(escapeChar)
   carbonLoadModel.setQuoteChar(quoteChar)
   carbonLoadModel.setCommentChar(commentchar)
+  carbonLoadModel.setDateFormat(dateFormat)
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread lion-x

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85465773
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
 ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.dataload
+
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import java.sql.Timestamp
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import 
org.apache.carbondata.spark.exception.MalformedCarbonCommandException
+import org.apache.spark.sql.Row
+
+class TestLoadDataWithDiffTimestampFormat extends QueryTest with 
BeforeAndAfterAll {
+  override def beforeAll {
+sql("DROP TABLE IF EXISTS t3")
+sql("""
+   CREATE TABLE IF NOT EXISTS t3
+   (ID Int, date Timestamp, starttime Timestamp, country String,
+   name String, phonetype String, serialname String, salary Int)
+   STORED BY 'carbondata'
+""")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, 
"/MM/dd")
+  }
+
+  test("test load data with different timestamp format") {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'starttime:-MM-dd HH:mm:ss')
+   """)
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData2.csv' into table t3
+   OPTIONS('dateformat' = 'date:-MM-dd,starttime:/MM/dd 
HH:mm:ss')
+   """)
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2015-07-23 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2016-07-23 01:01:30.0")))
+  )
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2015-07-25 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2016-07-25 02:32:02.0")))
+  )
+  }
+
+  test("test load data with different timestamp format with being set an 
empty string") {
+try {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = '')
+   """)
+  assert(false)
+} catch {
+  case ex: MalformedCarbonCommandException =>
+assertResult(ex.getMessage)("Error: Option DateFormat is set an 
empty string.")
+  case _ => assert(false)
+}
+  }
+
+  test("test load data with different timestamp format with a wrong column 
name") {
+try {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'fasfdas:/MM/dd')
+   """)
+  assert(false)
+} catch {
+  case ex: MalformedCarbonCommandException =>
+assertResult(ex.getMessage)("Error: Wrong Column Name fasfdas is 
provided in Option DateFormat.")
+  case _ => assert(false)
+}
+  }
+
+  test("test load data with different timestamp format with a timestamp 
column is set an empty string") {
+try {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'fasfdas:')
+   """)
+  assert(false)
+} catch {
+  case e

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread lion-x

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85464806
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle(
 Seq.empty
   }
 
+  private def validateDateFormat(dateFormat: String, dateDimensionsName: 
ArrayBuffer[String]):
+  Unit = {
+if (dateFormat == "") {
+  throw new MalformedCarbonCommandException("Error: Option DateFormat 
is set an empty string.")
+} else {
+  var dateFormats: Array[String] = dateFormat.split(",")
+  for (singleDateFormat <- dateFormats) {
+var dateFormatSplits: Array[String] = singleDateFormat.split(":", 
2)
+if (!dateDimensionsName.contains(dateFormatSplits(0))) {
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85463723
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/Segment.java 
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * Within a carbon table, each data load becomes one Segment, which stores 
all data files belong to this load in
+ * the segment folder.
+ */
+public abstract class Segment {
+
+  protected String id;
+
+  /**
+   * Path of the segment folder
+   */
+  private String path;
+
+  public Segment(String id, String path) {
+this.id = id;
+this.path = path;
+  }
+
+  public String getId() {
+return id;
+  }
+
+  public String getPath() {
+return path;
+  }
+
+  /**
+   * return all InputSplit of this segment, each file is a InputSplit
+   * @param job job context
+   * @return all InputSplit
+   * @throws IOException
+   */
+  public List getAllSplits(JobContext job) throws IOException {
--- End diff --

I suggest to return List


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85464310
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
--- End diff --

please use internal.CarbonInputSplit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (CARBONDATA-343) Optimize the duplicated definition code in GlobalDictionaryUtil.scala

2016-10-27 Thread Liang Chen (JIRA)

Liang Chen created CARBONDATA-343:
-

 Summary: Optimize the duplicated definition code in 
GlobalDictionaryUtil.scala 
 Key: CARBONDATA-343
 URL: https://issues.apache.org/jira/browse/CARBONDATA-343
 Project: CarbonData
  Issue Type: Improvement
Reporter: Liang Chen
Priority: Trivial


The two rows code have some duplicated definition:
-
val table = 
carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier

val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85461092
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle(
 Seq.empty
   }
 
+  private def validateDateFormat(dateFormat: String, dateDimensionsName: 
ArrayBuffer[String]):
+  Unit = {
+if (dateFormat == "") {
+  throw new MalformedCarbonCommandException("Error: Option DateFormat 
is set an empty string.")
+} else {
+  var dateFormats: Array[String] = dateFormat.split(",")
--- End diff --

CarbonCommonConstant.COMMA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460088
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle(
 Seq.empty
   }
 
+  private def validateDateFormat(dateFormat: String, dateDimensionsName: 
ArrayBuffer[String]):
+  Unit = {
+if (dateFormat == "") {
+  throw new MalformedCarbonCommandException("Error: Option DateFormat 
is set an empty string.")
+} else {
+  var dateFormats: Array[String] = dateFormat.split(",")
+  for (singleDateFormat <- dateFormats) {
+var dateFormatSplits: Array[String] = singleDateFormat.split(":", 
2)
+if (!dateDimensionsName.contains(dateFormatSplits(0))) {
--- End diff --

take care case-insensitive


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459286
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle(
   val allDictionaryPath = options.getOrElse("all_dictionary_path", "")
   val complex_delimiter_level_1 = 
options.getOrElse("complex_delimiter_level_1", "\\$")
   val complex_delimiter_level_2 = 
options.getOrElse("complex_delimiter_level_2", "\\:")
+  val timeFormat = options.getOrElse("timeformat", null)
--- End diff --

"timeFormat" is useless


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460589
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -343,7 +345,8 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   }
 
   data.setGenerator(
-  
KeyGeneratorFactory.getKeyGenerator(getUpdatedLens(meta.dimLens, 
meta.dimPresent)));
+  KeyGeneratorFactory.getKeyGenerator(
+  getUpdatedLens(meta.dimLens, meta.dimPresent)));
--- End diff --

keep code style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459810
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle(
   val allDictionaryPath = options.getOrElse("all_dictionary_path", "")
   val complex_delimiter_level_1 = 
options.getOrElse("complex_delimiter_level_1", "\\$")
   val complex_delimiter_level_2 = 
options.getOrElse("complex_delimiter_level_2", "\\:")
+  val timeFormat = options.getOrElse("timeformat", null)
+  val dateFormat = options.getOrElse("dateformat", null)
+  val tableDimensions: util.List[CarbonDimension] = 
table.getDimensionByTableName(tableName)
+  val dateDimensionsName = new ArrayBuffer[String]
+  tableDimensions.toArray.foreach {
+dimension => {
+  val columnSchema: ColumnSchema = 
dimension.asInstanceOf[CarbonDimension].getColumnSchema
+  if (columnSchema.getDataType.name == "TIMESTAMP") {
+dateDimensionsName += columnSchema.getColumnName
+  }
+}
+  }
+  if (dateFormat != null) {
+validateDateFormat(dateFormat, dateDimensionsName)
+  }
--- End diff --

please move these code into method validateDateFormat


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460571
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -1176,11 +1215,8 @@ else if(isComplexTypeColumn[j]) {
 }
 ColumnSchemaDetails details = 
columnSchemaDetailsWrapper.get(dimensionColumnIds[m]);
 if (details.isDirectDictionary()) {
-  DirectDictionaryGenerator directDictionaryGenerator1 =
-  DirectDictionaryKeyGeneratorFactory
-  
.getDirectDictionaryGenerator(details.getColumnType());
   surrogateKeyForHrrchy[0] =
-  
directDictionaryGenerator1.generateDirectSurrogateKey(tuple);
+  
directDictionaryGenerators[m].generateDirectSurrogateKey(tuple);
--- End diff --

take care code style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459633
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1156,6 +1169,9 @@ case class LoadTableUsingKettle(
   carbonLoadModel.setEscapeChar(escapeChar)
   carbonLoadModel.setQuoteChar(quoteChar)
   carbonLoadModel.setCommentChar(commentchar)
+  carbonLoadModel.setDateFormat(dateFormat)
+  
carbonLoadModel.setSerializationNullFormat("serialization_null_format" + "," +
+serializationNullFormat)
--- End diff --

this code is useless


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460491
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
 ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.dataload
+
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import java.sql.Timestamp
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import 
org.apache.carbondata.spark.exception.MalformedCarbonCommandException
+import org.apache.spark.sql.Row
+
+class TestLoadDataWithDiffTimestampFormat extends QueryTest with 
BeforeAndAfterAll {
+  override def beforeAll {
+sql("DROP TABLE IF EXISTS t3")
+sql("""
+   CREATE TABLE IF NOT EXISTS t3
+   (ID Int, date Timestamp, starttime Timestamp, country String,
+   name String, phonetype String, serialname String, salary Int)
+   STORED BY 'carbondata'
+""")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, 
"/MM/dd")
+  }
+
+  test("test load data with different timestamp format") {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'starttime:-MM-dd HH:mm:ss')
+   """)
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData2.csv' into table t3
+   OPTIONS('dateformat' = 'date:-MM-dd,starttime:/MM/dd 
HH:mm:ss')
+   """)
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2015-07-23 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2016-07-23 01:01:30.0")))
+  )
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2015-07-25 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2016-07-25 02:32:02.0")))
+  )
+  }
+
+  test("test load data with different timestamp format with being set an 
empty string") {
+try {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = '')
+   """)
+  assert(false)
+} catch {
+  case ex: MalformedCarbonCommandException =>
+assertResult(ex.getMessage)("Error: Option DateFormat is set an 
empty string.")
+  case _ => assert(false)
+}
+  }
+
+  test("test load data with different timestamp format with a wrong column 
name") {
+try {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'fasfdas:/MM/dd')
+   """)
+  assert(false)
+} catch {
+  case ex: MalformedCarbonCommandException =>
+assertResult(ex.getMessage)("Error: Wrong Column Name fasfdas is 
provided in Option DateFormat.")
+  case _ => assert(false)
+}
+  }
+
+  test("test load data with different timestamp format with a timestamp 
column is set an empty string") {
+try {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'fasfdas:')
+   """)
+  assert(false)
+} catch {
+  case

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread lion-x

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460233
  
--- Diff: 
processing/src/test/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGeneratorTest.java
 ---
@@ -37,7 +37,7 @@
   private int surrogateKey = -1;
 
   @Before public void setUp() throws Exception {
-TimeStampDirectDictionaryGenerator generator = 
TimeStampDirectDictionaryGenerator.instance;
+TimeStampDirectDictionaryGenerator generator = new 
TimeStampDirectDictionaryGenerator(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

ok


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85457738
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
--- End diff --

Yes, actually the IndexLoader is the factory, `load` function has the same 
functionality as `getInstance` in the factory. So I will keep the IndexLoader 
name? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread jackylk

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85457078
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.internal.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.segment.SegmentManager;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.scan.expression.Expression;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+
+/**
+ * Input format of CarbonData file.
+ * @param 
+ */
+public class CarbonTableInputFormat extends FileInputFormat {
+
+  private static final String FILTER_PREDICATE =
+  "mapreduce.input.carboninputformat.filter.predicate";
+
+  private SegmentManager segmentManager;
+
+  public CarbonTableInputFormat(SegmentManager segmentManager) {
--- End diff --

accept


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #266: Add one FAQ in the READEME

2016-10-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/266


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #261: [carbondata-339] Align storePath nam...

2016-10-27 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/261


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #266: Add one FAQ in the READEME

2016-10-27 Thread bill1208

GitHub user bill1208 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/266

Add one FAQ in the READEME

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/bill1208/incubator-carbondata master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #266


commit c9f92e8bcd5c1fd8e14f97b2b2f62892f675b8d8
Author: bill1208 
Date:   2016-10-27T18:48:18Z

Add one FAQ in the READEME




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #260: Add one FQA in Readme

2016-10-27 Thread bill1208

Github user bill1208 closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/260


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #265: [WIP]Improve first time query perfor...

2016-10-27 Thread kumarvishal09

GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/265

[WIP]Improve first time query performance

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue


   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---
Improve first time query performance


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
FirstTimeQueryPerformanceImprovement

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/265.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #265


commit 6af76c5d2d9022740624429a31d657b81db83126
Author: kumarvishal 
Date:   2016-10-27T17:24:49Z

Improve first time query performance




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #264: [CARBONDATA-341] CarbonTableIdentifi...

2016-10-27 Thread mohammadshahidkhan

GitHub user mohammadshahidkhan opened a pull request:

https://github.com/apache/incubator-carbondata/pull/264

[CARBONDATA-341] CarbonTableIdentifier being passed to the query flowâ¦

# why
CarbonTableIdentifier being passed to the query flow has wrong tableid.
While creating the table the CarbonData system assign a uniqueID to the 
table. In all the places CarbonTableIdentifier should have the same tableId.
But CarbonTableIdentifier is having the currentTimeStamp as tableId which 
is not correct.
# Solution
Pass the absolutetableidentifier and carbontableidentifier loaded from the 
carbondata schema file.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mohammadshahidkhan/incubator-carbondata 
TableID_Fix

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/264.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #264


commit faa88070eeabe37be5db874c4b7908fc2a721da2
Author: mohammadshahidkhan 
Date:   2016-10-27T17:13:07Z

[CARBONDATA-341] CarbonTableIdentifier being passed to the query flow has 
wrong tableid




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (CARBONDATA-342) Select query with 'in' has issue with where clause for int, bigint and decimal data types.

2016-10-27 Thread Chetan Bhat (JIRA)

Chetan Bhat created CARBONDATA-342:
--

 Summary: Select query with 'in' has issue with where clause for 
int, bigint and decimal data types.
 Key: CARBONDATA-342
 URL: https://issues.apache.org/jira/browse/CARBONDATA-342
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 0.1.0-incubating
 Environment: 3 node cluster.
Spark 1.6.2 built for Hadoop 2.6.0
Hadoop 2.7.2
Reporter: Chetan Bhat
Priority: Minor
 Fix For: 0.2.0-incubating


Select query with 'in' has issue with where clause for int, bigint and decimal 
data types.

Actual output as shown below - select queries with 'in' does not return any 
records in resultset.

0: jdbc:hive2://10.18.102.236:1> create table Test_Boundary (c1_int 
int,c2_Bigint Bigint,c3_Decimal Decimal(38,38),c4_double double,c5_string 
string,c6_Timestamp Timestamp,c7_Datatype_Desc string) STORED BY 
'org.apache.carbondata.format';
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.078 seconds)
0: jdbc:hive2://10.18.102.236:1> LOAD DATA INPATH 
'hdfs://10.18.102.236:54310/chetan/Test_Data1.csv' INTO table Test_Boundary 
OPTIONS('DELIMITER'=',','QUOTECHAR'='"','FILEHEADER'='');
+-+--+
| Result  |
+-+--+
+-+--+
No rows selected (0.501 seconds)
0: jdbc:hive2://10.18.102.236:1> select c1_int from test_boundary where 
c1_int in (2.147483647E9,2345.0,1234.0);
+-+--+
| c1_int  |
+-+--+
+-+--+
No rows selected (0.069 seconds)
0: jdbc:hive2://10.18.102.236:1> select c1_int from test_boundary where 
c1_int in (-2.147483647E9,2345.0,-1234.0);
+-+--+
| c1_int  |
+-+--+
+-+--+
No rows selected (0.071 seconds)
0: jdbc:hive2://10.18.102.236:1> select c1_int from test_boundary where 
c1_int in (0,-1234.0);
+-+--+
| c1_int  |
+-+--+
+-+--+
No rows selected (0.076 seconds)
0: jdbc:hive2://10.18.102.236:1> select c2_bigint from test_boundary where 
c2_bigint in (9223372036854775807,2345.0,1234.0);
++--+
| c2_bigint  |
++--+
++--+
No rows selected (0.059 seconds)
0: jdbc:hive2://10.18.102.236:1> select c2_bigint from test_boundary where 
c2_bigint in (-9223372036854775808,2345.0,-1234.0);
++--+
| c2_bigint  |
++--+
++--+
No rows selected (0.077 seconds)
0: jdbc:hive2://10.18.102.236:1> select c2_bigint from test_boundary where 
c2_bigint in (0,-1234.0);
++--+
| c2_bigint  |
++--+
++--+
No rows selected (0.062 seconds)

0: jdbc:hive2://10.18.102.236:1> select c3_decimal from test_boundary where 
c3_decimal in (0,-1234.0);
+-+--+
| c3_decimal  |
+-+--+
+-+--+
No rows selected (0.072 seconds)



Expected Output should be as shown below :-


0: jdbc:hive2://ha-cluster/default> select c1_int from test_boundary where 
c1_int in (2.147483647E9,2345.0,1234.0);
+-+--+
|   c1_int|
+-+--+
| 2147483647  |
| 2147483647  |
| 2345|
| 1234|
+-+--+
4 rows selected (0.388 seconds)
0: jdbc:hive2://ha-cluster/default> select c1_int from test_boundary where 
c1_int in (-2.147483647E9,2345.0,-1234.0);
+--+--+
|c1_int|
+--+--+
| -2147483647  |
| 2345 |
+--+--+
2 rows selected (0.258 seconds)
0: jdbc:hive2://ha-cluster/default> select c1_int from test_boundary where 
c1_int in (0,-1234.0);
+-+--+
| c1_int  |
+-+--+
| 0   |
+-+--+
1 row selected (0.255 seconds)
0: jdbc:hive2://ha-cluster/default> select c2_bigint from test_boundary where 
c2_bigint in (9223372036854775807,2345.0,1234.0);
+--+--+
|  c2_bigint   |
+--+--+
| 9223372036854775807  |
| 9223372036854775807  |
| 9223372036854775807  |
| 9223372036854775807  |
| 2345 |
| 1234 |
+--+--+
6 rows selected (0.331 seconds)
0: jdbc:hive2://ha-cluster/default> select c2_bigint from test_boundary where 
c2_bigint in (-9223372036854775808,2345.0,-1234.0);
+---+--+
|   c2_bigint   |
+---+--+
| -9223372036854775808  |
| 2345  |
+---+--+
2 rows selected (0.299 seconds)
0: jdbc:hive2://ha-cluster/default> select c2_bigint from test_boundary where 
c2_bigint in (0,-1234.0);
++--+
| c2_bigint  |
++--+
| 0  |
++--+
1 row selected (0.263 seconds)

0: jdbc:hive2://ha-cluster/default> select c3_decimal from test_boundary where 
c3_decimal in (0,-1234.0);
+-+--+
| c3_decimal  |
+-+--+
| 0E-38   |
| 0E-38   |
+-+--+
2 rows selected (0.273 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-341) CarbonTableIdentifier being passed to the query flow has wrong tableid

2016-10-27 Thread Mohammad Shahid Khan (JIRA)

Mohammad Shahid Khan created CARBONDATA-341:
---

 Summary: CarbonTableIdentifier being passed to the query flow has 
wrong tableid
 Key: CARBONDATA-341
 URL: https://issues.apache.org/jira/browse/CARBONDATA-341
 Project: CarbonData
  Issue Type: Bug
  Components: data-query, hadoop-integration, spark-integration
Affects Versions: 0.1.0-incubating
Reporter: Mohammad Shahid Khan
Assignee: Mohammad Shahid Khan


CarbonTableIdentifier being passed to the query flow has wrong tableid.
While creating the table the CarbonData system assign a uniqueID to the table. 
In all the places CarbonTableIdentifier  should have the same tableId.
But CarbonTableIdentifier  is having the currentTimeStamp as tableId which is 
not correct. 




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85346673
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
--- End diff --

does it required to load index every time?
I guess we are just creating the instance of index here, so why don't you 
use factory here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85343636
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.impl;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
--- End diff --

I guess we supposed to pass list of valid segments here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85340545
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.impl;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
+this.segment = segment;
+  }
+
+  @Override
+  public String getName() {
+return null;
+  }
+
+  @Override
+  public List filter(JobContext job, FilterResolverIntf filter)
--- End diff --

It seems method return type is incompatible. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85339106
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/CarbonFormat.java ---
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal;
+
+public enum CarbonFormat {
+  COLUMNR
--- End diff --

typo : COLUMNAR


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread ravipesala

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85337928
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.internal.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.segment.SegmentManager;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.scan.expression.Expression;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+
+/**
+ * Input format of CarbonData file.
+ * @param 
+ */
+public class CarbonTableInputFormat extends FileInputFormat {
+
+  private static final String FILTER_PREDICATE =
+  "mapreduce.input.carboninputformat.filter.predicate";
+
+  private SegmentManager segmentManager;
+
+  public CarbonTableInputFormat(SegmentManager segmentManager) {
+this.segmentManager = segmentManager;
+  }
+
+  @Override
+  public RecordReader createRecordReader(InputSplit split,
+  TaskAttemptContext context) throws IOException, InterruptedException 
{
+switch (((CarbonInputSplit)split).formatType()) {
--- End diff --

Why don't you take the formatType from job conf? Better don't touch 
InputSplit as it comes from outside. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #263: [CARBONDATA-2][WIP] Data load integr...

2016-10-27 Thread ravipesala

GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/263

[CARBONDATA-2][WIP] Data load integration of all steps for removing kettle

This PR integrates all data load steps to the main flow. 
Still DataWriterStep need to be integrated.And testing is pending.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
data-load-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #263


commit c4bd3a14e3e1f437d365c9f9e4dc21b2d69f56ec
Author: ravipesala 
Date:   2016-10-27T03:44:32Z

WIP Integrating new dataloading flow

commit 6aa1e738c02e2906b43b372bcad0ed8096962ddf
Author: ravipesala 
Date:   2016-10-27T12:41:11Z

Integrated data processor steps to new flow.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[jira] [Created] (CARBONDATA-340) Implement test cases for load package in core module

2016-10-27 Thread Anurag Srivastava (JIRA)

Anurag Srivastava created CARBONDATA-340:


 Summary: Implement test cases for load package in core module
 Key: CARBONDATA-340
 URL: https://issues.apache.org/jira/browse/CARBONDATA-340
 Project: CarbonData
  Issue Type: Test
Reporter: Anurag Srivastava
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

38 matches

Mail list logo