[jira] [Commented] (CARBONDATA-385) Select query is giving cast exception

2016-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644428#comment-15644428
 ] 

ASF GitHub Bot commented on CARBONDATA-385:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/301


> Select query is giving cast exception
> -
>
> Key: CARBONDATA-385
> URL: https://issues.apache.org/jira/browse/CARBONDATA-385
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ashok Kumar
>Priority: Minor
>
> In below scenario, select query is giving error as below
> Error: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
> 1. employee table
> create table employee(name string, empid string, mgrid string, mobileno 
> bigint) stored by 'carbondata'
> 2.load below data in employee table
>  tom,t23717,h2399,99780207526
> 3. manager table
> create table manager(name string, empid string, mgrid string, mobileno 
> bigint) stored by 'carbondata'
> 4. load below data in manager table
>  harry,h2399,v788232,99823230205
> 5.Run below query
> select e.empid from employee e inner join manager m on e.mgrid=m.empid#select 
> empid,mgrid from employee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-308) Use CarbonInputFormat in CarbonScanRDD compute

2016-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644342#comment-15644342
 ] 

ASF GitHub Bot commented on CARBONDATA-308:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86786637
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/carbon/datastore/block/Distributable.java
 ---
@@ -16,10 +16,12 @@
  */
 package org.apache.carbondata.core.carbon.datastore.block;
 
+import java.io.IOException;
+
 /**
- * Abstract class which is maintains the locations of node.
+ * interface to get the locations of node. Used for making task 
distribution based on locality
  */
-public abstract class Distributable implements Comparable {
+public interface Distributable extends Comparable {
 
-  public abstract String[] getLocations();
+  String[] getLocations() throws IOException;
--- End diff --

Because CarbonInputSplit need to implement Distributable, and InputSplit 
has a getLocation function that throws IOException


> Use CarbonInputFormat in CarbonScanRDD compute
> --
>
> Key: CARBONDATA-308
> URL: https://issues.apache.org/jira/browse/CARBONDATA-308
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> 1. In driver side, only getSplit is required, so only filter condition is 
> required, no need to create full QueryModel object, so we can move creation 
> of QueryModel from driver side to executor side.
> 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead 
> of use QueryExecutor directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-359) is null & null functions are not working when data fetching from sub query

2016-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15644202#comment-15644202
 ] 

ASF GitHub Bot commented on CARBONDATA-359:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/300


> is null & null functions are not working when data fetching from sub query
> --
>
> Key: CARBONDATA-359
> URL: https://issues.apache.org/jira/browse/CARBONDATA-359
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 0.2.0-incubating
>Reporter: Krishna Reddy
> Fix For: 0.2.0-incubating
>
>
> Is null & Not null functions are not working when data fetching from sub 
> query. 
> select * from (select if( Latest_areaId=3,1,Latest_areaId) test from 
> Carbon_automation) qq where test is  null; 
> select * from (select if( Latest_areaId=3,1,Latest_areaId) test from 
> Carbon_automation) qq where test is  not null; 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-386) Write unit test for Util Module

2016-11-07 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15643729#comment-15643729
 ] 

ASF GitHub Bot commented on CARBONDATA-386:
---

GitHub user abhisheknoldus opened a pull request:

https://github.com/apache/incubator-carbondata/pull/303

[CARBONDATA-386] Unit test case for 
CarbonMetadataUtil,CarbonMergerUtil,DataFileFooterConverter,DataTypeUtil

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [x] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/abhisheknoldus/incubator-carbondata 
util_unit_test

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/303.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #303


commit 5b52a5cd67ee79a052bb0030f7be02fae06976a8
Author: abhishek 
Date:   2016-10-28T17:42:39Z

CarbonMergerUtil test

commit 0d5b61dc150ab98322faa7943cd9874d056bc926
Author: abhishek 
Date:   2016-10-28T17:45:39Z

CarbonMetadataUtil test

commit e789faaece7e1c33237cb4c398a8ac84b24c5c2a
Author: abhishek 
Date:   2016-10-28T17:50:11Z

DataTypeUtil test

commit 3a8e9f03076b317265065a02351d553d921ab039
Author: abhishek 
Date:   2016-10-28T17:42:39Z

CarbonMergerUtil test

commit 5da31de3a61a4651d484a9ec81996560e10bc957
Author: abhishek 
Date:   2016-10-28T17:45:39Z

CarbonMetadataUtil test

commit e2097db6cfee17d725853888c11be5fe2995096e
Author: abhishek 
Date:   2016-10-28T17:50:11Z

DataTypeUtil test

commit 3eb56d2d4ccdc643c127e3d5b788c1a88ea82b92
Author: abhishek 
Date:   2016-10-30T11:32:07Z

datatypeutil test

commit 1c425fcb11a5aded3ffae838fee00acdd77e592a
Author: abhishek 
Date:   2016-11-01T12:51:56Z

DataTypeUtil test

commit b67ef41c2068b605b808af4ba64331a48c21e722
Author: abhishek 
Date:   2016-11-02T06:28:01Z

Apache License added

commit 1399d4a4ec00ffefeab2a1ab3c610c3d8e7b3208
Author: abhishek 
Date:   2016-11-07T07:37:26Z

carbon merger util test

commit 9d77d137b74923bfbc360c9059fa3c8d213be34d
Author: abhishek 
Date:   2016-11-07T07:38:10Z

data type util test

commit b16b68ed872fad1d31afd6a3f3b97f4232ca99dc
Author: abhishek 
Date:   2016-11-07T07:38:43Z

data file footer converter test

commit b9a7ffb3298e7547a124f72a48a263e965890937
Author: abhishek 
Date:   2016-11-07T07:42:47Z

removed println

commit 58bae0ed20bd440373982f260b7e0ea795efe30d
Author: abhishek 
Date:   2016-11-07T08:43:50Z

resolved conflict




> Write unit test for Util Module
> ---
>
> Key: CARBONDATA-386
> URL: https://issues.apache.org/jira/browse/CARBONDATA-386
> Project: CarbonData
>  Issue Type: Test
>Reporter: Prabhat Kashyap
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-385) Select query is giving cast exception

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15643406#comment-15643406
 ] 

ASF GitHub Bot commented on CARBONDATA-385:
---

GitHub user ashokblend opened a pull request:

https://github.com/apache/incubator-carbondata/pull/301

[WIP][CARBONDATA-385]During join operation, same column is available in two 
table. User do…

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---

… select on first table column and add filter on other table column.

Because of this, top level decoder finds that column is already decoded in 
bottom layer(because its comparing by name) and hence its not decoding.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ashokblend/incubator-carbondata carbondata-385

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/301.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #301


commit 730a2cccd2ea88260030c2efe8cdb7bc060c20e8
Author: ashok.blend 
Date:   2016-11-07T08:07:28Z

During join operation, same column is available in two table. User do 
select on first table column and add filter on other table column.
Because of this, top level decoder finds that column is already decoded in 
bottom layer(because its comparing by name) and hence its not decoding.




> Select query is giving cast exception
> -
>
> Key: CARBONDATA-385
> URL: https://issues.apache.org/jira/browse/CARBONDATA-385
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ashok Kumar
>Priority: Minor
>
> In below scenario, select query is giving error as below
> Error: java.lang.ClassCastException: java.lang.Integer cannot be cast to 
> java.lang.String
> 1. employee table
> create table employee(name string, empid string, mgrid string, mobileno 
> bigint) stored by 'carbondata'
> 2.load below data in employee table
>  tom,t23717,h2399,99780207526
> 3. manager table
> create table manager(name string, empid string, mgrid string, mobileno 
> bigint) stored by 'carbondata'
> 4. load below data in manager table
>  harry,h2399,v788232,99823230205
> 5.Run below query
> select e.empid from employee e inner join manager m on e.mgrid=m.empid#select 
> empid,mgrid from employee



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-359) is null & null functions are not working when data fetching from sub query

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15643048#comment-15643048
 ] 

ASF GitHub Bot commented on CARBONDATA-359:
---

GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/300

[CARBONDATA-359]is null & not null functions are not working when data 
fetching from sub query

https://issues.apache.org/jira/browse/CARBONDATA-359


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
null_notnull_bug

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/300.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #300


commit c659e01ab43b259fa7625a81a67f3e080b386516
Author: ravipesala 
Date:   2016-11-07T04:11:10Z

Fixed null and not null in sub queries




> is null & null functions are not working when data fetching from sub query
> --
>
> Key: CARBONDATA-359
> URL: https://issues.apache.org/jira/browse/CARBONDATA-359
> Project: CarbonData
>  Issue Type: Bug
>  Components: data-query
>Affects Versions: 0.2.0-incubating
>Reporter: Krishna Reddy
> Fix For: 0.2.0-incubating
>
>
> Is null & Not null functions are not working when data fetching from sub 
> query. 
> select * from (select if( Latest_areaId=3,1,Latest_areaId) test from 
> Carbon_automation) qq where test is  null; 
> select * from (select if( Latest_areaId=3,1,Latest_areaId) test from 
> Carbon_automation) qq where test is  not null; 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-384) Add Table Properties Options Validation

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-384?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15643034#comment-15643034
 ] 

ASF GitHub Bot commented on CARBONDATA-384:
---

GitHub user lion-x opened a pull request:

https://github.com/apache/incubator-carbondata/pull/299

[CARBONDATA-384]Add Table Properties Options Validation

# Why raise this PR?
Now, Carbon does not validate the table properties options. This will cause 
below problem.
For example,
Create table carbontable (...)
TBLPROPERTIES ('DICTIONARY_EXELUDE'='colname');
user wants to use DICTIONARY_EXCLUDE property, but he types a wrong option 
name, the setting will be ignored with no error throw.

# How to test?
Pass all test case.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lion-x/incubator-carbondata validateTblOptions

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/299.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #299


commit f19122e5441835a9a4298fb327eb80c373810e6e
Author: lion-x 
Date:   2016-11-07T04:00:19Z

validateTblOptions




> Add Table Properties Options Validation
> ---
>
> Key: CARBONDATA-384
> URL: https://issues.apache.org/jira/browse/CARBONDATA-384
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Now, Carbon does not validate the table properties options. This will cause 
> below problem.
> For example,
> Create table carbontable (...)
> TBLPROPERTIES ('DICTIONARY_EXELUDE'='colname');
> user wants to use DICTIONARY_EXCLUDE property, but he types a wrong option 
> name, the setting will be ignored with no error throw.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-364) Drop table is behaving inconsistently

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642875#comment-15642875
 ] 

ASF GitHub Bot commented on CARBONDATA-364:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/282


> Drop table is behaving inconsistently
> -
>
> Key: CARBONDATA-364
> URL: https://issues.apache.org/jira/browse/CARBONDATA-364
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Ashok Kumar
>Priority: Minor
>
> Scenario
> Run load command on table and then run drop table command for same table.
> Drop table will give message as table is locked for updation but actually its 
> deleting all files table related files from store.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-367) Add support alluxio(tachyon) file system(enhance ecosystem integration)

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642864#comment-15642864
 ] 

ASF GitHub Bot commented on CARBONDATA-367:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/287


> Add support alluxio(tachyon) file system(enhance ecosystem integration)
> ---
>
> Key: CARBONDATA-367
> URL: https://issues.apache.org/jira/browse/CARBONDATA-367
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Liang Chen
>Assignee: Liang Chen
>Priority: Minor
> Fix For: 0.3.0-incubating
>
>
> For supporting alluxio users to use higher performance file 
> format(CarbonData), and enhance Apache CarbonData ecosystem integration.
> Can load alluxio file for example "alluxio://localhost:19998/data.csv" to 
> Carbon Data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-374) Short data type is not working.

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15642751#comment-15642751
 ] 

ASF GitHub Bot commented on CARBONDATA-374:
---

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/293#discussion_r86702631
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithHiveSyntax.scala
 ---
@@ -76,6 +77,35 @@ class TestLoadDataWithHiveSyntax extends QueryTest with 
BeforeAndAfterAll {
 
   }
 
+  test("create table with smallint type and query smallint table")({
--- End diff --

Why you create a hive table instead of carbon table?


> Short data type is not working.
> ---
>
> Key: CARBONDATA-374
> URL: https://issues.apache.org/jira/browse/CARBONDATA-374
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SWATI RAO
>Assignee: cen yuhai
>
> Short datatype is not working as you have mentioned it is supported datatype 
> in the below link:
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/List-the-supported-datatypes-in-carbondata-td2419.html
> e.g:
> create table testTable(id Short, name String) stored by 'carbondata' ;
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> 'Short' ',' 'name' in column type; line 1 pos 26 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-2) Remove kettle for loading data

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641625#comment-15641625
 ] 

ASF GitHub Bot commented on CARBONDATA-2:
-

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r86684118
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala
 ---
@@ -0,0 +1,281 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.rdd
+
+import java.lang.Long
+import java.text.SimpleDateFormat
+import java.util
+import java.util.{Date, UUID}
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.io.NullWritable
+import org.apache.hadoop.mapreduce.RecordReader
+import org.apache.spark.{Logging, Partition, SparkContext, TaskContext}
+import org.apache.spark.mapred.{CarbonHadoopMapReduceUtil, 
CarbonSerializableConfiguration}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.execution.command.Partitioner
+import org.apache.spark.util.SerializableConfiguration
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.common.logging.impl.StandardLogService
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.load.{BlockDetails, LoadMetadataDetails}
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory
+import org.apache.carbondata.hadoop.csv.CSVInputFormat
+import org.apache.carbondata.hadoop.io.StringArrayWritable
+import org.apache.carbondata.processing.graphgenerator.GraphGenerator
+import org.apache.carbondata.spark.DataLoadResult
+import org.apache.carbondata.spark.load._
+import org.apache.carbondata.spark.splits.TableSplit
+import org.apache.carbondata.spark.util.CarbonQueryUtil
+
+/**
+ * It loads the data to carbon using @AbstractDataLoadProcessorStep
+ */
+class NewCarbonDataLoadRDD[K, V](
+sc: SparkContext,
+result: DataLoadResult[K, V],
+carbonLoadModel: CarbonLoadModel,
+var storeLocation: String,
+hdfsStoreLocation: String,
+kettleHomePath: String,
+partitioner: Partitioner,
+columinar: Boolean,
+loadCount: Integer,
+tableCreationTime: Long,
+schemaLastUpdatedTime: Long,
+blocksGroupBy: Array[(String, Array[BlockDetails])],
+isTableSplitPartition: Boolean)
+  extends RDD[(K, V)](sc, Nil) with CarbonHadoopMapReduceUtil with Logging 
{
+
+  sc.setLocalProperty("spark.scheduler.pool", "DDL")
+
+  private val jobTrackerId: String = {
+val formatter = new SimpleDateFormat("MMddHHmm")
+formatter.format(new Date())
+  }
+
+  // A Hadoop Configuration can be about 10 KB, which is pretty big, so 
broadcast it
+  private val confBroadcast =
+sc.broadcast(new 
CarbonSerializableConfiguration(sc.hadoopConfiguration))
+
+  override def getPartitions: Array[Partition] = {
+if (isTableSplitPartition) {
+  // for table split partition
+  var splits = Array[TableSplit]()
+
+  if (carbonLoadModel.isDirectLoad) {
+splits = 
CarbonQueryUtil.getTableSplitsForDirectLoad(carbonLoadModel.getFactFilePath,
+  partitioner.nodeList, partitioner.partitionCount)
+  }
+  else {
+splits = 
CarbonQueryUtil.getTableSplits(carbonLoadModel.getDatabaseName,
+  carbonLoadModel.getTableName, null, partitioner)
+  }
+
+  splits.zipWithIndex.map { s =>
+// filter the same partition unique id, because only one will 
match, so get 0 element
+val blocksDetails: Array[BlockDetails] = blocksGroupBy.filter(p =>
+  p._1 == s._1.getPartition.getUniqueID)(0)._2
+new CarbonTableSplitPartition(id, s._2, 

[jira] [Commented] (CARBONDATA-2) Remove kettle for loading data

2016-11-06 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15641585#comment-15641585
 ] 

ASF GitHub Bot commented on CARBONDATA-2:
-

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r86683747
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala
 ---
@@ -0,0 +1,281 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.rdd
+
+import java.lang.Long
+import java.text.SimpleDateFormat
+import java.util
+import java.util.{Date, UUID}
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.io.NullWritable
+import org.apache.hadoop.mapreduce.RecordReader
+import org.apache.spark.{Logging, Partition, SparkContext, TaskContext}
+import org.apache.spark.mapred.{CarbonHadoopMapReduceUtil, 
CarbonSerializableConfiguration}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.execution.command.Partitioner
+import org.apache.spark.util.SerializableConfiguration
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.common.logging.impl.StandardLogService
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.load.{BlockDetails, LoadMetadataDetails}
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory
+import org.apache.carbondata.hadoop.csv.CSVInputFormat
+import org.apache.carbondata.hadoop.io.StringArrayWritable
+import org.apache.carbondata.processing.graphgenerator.GraphGenerator
+import org.apache.carbondata.spark.DataLoadResult
+import org.apache.carbondata.spark.load._
+import org.apache.carbondata.spark.splits.TableSplit
+import org.apache.carbondata.spark.util.CarbonQueryUtil
+
+/**
+ * It loads the data to carbon using @AbstractDataLoadProcessorStep
+ */
+class NewCarbonDataLoadRDD[K, V](
+sc: SparkContext,
+result: DataLoadResult[K, V],
+carbonLoadModel: CarbonLoadModel,
+var storeLocation: String,
+hdfsStoreLocation: String,
+kettleHomePath: String,
+partitioner: Partitioner,
+columinar: Boolean,
+loadCount: Integer,
+tableCreationTime: Long,
+schemaLastUpdatedTime: Long,
+blocksGroupBy: Array[(String, Array[BlockDetails])],
+isTableSplitPartition: Boolean)
+  extends RDD[(K, V)](sc, Nil) with CarbonHadoopMapReduceUtil with Logging 
{
+
+  sc.setLocalProperty("spark.scheduler.pool", "DDL")
+
+  private val jobTrackerId: String = {
+val formatter = new SimpleDateFormat("MMddHHmm")
+formatter.format(new Date())
+  }
+
+  // A Hadoop Configuration can be about 10 KB, which is pretty big, so 
broadcast it
+  private val confBroadcast =
+sc.broadcast(new 
CarbonSerializableConfiguration(sc.hadoopConfiguration))
+
+  override def getPartitions: Array[Partition] = {
+if (isTableSplitPartition) {
+  // for table split partition
+  var splits = Array[TableSplit]()
--- End diff --

ok


> Remove kettle for loading data
> --
>
> Key: CARBONDATA-2
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: Liang Chen
>Priority: Critical
>  Labels: features
> Fix For: 0.3.0-incubating
>
> Attachments: CarbonDataLoadingdesign.pdf
>
>
> Remove kettle for loading data module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-2) Remove kettle for loading data

2016-11-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15639794#comment-15639794
 ] 

ASF GitHub Bot commented on CARBONDATA-2:
-

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r8739
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1198,10 +1172,16 @@ case class LoadTableUsingKettle(
 GlobalDictionaryUtil
   .generateGlobalDictionary(sqlContext, carbonLoadModel, 
relation.tableMeta.storePath,
 dataFrame)
-CarbonDataRDDFactory
-  .loadCarbonData(sqlContext, carbonLoadModel, storeLocation, 
relation.tableMeta.storePath,
+CarbonDataRDDFactory.loadCarbonData(sqlContext,
+carbonLoadModel,
+storeLocation,
+relation.tableMeta.storePath,
--- End diff --

why there are two storeLocation?


> Remove kettle for loading data
> --
>
> Key: CARBONDATA-2
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: Liang Chen
>Priority: Critical
>  Labels: features
> Fix For: 0.3.0-incubating
>
> Attachments: CarbonDataLoadingdesign.pdf
>
>
> Remove kettle for loading data module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-2) Remove kettle for loading data

2016-11-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15639710#comment-15639710
 ] 

ASF GitHub Bot commented on CARBONDATA-2:
-

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r86665773
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala
 ---
@@ -0,0 +1,281 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.rdd
+
+import java.lang.Long
+import java.text.SimpleDateFormat
+import java.util
+import java.util.{Date, UUID}
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.io.NullWritable
+import org.apache.hadoop.mapreduce.RecordReader
+import org.apache.spark.{Logging, Partition, SparkContext, TaskContext}
+import org.apache.spark.mapred.{CarbonHadoopMapReduceUtil, 
CarbonSerializableConfiguration}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.execution.command.Partitioner
+import org.apache.spark.util.SerializableConfiguration
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.common.logging.impl.StandardLogService
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.load.{BlockDetails, LoadMetadataDetails}
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory
+import org.apache.carbondata.hadoop.csv.CSVInputFormat
+import org.apache.carbondata.hadoop.io.StringArrayWritable
+import org.apache.carbondata.processing.graphgenerator.GraphGenerator
+import org.apache.carbondata.spark.DataLoadResult
+import org.apache.carbondata.spark.load._
+import org.apache.carbondata.spark.splits.TableSplit
+import org.apache.carbondata.spark.util.CarbonQueryUtil
+
+/**
+ * It loads the data to carbon using @AbstractDataLoadProcessorStep
+ */
+class NewCarbonDataLoadRDD[K, V](
+sc: SparkContext,
+result: DataLoadResult[K, V],
+carbonLoadModel: CarbonLoadModel,
+var storeLocation: String,
+hdfsStoreLocation: String,
+kettleHomePath: String,
+partitioner: Partitioner,
+columinar: Boolean,
+loadCount: Integer,
+tableCreationTime: Long,
+schemaLastUpdatedTime: Long,
+blocksGroupBy: Array[(String, Array[BlockDetails])],
+isTableSplitPartition: Boolean)
+  extends RDD[(K, V)](sc, Nil) with CarbonHadoopMapReduceUtil with Logging 
{
+
+  sc.setLocalProperty("spark.scheduler.pool", "DDL")
+
+  private val jobTrackerId: String = {
+val formatter = new SimpleDateFormat("MMddHHmm")
+formatter.format(new Date())
+  }
+
+  // A Hadoop Configuration can be about 10 KB, which is pretty big, so 
broadcast it
+  private val confBroadcast =
+sc.broadcast(new 
CarbonSerializableConfiguration(sc.hadoopConfiguration))
+
+  override def getPartitions: Array[Partition] = {
+if (isTableSplitPartition) {
+  // for table split partition
+  var splits = Array[TableSplit]()
+
+  if (carbonLoadModel.isDirectLoad) {
+splits = 
CarbonQueryUtil.getTableSplitsForDirectLoad(carbonLoadModel.getFactFilePath,
+  partitioner.nodeList, partitioner.partitionCount)
+  }
+  else {
+splits = 
CarbonQueryUtil.getTableSplits(carbonLoadModel.getDatabaseName,
+  carbonLoadModel.getTableName, null, partitioner)
+  }
+
+  splits.zipWithIndex.map { s =>
+// filter the same partition unique id, because only one will 
match, so get 0 element
+val blocksDetails: Array[BlockDetails] = blocksGroupBy.filter(p =>
+  p._1 == s._1.getPartition.getUniqueID)(0)._2
+new CarbonTableSplitPartition(id, s._2, 

[jira] [Commented] (CARBONDATA-2) Remove kettle for loading data

2016-11-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15639584#comment-15639584
 ] 

ASF GitHub Bot commented on CARBONDATA-2:
-

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r86664148
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/NewCarbonDataLoadRDD.scala
 ---
@@ -0,0 +1,281 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.carbondata.spark.rdd
+
+import java.lang.Long
+import java.text.SimpleDateFormat
+import java.util
+import java.util.{Date, UUID}
+
+import scala.collection.JavaConverters._
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.io.NullWritable
+import org.apache.hadoop.mapreduce.RecordReader
+import org.apache.spark.{Logging, Partition, SparkContext, TaskContext}
+import org.apache.spark.mapred.{CarbonHadoopMapReduceUtil, 
CarbonSerializableConfiguration}
+import org.apache.spark.rdd.RDD
+import org.apache.spark.sql.execution.command.Partitioner
+import org.apache.spark.util.SerializableConfiguration
+
+import org.apache.carbondata.common.logging.LogServiceFactory
+import org.apache.carbondata.common.logging.impl.StandardLogService
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.load.{BlockDetails, LoadMetadataDetails}
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory
+import org.apache.carbondata.hadoop.csv.CSVInputFormat
+import org.apache.carbondata.hadoop.io.StringArrayWritable
+import org.apache.carbondata.processing.graphgenerator.GraphGenerator
+import org.apache.carbondata.spark.DataLoadResult
+import org.apache.carbondata.spark.load._
+import org.apache.carbondata.spark.splits.TableSplit
+import org.apache.carbondata.spark.util.CarbonQueryUtil
+
+/**
+ * It loads the data to carbon using @AbstractDataLoadProcessorStep
+ */
+class NewCarbonDataLoadRDD[K, V](
+sc: SparkContext,
+result: DataLoadResult[K, V],
+carbonLoadModel: CarbonLoadModel,
+var storeLocation: String,
+hdfsStoreLocation: String,
+kettleHomePath: String,
+partitioner: Partitioner,
+columinar: Boolean,
+loadCount: Integer,
+tableCreationTime: Long,
+schemaLastUpdatedTime: Long,
+blocksGroupBy: Array[(String, Array[BlockDetails])],
+isTableSplitPartition: Boolean)
+  extends RDD[(K, V)](sc, Nil) with CarbonHadoopMapReduceUtil with Logging 
{
+
+  sc.setLocalProperty("spark.scheduler.pool", "DDL")
+
+  private val jobTrackerId: String = {
+val formatter = new SimpleDateFormat("MMddHHmm")
+formatter.format(new Date())
+  }
+
+  // A Hadoop Configuration can be about 10 KB, which is pretty big, so 
broadcast it
+  private val confBroadcast =
+sc.broadcast(new 
CarbonSerializableConfiguration(sc.hadoopConfiguration))
+
+  override def getPartitions: Array[Partition] = {
+if (isTableSplitPartition) {
+  // for table split partition
+  var splits = Array[TableSplit]()
--- End diff --

unnecessary initialization


> Remove kettle for loading data
> --
>
> Key: CARBONDATA-2
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2
> Project: CarbonData
>  Issue Type: Improvement
>  Components: data-load
>Reporter: Liang Chen
>Priority: Critical
>  Labels: features
> Fix For: 0.3.0-incubating
>
> Attachments: CarbonDataLoadingdesign.pdf
>
>
> Remove kettle for loading data module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15638192#comment-15638192
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/297


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15637276#comment-15637276
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/208


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-375) Dictionary cache not getting cleared after task completion in dictionary decoder

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15636189#comment-15636189
 ] 

ASF GitHub Bot commented on CARBONDATA-375:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/292


> Dictionary cache not getting cleared after task completion in dictionary 
> decoder
> 
>
> Key: CARBONDATA-375
> URL: https://issues.apache.org/jira/browse/CARBONDATA-375
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Manish Gupta
>Assignee: Manish Gupta
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Currently LRU cache eviction policy is based on dictionary access count. For 
> cache to remove a entry its access count must be 0. In dictionary decoder 
> after conversion of surrogate key to actual value the access count for 
> dictionary columns in query is not getting decremented due to which it will 
> never be cleared from memory when LRU cache size is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-283) Improve the test cases for concurrent scenarios

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635994#comment-15635994
 ] 

ASF GitHub Bot commented on CARBONDATA-283:
---

Github user ManoharVanam commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/207#discussion_r86523287
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Date;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.load.LoadMetadataDetails;
+
+/**
+ * This class contains all table status file utilities
+ */
+public final class CarbonTableStatusUtil {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName());
+
+  private CarbonTableStatusUtil() {
+
+  }
+
+  /**
+   * updates table status details using latest metadata
+   *
+   * @param oldMetadata
+   * @param newMetadata
+   * @return
+   */
+
+  public static List updateLatestTableStatusDetails(
--- End diff --

ok


> Improve the test cases for concurrent scenarios
> ---
>
> Key: CARBONDATA-283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-283
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Manohar Vanam
>Assignee: Manohar Vanam
>Priority: Minor
>
> Improve test cases for data retention concurrent scenarios



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-283) Improve the test cases for concurrent scenarios

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635992#comment-15635992
 ] 

ASF GitHub Bot commented on CARBONDATA-283:
---

Github user ManoharVanam commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/207#discussion_r86523222
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Date;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.load.LoadMetadataDetails;
+
+/**
+ * This class contains all table status file utilities
+ */
+public final class CarbonTableStatusUtil {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName());
+
+  private CarbonTableStatusUtil() {
+
+  }
+
+  /**
+   * updates table status details using latest metadata
+   *
+   * @param oldMetadata
+   * @param newMetadata
+   * @return
+   */
+
+  public static List updateLatestTableStatusDetails(
+  LoadMetadataDetails[] oldMetadata, LoadMetadataDetails[] 
newMetadata) {
+
+List newListMetadata =
+new ArrayList(Arrays.asList(newMetadata));
+for (LoadMetadataDetails oldSegment : oldMetadata) {
+  if 
(CarbonCommonConstants.MARKED_FOR_DELETE.equalsIgnoreCase(oldSegment.getLoadStatus()))
 {
+
updateSegmentMetadataDetails(newListMetadata.get(newListMetadata.indexOf(oldSegment)));
+  }
+}
+return newListMetadata;
+  }
+
+  /**
+   * returns current time
+   *
+   * @return
+   */
+  private static String readCurrentTime() {
+SimpleDateFormat sdf = new 
SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP);
+String date = null;
+
+date = sdf.format(new Date());
+
+return date;
+  }
+
+  /**
+   * updates segment status and modificaton time details
+   *
+   * @param loadMetadata
+   */
+  public static void updateSegmentMetadataDetails(LoadMetadataDetails 
loadMetadata) {
--- End diff --

ok


> Improve the test cases for concurrent scenarios
> ---
>
> Key: CARBONDATA-283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-283
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Manohar Vanam
>Assignee: Manohar Vanam
>Priority: Minor
>
> Improve test cases for data retention concurrent scenarios



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-381) Unnecessary catalog metadata refresh and array index of bound exception in drop table

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635803#comment-15635803
 ] 

ASF GitHub Bot commented on CARBONDATA-381:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/294


> Unnecessary catalog metadata refresh and array index of bound exception in 
> drop table
> -
>
> Key: CARBONDATA-381
> URL: https://issues.apache.org/jira/browse/CARBONDATA-381
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Manish Gupta
>Assignee: Manish Gupta
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Problem:
> 1. Whenever a catalog metadata is refreshed it modified the timestamp of 
> modifiedTime.mdt file which leads to unnecessary refreshing the complete 
> catalog metadata.
> 2. Array Index of bound exception is thrown on failure of table creation.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-374) Short data type is not working.

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635547#comment-15635547
 ] 

ASF GitHub Bot commented on CARBONDATA-374:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/293#discussion_r86498453
  
--- Diff: 
integration/spark/src/test/scala/org/apache/spark/sql/TestCreateTable.scala ---
@@ -0,0 +1,30 @@
+/*
--- End diff --

And also please include data load and sql query to retrieve data


> Short data type is not working.
> ---
>
> Key: CARBONDATA-374
> URL: https://issues.apache.org/jira/browse/CARBONDATA-374
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SWATI RAO
>Assignee: cen yuhai
>
> Short datatype is not working as you have mentioned it is supported datatype 
> in the below link:
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/List-the-supported-datatypes-in-carbondata-td2419.html
> e.g:
> create table testTable(id Short, name String) stored by 'carbondata' ;
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> 'Short' ',' 'name' in column type; line 1 pos 26 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-374) Short data type is not working.

2016-11-04 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635418#comment-15635418
 ] 

ASF GitHub Bot commented on CARBONDATA-374:
---

Github user cenyuhai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/293#discussion_r86494087
  
--- Diff: 
integration/spark/src/test/scala/org/apache/spark/sql/TestCreateTable.scala ---
@@ -0,0 +1,30 @@
+/*
--- End diff --

ok


> Short data type is not working.
> ---
>
> Key: CARBONDATA-374
> URL: https://issues.apache.org/jira/browse/CARBONDATA-374
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SWATI RAO
>Assignee: cen yuhai
>
> Short datatype is not working as you have mentioned it is supported datatype 
> in the below link:
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/List-the-supported-datatypes-in-carbondata-td2419.html
> e.g:
> create table testTable(id Short, name String) stored by 'carbondata' ;
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> 'Short' ',' 'name' in column type; line 1 pos 26 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-351) name of thrift file is not unified

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635178#comment-15635178
 ] 

ASF GitHub Bot commented on CARBONDATA-351:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/291


> name of thrift file is not unified
> --
>
> Key: CARBONDATA-351
> URL: https://issues.apache.org/jira/browse/CARBONDATA-351
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Jay
>Assignee: cen yuhai
>Priority: Trivial
>
> in carbon-format module, some file name is not unified.
> for example,  carbondataindex.thrift can be changed to  
> carbondata_index.thrift ,  dictionary_meta.thrift  can be changed to 
> dictionary_metadata.thrift.. and so on .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-374) Short data type is not working.

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15635131#comment-15635131
 ] 

ASF GitHub Bot commented on CARBONDATA-374:
---

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/293#discussion_r86485669
  
--- Diff: 
integration/spark/src/test/scala/org/apache/spark/sql/TestCreateTable.scala ---
@@ -0,0 +1,30 @@
+/*
--- End diff --

pls move the contents into 
incubator-carbondata\integration\spark\src\test\scala\org\apache\carbondata\spark\testsuite\createtable\TestCreateTableSyntax.scala


> Short data type is not working.
> ---
>
> Key: CARBONDATA-374
> URL: https://issues.apache.org/jira/browse/CARBONDATA-374
> Project: CarbonData
>  Issue Type: Bug
>Reporter: SWATI RAO
>Assignee: cen yuhai
>
> Short datatype is not working as you have mentioned it is supported datatype 
> in the below link:
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/List-the-supported-datatypes-in-carbondata-td2419.html
> e.g:
> create table testTable(id Short, name String) stored by 'carbondata' ;
> Error: org.apache.spark.sql.AnalysisException: cannot recognize input near 
> 'Short' ',' 'name' in column type; line 1 pos 26 (state=,code=0)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-308) Use CarbonInputFormat in CarbonScanRDD compute

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633470#comment-15633470
 ] 

ASF GitHub Bot commented on CARBONDATA-308:
---

Github user kumarvishal09 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86393676
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputFormat.java ---
@@ -311,80 +278,6 @@ private void addSegmentsIfEmpty(JobContext job, 
AbsoluteTableIdentifier absolute
 return result;
   }
 
-  /**
-   * get total number of rows. Same as count(*)
-   *
-   * @throws IOException
-   * @throws IndexBuilderException
-   */
-  public long getRowCount(JobContext job) throws IOException, 
IndexBuilderException {
--- End diff --

This method is useful for count(*) query as we can return number of rows 
from driver itself , currently we are pushing down to executor, better keep 
this method it will be useful.


> Use CarbonInputFormat in CarbonScanRDD compute
> --
>
> Key: CARBONDATA-308
> URL: https://issues.apache.org/jira/browse/CARBONDATA-308
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> 1. In driver side, only getSplit is required, so only filter condition is 
> required, no need to create full QueryModel object, so we can move creation 
> of QueryModel from driver side to executor side.
> 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead 
> of use QueryExecutor directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-375) Dictionary cache not getting cleared after task completion in dictionary decoder

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633257#comment-15633257
 ] 

ASF GitHub Bot commented on CARBONDATA-375:
---

GitHub user manishgupta88 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/292

[CARBONDATA-375] Dictionary cache not getting cleared after task completion 
in dictionary decoder

Problem: Dictionary cache not getting cleared after task completion in 
dictionary decoder

Analysis: Currently LRU cache eviction policy is based on dictionary access 
count. For cache to remove a entry its access count must be 0. In dictionary 
decoder after conversion of surrogate key to actual value the access count for 
dictionary columns in query is not getting decremented due to which it will 
never be cleared from memory when LRU cache size is configured.

Fix: Add a task completion listener which will take care of clearing the 
dictionary in case of both success and failure

Impact area: LRU cache eviction policy which can lead to query and data 
load failure

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/manishgupta88/incubator-carbondata 
dictionary_decoder_clear_dictionary

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/292.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #292


commit b305f34e1014267b3706c287cef7070189fc3c28
Author: manishgupta88 
Date:   2016-11-03T15:48:03Z

Problem: Dictionary cache not getting cleared after task completion in 
dictionary decoder

Analysis: Currently LRU cache eviction policy is based on dictionary access 
count. For cache to remove a entry its access count must be 0. In dictionary 
decoder after conversion of surrogate key to actual value the access count for 
dictionary columns in query is not getting decremented due to which it will 
never be cleared from memory when LRU cache size is configured.

Fix: Add a task completion listener which will take care of clearing the 
dictionary in case of both success and failure

Impact area: LRU cache eviction policy which can lead to query and data 
load failure




> Dictionary cache not getting cleared after task completion in dictionary 
> decoder
> 
>
> Key: CARBONDATA-375
> URL: https://issues.apache.org/jira/browse/CARBONDATA-375
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Manish Gupta
>Assignee: Manish Gupta
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Currently LRU cache eviction policy is based on dictionary access count. For 
> cache to remove a entry its access count must be 0. In dictionary decoder 
> after conversion of surrogate key to actual value the access count for 
> dictionary columns in query is not getting decremented due to which it will 
> never be cleared from memory when LRU cache size is configured.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-283) Improve the test cases for concurrent scenarios

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15633010#comment-15633010
 ] 

ASF GitHub Bot commented on CARBONDATA-283:
---

Github user gvramana commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/207#discussion_r86347851
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonTableStatusUtil.java
 ---
@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.util;
+
+import java.text.SimpleDateFormat;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Date;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.load.LoadMetadataDetails;
+
+/**
+ * This class contains all table status file utilities
+ */
+public final class CarbonTableStatusUtil {
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(CarbonTableStatusUtil.class.getName());
+
+  private CarbonTableStatusUtil() {
+
+  }
+
+  /**
+   * updates table status details using latest metadata
+   *
+   * @param oldMetadata
+   * @param newMetadata
+   * @return
+   */
+
+  public static List updateLatestTableStatusDetails(
+  LoadMetadataDetails[] oldMetadata, LoadMetadataDetails[] 
newMetadata) {
+
+List newListMetadata =
+new ArrayList(Arrays.asList(newMetadata));
+for (LoadMetadataDetails oldSegment : oldMetadata) {
+  if 
(CarbonCommonConstants.MARKED_FOR_DELETE.equalsIgnoreCase(oldSegment.getLoadStatus()))
 {
+
updateSegmentMetadataDetails(newListMetadata.get(newListMetadata.indexOf(oldSegment)));
+  }
+}
+return newListMetadata;
+  }
+
+  /**
+   * returns current time
+   *
+   * @return
+   */
+  private static String readCurrentTime() {
+SimpleDateFormat sdf = new 
SimpleDateFormat(CarbonCommonConstants.CARBON_TIMESTAMP);
+String date = null;
+
+date = sdf.format(new Date());
+
+return date;
+  }
+
+  /**
+   * updates segment status and modificaton time details
+   *
+   * @param loadMetadata
+   */
+  public static void updateSegmentMetadataDetails(LoadMetadataDetails 
loadMetadata) {
--- End diff --

Move these function to status Manager


> Improve the test cases for concurrent scenarios
> ---
>
> Key: CARBONDATA-283
> URL: https://issues.apache.org/jira/browse/CARBONDATA-283
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Manohar Vanam
>Assignee: Manohar Vanam
>Priority: Minor
>
> Improve test cases for data retention concurrent scenarios



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-371) Write unit test for ColumnDictionaryInfo

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632648#comment-15632648
 ] 

ASF GitHub Bot commented on CARBONDATA-371:
---

GitHub user harmeetsingh0013 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/290

[CARBONDATA-371] Write unit test for ColumnDictionaryInfo

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [x] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [x] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [x] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [x] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/harmeetsingh0013/incubator-carbondata 
CARBONDATA-371

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/290.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #290


commit f007240376a11a9f2e1e172cc2bffd4b1ad4340a
Author: harmeetsingh0013 
Date:   2016-11-03T12:09:37Z

Write unit test cases for ColumnDictionaryInfo

commit 3802bbf528cb3deceef15cdd1bb4e48073a4570f
Author: harmeetsingh0013 
Date:   2016-11-03T12:38:13Z

Add apache license in javadocs




> Write unit test for ColumnDictionaryInfo
> 
>
> Key: CARBONDATA-371
> URL: https://issues.apache.org/jira/browse/CARBONDATA-371
> Project: CarbonData
>  Issue Type: Test
>Reporter: Prabhat Kashyap
>Priority: Trivial
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-367) Add support alluxio(tachyon) file system(enhance ecosystem integration)

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632063#comment-15632063
 ] 

ASF GitHub Bot commented on CARBONDATA-367:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/287#discussion_r86300020
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastorage/store/impl/FileFactory.java
 ---
@@ -57,14 +58,18 @@
 if (property != null) {
   if (property.startsWith(CarbonUtil.HDFS_PREFIX)) {
 storeDefaultFileType = FileType.HDFS;
-  } else if (property.startsWith(CarbonUtil.VIEWFS_PREFIX)) {
+  }
+  else if (property.startsWith(CarbonUtil.ALLUXIO_PREFIX)) {
+storeDefaultFileType = FileType.ALLUXIO;
+  }
+  else if (property.startsWith(CarbonUtil.VIEWFS_PREFIX)) {
 storeDefaultFileType = FileType.VIEWFS;
   }
 }
 
 configuration = new Configuration();
 configuration.addResource(new Path("../core-default.xml"));
-  }
+}
--- End diff --

incorrect indentation


> Add support alluxio(tachyon) file system(enhance ecosystem integration)
> ---
>
> Key: CARBONDATA-367
> URL: https://issues.apache.org/jira/browse/CARBONDATA-367
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Liang Chen
>Assignee: Liang Chen
>Priority: Minor
>
> For supporting alluxio users to use higher performance file 
> format(CarbonData), and enhance Apache CarbonData ecosystem integration.
> Can load alluxio file for example "alluxio://localhost:19998/data.csv" to 
> Carbon Data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-367) Add support alluxio(tachyon) file system(enhance ecosystem integration)

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632058#comment-15632058
 ] 

ASF GitHub Bot commented on CARBONDATA-367:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/287#discussion_r86299810
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastorage/store/filesystem/ALLUXIOCarbonFile.java
 ---
@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.core.datastorage.store.filesystem;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.datastorage.store.impl.FileFactory;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.hdfs.DistributedFileSystem;
+
+
+
+public class ALLUXIOCarbonFile extends AbstractDFSCarbonFile {
--- End diff --

Please use `Alluxio` instead of `ALLUXIO`


> Add support alluxio(tachyon) file system(enhance ecosystem integration)
> ---
>
> Key: CARBONDATA-367
> URL: https://issues.apache.org/jira/browse/CARBONDATA-367
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Liang Chen
>Assignee: Liang Chen
>Priority: Minor
>
> For supporting alluxio users to use higher performance file 
> format(CarbonData), and enhance Apache CarbonData ecosystem integration.
> Can load alluxio file for example "alluxio://localhost:19998/data.csv" to 
> Carbon Data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-367) Add support alluxio(tachyon) file system(enhance ecosystem integration)

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15632060#comment-15632060
 ] 

ASF GitHub Bot commented on CARBONDATA-367:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/287#discussion_r86299867
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/datastorage/store/impl/FileFactory.java
 ---
@@ -57,14 +58,18 @@
 if (property != null) {
   if (property.startsWith(CarbonUtil.HDFS_PREFIX)) {
 storeDefaultFileType = FileType.HDFS;
-  } else if (property.startsWith(CarbonUtil.VIEWFS_PREFIX)) {
+  }
+  else if (property.startsWith(CarbonUtil.ALLUXIO_PREFIX)) {
+storeDefaultFileType = FileType.ALLUXIO;
+  }
+  else if (property.startsWith(CarbonUtil.VIEWFS_PREFIX)) {
--- End diff --

move to previous line


> Add support alluxio(tachyon) file system(enhance ecosystem integration)
> ---
>
> Key: CARBONDATA-367
> URL: https://issues.apache.org/jira/browse/CARBONDATA-367
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Liang Chen
>Assignee: Liang Chen
>Priority: Minor
>
> For supporting alluxio users to use higher performance file 
> format(CarbonData), and enhance Apache CarbonData ecosystem integration.
> Can load alluxio file for example "alluxio://localhost:19998/data.csv" to 
> Carbon Data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-367) Add support alluxio(tachyon) file system(enhance ecosystem integration)

2016-11-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631970#comment-15631970
 ] 

ASF GitHub Bot commented on CARBONDATA-367:
---

GitHub user chenliang613 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/287

[CARBONDATA-367]Add support alluxio(tachyon) file system

For supporting alluxio users to use higher performance file 
format(CarbonData), and enhance Apache CarbonData ecosystem integration.

Can load alluxio file for example "alluxio://localhost:19998/data.csv" to 
Carbon Data.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenliang613/incubator-carbondata 
alluxio_integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/287.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #287


commit 08371cb52b3024a12986bd974c0964a8d5be4018
Author: chenliang613 
Date:   2016-11-03T07:50:35Z

CARBONDATA-367 Add support alluxio(tachyon) file system




> Add support alluxio(tachyon) file system(enhance ecosystem integration)
> ---
>
> Key: CARBONDATA-367
> URL: https://issues.apache.org/jira/browse/CARBONDATA-367
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Liang Chen
>Assignee: Liang Chen
>Priority: Minor
>
> For supporting alluxio users to use higher performance file 
> format(CarbonData), and enhance Apache CarbonData ecosystem integration.
> Can load alluxio file for example "alluxio://localhost:19998/data.csv" to 
> Carbon Data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-362) Optimize the parameters' name in CarbonDataRDDFactory.scala

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631315#comment-15631315
 ] 

ASF GitHub Bot commented on CARBONDATA-362:
---

Github user Hexiaoqiao commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/281#discussion_r86279989
  
--- Diff: 
integration/spark/src/main/scala/org/apache/carbondata/spark/rdd/CarbonDataRDDFactory.scala
 ---
@@ -65,7 +65,7 @@ object CarbonDataRDDFactory extends Logging {
   sqlContext: SQLContext,
   carbonLoadModel: CarbonLoadModel,
   storeLocation: String,
-  hdfsStoreLocation: String,
+  StoreLocation: String,
--- End diff --

pls follow the code style and use lowercase character of variable.


> Optimize the parameters' name in CarbonDataRDDFactory.scala
> ---
>
> Key: CARBONDATA-362
> URL: https://issues.apache.org/jira/browse/CARBONDATA-362
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Liang Chen
>Assignee: He Xiaoqiao
>Priority: Trivial
>
> Optimize the parameters' name in CarbonDataRDDFactory.scala:
> changes the name of "hdfsStoreLocation"  to "storePath", because not only 
> support hdfs path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-362) Optimize the parameters' name in CarbonDataRDDFactory.scala

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15631110#comment-15631110
 ] 

ASF GitHub Bot commented on CARBONDATA-362:
---

GitHub user lion-x opened a pull request:

https://github.com/apache/incubator-carbondata/pull/281

[CARBONDATA-362]Optimize the Parameters Name in CarbonDataRDDFactory.scala

# Why raise this PR?
changes the name of "hdfsStoreLocation" to "storePath", because not only 
support hdfs path.

# How to test?
Pass all test cases.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lion-x/incubator-carbondata 
optimizeParametersName

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/281.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #281


commit ef50982ee53929d67b3e58d57d7a05a46a8a2ba8
Author: lion-x 
Date:   2016-11-03T01:02:31Z

optimizeParametersName




> Optimize the parameters' name in CarbonDataRDDFactory.scala
> ---
>
> Key: CARBONDATA-362
> URL: https://issues.apache.org/jira/browse/CARBONDATA-362
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Liang Chen
>Priority: Trivial
>
> Optimize the parameters' name in CarbonDataRDDFactory.scala:
> changes the name of "hdfsStoreLocation"  to "storePath", because not only 
> support hdfs path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-308) Use CarbonInputFormat in CarbonScanRDD compute

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15630016#comment-15630016
 ] 

ASF GitHub Bot commented on CARBONDATA-308:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86214621
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java
 ---
@@ -953,66 +959,6 @@ public static void 
checkAndCreateCarbonDataLocation(String carbonStorePath, Stri
   }
 
   /**
-   * method to distribute the blocklets of a block in multiple blocks
--- End diff --

May be we should take a call for removing blocklet distribution. For filter 
queries with small number of blocks to scan it is very helpful to process 
faster.


> Use CarbonInputFormat in CarbonScanRDD compute
> --
>
> Key: CARBONDATA-308
> URL: https://issues.apache.org/jira/browse/CARBONDATA-308
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> 1. In driver side, only getSplit is required, so only filter condition is 
> required, no need to create full QueryModel object, so we can move creation 
> of QueryModel from driver side to executor side.
> 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead 
> of use QueryExecutor directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-308) Use CarbonInputFormat in CarbonScanRDD compute

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629964#comment-15629964
 ] 

ASF GitHub Bot commented on CARBONDATA-308:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86211496
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/filter/FilterExpressionProcessor.java
 ---
@@ -352,4 +354,18 @@ private FilterResolverIntf 
getFilterResolverBasedOnExpressionType(
 return new RowLevelFilterResolverImpl(expression, false, false, 
tableIdentifier);
   }
 
+  public static FilterResolverIntf 
getResolvedFilter(AbsoluteTableIdentifier identifier,
--- End diff --

Why it was added?


> Use CarbonInputFormat in CarbonScanRDD compute
> --
>
> Key: CARBONDATA-308
> URL: https://issues.apache.org/jira/browse/CARBONDATA-308
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
>Assignee: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> 1. In driver side, only getSplit is required, so only filter condition is 
> required, no need to create full QueryModel object, so we can move creation 
> of QueryModel from driver side to executor side.
> 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead 
> of use QueryExecutor directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-363) Block loading issue in case of blocklet distribution

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629677#comment-15629677
 ] 

ASF GitHub Bot commented on CARBONDATA-363:
---

GitHub user kumarvishal09 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/280

[CARBONDATA-363]fixed block loading issue in case of blocklet distribution

**Problem**: In case of blocklet distribution same block is getting loaded 
multiple times this is because when blocklet distribution is enabled same block 
will divided inside a task so there is not synchronisation as block loading is 
done in different thread because of this same block is getting read from carbon 
data file footer multiple times and it is hitting the first time query 
performance.
**Solution**: Need to add locking for above issue, if one thread is loading 
particulate block other thread need and once block loaded other thread need to 
use same reference of the block 


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kumarvishal09/incubator-carbondata 
blockloadingissue

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/280.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #280


commit a6d8a18a73e3ed8e39de444b2ddfded45813493f
Author: kumarvishal 
Date:   2016-11-02T17:02:42Z

fixed block loading issue in case of blocklet distribution




> Block loading issue in case of blocklet distribution
> 
>
> Key: CARBONDATA-363
> URL: https://issues.apache.org/jira/browse/CARBONDATA-363
> Project: CarbonData
>  Issue Type: Bug
>Reporter: kumar vishal
>Assignee: kumar vishal
>
> Problem: In case of blocklet distribution same block is getting loaded 
> multiple times this is because when blocklet distribution is enabled same 
> block will divided inside a task so there is not synchronisation as block 
> loading is done in different thread because of this same block is getting 
> read from carbon data file footer multiple times and it is hitting the first 
> time query performance.
> Solution: Need to add locking for above issue, if one thread is loading 
> particulate block other thread need and once block loaded other thread need 
> to use same reference of the block 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-355) Remove unnecessary method argument columnIdentifier of PathService.getCarbonTablePath

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-355?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15629316#comment-15629316
 ] 

ASF GitHub Bot commented on CARBONDATA-355:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/274


> Remove unnecessary method argument columnIdentifier of 
> PathService.getCarbonTablePath
> -
>
> Key: CARBONDATA-355
> URL: https://issues.apache.org/jira/browse/CARBONDATA-355
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.2.0-incubating
>Reporter: He Xiaoqiao
>Assignee: He Xiaoqiao
>Priority: Minor
>
> Remove one of method arguments of PathService#getCarbonTablePath since it is 
> not necessary pass columnIdentifier when get table path.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-349) Support load local file

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628614#comment-15628614
 ] 

ASF GitHub Bot commented on CARBONDATA-349:
---

Github user Jay357089 closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/275


> Support load local file
> ---
>
> Key: CARBONDATA-349
> URL: https://issues.apache.org/jira/browse/CARBONDATA-349
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Lionx
>Priority: Minor
>
> In carbonExample, we can run load local file command, while in cluster(with 
> hdfs), loading data from local file will throw exception like "file not 
> found", i am afraid that in cluster mode, carbon can not parse the local URI 
> 'file://' or 'file://' is removed in code even user has added it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-356) Remove Two Useless Files ConvertedType.java and QuerySchemaInfo.java

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628590#comment-15628590
 ] 

ASF GitHub Bot commented on CARBONDATA-356:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/276


> Remove Two Useless Files ConvertedType.java and QuerySchemaInfo.java
> 
>
> Key: CARBONDATA-356
> URL: https://issues.apache.org/jira/browse/CARBONDATA-356
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
> Fix For: 0.3.0-incubating
>
>
>  ConvertedType.java and QuerySchemaInfo.java are uselesss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-328) Improve Code and Fix Warnings

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628430#comment-15628430
 ] 

ASF GitHub Bot commented on CARBONDATA-328:
---

GitHub user PKOfficial opened a pull request:

https://github.com/apache/incubator-carbondata/pull/279

[CARBONDATA-328] [Spark] Improve Code and Fix Warnings [Squashed]

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
- [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
**Please provide details on**
- **Whether new unit test cases have been added or why no new tests are 
required?**
Not required because no change in functionality.
- **What manual testing you have done?**
Run basic Commands in Beeline.
- **Any additional information to help reviewers in testing this change.**
No
 
---
Improved spark module code.
* Removed some compliation warnings.
* Replace pattern matching for boolean to IF-ELSE.
* Improved code according to scala standards.
* Removed unnecessary new lines.
* Added string interpolation instead of string concatenation.
* Removed unnecessary semi-colons.
* Fixed indentation.
* add useKettle option for loading
* Fixed indentation.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PKOfficial/incubator-carbondata improved-code

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/279.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #279


commit 044260fc7265251be097d6544dade1bc4db1e3a0
Author: X-Lion 
Date:   2016-09-29T14:33:18Z

Lionx0929

mend

commit 2fd3cd566eec870d7e2601b3bdc66496fa509377
Author: ravipesala 
Date:   2016-10-20T02:41:46Z

Added Writer processor step for dataloading.

Rebased Fixed comments.

Added factory for fact data handler

Fixed review comments.

Fixed compilation issue after rebase

commit a982583720c83bbf558f6fc1404078b5bbffaa15
Author: hexiaoqiao 
Date:   2016-10-28T14:44:57Z

CARBONDATA-343 delete some duplicated definition code

CARBONDATA-343 delete some duplicated definition code

commit 1be29a884a5ab00b970fea7c3901f8acf0a13465
Author: Prabhat Kashyap 
Date:   2016-10-19T16:54:47Z

Improved spark module code.
* Removed some compliation warnings.
* Replace pattern matching for boolean to IF-ELSE.
* Improved code according to scala standards.
* Removed unnecessary new lines.
* Added string interpolation instead of string concatenation.
* Removed unnecessary semi-colons.
* Fixed indentation.
* add useKettle option for loading
* Fixed indentation.




> Improve Code and Fix Warnings
> -
>
> Key: CARBONDATA-328
> URL: https://issues.apache.org/jira/browse/CARBONDATA-328
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Prabhat Kashyap
>Priority: Trivial
>
> Remove compiler warning and improve the existing code according to the 
> standards. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-356) Remove Two Useless Files ConvertedType.java and QuerySchemaInfo.java

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628408#comment-15628408
 ] 

ASF GitHub Bot commented on CARBONDATA-356:
---

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/276#discussion_r86104485
  
--- Diff: 
core/src/main/java/org/apache/carbondata/scan/expression/ExpressionResult.java 
---
@@ -33,8 +33,7 @@
 import 
org.apache.carbondata.scan.expression.exception.FilterIllegalMemberException;
 
 public class ExpressionResult implements Comparable {
-
-  private static final long serialVersionUID = 1L;
+  
--- End diff --

I checked in my env. mvn clean passed.


> Remove Two Useless Files ConvertedType.java and QuerySchemaInfo.java
> 
>
> Key: CARBONDATA-356
> URL: https://issues.apache.org/jira/browse/CARBONDATA-356
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
>  ConvertedType.java and QuerySchemaInfo.java are uselesss.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-353) Update doc for dateformat option

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628368#comment-15628368
 ] 

ASF GitHub Bot commented on CARBONDATA-353:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/272


> Update doc for dateformat option
> 
>
> Key: CARBONDATA-353
> URL: https://issues.apache.org/jira/browse/CARBONDATA-353
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Update doc for dateformat option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-357) Write unit test for ValueCompressionUtil

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628249#comment-15628249
 ] 

ASF GitHub Bot commented on CARBONDATA-357:
---

GitHub user kunal642 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/277

[CARBONDATA-357] Added ValueCompressionUtilTest

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kunal642/incubator-carbondata CARBONDATA-357

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/277.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #277


commit e20a094e4f2d2436f31864d0b653a305b6a8ddc1
Author: kunal642 
Date:   2016-11-02T08:49:24Z

Added ValueCompressionUtilTest




> Write unit test for ValueCompressionUtil
> 
>
> Key: CARBONDATA-357
> URL: https://issues.apache.org/jira/browse/CARBONDATA-357
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Prabhat Kashyap
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-349) Support load local file

2016-11-02 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15628004#comment-15628004
 ] 

ASF GitHub Bot commented on CARBONDATA-349:
---

GitHub user Jay357089 opened a pull request:

https://github.com/apache/incubator-carbondata/pull/275

[CARBONDATA-349][WIP] Support load local file into carbon table

https://issues.apache.org/jira/browse/CARBONDATA-349

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Jay357089/incubator-carbondata supportLocal

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/275.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #275


commit ec034ae38e14094193b6ee07f1da6bb27a393537
Author: Jay357089 
Date:   2016-11-02T06:59:52Z

support load local file




> Support load local file
> ---
>
> Key: CARBONDATA-349
> URL: https://issues.apache.org/jira/browse/CARBONDATA-349
> Project: CarbonData
>  Issue Type: New Feature
>Reporter: Lionx
>Priority: Minor
>
> In carbonExample, we can run load local file command, while in cluster(with 
> hdfs), loading data from local file will throw exception like "file not 
> found", i am afraid that in cluster mode, carbon can not parse the local URI 
> 'file://' or 'file://' is removed in code even user has added it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-308) Use CarbonInputFormat in CarbonScanRDD compute

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627358#comment-15627358
 ] 

ASF GitHub Bot commented on CARBONDATA-308:
---

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058188
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
@@ -22,28 +22,44 @@
 import java.io.DataOutput;
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.Distributable;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
 
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Writable;
 import org.apache.hadoop.mapreduce.lib.input.FileSplit;
 
+
 /**
  * Carbon input split to allow distributed read of CarbonInputFormat.
  */
-public class CarbonInputSplit extends FileSplit implements Serializable, 
Writable {
+public class CarbonInputSplit extends FileSplit implements Distributable, 
Serializable, Writable {
 
   private static final long serialVersionUID = 3520344046772190207L;
   private String segmentId;
-  /**
+  public String taskId = "0";
+
+  /*
* Number of BlockLets in a block
*/
   private int numberOfBlocklets = 0;
 
-  public CarbonInputSplit() {
-super(null, 0, 0, new String[0]);
+  public  CarbonInputSplit() {
   }
 
-  public CarbonInputSplit(String segmentId, Path path, long start, long 
length,
+  private void parserPath(Path path) {
+String[] nameParts = path.getName().split("-");
+if (nameParts != null && nameParts.length >= 3) {
+  this.taskId = nameParts[2];
+}
+  }
+
+  private CarbonInputSplit(String segmentId, Path path, long start, long 
length,
--- End diff --

please initialize taskId


> Use CarbonInputFormat in CarbonScanRDD compute
> --
>
> Key: CARBONDATA-308
> URL: https://issues.apache.org/jira/browse/CARBONDATA-308
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> 1. In driver side, only getSplit is required, so only filter condition is 
> required, no need to create full QueryModel object, so we can move creation 
> of QueryModel from driver side to executor side.
> 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead 
> of use QueryExecutor directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-353) Update doc for dateformat option

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15627280#comment-15627280
 ] 

ASF GitHub Bot commented on CARBONDATA-353:
---

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/272#discussion_r86058866
  
--- Diff: docs/DML-Operations-on-Carbon.md ---
@@ -91,12 +91,17 @@ Following are the options that can be used in load data:
 ```ruby
 OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary')
 ```
-- **COLUMNDICT:** dictionary file path for single column.
+- **COLUMNDICT:** Dictionary file path for each column.
 
 ```ruby
 OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, 
column2:dictionaryFilePath2')
 ```
 Note: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together.
+- **DATEFORMAT:** Date format for each column.
+
+```ruby
+OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2')
--- End diff --

I add a note, ref to the JAVA SimpleDateFormat Class Doc. It provides more 
details.


> Update doc for dateformat option
> 
>
> Key: CARBONDATA-353
> URL: https://issues.apache.org/jira/browse/CARBONDATA-353
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Update doc for dateformat option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-353) Update doc for dateformat option

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625795#comment-15625795
 ] 

ASF GitHub Bot commented on CARBONDATA-353:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/272#discussion_r85959032
  
--- Diff: docs/DML-Operations-on-Carbon.md ---
@@ -91,12 +91,17 @@ Following are the options that can be used in load data:
 ```ruby
 OPTIONS('ALL_DICTIONARY_PATH'='/opt/alldictionary/data.dictionary')
 ```
-- **COLUMNDICT:** dictionary file path for single column.
+- **COLUMNDICT:** Dictionary file path for each column.
 
 ```ruby
 OPTIONS('COLUMNDICT'='column1:dictionaryFilePath1, 
column2:dictionaryFilePath2')
 ```
 Note: ALL_DICTIONARY_PATH and COLUMNDICT can't be used together.
+- **DATEFORMAT:** Date format for each column.
+
+```ruby
+OPTIONS('DATEFORMAT'='column1:dateFormat1, column2:dateFormat2')
--- End diff --

give an example of the data format


> Update doc for dateformat option
> 
>
> Key: CARBONDATA-353
> URL: https://issues.apache.org/jira/browse/CARBONDATA-353
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Update doc for dateformat option



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625774#comment-15625774
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957803
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -472,6 +475,7 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
   }
 }
+<<< HEAD
--- End diff --

is this file is having any conflict?


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-276) Add trim option

2016-11-01 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15625769#comment-15625769
 ] 

ASF GitHub Bot commented on CARBONDATA-276:
---

Github user sujith71955 commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/200#discussion_r85957411
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java
 ---
@@ -1694,5 +1699,19 @@ public void setTableOption(String tableOption) {
   public TableOptionWrapper getTableOptionWrapper() {
 return tableOptionWrapper;
   }
+
+  public String getIsUseTrim() {
+return isUseTrim;
+  }
+
+  public void setIsUseTrim(Boolean[] isUseTrim) {
+for (Boolean flag: isUseTrim) {
+  if (flag) {
+this.isUseTrim += "T";
--- End diff --

Use  TRUE/FALSE for better readability


> Add trim option
> ---
>
> Key: CARBONDATA-276
> URL: https://issues.apache.org/jira/browse/CARBONDATA-276
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Fix a bug and add trim option.
> Bug: When string is contains LeadingWhiteSpace or TrailingWhiteSpace, query 
> result is null. This is because the dictionary ignore the LeadingWhiteSpace 
> and TrailingWhiteSpace and the csvInput dose not.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-2) Remove kettle for loading data

2016-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15622411#comment-15622411
 ] 

ASF GitHub Bot commented on CARBONDATA-2:
-

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/263#discussion_r85757712
  
--- Diff: 
integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoaderUtil.java
 ---
@@ -213,6 +224,64 @@ public static void executeGraph(CarbonLoadModel 
loadModel, String storeLocation,
 info, loadModel.getPartitionId(), 
loadModel.getCarbonDataLoadSchema());
   }
 
+  public static void executeNewDataLoad(CarbonLoadModel loadModel, String 
storeLocation,
+  String hdfsStoreLocation, RecordReader[] recordReaders)
+  throws Exception {
+if (!new File(storeLocation).mkdirs()) {
+  LOGGER.error("Error while creating the temp store path: " + 
storeLocation);
+}
+CarbonDataLoadConfiguration configuration = new 
CarbonDataLoadConfiguration();
+String databaseName = loadModel.getDatabaseName();
+String tableName = loadModel.getTableName();
+String tempLocationKey = databaseName + 
CarbonCommonConstants.UNDERSCORE + tableName
++ CarbonCommonConstants.UNDERSCORE + loadModel.getTaskNo();
+CarbonProperties.getInstance().addProperty(tempLocationKey, 
storeLocation);
+CarbonProperties.getInstance()
+.addProperty(CarbonCommonConstants.STORE_LOCATION_HDFS, 
hdfsStoreLocation);
+// CarbonProperties.getInstance().addProperty("store_output_location", 
outPutLoc);
+CarbonProperties.getInstance().addProperty("send.signal.load", 
"false");
+
+CarbonTable carbonTable = 
loadModel.getCarbonDataLoadSchema().getCarbonTable();
+AbsoluteTableIdentifier identifier =
+carbonTable.getAbsoluteTableIdentifier();
+configuration.setTableIdentifier(identifier);
+String csvHeader = loadModel.getCsvHeader();
+if (csvHeader != null && !csvHeader.isEmpty()) {
+  
configuration.setHeader(CarbonDataProcessorUtil.getColumnFields(csvHeader, 
","));
+} else {
+  CarbonFile csvFile =
+  
CarbonDataProcessorUtil.getCsvFileToRead(loadModel.getFactFilesToProcess().get(0));
+  configuration
+  .setHeader(CarbonDataProcessorUtil.getFileHeader(csvFile, 
loadModel.getCsvDelimiter()));
+}
+
+configuration.setPartitionId(loadModel.getPartitionId());
+configuration.setSegmentId(loadModel.getSegmentId());
+configuration.setTaskNo(loadModel.getTaskNo());
+
configuration.setDataLoadProperty(DataLoadProcessorConstants.COMPLEX_DELIMITERS,
+new String[] { loadModel.getComplexDelimiterLevel1(),
+loadModel.getComplexDelimiterLevel2() });
+List dimensions =
+
carbonTable.getDimensionByTableName(carbonTable.getFactTableName());
+List measures =
+carbonTable.getMeasureByTableName(carbonTable.getFactTableName());
+DataField[] dataFields = new DataField[dimensions.size() + 
measures.size()];
+
+int i = 0;
+for (CarbonColumn column : dimensions) {
+  dataFields[i++] = new DataField(column);
+}
+for (CarbonColumn column : measures) {
+  dataFields[i++] = new DataField(column);
+}
+Iterator[] iterators = new RecordReaderIterator[recordReaders.length];
+configuration.setDataFields(dataFields);
+for (int j = 0; j < recordReaders.length; j++) {
+  iterators[j] = new RecordReaderIterator(recordReaders[j]);
+}
+new DataLoadProcessExecutor().execute(configuration, iterators);
--- End diff --

should have a CarbonTableOutputFormat and use it here, right?


> Remove kettle for loading data
> --
>
> Key: CARBONDATA-2
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Liang Chen
>Priority: Critical
> Fix For: 0.3.0-incubating
>
> Attachments: CarbonDataLoadingdesign.pdf
>
>
> Remove kettle for loading data module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-332) Create successfully Database, tables and columns using carbon reserve keywords

2016-10-31 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15621965#comment-15621965
 ] 

ASF GitHub Bot commented on CARBONDATA-332:
---

GitHub user lion-x opened a pull request:

https://github.com/apache/incubator-carbondata/pull/273

[CARBONDATA-332] Prohibit to use reserved words in database\table\col name

# Why raise this PR?
Carbon should prohibit to use reserved words in database\table\col name.  
Using reserved words will cause many unnecessary troubles.

# How to solve?
Adding  reserved words validation for database, table, cols when they are 
creating.

# How to test?
Pass all test cases. 



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lion-x/incubator-carbondata prohibitReserved

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/273.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #273


commit fc7a7bb7e7a8df5f3abc5e8b1b3b9c0804226869
Author: lion-x 
Date:   2016-10-31T11:46:07Z

validateCarbonReservedKeywords




> Create successfully Database, tables and columns using carbon reserve keywords
> --
>
> Key: CARBONDATA-332
> URL: https://issues.apache.org/jira/browse/CARBONDATA-332
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Harmeet Singh
>Assignee: Lionx
>
> Hey team, I am trying to create database, tables and columns with carbon 
> reserve keywords name and carbon allow us for creating. I am expecting an 
> error. In hive, we are facing an error. Following are the steps : 
> Step1: 
> 0: jdbc:hive2://127.0.0.1:1> create database double;
> +-+--+
> | result  |
> +-+--+
> +-+--+
> No rows selected (6.225 seconds)
> Step 2: 
> 0: jdbc:hive2://127.0.0.1:1> use double;
> +-+--+
> | result  |
> +-+--+
> +-+--+
> No rows selected (0.104 seconds)
> Step 3:
> 0: jdbc:hive2://127.0.0.1:1> create table decimal(int int, string string) 
> stored by 'carbondata';
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (2.372 seconds)
> Step 4:
> 0: jdbc:hive2://127.0.0.1:1> show tables;
> ++--+--+
> | tableName  | isTemporary  |
> ++--+--+
> | decimal| false|
> ++--+--+
> 1 row selected (0.071 seconds)
> Step 5:
> 0: jdbc:hive2://127.0.0.1:1> desc decimal;
> +---++--+--+
> | col_name  | data_type  | comment  |
> +---++--+--+
> | string| string |  |
> | int   | bigint |  |
> +---++--+--+
> 2 rows selected (0.556 seconds)
> Step 6:
> 0: jdbc:hive2://127.0.0.1:1> load data inpath 
> 'hdfs://localhost:54310/home/harmeet/reservewords.csv' into table decimal;
> +-+--+
> | Result  |
> +-+--+
> +-+--+
> No rows selected (0.863 seconds)
> Step 7:
> 0: jdbc:hive2://127.0.0.1:1> select * from decimal;
> +-+--+--+
> | string  | int  |
> +-+--+--+
> |  james  | 10   |
> +-+--+--+
> 1 row selected (0.413 seconds)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617340#comment-15617340
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85633017
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -470,6 +472,34 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
   }
 }
+HashMap dateformatsHashMap = new HashMap();
+if (meta.dateFormat != null) {
+  String[] dateformats = 
meta.dateFormat.split(CarbonCommonConstants.COMMA);
+  for (String dateFormat:dateformats) {
+String[] dateFormatSplits = dateFormat.split(":", 2);
+
dateformatsHashMap.put(dateFormatSplits[0].toLowerCase().trim(),
+dateFormatSplits[1].trim());
+  }
+}
+String[] DimensionColumnIds = meta.getDimensionColumnIds();
+directDictionaryGenerators =
+new DirectDictionaryGenerator[DimensionColumnIds.length];
+for (int i = 0; i < DimensionColumnIds.length; i++) {
+  ColumnSchemaDetails columnSchemaDetails = 
columnSchemaDetailsWrapper.get(
+  DimensionColumnIds[i]);
+  if (columnSchemaDetails.isDirectDictionary()) {
+String columnName = columnSchemaDetails.getColumnName();
+DataType columnType = columnSchemaDetails.getColumnType();
+if (dateformatsHashMap.containsKey(columnName)) {
--- End diff --

ok



> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15617327#comment-15617327
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85632919
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1244,6 +1246,29 @@ case class LoadTableUsingKettle(
 Seq.empty
   }
 
+  private  def validateDateFormat(dateFormat: String, table: CarbonTable): 
Unit = {
+val dimensions = table.getDimensionByTableName(tableName).asScala
+if (dateFormat != null) {
+  if (dateFormat.trim == "") {
+throw new MalformedCarbonCommandException("Error: Option 
DateFormat is set an empty " +
--- End diff --

better to remove "Error: " for all exception message


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-302) 7. Add DataWriterProcessorStep which reads the data from sort temp files and creates carbondata files.

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615928#comment-15615928
 ] 

ASF GitHub Bot commented on CARBONDATA-302:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/251


> 7. Add DataWriterProcessorStep which reads the data from sort temp files and 
> creates carbondata files.
> --
>
> Key: CARBONDATA-302
> URL: https://issues.apache.org/jira/browse/CARBONDATA-302
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.3.0-incubating
>
>
> Add DataWriterProcessorStep which reads the data from sort temp files and 
> merge sort it, and apply mdk generator on key and creates carbondata files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-343) Optimize the duplicated definition code in GlobalDictionaryUtil.scala

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15615658#comment-15615658
 ] 

ASF GitHub Bot commented on CARBONDATA-343:
---

GitHub user Hexiaoqiao opened a pull request:

https://github.com/apache/incubator-carbondata/pull/271

[CARBONDATA-343] delete some duplicated definition code

Delete some duplicated definition code in GlobalDictionaryUtil.scala which 
is mentioned  as 
[CARBONDATA-343](https://issues.apache.org/jira/browse/CARBONDATA-343) (and 
learn new [Contributing to 
CarbonData](https://cwiki.apache.org/confluence/display/CARBONDATA/Contributing+to+CarbonData)
 btw.)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Hexiaoqiao/incubator-carbondata carbon-dev

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/271.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #271


commit af810f8964772ee95dc6525375cae13010521604
Author: hexiaoqiao 
Date:   2016-10-28T14:44:57Z

CARBONDATA-343 delete some duplicated definition code

commit 5c9775d0f51baafb04423330591c8d90df9ebdff
Author: hexiaoqiao 
Date:   2016-10-28T14:52:51Z

CARBONDATA-343 delete some duplicated definition code




> Optimize the duplicated definition code in GlobalDictionaryUtil.scala 
> --
>
> Key: CARBONDATA-343
> URL: https://issues.apache.org/jira/browse/CARBONDATA-343
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Liang Chen
>Assignee: He Xiaoqiao
>Priority: Trivial
>
> The two rows code have some duplicated definition:
> -
> val table = 
> carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable.getAbsoluteTableIdentifier.getCarbonTableIdentifier
> val carbonTable = carbonLoadModel.getCarbonDataLoadSchema.getCarbonTable



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614544#comment-15614544
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85480694
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
--- End diff --

ok


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-28 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614540#comment-15614540
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85480562
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
+List blocks = index.filter(job, filterResolver);
--- End diff --

You are right, I will modify


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614252#comment-15614252
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85470293
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
--- End diff --

if loader internally implement cache then we can keep as `IndexLoader` only.


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614128#comment-15614128
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85466134
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
 ---
@@ -0,0 +1,74 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.dataload
+
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import java.sql.Timestamp
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import org.apache.spark.sql.Row
+
+class TestLoadDataWithDiffTimestampFormat extends QueryTest with 
BeforeAndAfterAll {
+  override def beforeAll {
+sql("DROP TABLE IF EXISTS t3")
+sql("""
+   CREATE TABLE IF NOT EXISTS t3
+   (ID Int, date Timestamp, starttime Timestamp, country String,
+   name String, phonetype String, serialname String, salary Int)
+   STORED BY 'carbondata'
+""")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, 
"/MM/dd")
+  }
+
+  test("test load data with different timestamp format") {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'starttime:-MM-dd HH:mm:ss')
+   """)
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData2.csv' into table t3
+   OPTIONS('dateformat' = 'date:-MM-dd,starttime:/MM/dd 
HH:mm:ss')
+   """)
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2015-07-23 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2016-07-23 01:01:30.0")))
+  )
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2015-07-25 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2016-07-25 02:32:02.0")))
+  )
+  }
--- End diff --

ok


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614123#comment-15614123
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85466025
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1129,6 +1130,9 @@ case class LoadTable(
   carbonLoadModel.setEscapeChar(escapeChar)
   carbonLoadModel.setQuoteChar(quoteChar)
   carbonLoadModel.setCommentChar(commentchar)
+  carbonLoadModel.setDateFormat(dateFormat)
--- End diff --

ok


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614117#comment-15614117
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85465773
  
--- Diff: 
integration/spark/src/test/scala/org/apache/carbondata/spark/testsuite/dataload/TestLoadDataWithDiffTimestampFormat.scala
 ---
@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.spark.testsuite.dataload
+
+import org.apache.spark.sql.common.util.CarbonHiveContext._
+import org.apache.spark.sql.common.util.QueryTest
+import org.scalatest.BeforeAndAfterAll
+import java.sql.Timestamp
+
+import org.apache.carbondata.core.constants.CarbonCommonConstants
+import org.apache.carbondata.core.util.CarbonProperties
+import 
org.apache.carbondata.spark.exception.MalformedCarbonCommandException
+import org.apache.spark.sql.Row
+
+class TestLoadDataWithDiffTimestampFormat extends QueryTest with 
BeforeAndAfterAll {
+  override def beforeAll {
+sql("DROP TABLE IF EXISTS t3")
+sql("""
+   CREATE TABLE IF NOT EXISTS t3
+   (ID Int, date Timestamp, starttime Timestamp, country String,
+   name String, phonetype String, serialname String, salary Int)
+   STORED BY 'carbondata'
+""")
+CarbonProperties.getInstance()
+  .addProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, 
"/MM/dd")
+  }
+
+  test("test load data with different timestamp format") {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'starttime:-MM-dd HH:mm:ss')
+   """)
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData2.csv' into table t3
+   OPTIONS('dateformat' = 'date:-MM-dd,starttime:/MM/dd 
HH:mm:ss')
+   """)
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2015-07-23 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 1"),
+Seq(Row(Timestamp.valueOf("2016-07-23 01:01:30.0")))
+  )
+  checkAnswer(
+sql("SELECT date FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2015-07-25 00:00:00.0")))
+  )
+  checkAnswer(
+sql("SELECT starttime FROM t3 WHERE ID = 18"),
+Seq(Row(Timestamp.valueOf("2016-07-25 02:32:02.0")))
+  )
+  }
+
+  test("test load data with different timestamp format with being set an 
empty string") {
+try {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = '')
+   """)
+  assert(false)
+} catch {
+  case ex: MalformedCarbonCommandException =>
+assertResult(ex.getMessage)("Error: Option DateFormat is set an 
empty string.")
+  case _ => assert(false)
+}
+  }
+
+  test("test load data with different timestamp format with a wrong column 
name") {
+try {
+  sql(s"""
+   LOAD DATA LOCAL INPATH 
'./src/test/resources/timeStampFormatData1.csv' into table t3
+   OPTIONS('dateformat' = 'fasfdas:/MM/dd')
+   """)
+  assert(false)
+} catch {
+  case ex: MalformedCarbonCommandException =>
+assertResult(ex.getMessage)("Error: Wrong Column Name fasfdas is 
provided in Option DateFormat.")
+  case _ => assert(false)
+}
+  }
+
+  test("test load data with different timestamp format with a timestamp 
column is set an empty string") {
+try {
+  

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614074#comment-15614074
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85464310
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
--- End diff --

please use internal.CarbonInputSplit


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614003#comment-15614003
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459633
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1156,6 +1169,9 @@ case class LoadTableUsingKettle(
   carbonLoadModel.setEscapeChar(escapeChar)
   carbonLoadModel.setQuoteChar(quoteChar)
   carbonLoadModel.setCommentChar(commentchar)
+  carbonLoadModel.setDateFormat(dateFormat)
+  
carbonLoadModel.setSerializationNullFormat("serialization_null_format" + "," +
+serializationNullFormat)
--- End diff --

this code is useless


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15614004#comment-15614004
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460589
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -343,7 +345,8 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   }
 
   data.setGenerator(
-  
KeyGeneratorFactory.getKeyGenerator(getUpdatedLens(meta.dimLens, 
meta.dimPresent)));
+  KeyGeneratorFactory.getKeyGenerator(
+  getUpdatedLens(meta.dimLens, meta.dimPresent)));
--- End diff --

keep code style


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613998#comment-15613998
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459810
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle(
   val allDictionaryPath = options.getOrElse("all_dictionary_path", "")
   val complex_delimiter_level_1 = 
options.getOrElse("complex_delimiter_level_1", "\\$")
   val complex_delimiter_level_2 = 
options.getOrElse("complex_delimiter_level_2", "\\:")
+  val timeFormat = options.getOrElse("timeformat", null)
+  val dateFormat = options.getOrElse("dateformat", null)
+  val tableDimensions: util.List[CarbonDimension] = 
table.getDimensionByTableName(tableName)
+  val dateDimensionsName = new ArrayBuffer[String]
+  tableDimensions.toArray.foreach {
+dimension => {
+  val columnSchema: ColumnSchema = 
dimension.asInstanceOf[CarbonDimension].getColumnSchema
+  if (columnSchema.getDataType.name == "TIMESTAMP") {
+dateDimensionsName += columnSchema.getColumnName
+  }
+}
+  }
+  if (dateFormat != null) {
+validateDateFormat(dateFormat, dateDimensionsName)
+  }
--- End diff --

please move these code into method validateDateFormat


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15613866#comment-15613866
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85457078
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.internal.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.segment.SegmentManager;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.scan.expression.Expression;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+
+/**
+ * Input format of CarbonData file.
+ * @param 
+ */
+public class CarbonTableInputFormat extends FileInputFormat {
+
+  private static final String FILTER_PREDICATE =
+  "mapreduce.input.carboninputformat.filter.predicate";
+
+  private SegmentManager segmentManager;
+
+  public CarbonTableInputFormat(SegmentManager segmentManager) {
--- End diff --

accept


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15612029#comment-15612029
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85346673
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Block;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.index.IndexLoader;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * This segment is backed by index, thus getSplits can use the index to do 
file pruning.
+ */
+public class IndexedSegment extends Segment {
+
+  private IndexLoader loader;
+
+  public IndexedSegment(String name, String path, IndexLoader loader) {
+super(name, path);
+this.loader = loader;
+  }
+
+  @Override
+  public List getSplits(JobContext job, FilterResolverIntf 
filterResolver)
+  throws IOException {
+// do as following
+// 1. create the index or get from cache by the filter name in the 
configuration
+// 2. filter by index to get the filtered block
+// 3. create input split from filtered block
+
+List output = new LinkedList<>();
+Index index = loader.load(job.getConfiguration());
--- End diff --

does it required to load index every time?
I guess we are just creating the instance of index here, so why don't you 
use factory here?


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611953#comment-15611953
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85340545
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.impl;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
+this.segment = segment;
+  }
+
+  @Override
+  public String getName() {
+return null;
+  }
+
+  @Override
+  public List filter(JobContext job, FilterResolverIntf filter)
--- End diff --

It seems method return type is incompatible. 


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the 

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611994#comment-15611994
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85343636
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/impl/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.impl;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
--- End diff --

I guess we supposed to pass list of valid segments here.


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional 

[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611939#comment-15611939
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85339106
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/CarbonFormat.java ---
@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal;
+
+public enum CarbonFormat {
+  COLUMNR
--- End diff --

typo : COLUMNAR


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611920#comment-15611920
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85337928
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/api/CarbonTableInputFormat.java
 ---
@@ -0,0 +1,169 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.api;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonProjection;
+import org.apache.carbondata.hadoop.internal.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.internal.segment.SegmentManager;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import org.apache.carbondata.hadoop.util.ObjectSerializationUtil;
+import org.apache.carbondata.scan.expression.Expression;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+
+/**
+ * Input format of CarbonData file.
+ * @param 
+ */
+public class CarbonTableInputFormat extends FileInputFormat {
+
+  private static final String FILTER_PREDICATE =
+  "mapreduce.input.carboninputformat.filter.predicate";
+
+  private SegmentManager segmentManager;
+
+  public CarbonTableInputFormat(SegmentManager segmentManager) {
+this.segmentManager = segmentManager;
+  }
+
+  @Override
+  public RecordReader createRecordReader(InputSplit split,
+  TaskAttemptContext context) throws IOException, InterruptedException 
{
+switch (((CarbonInputSplit)split).formatType()) {
--- End diff --

Why don't you take the formatType from job conf? Better don't touch 
InputSplit as it comes from outside. 


> Abstracting Index and Segment interface
> ---
>
> Key: CARBONDATA-284
> URL: https://issues.apache.org/jira/browse/CARBONDATA-284
> Project: CarbonData
>  Issue Type: Improvement
>  Components: hadoop-integration
>Affects Versions: 0.1.0-incubating
>Reporter: Jacky Li
> Fix For: 0.3.0-incubating
>
>
> This issue is intended to abstract developer API and user API to achieve 
> following goals:
> Goal 1: User can choose the place to store Index data, it can be stored in
> processing framework's memory space (like in spark driver memory) or in
> another service outside of the processing framework (like using a
> independent database service, which can be shared across client)
> Goal 2: Developer can add more index of his choice to CarbonData files.
> Besides B+ tree on multi-dimensional key which current CarbonData supports,
> developers are free to add other indexing technology to make certain
> workload faster. These new indices should be added in a pluggable way.
> This Jira has been discussed in maillist: 
> http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Abstracting-CarbonData-s-Index-Interface-td1587.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-2) Remove kettle for loading data

2016-10-27 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-2?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15611774#comment-15611774
 ] 

ASF GitHub Bot commented on CARBONDATA-2:
-

GitHub user ravipesala opened a pull request:

https://github.com/apache/incubator-carbondata/pull/263

[CARBONDATA-2][WIP] Data load integration of all steps for removing kettle

This PR integrates all data load steps to the main flow. 
Still DataWriterStep need to be integrated.And testing is pending.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ravipesala/incubator-carbondata 
data-load-integration

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/263.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #263


commit c4bd3a14e3e1f437d365c9f9e4dc21b2d69f56ec
Author: ravipesala 
Date:   2016-10-27T03:44:32Z

WIP Integrating new dataloading flow

commit 6aa1e738c02e2906b43b372bcad0ed8096962ddf
Author: ravipesala 
Date:   2016-10-27T12:41:11Z

Integrated data processor steps to new flow.




> Remove kettle for loading data
> --
>
> Key: CARBONDATA-2
> URL: https://issues.apache.org/jira/browse/CARBONDATA-2
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Liang Chen
>Priority: Critical
> Fix For: 0.3.0-incubating
>
> Attachments: CarbonDataLoadingdesign.pdf
>
>
> Remove kettle for loading data module



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-302) 7. Add DataWriterProcessorStep which reads the data from sort temp files and creates carbondata files.

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610793#comment-15610793
 ] 

ASF GitHub Bot commented on CARBONDATA-302:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/251#discussion_r85270229
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/util/CarbonDataProcessorUtil.java
 ---
@@ -304,4 +311,92 @@ public static String getLocalDataFolderLocation(String 
databaseName, String tabl
 return ArrayUtils
 .toPrimitive(noDictionaryMapping.toArray(new 
Boolean[noDictionaryMapping.size()]));
   }
+
+  /**
+   * Preparing the boolean [] to map whether the dimension use inverted 
index or not.
+   */
+  public static boolean[] getIsUseInvertedIndex(DataField[] fields) {
+List isUseInvertedIndexList = new ArrayList();
+for (DataField field : fields) {
+  if (field.getColumn().isUseInvertedIndnex() && 
field.getColumn().isDimesion()) {
+isUseInvertedIndexList.add(true);
+  } else if(field.getColumn().isDimesion()){
+isUseInvertedIndexList.add(false);
+  }
+}
+return ArrayUtils
+.toPrimitive(isUseInvertedIndexList.toArray(new 
Boolean[isUseInvertedIndexList.size()]));
+  }
+
+  private static String getComplexTypeString(DataField[] dataFields) {
+StringBuilder dimString = new StringBuilder();
+for (int i = 0; i < dataFields.length; i++) {
+  DataField dataField = dataFields[i];
+  if (dataField.getColumn().getDataType().equals(DataType.ARRAY) || 
dataField.getColumn()
+  .getDataType().equals(DataType.STRUCT)) {
+addAllComplexTypeChildren((CarbonDimension) dataField.getColumn(), 
dimString, "");
+dimString.append(CarbonCommonConstants.SEMICOLON_SPC_CHARACTER);
+  }
+}
+return dimString.toString();
+  }
+
+  /**
+   * This method will return all the child dimensions under complex 
dimension
+   *
+   */
+  private static void addAllComplexTypeChildren(CarbonDimension dimension, 
StringBuilder dimString,
+  String parent) {
+dimString.append(
+dimension.getColName() + CarbonCommonConstants.COLON_SPC_CHARACTER 
+ dimension.getDataType()
--- End diff --

ok


> 7. Add DataWriterProcessorStep which reads the data from sort temp files and 
> creates carbondata files.
> --
>
> Key: CARBONDATA-302
> URL: https://issues.apache.org/jira/browse/CARBONDATA-302
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.3.0-incubating
>
>
> Add DataWriterProcessorStep which reads the data from sort temp files and 
> merge sort it, and apply mdk generator on key and creates carbondata files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-302) 7. Add DataWriterProcessorStep which reads the data from sort temp files and creates carbondata files.

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610795#comment-15610795
 ] 

ASF GitHub Bot commented on CARBONDATA-302:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/251#discussion_r85270264
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/writer/DataWriterProcessorStepImpl.java
 ---
@@ -0,0 +1,360 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.steps.writer;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.carbon.CarbonTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.metadata.CarbonMetadata;
+import org.apache.carbondata.core.carbon.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.carbon.path.CarbonStorePath;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.IgnoreDictionary;
+import org.apache.carbondata.core.keygenerator.KeyGenerator;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.processing.datatypes.GenericDataType;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import org.apache.carbondata.processing.store.CarbonDataFileAttributes;
+import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel;
+import org.apache.carbondata.processing.store.CarbonFactHandler;
+import org.apache.carbondata.processing.store.CarbonFactHandlerFactory;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+
+/**
+ * It reads data from sorted files which are generated in previous sort 
step.
+ * And it writes data to carbondata file. It also generates mdk key while 
writing to carbondata file
+ */
+public class DataWriterProcessorStepImpl extends 
AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(DataWriterProcessorStepImpl.class.getName());
+
+  private String storeLocation;
+
+  private boolean[] isUseInvertedIndex;
+
+  private int[] dimLens;
+
+  private int dimensionCount;
+
+  private List wrapperColumnSchema;
+
+  private int[] colCardinality;
+
+  private SegmentProperties segmentProperties;
+
+  private KeyGenerator keyGenerator;
+
+  private CarbonFactHandler dataHandler;
+
+  private Map complexIndexMap;
+
+  private int noDictionaryCount;
+
+  private int complexDimensionCount;
+
+  private int measureCount;
+
+  private long readCounter;
+
+  private long 

[jira] [Commented] (CARBONDATA-302) 7. Add DataWriterProcessorStep which reads the data from sort temp files and creates carbondata files.

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610712#comment-15610712
 ] 

ASF GitHub Bot commented on CARBONDATA-302:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/251#discussion_r85267495
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/writer/DataWriterProcessorStepImpl.java
 ---
@@ -0,0 +1,360 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.steps.writer;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.carbon.CarbonTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.metadata.CarbonMetadata;
+import org.apache.carbondata.core.carbon.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.carbon.path.CarbonStorePath;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.IgnoreDictionary;
+import org.apache.carbondata.core.keygenerator.KeyGenerator;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.processing.datatypes.GenericDataType;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import org.apache.carbondata.processing.store.CarbonDataFileAttributes;
+import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel;
+import org.apache.carbondata.processing.store.CarbonFactHandler;
+import org.apache.carbondata.processing.store.CarbonFactHandlerFactory;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+
+/**
+ * It reads data from sorted files which are generated in previous sort 
step.
+ * And it writes data to carbondata file. It also generates mdk key while 
writing to carbondata file
+ */
+public class DataWriterProcessorStepImpl extends 
AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(DataWriterProcessorStepImpl.class.getName());
+
+  private String storeLocation;
+
+  private boolean[] isUseInvertedIndex;
+
+  private int[] dimLens;
+
+  private int dimensionCount;
+
+  private List wrapperColumnSchema;
+
+  private int[] colCardinality;
+
+  private SegmentProperties segmentProperties;
+
+  private KeyGenerator keyGenerator;
+
+  private CarbonFactHandler dataHandler;
+
+  private Map complexIndexMap;
+
+  private int noDictionaryCount;
+
+  private int complexDimensionCount;
+
+  private int measureCount;
+
+  private long readCounter;
+
+  private long 

[jira] [Commented] (CARBONDATA-302) 7. Add DataWriterProcessorStep which reads the data from sort temp files and creates carbondata files.

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610709#comment-15610709
 ] 

ASF GitHub Bot commented on CARBONDATA-302:
---

Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/251#discussion_r85267443
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/store/CarbonFactHandlerFactory.java
 ---
@@ -0,0 +1,48 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.processing.store;
+
+/**
+ * Factory class for CarbonFactHandler.
+ */
+public final class CarbonFactHandlerFactory {
+
+  /**
+   * Creating fact handler to write data.
+   * @param model
+   * @param handlerType
+   * @return
+   */
+  public static CarbonFactHandler 
createCarbonFactHandler(CarbonFactDataHandlerModel model,
--- End diff --

Yes, I don't see the advantage of using semaphore here because we are 
already using fixed thread pool to control the threads. I will discuss with 
team and confirm whether it is needed. 


> 7. Add DataWriterProcessorStep which reads the data from sort temp files and 
> creates carbondata files.
> --
>
> Key: CARBONDATA-302
> URL: https://issues.apache.org/jira/browse/CARBONDATA-302
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Ravindra Pesala
> Fix For: 0.3.0-incubating
>
>
> Add DataWriterProcessorStep which reads the data from sort temp files and 
> merge sort it, and apply mdk generator on key and creates carbondata files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-308) Use CarbonInputFormat in CarbonScanRDD compute

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610586#comment-15610586
 ] 

ASF GitHub Bot commented on CARBONDATA-308:
---

GitHub user jackylk opened a pull request:

https://github.com/apache/incubator-carbondata/pull/262

[CARBONDATA-308] [WIP] Use CarbonInputFormat in CarbonScanRDD compute

Use CarbonInputFormat in CarbonScanRDD compute function

1. In driver side, only getSplit is required, so only filter condition is 
required, no need to create full QueryModel object, so creation of QueryModel 
is moved from driver side to executor side.
2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute 
instead of use 
QueryExecutor directly

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jackylk/incubator-carbondata scanrdd

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/262.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #262


commit ef4a889db9b86653c273794c9a810a9cd9683437
Author: jackylk 
Date:   2016-10-22T18:43:53Z

use CarbonInputFormat in executor

commit a5c17f523c7127b538cc2d384cbff4fa454a007a
Author: jackylk 
Date:   2016-10-27T04:01:36Z

modify getPartition




> Use CarbonInputFormat in CarbonScanRDD compute
> --
>
> Key: CARBONDATA-308
> URL: https://issues.apache.org/jira/browse/CARBONDATA-308
> Project: CarbonData
>  Issue Type: Sub-task
>  Components: spark-integration
>Reporter: Jacky Li
> Fix For: 0.2.0-incubating
>
>
> Take CarbonScanRDD as the target RDD, modify as following:
> 1. In driver side, only getSplit is required, so only filter condition is 
> required, no need to create full QueryModel object, so we can move creation 
> of QueryModel from driver side to executor side.
> 2. use CarbonInputFormat.createRecordReader in CarbonScanRDD.compute instead 
> of use QueryExecutor directly



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-339) Align storePath name in generateGlobalDictionary() of GlobalDictionaryUtil.scala

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610471#comment-15610471
 ] 

ASF GitHub Bot commented on CARBONDATA-339:
---

GitHub user hseagle opened a pull request:

https://github.com/apache/incubator-carbondata/pull/261

fix issue carbondata-339

fix jira issue carbondata-339, replace hdfsLocation with storePath in the 
function generateGlobalDictionary


https://issues.apache.org/jira/browse/CARBONDATA-339

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/hseagle/incubator-carbondata carbondata-339

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/261.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #261


commit 64d4d6daaf6e8adede6cfffe94221d20f365631c
Author: hseagle 
Date:   2016-10-27T02:55:53Z

fix issue carbondata-339




> Align storePath name in generateGlobalDictionary() of 
> GlobalDictionaryUtil.scala
> 
>
> Key: CARBONDATA-339
> URL: https://issues.apache.org/jira/browse/CARBONDATA-339
> Project: CarbonData
>  Issue Type: Bug
>Reporter: Liang Chen
>Assignee: pengxu
>Priority: Trivial
> Fix For: 0.2.0-incubating
>
>
> Align storePath name in generateGlobalDictionary() of 
> GlobalDictionaryUtil.scala: Change all "hdfsLocation" to "storePath".
> It can support any path, not only hdfs path,need to change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610394#comment-15610394
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85256460
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java
 ---
@@ -111,7 +110,7 @@
   /**
* timeFormat
*/
-  protected SimpleDateFormat timeFormat;
+  protected String dateFormat;
--- End diff --

ok



> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610231#comment-15610231
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85250472
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java
 ---
@@ -651,6 +654,7 @@ public void setDefault() {
 columnSchemaDetails = "";
 columnsDataTypeString="";
 tableOption = "";
+dateFormat = CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT;
--- End diff --

ok


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610210#comment-15610210
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85249559
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -39,37 +39,32 @@
  */
 public class TimeStampDirectDictionaryGenerator implements 
DirectDictionaryGenerator {
 
-  private TimeStampDirectDictionaryGenerator() {
+  private ThreadLocal threadLocal = new ThreadLocal<>();
 
-  }
-
-  public static TimeStampDirectDictionaryGenerator instance =
-  new TimeStampDirectDictionaryGenerator();
+  private String dateFormat;
 
   /**
* The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis.
*/
-  public static final long granularityFactor;
+  public  long granularityFactor;
   /**
* The date timestamp to be considered as start date for calculating the 
timestamp
* java counts the number of milliseconds from  start of "January 1, 
1970", this property is
* customized the start of position. for example "January 1, 2000"
*/
-  public static final long cutOffTimeStamp;
+  public  long cutOffTimeStamp;
   /**
* Logger instance
*/
+
   private static final LogService LOGGER =
-  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
+  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
--- End diff --

done


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15610197#comment-15610197
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user lion-x commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85248921
  
--- Diff: 
processing/src/test/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGeneratorTest.java
 ---
@@ -37,7 +37,7 @@
   private int surrogateKey = -1;
 
   @Before public void setUp() throws Exception {
-TimeStampDirectDictionaryGenerator generator = 
TimeStampDirectDictionaryGenerator.instance;
+TimeStampDirectDictionaryGenerator generator = new 
TimeStampDirectDictionaryGenerator(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

This file is a test file, I think the TimeStampDirectDictionaryGenerator 
should be set 'CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT' for 
testing. pls check again.


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-302) 7. Add DataWriterProcessorStep which reads the data from sort temp files and creates carbondata files.

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608849#comment-15608849
 ] 

ASF GitHub Bot commented on CARBONDATA-302:
---

Github user jackylk commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/251#discussion_r85157225
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/newflow/steps/writer/DataWriterProcessorStepImpl.java
 ---
@@ -0,0 +1,360 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.processing.newflow.steps.writer;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.Iterator;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.common.logging.LogService;
+import org.apache.carbondata.common.logging.LogServiceFactory;
+import org.apache.carbondata.core.carbon.CarbonTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.metadata.CarbonMetadata;
+import org.apache.carbondata.core.carbon.metadata.schema.table.CarbonTable;
+import 
org.apache.carbondata.core.carbon.metadata.schema.table.column.ColumnSchema;
+import org.apache.carbondata.core.carbon.path.CarbonStorePath;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
+import org.apache.carbondata.core.constants.CarbonCommonConstants;
+import org.apache.carbondata.core.constants.IgnoreDictionary;
+import org.apache.carbondata.core.keygenerator.KeyGenerator;
+import org.apache.carbondata.core.util.CarbonProperties;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.core.util.CarbonUtil;
+import org.apache.carbondata.processing.datatypes.GenericDataType;
+import 
org.apache.carbondata.processing.newflow.AbstractDataLoadProcessorStep;
+import 
org.apache.carbondata.processing.newflow.CarbonDataLoadConfiguration;
+import org.apache.carbondata.processing.newflow.DataField;
+import 
org.apache.carbondata.processing.newflow.constants.DataLoadProcessorConstants;
+import 
org.apache.carbondata.processing.newflow.exception.CarbonDataLoadingException;
+import org.apache.carbondata.processing.newflow.row.CarbonRow;
+import org.apache.carbondata.processing.newflow.row.CarbonRowBatch;
+import org.apache.carbondata.processing.store.CarbonDataFileAttributes;
+import org.apache.carbondata.processing.store.CarbonFactDataHandlerModel;
+import org.apache.carbondata.processing.store.CarbonFactHandler;
+import org.apache.carbondata.processing.store.CarbonFactHandlerFactory;
+import 
org.apache.carbondata.processing.store.writer.exception.CarbonDataWriterException;
+import org.apache.carbondata.processing.util.CarbonDataProcessorUtil;
+
+/**
+ * It reads data from sorted files which are generated in previous sort 
step.
+ * And it writes data to carbondata file. It also generates mdk key while 
writing to carbondata file
+ */
+public class DataWriterProcessorStepImpl extends 
AbstractDataLoadProcessorStep {
+
+  private static final LogService LOGGER =
+  
LogServiceFactory.getLogService(DataWriterProcessorStepImpl.class.getName());
+
+  private String storeLocation;
+
+  private boolean[] isUseInvertedIndex;
+
+  private int[] dimLens;
+
+  private int dimensionCount;
+
+  private List wrapperColumnSchema;
+
+  private int[] colCardinality;
+
+  private SegmentProperties segmentProperties;
+
+  private KeyGenerator keyGenerator;
+
+  private CarbonFactHandler dataHandler;
+
+  private Map complexIndexMap;
+
+  private int noDictionaryCount;
+
+  private int complexDimensionCount;
+
+  private int measureCount;
+
+  private long readCounter;
+
+  private long 

[jira] [Commented] (CARBONDATA-337) Correct Inverted Index spelling mistakes

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608099#comment-15608099
 ] 

ASF GitHub Bot commented on CARBONDATA-337:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/257


> Correct Inverted Index spelling mistakes
> 
>
> Key: CARBONDATA-337
> URL: https://issues.apache.org/jira/browse/CARBONDATA-337
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Lionx
>Assignee: Lionx
>Priority: Minor
>
> Correct Inverted Index spelling mistakes in three files.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-338) Remove the method arguments as they are never used inside the method

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608090#comment-15608090
 ] 

ASF GitHub Bot commented on CARBONDATA-338:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/258


> Remove the method arguments as they are never used inside the method
> 
>
> Key: CARBONDATA-338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-338
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Reporter: Shivansh
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-330) Fix compiler warnings - Java related

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15608076#comment-15608076
 ] 

ASF GitHub Bot commented on CARBONDATA-330:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/250


> Fix compiler warnings - Java related
> 
>
> Key: CARBONDATA-330
> URL: https://issues.apache.org/jira/browse/CARBONDATA-330
> Project: CarbonData
>  Issue Type: Improvement
>  Components: build, core
>Affects Versions: 0.2.0-incubating
>Reporter: Aniket Adnaik
>Priority: Trivial
> Fix For: 0.2.0-incubating
>
>
> Fix java compiler warnings and code cleanup.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-338) Remove the method arguments as they are never used inside the method

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-338?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607808#comment-15607808
 ] 

ASF GitHub Bot commented on CARBONDATA-338:
---

GitHub user shiv4nsh opened a pull request:

https://github.com/apache/incubator-carbondata/pull/258

[CARBONDATA-338] Removed the unused value inside the method

Be sure to do all of the following to help us incorporate your contribution
quickly and easily:

 - [ ] Make sure the PR title is formatted like:
   `[CARBONDATA-] Description of pull request`
 - [ ] Make sure tests pass via `mvn clean verify`. (Even better, enable
   Travis-CI on your fork and ensure the whole test matrix passes).
 - [ ] Replace `` in the title with the actual Jira issue
   number, if there is one.
 - [ ] If this contribution is large, please file an Apache
   [Individual Contributor License 
Agreement](https://www.apache.org/licenses/icla.txt).
 - [ ] Testing done
 
Please provide details on 
- Whether new unit test cases have been added or why no new tests 
are required?
- What manual testing you have done?
- Any additional information to help reviewers in testing this 
change.
 
 - [ ] For large changes, please consider breaking it into sub-tasks under 
an umbrella JIRA. 
 
---



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shiv4nsh/incubator-carbondata 
improvement/CARBONDATA-338

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/258.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #258


commit 97cdfdc6bd4fc112253437628683d8fbdaab8c6f
Author: Knoldus 
Date:   2016-10-26T08:01:35Z

Removed the unused value inside the method




> Remove the method arguments as they are never used inside the method
> 
>
> Key: CARBONDATA-338
> URL: https://issues.apache.org/jira/browse/CARBONDATA-338
> Project: CarbonData
>  Issue Type: Improvement
>  Components: core
>Reporter: Shivansh
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-284) Abstracting Index and Segment interface

2016-10-26 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-284?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607805#comment-15607805
 ] 

ASF GitHub Bot commented on CARBONDATA-284:
---

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061184
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.memory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
+this.segment = segment;
+  }
+
+  @Override
+  public String getName() {
+return null;
+  }
+
+  @Override
+  public List filter(JobContext job, FilterResolverIntf filter)
+  throws IOException {
+
+List result = new LinkedList();
+
+FilterExpressionProcessor filterExpressionProcessor = new 
FilterExpressionProcessor();
+
+AbsoluteTableIdentifier absoluteTableIdentifier = null;
+
//CarbonInputFormatUtil.getAbsoluteTableIdentifier(job.getConfiguration());
+
+//for this segment fetch blocks matching filter in BTree
+List dataRefNodes = null;
+try {
+  dataRefNodes = getDataBlocksOfSegment(job, 
filterExpressionProcessor, absoluteTableIdentifier,
+  filter, segment.getId());
+} catch (IndexBuilderException e) {
+  throw new IOException(e.getMessage());
+}
 

[jira] [Commented] (CARBONDATA-305) Switching between kettle flow and new data loading flow make configurable

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607251#comment-15607251
 ] 

ASF GitHub Bot commented on CARBONDATA-305:
---

Github user asfgit closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/239


> Switching between kettle flow and new data loading flow make configurable
> -
>
> Key: CARBONDATA-305
> URL: https://issues.apache.org/jira/browse/CARBONDATA-305
> Project: CarbonData
>  Issue Type: Sub-task
>Reporter: Ravindra Pesala
>Assignee: Jacky Li
>Priority: Minor
> Fix For: 0.2.0-incubating
>
>
> Switching between kettle flow and new data loading flow make configurable. 
> This configuration should switch it dynamically while loading the data.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607196#comment-15607196
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85040305
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java
 ---
@@ -111,7 +110,7 @@
   /**
* timeFormat
--- End diff --

please correct comment


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607198#comment-15607198
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85035902
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/DirectDictionaryKeyGeneratorFactory.java
 ---
@@ -39,14 +40,26 @@ private DirectDictionaryKeyGeneratorFactory() {
* @param dataType DataType
* @return the generator instance
*/
-  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType) {
+  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType,
+   
String dateFormat) {
 DirectDictionaryGenerator directDictionaryGenerator = null;
 switch (dataType) {
   case TIMESTAMP:
-directDictionaryGenerator = 
TimeStampDirectDictionaryGenerator.instance;
+directDictionaryGenerator = new 
TimeStampDirectDictionaryGenerator(dateFormat);
 break;
   default:
+}
+return directDictionaryGenerator;
+  }
 
+  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType) {
+DirectDictionaryGenerator directDictionaryGenerator = null;
+switch (dataType) {
+  case TIMESTAMP:
+directDictionaryGenerator = new TimeStampDirectDictionaryGenerator(
+CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

here need to use CarbonProperty 
CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607191#comment-15607191
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036534
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/DirectDictionaryKeyGeneratorFactory.java
 ---
@@ -39,14 +40,26 @@ private DirectDictionaryKeyGeneratorFactory() {
* @param dataType DataType
* @return the generator instance
*/
-  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType) {
+  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType,
+   
String dateFormat) {
--- End diff --

please keep java code style


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607194#comment-15607194
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036811
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -92,23 +87,24 @@ private TimeStampDirectDictionaryGenerator() {
   cutOffTimeStampLocal = -1;
 } else {
   try {
-SimpleDateFormat timeParser = new 
SimpleDateFormat(CarbonProperties.getInstance()
-.getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
-CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT));
+SimpleDateFormat timeParser = new SimpleDateFormat(
+CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

why just use default value? 


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607188#comment-15607188
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036431
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -39,37 +39,32 @@
  */
 public class TimeStampDirectDictionaryGenerator implements 
DirectDictionaryGenerator {
 
-  private TimeStampDirectDictionaryGenerator() {
+  private ThreadLocal threadLocal = new ThreadLocal<>();
 
-  }
-
-  public static TimeStampDirectDictionaryGenerator instance =
-  new TimeStampDirectDictionaryGenerator();
+  private String dateFormat;
 
   /**
* The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis.
*/
-  public static final long granularityFactor;
+  public  long granularityFactor;
   /**
* The date timestamp to be considered as start date for calculating the 
timestamp
* java counts the number of milliseconds from  start of "January 1, 
1970", this property is
* customized the start of position. for example "January 1, 2000"
*/
-  public static final long cutOffTimeStamp;
+  public  long cutOffTimeStamp;
   /**
* Logger instance
*/
+
   private static final LogService LOGGER =
-  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
+  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
--- End diff --

please correct code style


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607190#comment-15607190
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85039192
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -470,6 +474,36 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
   }
 }
+HashMap dateformatsHashMap = new HashMap();
+if (meta.dateFormat != null) {
+  String[] dateformats = meta.dateFormat.split(",");
+  for (String dateFormat:dateformats) {
+String[] dateFormatSplits = dateFormat.split(":", 2);
+
dateformatsHashMap.put(dateFormatSplits[0],dateFormatSplits[1]);
+// TODO  verify the dateFormatSplits is valid or not
+  }
+}
+directDictionaryGenerators =
+new 
DirectDictionaryGenerator[meta.getDimensionColumnIds().length];
+for (int i = 0; i < meta.getDimensionColumnIds().length; i++) {
--- End diff --

not good to invoke getDimensionColumnIds many times


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607187#comment-15607187
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85040709
  
--- Diff: 
processing/src/test/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGeneratorTest.java
 ---
@@ -37,7 +37,7 @@
   private int surrogateKey = -1;
 
   @Before public void setUp() throws Exception {
-TimeStampDirectDictionaryGenerator generator = 
TimeStampDirectDictionaryGenerator.instance;
+TimeStampDirectDictionaryGenerator generator = new 
TimeStampDirectDictionaryGenerator(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

Should use carbon property to  create generator, not default value.
please correct all.


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607197#comment-15607197
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85040702
  
--- Diff: 
processing/src/test/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGeneratorTest.java
 ---
@@ -37,7 +37,7 @@
   private int surrogateKey = -1;
 
   @Before public void setUp() throws Exception {
-TimeStampDirectDictionaryGenerator generator = 
TimeStampDirectDictionaryGenerator.instance;
+TimeStampDirectDictionaryGenerator generator = new 
TimeStampDirectDictionaryGenerator(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

Should use carbon property to  create generator, not default value.
please correct all.


> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (CARBONDATA-37) Support Date/Time format for Timestamp columns to be defined at column level

2016-10-25 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/CARBONDATA-37?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607195#comment-15607195
 ] 

ASF GitHub Bot commented on CARBONDATA-37:
--

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85038702
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -117,9 +113,11 @@ private TimeStampDirectDictionaryGenerator() {
* @return dictionary value
*/
   @Override public int generateDirectSurrogateKey(String memberStr) {
-SimpleDateFormat timeParser = new 
SimpleDateFormat(CarbonProperties.getInstance()
-.getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
-CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT));
+SimpleDateFormat timeParser = threadLocal.get();
+if(timeParser == null){
+  timeParser = new SimpleDateFormat(dateFormat);
+  threadLocal.set(timeParser);
+}
 timeParser.setLenient(false);
--- End diff --

Please extract above codes to a new initial method,  and invoke this method 
in different thread.
It it not good to run these codes in generateDirectSurrogateKey method.



> Support Date/Time format for Timestamp columns to be defined at column level
> 
>
> Key: CARBONDATA-37
> URL: https://issues.apache.org/jira/browse/CARBONDATA-37
> Project: CarbonData
>  Issue Type: Improvement
>Reporter: Vimal Das Kammath
>Assignee: Lionx
>
> Carbon support defining the Date/Time format. But the configuration for the 
> same is present in carbon.properties and hence is global for all tables.
> This global configuration for timestamp format cannot support scenarios where 
> different tables or different Timestamp columns in the same table.
> Suggest to provide option in the create table DDL itself to define the format 
> for each Timestamp column. Also provide defaults so that users can create 
> table with Timestamp columns without having to always define the Date/Time 
> format.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


  1   2   3   4   5   >