from:"QiangCai"

Re: [DISCUSSION] Support new feature: bitmap encode

2017-04-17 Thread David QiangCai

Hi,

In my opinion, we need the following points as least.

1. create table ddl can specify a bitmap option

2. add a new Encoding: BITMAP



-
Best Regards
David QiangCai
--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Support-new-feature-bitmap-encode-tp10913p11210.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-941) 7. Compaction of Partition Table

2017-04-17 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-941:
---

 Summary: 7. Compaction of Partition Table
 Key: CARBONDATA-941
 URL: https://issues.apache.org/jira/browse/CARBONDATA-941
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai


compaction same partition of segments



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-940) 6. Alter table add/drop partition

2017-04-17 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-940:
---

 Summary: 6. Alter table add/drop partition 
 Key: CARBONDATA-940
 URL: https://issues.apache.org/jira/browse/CARBONDATA-940
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-939) 5. Partition tables join on partition column

2017-04-17 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-939:
---

 Summary: 5. Partition tables join on partition column
 Key: CARBONDATA-939
 URL: https://issues.apache.org/jira/browse/CARBONDATA-939
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-938) 4. Detail filter query on partition column

2017-04-17 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-938:
---

 Summary: 4. Detail filter query on partition column 
 Key: CARBONDATA-938
 URL: https://issues.apache.org/jira/browse/CARBONDATA-938
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai


use filter(equal,range, in etc.) to get partition id list, use this partition 
id list to filter BTree. 




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-937) 3. Data loading of partition table

2017-04-17 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-937:
---

 Summary: 3. Data loading of partition table 
 Key: CARBONDATA-937
 URL: https://issues.apache.org/jira/browse/CARBONDATA-937
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai


use PartitionInfo to generate Partitioner (hash, list, range) 
use Partitioner to repartition input data file, reuse loadDataFrame flow 
use partition id to replace task no in carbondata/index file name 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-936) 2. Create Table with Partition

2017-04-17 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-936:
---

 Summary: 2. Create Table with Partition
 Key: CARBONDATA-936
 URL: https://issues.apache.org/jira/browse/CARBONDATA-936
 Project: CarbonData
  Issue Type: Sub-task
 Environment: CarbonSparkSqlParser parse partition part to generate 
PartitionInfo, add PartitionInfo to TableModel. 

CreateTable add PartitionInfo to TableInfo,  store PartitionInfo in TableSchema 

support spark 2.1 at first.
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-935) 1. Define PartitionInfo model

2017-04-17 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-935:
---

 Summary: 1. Define PartitionInfo model
 Key: CARBONDATA-935
 URL: https://issues.apache.org/jira/browse/CARBONDATA-935
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai


modify schema.thrift to define PartitionInfo, add PartitionInfo to TableSchema 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: [Discussion] Implement Partition Table Feature

2017-04-17 Thread QiangCai

sub-task list of Partition Table Feature:

1. Define PartitionInfo model
modify schema.thrift to define PartitionInfo, add PartitionInfo to
TableSchema

2. Create Table with Partition
CarbonSparkSqlParser parse partition part to generate PartitionInfo, add
PartitionInfo to TableModel.

CreateTable add PartitionInfo to TableInfo,  store PartitionInfo in
TableSchema

3. Data loading of partition table
use PartitionInfo to generate Partitioner (hash, list, range)
use Partitioner to repartition input data file, reuse loadDataFrame flow
use partition id to replace task no in carbondata/index file name

4. Detail filter query on partition column
support equal filter to get partition id, use this partition id to filter
BTree.
In the future, will support other filter(range, in...)

5. Partition tables join on partition column

6. Alter table add/drop partition

Any suggestion?

Best Regards,
David QiangCai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Implement-Partition-Table-Feature-tp10938p11151.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: [Discussion] Implement Partition Table Feature

2017-04-16 Thread QiangCai

Hi Cao Lu,
  I suggest to mention the following information.

1. table creation
modify schema.thrift, add optional partitioner information to TableSchema

2. alter table add/drop partition

3. data loading of partition table
use  partitioner information of TableSchema to generate the table
partitioner, then use this partitioner to repartition input RDD, finally
reuse loadDataFrame flow.

use partition id to replace task no in carbondata/inde file name, so no need
to store partition information in footer and index file, 

4. detail query on partition table with partition column filter.
use partition column filter to get partition id list, use partition id list
to filter BTree.

5. partition tables join on partition column




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Implement-Partition-Table-Feature-tp10938p11139.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-923) InserInto read from one row not working

2017-04-13 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-923:
---

 Summary: InserInto read from one row not working
 Key: CARBONDATA-923
 URL: https://issues.apache.org/jira/browse/CARBONDATA-923
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai
Assignee: QiangCai


Reproduce:
create table OneRowTable(col1 string, col2 string, col3 int, col4 double) 
stored by 'carbondata'
insert into OneRowTable select '0.1', 'a.b', 1, 1.2

Exception:
org.apache.spark.sql.AnalysisException: cannot resolve '`0.1`' given input 
columns: [0.1, a.b, 1, 1.2];;
'Project ['0.1, 'a.b]
+- Project [0.1 AS 0.1#11, a.b AS a.b#12, 1 AS 1#13, 1.2 AS 1.2#14]
   +- OneRowRelation$



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: how to distribute cabon.properties file

2017-04-12 Thread QiangCai

+1



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/how-to-distribute-cabon-properties-file-tp10687p10869.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: bucket table

2017-04-12 Thread QiangCai

Hi Lu,

Please find faq page: http://carbondata.apache.org/docs/latest/faq.html

I think you should add option 'BAD_RECORDS_ACTION'='FORCE' to  load sql.
e.g. load data ... options( 'BAD_RECORDS_ACTION'='FORCE')

Best Regards
David QiangCai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/bucket-table-tp10862p10865.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: CarbonLock Exception refractor

2017-04-12 Thread QiangCai

I agree to refactor code to give detail information.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonLock-Exception-refractor-tp10686p10864.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: What is the problem of insert overwrite a table stored by carbondata

2017-04-12 Thread QiangCai

Hi,
  Now CarbonData don't implement overwrite InsertInto. 

  It is a bug, should be fixed, and we can implement overwrite InsertInto.

Best Regards
David QiangCai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/What-is-the-problem-of-insert-overwrite-a-table-stored-by-carbondata-tp10691p10813.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-887) lazy rdd iterator for InsertInto

2017-04-07 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-887:
---

 Summary: lazy rdd iterator for InsertInto
 Key: CARBONDATA-887
 URL: https://issues.apache.org/jira/browse/CARBONDATA-887
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai
Assignee: QiangCai






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-886) remove all redundant local variable

2017-04-07 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-886:
---

 Summary: remove all redundant local variable
 Key: CARBONDATA-886
 URL: https://issues.apache.org/jira/browse/CARBONDATA-886
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai
Assignee: QiangCai
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: how to load the dictionary file when loading data.

2017-04-07 Thread QiangCai

Hi,
You can have a look AllDictionaryExample in spark module.

A example of dict file as following:

2,usa
2,china
1,2015/7/26
1,2015/7/23
1,2015/7/30
3,aaa3
3,aaa10

   The line format is   ","

Best regards
David QiangCai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/how-to-load-the-dictionary-file-when-loading-data-tp10459p10485.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Carbondata with Datastax / Cassandra

2017-04-05 Thread QiangCai

Hi,
Now CarbonData can't support cfs file system. 

I think we can try to support it.

Best regards
David CaiQiang.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Carbondata-with-Datastax-Cassandra-tp10031p10138.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[DISCUSSION]implement delta encoding for numeric type column in SORT_COLUMNS

2017-04-05 Thread QiangCai

Hi all,

Now we plan to implement delta encoding for the numeric type column in
SORT_COLUMNS. 

1. use delta encoding to encode the numeric type data

2. write present metadata to the page header, to record the null value.

3. improve the compression on no-dictionary string column.
use RLE to compress the array of the lengths in LV encoding.

Any thoughts, comments and questions ?

Best Regards
David QiangCai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-implement-delta-encoding-for-numeric-type-column-in-SORT-COLUMNS-tp10124.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Dimension column of integer type - to exclude from dictionary

2017-04-03 Thread QiangCai

SORT_COLUMNS can add a numeric type column to a dimension without dictionary
encoding. SORT_COLUMNS feature was implemented in 12-dev branch.

Best Regards
David QiangCai 



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dimension-column-of-integer-type-to-exclude-from-dictionary-tp9961p9977.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[DISCUSSION]support new feature: Partition Table

2017-03-31 Thread QiangCai

Hi all, 

  Let's start the discussion regarding the partition table.

  To support partition table, what we should do?

  1. create table with partition to support Range Partitioning, Hash
Partitioning, List Partitioning and Composite Partitioning, write the
partition info to schema. 

  2. during data loading, re-partition the input data, start a task process
a partition, write partition information to footer and index file.

  3. during data query, prune B+Tree by partition if the filter contain the
partition column. or prune data blocks by partition when there is only
partition column predicate.

  4. optimizer the join performance of two partition tables if partition
column is the join column.

   Any thoughts, comments and questions ?

   Thanks!

Best Regards
David



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-support-new-feature-Partition-Table-tp9935.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-842) when SORT_COLUMN is empty, no need to sort data.

2017-03-31 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-842:
---

 Summary: when SORT_COLUMN is empty, no need to sort data.
 Key: CARBONDATA-842
 URL: https://issues.apache.org/jira/browse/CARBONDATA-842
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-841) improve the compress encoding for numeric type column to give good performance

2017-03-31 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-841:
---

 Summary: improve the compress encoding for numeric type column to 
give good performance
 Key: CARBONDATA-841
 URL: https://issues.apache.org/jira/browse/CARBONDATA-841
 Project: CarbonData
  Issue Type: Sub-task
Reporter: QiangCai


Now no-dictionary column use LV(length-value) encoding. It isn't the best 
choice for numeric type column.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: carbondata find a bug

2017-03-27 Thread QiangCai

+1

Best Regards
David QiangCai




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbondata-find-a-bug-tp9747p9749.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: [New Feature] Range Filter Optimization

2017-03-21 Thread QiangCai

+1

Best Regards
David QiangCai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/New-Feature-Range-Filter-Optimization-tp9343p9383.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: [PROPOSAL] Update on the Jenkins CarbonData job

2017-03-18 Thread QiangCai

+1

Best Regards
David QiangCai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/PROPOSAL-Update-on-the-Jenkins-CarbonData-job-tp9202p9231.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-782) Support SORT_COLUMNS

2017-03-15 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-782:
---

 Summary: Support SORT_COLUMNS
 Key: CARBONDATA-782
 URL: https://issues.apache.org/jira/browse/CARBONDATA-782
 Project: CarbonData
  Issue Type: New Feature
Reporter: QiangCai
Assignee: QiangCai


The tasks of SORT_COLUMNS:

1.Support create table with sort_columns property.
e.g. tblproperties('sort_columns' = 'col7,col3')
The table with SORT_COLUMNS property will be sorted by SORT_COLUMNS. The order 
of columns is decided by SORT_COLUMNS.

2.Change the encoding rule of SORT_COLUMNS
Firstly, the rule of column encoding will keep consistent with previous.
Secondly, if a column of SORT_COLUMNS is a measure before, now this column will 
be created as a dimension. And this dimension is a no-dicitonary column(Better 
to use other direct-dictionary).
Thirdly, the dimension of SORT_COLUMNS have RLE and ROWID page, other dimension 
have only RLE(not sorted).

3.The start/end key should be composed of SORT_COLUMNS.
Using SORT_COLUMNS to build start/end key during data loading and select query.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: column auto mapping when loading data from csv file

2017-03-13 Thread QiangCai

Hi Ravindra,
How about to use 'NOT_AUTOFILEHEADER'='true' as following?
   I think 'AUTOFILEHEADER'='true' should be a default behavior.

   if (load sql contain "FILEHEADER") {
 1. input files shouldn't contain a fileheader
 2. use "FILEHEADER" parameter to load data after passing column check

   } else {

 if (not exists 'NOT_AUTOFILEHEADER' option) {

   1.auto map the first row of input files with table's columns
   if(the first row contain all column names ) {
  2. use first row as the file header to load data
   } else if (the first row contain part of column names) {
  2. stop loading
   } else {
 2. use the origin order of table's columns to load data
   }

 } else {
   1. input files should contain a file header
   2. use first row as the file header to load data after passing column
check
}
  }



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/column-auto-mapping-when-loading-data-from-csv-file-tp8717p8753.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-765) dataframe wirter need to first drop table unless loading said table not found

2017-03-13 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-765:
---

 Summary: dataframe wirter need to first drop table unless loading 
said table not found
 Key: CARBONDATA-765
 URL: https://issues.apache.org/jira/browse/CARBONDATA-765
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai
Assignee: QiangCai


dataframe wirter need to first drop table unless loading said table not found



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-764) Improving Non-dictionary storage & performance

2017-03-13 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-764:
---

 Summary: Improving Non-dictionary storage & performance
 Key: CARBONDATA-764
 URL: https://issues.apache.org/jira/browse/CARBONDATA-764
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai


mail list:
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Improving-Non-dictionary-storage-amp-performance-td8146.html



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-762) modify all schemaName->databaseName, cubeName->tableName

2017-03-13 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-762:
---

 Summary: modify all schemaName->databaseName, cubeName->tableName
 Key: CARBONDATA-762
 URL: https://issues.apache.org/jira/browse/CARBONDATA-762
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai
Assignee: QiangCai
Priority: Minor


modify all schemaName->databaseName, cubeName->tableName



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-763) Add L5 loading support, global sorting like HBase

2017-03-13 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-763:
---

 Summary: Add L5 loading support, global sorting like HBase
 Key: CARBONDATA-763
 URL: https://issues.apache.org/jira/browse/CARBONDATA-763
 Project: CarbonData
  Issue Type: Bug
Reporter: QiangCai


Add L5 loading support, global sorting like HBase



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-761) Dictionary server should not be shutdown after loading

2017-03-13 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-761:
---

 Summary: Dictionary server should not be shutdown after loading
 Key: CARBONDATA-761
 URL: https://issues.apache.org/jira/browse/CARBONDATA-761
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Reporter: QiangCai
Assignee: QiangCai
Priority: Minor


Code:
CarbonTableSchema/LoadTable



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-760) Should to avoid ERROR log for successful select query

2017-03-13 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-760:
---

 Summary: Should to avoid ERROR log for successful select query
 Key: CARBONDATA-760
 URL: https://issues.apache.org/jira/browse/CARBONDATA-760
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Reporter: QiangCai
Assignee: QiangCai
Priority: Minor


Some table without delete or update operator maybe not have delta files. Select 
query shouldn't record error log.

Code:
SegmentUpdateStatusManager.getDeltaFiles

Log detail:
 ERROR 06-03 19:21:37,531 - pool-475-thread-1 Invalid tuple id 
arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/0
ERROR 06-03 19:21:37,948 - pool-475-thread-1 Invalid tuple id 
arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/1
ERROR 06-03 19:21:38,517 - pool-475-thread-1 Invalid tuple id 
arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/2
ERROR 06-03 19:21:38,909 - pool-475-thread-1 Invalid tuple id 
arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/3
ERROR 06-03 19:21:39,292 - pool-475-thread-1 Invalid tuple id 
arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/4



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Re: I loaded the data with the timestamp field unsuccessful

2017-03-08 Thread QiangCai

try /M/dd

Best Regards
David CaiQiang



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/I-loaded-the-data-with-the-timestamp-field-unsuccessful-tp8417p8419.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: [DISCUSS] For the dimension default should be no dictionary

2017-02-28 Thread QiangCai

+1

It is not easy for user to understand the previous options.
The logic of this two options SORT_COLUMNS AND TABLE_DICTIOANRY  is very
clear.
I am coding to implement SORT_COLUMNS option by this way.

Best Regards
David Caiqiang




--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-dimension-default-should-be-no-dictionary-tp8010p8122.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: 回复： data lost when loading data from csv file to carbon table

2017-02-15 Thread QiangCai

Maybe you can check PR594, it will fix a bug which will impact the result of
loading.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/data-lost-when-loading-data-from-csv-file-to-carbon-table-tp7554p7639.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-701) There is a memory leak issue in no kettle loading flow

2017-02-10 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-701:
---

 Summary: There is a memory leak issue in no kettle loading flow
 Key: CARBONDATA-701
 URL: https://issues.apache.org/jira/browse/CARBONDATA-701
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
 Fix For: 1.0.1-incubating


When loading more data, throw OOM exception.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] [Created] (CARBONDATA-659) Should add WhitespaceAround and ParenPad to javastyle

2017-01-18 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-659:
---

 Summary: Should add WhitespaceAround and ParenPad to javastyle
 Key: CARBONDATA-659
 URL: https://issues.apache.org/jira/browse/CARBONDATA-659
 Project: CarbonData
  Issue Type: Improvement
Reporter: QiangCai
Assignee: QiangCai
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-627) Fix Union unit test case for spark2

2017-01-11 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-627:
---

 Summary: Fix Union unit test case for spark2
 Key: CARBONDATA-627
 URL: https://issues.apache.org/jira/browse/CARBONDATA-627
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
Priority: Minor
 Fix For: 1.0.0-incubating


UnionTestCase failed in spark2, We should fix it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-614) Should fix dictionary locked issue

2017-01-09 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-614:
---

 Summary: Should fix dictionary locked issue
 Key: CARBONDATA-614
 URL: https://issues.apache.org/jira/browse/CARBONDATA-614
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
 Fix For: 1.0.0-incubating


When carbon.properties.filepath is configured exactly,  still show the 
following exception.

Error: org.apache.spark.SparkException: Job aborted due to stage failure: 
Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in 
stage 2.0 (TID 8, hadoop-slave-2): java.lang.RuntimeException: Dictionary 
file name is locked for updation. Please try after some time 
at scala.sys.package$.error(package.scala:27) 
at 
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:364)
 
at 
org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:302)
 
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) 
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) 
at org.apache.spark.scheduler.Task.run(Task.scala:89) 
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) 
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
at java.lang.Thread.run(Thread.java:745) 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-600) Should reuse unit test case for integration module

2017-01-05 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-600:
---

 Summary: Should reuse unit test case for integration module
 Key: CARBONDATA-600
 URL: https://issues.apache.org/jira/browse/CARBONDATA-600
 Project: CarbonData
  Issue Type: Bug
  Components: spark-integration
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
Priority: Minor
 Fix For: 1.0.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-601) Should reuse unit test case for integration module

2017-01-05 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-601:
---

 Summary: Should reuse unit test case for integration module
 Key: CARBONDATA-601
 URL: https://issues.apache.org/jira/browse/CARBONDATA-601
 Project: CarbonData
  Issue Type: Test
  Components: spark-integration
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
Priority: Minor
 Fix For: 1.0.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: carbon thrift server for spark 2.0 showing unusual behaviour

2017-01-04 Thread QiangCai

column name is "int"?  type is "String"?
better to try another column name.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbon-thrift-server-for-spark-2-0-showing-unusual-behaviour-tp5384p5454.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: carbontable compact throw err

2017-01-03 Thread QiangCai

You can check as following and show the result.

1)
select * from test limit 1

2)
show segments for table test limit 1000

3)
alter table test compact 'major'

Better to provide more log info.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbontable-compact-throw-err-tp5382p5426.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: Dictionary file is locked for Updation, unable to Load

2017-01-03 Thread QiangCai

I think you can have a look this maillist.
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-td5076.html

Have a look the following guide and pay attention to carbon.properties file. 

https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide


For spark yarn cluster mode, 
1. both driver side and executor side need same carbon.properties file.
2. set carbon.lock.type=HDFSLOCK 
3. set carbon.properties.filepath
spark.executor.extraJavaOptions
-Dcarbon.properties.filepath=
spark.driver.extraJavaOptions  
-Dcarbon.properties.filepath=



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-Updation-unable-to-Load-tp5359p5422.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: why there is a table name option in carbon source format?

2017-01-03 Thread QiangCai

For Spark 2,  when using SparkSession to create carbon table, need tableName
option to create carbon schema in store location folder. Better to use
CarbonSession to create carbon table now.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/why-there-is-a-table-name-option-in-carbon-source-format-tp5385p5420.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: CatalystAnalysy

2016-12-27 Thread QiangCai

You can try -Dscala.version=



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CatalystAnalysy-tp5129p5141.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: 回复： Dictionary file is locked for updation

2016-12-27 Thread QiangCai

please correct the path of carbon.properties file.

spark.executor.extraJavaOptions
-Dcarbon.properties.filepath=carbon.properties 





--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-tp5076p5092.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: 回复： Dictionary file is locked for updation

2016-12-27 Thread QiangCai

Please try to add carbon.storelocation to carbon.properties file.
e.g.
carbon.storelocation=hdfs://master:9000/carbondata/store

You can have a look the following guide and pay attention to
carbon.properties file.

https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-tp5076p5090.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[Discussion]Simplify the deployment of carbondata

2016-12-25 Thread QiangCai

hi all,
  
  I suggest to simplify deployment of CarbonData as following.
  1. remove kettle dependency completely, no need to deploy "carbonplugins"
folder on each node, no need to set "carbhon.kettle.home"
  2. remove carbon.properties file from executor side, pass CarbonData
configuration to executor side from driver side 
  3. use "spark.sql.warehouse.dir"(spark2) or
"hive.metastore.warehouse.dir"(spark1) instead of "carbon.storelocation"

  So we will just need to deploy CarbonData jars on cluster mode in the
future.

  What's your opinion?

Best Regards 
David Cai



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Simplify-the-deployment-of-carbondata-tp5000.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: same query and I change the value than throw a error

2016-12-22 Thread QiangCai

Please provide executor side log



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/same-query-and-I-change-the-value-than-throw-a-error-tp4811p4893.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: 回复： etl.DataLoadingException: The input file does not exist

2016-12-22 Thread QiangCai

Please find the following item in carbon.properties file, give a proper
path(hdfs://master:9000/)
carbon.ddl.base.hdfs.url

During loading, will combine this url and data file path.

BTW, better to provide the version number.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-input-file-does-not-exist-tp4853p4888.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: carbondata-0.2 load data failed in yarn molde

2016-12-22 Thread QiangCai

I think the root cause is metadata lock type. 
Please add "carbon.lock.type" configuration to carbon.properties as
following.
#Local mode
carbon.lock.type=LOCALLOCK

#Cluster mode
carbon.lock.type=HDFSLOCK



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbondata-0-2-load-data-failed-in-yarn-molde-tp3908p4887.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-540) Support inserInto without kettle for spark2

2016-12-18 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-540:
---

 Summary: Support inserInto without kettle for spark2
 Key: CARBONDATA-540
 URL: https://issues.apache.org/jira/browse/CARBONDATA-540
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
 Fix For: 1.0.0-incubating


Support inserInto without kettle for spark2



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [DISCUSSION] CarbonData loading solution discussion

2016-12-15 Thread QiangCai

+1We should flexibility choose loading solution according to Scenario 1 and
2, and will get performance benefits.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-CarbonData-loading-solution-discussion-tp4490p4520.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-535) carbondata should support datatype: Date and Char

2016-12-15 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-535:
---

 Summary: carbondata should support datatype: Date and Char
 Key: CARBONDATA-535
 URL: https://issues.apache.org/jira/browse/CARBONDATA-535
 Project: CarbonData
  Issue Type: Improvement
  Components: file-format
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
 Fix For: 1.0.0-incubating


carbondata should support datatype: Date and Char



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-497) [Spark2] fix datatype issue of CarbonLateDecoderRule

2016-12-06 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-497:
---

 Summary: [Spark2] fix datatype issue of CarbonLateDecoderRule
 Key: CARBONDATA-497
 URL: https://issues.apache.org/jira/browse/CARBONDATA-497
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
 Fix For: 1.0.0-incubating


In spark2, LogicalPlan resolve method need to check input data type. If the 
data type is wrong, the logical plan will be unresolved. 
CarbonLateDecoderRule should correct the datatype of dictionary dimension to 
resolve the logical plan.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-493) Insertinto sql can not select from a empty table

2016-12-04 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-493:
---

 Summary: Insertinto sql can not select from a empty table
 Key: CARBONDATA-493
 URL: https://issues.apache.org/jira/browse/CARBONDATA-493
 Project: CarbonData
  Issue Type: Bug
Affects Versions: 1.0.0-incubating
Reporter: QiangCai
Assignee: QiangCai
 Fix For: 1.0.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-488) add InsertInto feature for spark2

2016-12-02 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-488:
---

 Summary: add InsertInto feature for spark2
 Key: CARBONDATA-488
 URL: https://issues.apache.org/jira/browse/CARBONDATA-488
 Project: CarbonData
  Issue Type: New Feature
  Components: data-load
Affects Versions: 0.3.0-incubating
Reporter: QiangCai
Assignee: QiangCai
 Fix For: 0.3.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-486) Rreading dataframe concurrently will lead to wrong data

2016-12-02 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-486:
---

 Summary: Rreading dataframe concurrently will lead to wrong data
 Key: CARBONDATA-486
 URL: https://issues.apache.org/jira/browse/CARBONDATA-486
 Project: CarbonData
  Issue Type: Bug
  Components: data-load
Affects Versions: 0.3.0-incubating
Reporter: QiangCai
Assignee: QiangCai
 Fix For: 0.3.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Created] (CARBONDATA-481) [SPARK2]fix late decoder and support whole stage code gen

2016-12-01 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-481:
---

 Summary: [SPARK2]fix late decoder and support whole stage code gen
 Key: CARBONDATA-481
 URL: https://issues.apache.org/jira/browse/CARBONDATA-481
 Project: CarbonData
  Issue Type: Bug
  Components: data-query
Affects Versions: 0.2.0-incubating
Reporter: QiangCai
 Fix For: 0.3.0-incubating






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Re: [Feature Proposal] Spark 2 integration with CarbonData

2016-11-28 Thread QiangCai

+1
I think I can finish some tasks. please assign some tasks to me.



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Spark-2-integration-with-CarbonData-tp3236p3320.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: [VOTE] Apache CarbonData 0.2.0-incubating release

2016-11-10 Thread QiangCai

+1



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/VOTE-Apache-CarbonData-0-2-0-incubating-release-tp2823p2836.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:

2016-11-08 Thread QiangCai

I look forward to release this version. 
Carbondata improved query and load performance. And it is a good news no
need to install thrift for building project. 
Btw, How many PR merged into this version?



--
View this message in context: 
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/As-planed-we-are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2752.html
Sent from the Apache CarbonData Mailing List archive mailing list archive at 
Nabble.com.

[jira] [Created] (CARBONDATA-368) Should improve performance of DataFrame loading

2016-11-03 Thread QiangCai (JIRA)

QiangCai created CARBONDATA-368:
---

 Summary: Should improve performance of DataFrame loading 
 Key: CARBONDATA-368
 URL: https://issues.apache.org/jira/browse/CARBONDATA-368
 Project: CarbonData
  Issue Type: Improvement
  Components: data-load
Affects Versions: 0.3.0-incubating
Reporter: QiangCai
Assignee: QiangCai






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[GitHub] incubator-carbondata pull request #278: [CARBONDATA-85][WIP] support insert ...

2016-11-02 Thread QiangCai

GitHub user QiangCai opened a pull request:

https://github.com/apache/incubator-carbondata/pull/278

[CARBONDATA-85][WIP] support insert into carbon table select from table

**1.Support insert into carbon table select from table**

**2.Imporve  performance of dataframe loading**

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/incubator-carbondata loaddataframe

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/278.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #278


commit 217947dc6f167f8ae490d28254eed4785eea73d3
Author: QiangCai <david.c...@gmail.com>
Date:   2016-10-24T02:54:20Z

DataLoadCoalescedRDD

DataLoadPartitionCoalescer

concurrently read dataframe

commit 39d517179184c8412a488e44b5b914412ec24451
Author: QiangCai <qiang...@qq.com>
Date:   2016-11-01T09:39:57Z

add test case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-01 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058166
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
@@ -22,28 +22,44 @@
 import java.io.DataOutput;
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.Distributable;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
 
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Writable;
 import org.apache.hadoop.mapreduce.lib.input.FileSplit;
 
+
 /**
  * Carbon input split to allow distributed read of CarbonInputFormat.
  */
-public class CarbonInputSplit extends FileSplit implements Serializable, 
Writable {
+public class CarbonInputSplit extends FileSplit implements Distributable, 
Serializable, Writable {
 
   private static final long serialVersionUID = 3520344046772190207L;
   private String segmentId;
-  /**
+  public String taskId = "0";
+
+  /*
* Number of BlockLets in a block
*/
   private int numberOfBlocklets = 0;
 
-  public CarbonInputSplit() {
-super(null, 0, 0, new String[0]);
+  public  CarbonInputSplit() {
   }
 
-  public CarbonInputSplit(String segmentId, Path path, long start, long 
length,
+  private void parserPath(Path path) {
--- End diff --

please use CarbonTablePath.DataFileUtil.getTaskNo


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...

2016-11-01 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058188
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java ---
@@ -22,28 +22,44 @@
 import java.io.DataOutput;
 import java.io.IOException;
 import java.io.Serializable;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.Distributable;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import org.apache.carbondata.core.carbon.path.CarbonTablePath;
 
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.Writable;
 import org.apache.hadoop.mapreduce.lib.input.FileSplit;
 
+
 /**
  * Carbon input split to allow distributed read of CarbonInputFormat.
  */
-public class CarbonInputSplit extends FileSplit implements Serializable, 
Writable {
+public class CarbonInputSplit extends FileSplit implements Distributable, 
Serializable, Writable {
 
   private static final long serialVersionUID = 3520344046772190207L;
   private String segmentId;
-  /**
+  public String taskId = "0";
+
+  /*
* Number of BlockLets in a block
*/
   private int numberOfBlocklets = 0;
 
-  public CarbonInputSplit() {
-super(null, 0, 0, new String[0]);
+  public  CarbonInputSplit() {
   }
 
-  public CarbonInputSplit(String segmentId, Path path, long start, long 
length,
+  private void parserPath(Path path) {
+String[] nameParts = path.getName().split("-");
+if (nameParts != null && nameParts.length >= 3) {
+  this.taskId = nameParts[2];
+}
+  }
+
+  private CarbonInputSplit(String segmentId, Path path, long start, long 
length,
--- End diff --

please initialize taskId


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-28 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85632892
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -470,6 +472,34 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
   }
 }
+HashMap<String, String> dateformatsHashMap = new HashMap<String, 
String>();
+if (meta.dateFormat != null) {
+  String[] dateformats = 
meta.dateFormat.split(CarbonCommonConstants.COMMA);
+  for (String dateFormat:dateformats) {
+String[] dateFormatSplits = dateFormat.split(":", 2);
+
dateformatsHashMap.put(dateFormatSplits[0].toLowerCase().trim(),
+dateFormatSplits[1].trim());
+  }
+}
+String[] DimensionColumnIds = meta.getDimensionColumnIds();
+directDictionaryGenerators =
+new DirectDictionaryGenerator[DimensionColumnIds.length];
+for (int i = 0; i < DimensionColumnIds.length; i++) {
+  ColumnSchemaDetails columnSchemaDetails = 
columnSchemaDetailsWrapper.get(
+  DimensionColumnIds[i]);
+  if (columnSchemaDetails.isDirectDictionary()) {
+String columnName = columnSchemaDetails.getColumnName();
+DataType columnType = columnSchemaDetails.getColumnType();
+if (dateformatsHashMap.containsKey(columnName)) {
--- End diff --

better to use "get" method, just look up map once.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85463723
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/Segment.java 
---
@@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.hadoop.fs.FileStatus;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+/**
+ * Within a carbon table, each data load becomes one Segment, which stores 
all data files belong to this load in
+ * the segment folder.
+ */
+public abstract class Segment {
+
+  protected String id;
+
+  /**
+   * Path of the segment folder
+   */
+  private String path;
+
+  public Segment(String id, String path) {
+this.id = id;
+this.path = path;
+  }
+
+  public String getId() {
+return id;
+  }
+
+  public String getPath() {
+return path;
+  }
+
+  /**
+   * return all InputSplit of this segment, each file is a InputSplit
+   * @param job job context
+   * @return all InputSplit
+   * @throws IOException
+   */
+  public List getAllSplits(JobContext job) throws IOException {
--- End diff --

I suggest to return List


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85464310
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java
 ---
@@ -0,0 +1,73 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.segment.impl;
+
+import java.io.IOException;
+import java.util.LinkedList;
+import java.util.List;
+
+import org.apache.carbondata.hadoop.CarbonInputSplit;
--- End diff --

please use internal.CarbonInputSplit


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85461092
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle(
 Seq.empty
   }
 
+  private def validateDateFormat(dateFormat: String, dateDimensionsName: 
ArrayBuffer[String]):
+  Unit = {
+if (dateFormat == "") {
+  throw new MalformedCarbonCommandException("Error: Option DateFormat 
is set an empty string.")
+} else {
+  var dateFormats: Array[String] = dateFormat.split(",")
--- End diff --

CarbonCommonConstant.COMMA


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460088
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle(
 Seq.empty
   }
 
+  private def validateDateFormat(dateFormat: String, dateDimensionsName: 
ArrayBuffer[String]):
+  Unit = {
+if (dateFormat == "") {
+  throw new MalformedCarbonCommandException("Error: Option DateFormat 
is set an empty string.")
+} else {
+  var dateFormats: Array[String] = dateFormat.split(",")
+  for (singleDateFormat <- dateFormats) {
+var dateFormatSplits: Array[String] = singleDateFormat.split(":", 
2)
+if (!dateDimensionsName.contains(dateFormatSplits(0))) {
--- End diff --

take care case-insensitive


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459286
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle(
   val allDictionaryPath = options.getOrElse("all_dictionary_path", "")
   val complex_delimiter_level_1 = 
options.getOrElse("complex_delimiter_level_1", "\\$")
   val complex_delimiter_level_2 = 
options.getOrElse("complex_delimiter_level_2", "\\:")
+  val timeFormat = options.getOrElse("timeformat", null)
--- End diff --

"timeFormat" is useless


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460589
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -343,7 +345,8 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   }
 
   data.setGenerator(
-  
KeyGeneratorFactory.getKeyGenerator(getUpdatedLens(meta.dimLens, 
meta.dimPresent)));
+  KeyGeneratorFactory.getKeyGenerator(
+  getUpdatedLens(meta.dimLens, meta.dimPresent)));
--- End diff --

keep code style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-27 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459810
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle(
   val allDictionaryPath = options.getOrElse("all_dictionary_path", "")
   val complex_delimiter_level_1 = 
options.getOrElse("complex_delimiter_level_1", "\\$")
   val complex_delimiter_level_2 = 
options.getOrElse("complex_delimiter_level_2", "\\:")
+  val timeFormat = options.getOrElse("timeformat", null)
+  val dateFormat = options.getOrElse("dateformat", null)
+  val tableDimensions: util.List[CarbonDimension] = 
table.getDimensionByTableName(tableName)
+  val dateDimensionsName = new ArrayBuffer[String]
+  tableDimensions.toArray.foreach {
+dimension => {
+  val columnSchema: ColumnSchema = 
dimension.asInstanceOf[CarbonDimension].getColumnSchema
+  if (columnSchema.getDataType.name == "TIMESTAMP") {
+dateDimensionsName += columnSchema.getColumnName
+  }
+}
+  }
+  if (dateFormat != null) {
+validateDateFormat(dateFormat, dateDimensionsName)
+  }
--- End diff --

please move these code into method validateDateFormat


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284][WIP] Abstracting in...

2016-10-26 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061184
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.memory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
+
+  private static final Log LOG = 
LogFactory.getLog(InMemoryBTreeIndex.class);
+  private Segment segment;
+
+  InMemoryBTreeIndex(Segment segment) {
+this.segment = segment;
+  }
+
+  @Override
+  public String getName() {
+return null;
+  }
+
+  @Override
+  public List filter(JobContext job, FilterResolverIntf filter)
+  throws IOException {
+
+List result = new LinkedList();
+
+FilterExpressionProcessor filterExpressionProcessor = new 
FilterExpressionProcessor();
+
+AbsoluteTableIdentifier absoluteTableIdentifier = null;
+
//CarbonInputFormatUtil.getAbsoluteTableIdentifier(job.getConfiguration());
+
+//for this segment fetch blocks matching filter in BTree
+List dataRefNodes = null;
+try {
+  dataRefNodes = getDataBlocksOfSegment(job, 
filterExpressionProcessor, absoluteTableIdentifier,
+  filter, segment.getId());
+} catch (IndexBuilderException e) {
+  throw new IOException(e.getMessage());
+}
+for (DataRefNode dataRefNode : dataRefNodes) {
+  BlockBTreeLeafNode leafNode = (BlockBTreeLeafNode) dataRefNode;
+  TableBlockInfo tableBlockInfo = leafNode.getTableBlockInfo();
+  result.add(

[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284][WIP] Abstracting in...

2016-10-26 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061025
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java
 ---
@@ -0,0 +1,214 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.carbondata.hadoop.internal.index.memory;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+
+import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier;
+import org.apache.carbondata.core.carbon.datastore.DataRefNode;
+import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder;
+import org.apache.carbondata.core.carbon.datastore.IndexKey;
+import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore;
+import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex;
+import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos;
+import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties;
+import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo;
+import 
org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder;
+import 
org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode;
+import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants;
+import 
org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder;
+import org.apache.carbondata.core.keygenerator.KeyGenException;
+import org.apache.carbondata.hadoop.CarbonInputSplit;
+import org.apache.carbondata.hadoop.internal.index.Index;
+import org.apache.carbondata.hadoop.internal.segment.Segment;
+import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil;
+import 
org.apache.carbondata.scan.executor.exception.QueryExecutionException;
+import org.apache.carbondata.scan.filter.FilterExpressionProcessor;
+import org.apache.carbondata.scan.filter.FilterUtil;
+import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf;
+import org.apache.commons.logging.Log;
+import org.apache.commons.logging.LogFactory;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.JobContext;
+
+class InMemoryBTreeIndex implements Index {
--- End diff --

I understand InMemoryBTreeIndex  is segment level's index.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85040305
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java
 ---
@@ -111,7 +110,7 @@
   /**
* timeFormat
--- End diff --

please correct comment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85039192
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -470,6 +474,36 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
   }
 }
+HashMap<String, String> dateformatsHashMap = new HashMap<String, 
String>();
+if (meta.dateFormat != null) {
+  String[] dateformats = meta.dateFormat.split(",");
+  for (String dateFormat:dateformats) {
+String[] dateFormatSplits = dateFormat.split(":", 2);
+
dateformatsHashMap.put(dateFormatSplits[0],dateFormatSplits[1]);
+// TODO  verify the dateFormatSplits is valid or not
+  }
+}
+directDictionaryGenerators =
+new 
DirectDictionaryGenerator[meta.getDimensionColumnIds().length];
+for (int i = 0; i < meta.getDimensionColumnIds().length; i++) {
--- End diff --

not good to invoke getDimensionColumnIds many times


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85039860
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -39,37 +39,32 @@
  */
 public class TimeStampDirectDictionaryGenerator implements 
DirectDictionaryGenerator {
 
-  private TimeStampDirectDictionaryGenerator() {
+  private ThreadLocal threadLocal = new ThreadLocal<>();
 
-  }
-
-  public static TimeStampDirectDictionaryGenerator instance =
-  new TimeStampDirectDictionaryGenerator();
+  private String dateFormat;
 
   /**
* The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis.
*/
-  public static final long granularityFactor;
+  public  long granularityFactor;
   /**
* The date timestamp to be considered as start date for calculating the 
timestamp
* java counts the number of milliseconds from  start of "January 1, 
1970", this property is
* customized the start of position. for example "January 1, 2000"
*/
-  public static final long cutOffTimeStamp;
+  public  long cutOffTimeStamp;
   /**
* Logger instance
*/
+
   private static final LogService LOGGER =
-  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
+  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
 
-  /**
-   * initialization block for granularityFactor and cutOffTimeStamp
-   */
-  static {
+  public TimeStampDirectDictionaryGenerator(String dateFormat) {
--- End diff --

please keep default dateformat TimeStampDirectDictionaryGenerator() 
construct method, If DataLoading command didn't provide dateformat option for 
some column, we can use none-parameter construct method.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85039488
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java
 ---
@@ -470,6 +474,36 @@ public boolean processRow(StepMetaInterface smi, 
StepDataInterface sdi) throws K
   break;
   }
 }
+HashMap<String, String> dateformatsHashMap = new HashMap<String, 
String>();
+if (meta.dateFormat != null) {
+  String[] dateformats = meta.dateFormat.split(",");
+  for (String dateFormat:dateformats) {
+String[] dateFormatSplits = dateFormat.split(":", 2);
+
dateformatsHashMap.put(dateFormatSplits[0],dateFormatSplits[1]);
+// TODO  verify the dateFormatSplits is valid or not
+  }
+}
+directDictionaryGenerators =
+new 
DirectDictionaryGenerator[meta.getDimensionColumnIds().length];
+for (int i = 0; i < meta.getDimensionColumnIds().length; i++) {
+  ColumnSchemaDetails columnSchemaDetails = 
columnSchemaDetailsWrapper.get(
+  meta.getDimensionColumnIds()[i]);
+  if (columnSchemaDetails.isDirectDictionary()) {
+if 
(dateformatsHashMap.containsKey(columnSchemaDetails.getColumnName())) {
+  directDictionaryGenerators[i] =
+  
DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator(
+  columnSchemaDetails.getColumnType(),
+  
dateformatsHashMap.get(columnSchemaDetails.getColumnName()));
+} else {
+  String dateFormat = CarbonProperties.getInstance()
+  
.getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
+  
CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
+  directDictionaryGenerators[i] =
+  
DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator(
+  columnSchemaDetails.getColumnType(), 
dateFormat);
--- End diff --

1. move out CarbonProperties.getInstance().getProperty  from for loop
2. for defaut  dataformat, use  method 
getDirectDictionaryGenerator(DataType dataType)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85038170
  
--- Diff: 
integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala
 ---
@@ -1129,6 +1130,9 @@ case class LoadTable(
   carbonLoadModel.setEscapeChar(escapeChar)
   carbonLoadModel.setQuoteChar(quoteChar)
   carbonLoadModel.setCommentChar(commentchar)
+  carbonLoadModel.setDateFormat(dateFormat)
--- End diff --

It is necessary to validate input "dateFormat" before dataloading


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85038977
  
--- Diff: 
processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java
 ---
@@ -651,6 +654,7 @@ public void setDefault() {
 columnSchemaDetails = "";
 columnsDataTypeString="";
 tableOption = "";
+dateFormat = CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT;
--- End diff --

Here should be empty string


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85040709
  
--- Diff: 
processing/src/test/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGeneratorTest.java
 ---
@@ -37,7 +37,7 @@
   private int surrogateKey = -1;
 
   @Before public void setUp() throws Exception {
-TimeStampDirectDictionaryGenerator generator = 
TimeStampDirectDictionaryGenerator.instance;
+TimeStampDirectDictionaryGenerator generator = new 
TimeStampDirectDictionaryGenerator(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

Should use carbon property to  create generator, not default value.
please correct all.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036554
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -39,37 +39,32 @@
  */
 public class TimeStampDirectDictionaryGenerator implements 
DirectDictionaryGenerator {
 
-  private TimeStampDirectDictionaryGenerator() {
+  private ThreadLocal threadLocal = new ThreadLocal<>();
 
-  }
-
-  public static TimeStampDirectDictionaryGenerator instance =
-  new TimeStampDirectDictionaryGenerator();
+  private String dateFormat;
 
   /**
* The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis.
*/
-  public static final long granularityFactor;
+  public  long granularityFactor;
   /**
* The date timestamp to be considered as start date for calculating the 
timestamp
* java counts the number of milliseconds from  start of "January 1, 
1970", this property is
* customized the start of position. for example "January 1, 2000"
*/
-  public static final long cutOffTimeStamp;
+  public  long cutOffTimeStamp;
   /**
* Logger instance
*/
+
   private static final LogService LOGGER =
-  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
+  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
--- End diff --

please correct all code style
wrap line indentation length is 4 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036811
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -92,23 +87,24 @@ private TimeStampDirectDictionaryGenerator() {
   cutOffTimeStampLocal = -1;
 } else {
   try {
-SimpleDateFormat timeParser = new 
SimpleDateFormat(CarbonProperties.getInstance()
-.getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
-CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT));
+SimpleDateFormat timeParser = new SimpleDateFormat(
+CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

why just use default value? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85038702
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -117,9 +113,11 @@ private TimeStampDirectDictionaryGenerator() {
* @return dictionary value
*/
   @Override public int generateDirectSurrogateKey(String memberStr) {
-SimpleDateFormat timeParser = new 
SimpleDateFormat(CarbonProperties.getInstance()
-.getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT,
-CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT));
+SimpleDateFormat timeParser = threadLocal.get();
+if(timeParser == null){
+  timeParser = new SimpleDateFormat(dateFormat);
+  threadLocal.set(timeParser);
+}
 timeParser.setLenient(false);
--- End diff --

Please extract above codes to a new initial method,  and invoke this method 
in different thread.
It it not good to run these codes in generateDirectSurrogateKey method.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85035902
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/DirectDictionaryKeyGeneratorFactory.java
 ---
@@ -39,14 +40,26 @@ private DirectDictionaryKeyGeneratorFactory() {
* @param dataType DataType
* @return the generator instance
*/
-  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType) {
+  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType,
+   
String dateFormat) {
 DirectDictionaryGenerator directDictionaryGenerator = null;
 switch (dataType) {
   case TIMESTAMP:
-directDictionaryGenerator = 
TimeStampDirectDictionaryGenerator.instance;
+directDictionaryGenerator = new 
TimeStampDirectDictionaryGenerator(dateFormat);
 break;
   default:
+}
+return directDictionaryGenerator;
+  }
 
+  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType) {
+DirectDictionaryGenerator directDictionaryGenerator = null;
+switch (dataType) {
+  case TIMESTAMP:
+directDictionaryGenerator = new TimeStampDirectDictionaryGenerator(
+CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT);
--- End diff --

here need to use CarbonProperty 
CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036534
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/DirectDictionaryKeyGeneratorFactory.java
 ---
@@ -39,14 +40,26 @@ private DirectDictionaryKeyGeneratorFactory() {
* @param dataType DataType
* @return the generator instance
*/
-  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType) {
+  public static DirectDictionaryGenerator 
getDirectDictionaryGenerator(DataType dataType,
+   
String dateFormat) {
--- End diff --

please keep java code style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...

2016-10-25 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036431
  
--- Diff: 
core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java
 ---
@@ -39,37 +39,32 @@
  */
 public class TimeStampDirectDictionaryGenerator implements 
DirectDictionaryGenerator {
 
-  private TimeStampDirectDictionaryGenerator() {
+  private ThreadLocal threadLocal = new ThreadLocal<>();
 
-  }
-
-  public static TimeStampDirectDictionaryGenerator instance =
-  new TimeStampDirectDictionaryGenerator();
+  private String dateFormat;
 
   /**
* The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis.
*/
-  public static final long granularityFactor;
+  public  long granularityFactor;
   /**
* The date timestamp to be considered as start date for calculating the 
timestamp
* java counts the number of milliseconds from  start of "January 1, 
1970", this property is
* customized the start of position. for example "January 1, 2000"
*/
-  public static final long cutOffTimeStamp;
+  public  long cutOffTimeStamp;
   /**
* Logger instance
*/
+
   private static final LogService LOGGER =
-  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
+  
LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName());
--- End diff --

please correct code style


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #127: [CARBONDATA-213] Remove dependency: ...

2016-10-25 Thread QiangCai

Github user QiangCai closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/127


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-19 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r84068389
  
--- Diff: 
hadoop/src/test/java/org/apache/carbondata/hadoop/csv/CSVInputFormatTest.java 
---
@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.csv;
+
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileOutputStream;
+import java.io.IOException;
+
+import org.apache.carbondata.hadoop.io.StringArrayWritable;
+
+import junit.framework.TestCase;
+import org.junit.Assert;
+import org.junit.Test;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.compress.BZip2Codec;
+import org.apache.hadoop.io.compress.CompressionOutputStream;
+import org.apache.hadoop.io.compress.GzipCodec;
+import org.apache.hadoop.io.compress.Lz4Codec;
+import org.apache.hadoop.io.compress.SnappyCodec;
+import org.apache.hadoop.mapreduce.Job;
+import org.apache.hadoop.mapreduce.Mapper;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
+
+public class CSVInputFormatTest extends TestCase {
+
+  /**
+   * generate compressed files, no need to call this method.
+   * @throws Exception
+   */
+  public void testGenerateCompressFiles() throws Exception {
+String pwd = new File("src/test/resources").getCanonicalPath();
+String inputFile = pwd + "/data.csv";
+FileInputStream input = new FileInputStream(inputFile);
+Configuration conf = new Configuration();
+
+// .gz
+String outputFile = pwd + "/data.csv.gz";
+FileOutputStream output = new FileOutputStream(outputFile);
+GzipCodec gzip = new GzipCodec();
+gzip.setConf(conf);
+CompressionOutputStream outputStream = gzip.createOutputStream(output);
+int i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+// .bz2
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.bz2";
+output = new FileOutputStream(outputFile);
+BZip2Codec bzip2 = new BZip2Codec();
+bzip2.setConf(conf);
+outputStream = bzip2.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+// .snappy
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.snappy";
+output = new FileOutputStream(outputFile);
+SnappyCodec snappy = new SnappyCodec();
+snappy.setConf(conf);
+outputStream = snappy.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+//.lz4
+input = new FileInputStream(inputFile);
+outputFile = pwd + "/data.csv.lz4";
+output = new FileOutputStream(outputFile);
+Lz4Codec lz4 = new Lz4Codec();
+lz4.setConf(conf);
+outputStream = lz4.createOutputStream(output);
+i = -1;
+while ((i = input.read()) != -1) {
+  outputStream.write(i);
+}
+outputStream.close();
+input.close();
+
+  }
+
+  /**
+   * CSVCheckMapper check the content of csv files.
+   */
+  public static class CSVCheckMapper extends Mapper<NullWritable, 
StringArrayWritable, NullWritable,
+  NullWritable> {
+@Override
+protected void map(NullWritable key, StringArrayWritable value, 
Context context)
+throws IOException, InterruptedException {

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83387366
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java 
---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.mapreduce;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.Reader;
+
+import org.apache.carbondata.hadoop.io.BoundedInputStream;
+import org.apache.carbondata.hadoop.io.StringArrayWritable;
+import org.apache.carbondata.hadoop.util.CSVInputFormatUtil;
+
+import com.univocity.parsers.csv.CsvParser;
+import org.apache.hadoop.conf.Configuration;
+import org.apache.hadoop.fs.FSDataInputStream;
+import org.apache.hadoop.fs.FileSystem;
+import org.apache.hadoop.fs.Path;
+import org.apache.hadoop.fs.Seekable;
+import org.apache.hadoop.io.NullWritable;
+import org.apache.hadoop.io.Text;
+import org.apache.hadoop.io.compress.CodecPool;
+import org.apache.hadoop.io.compress.CompressionCodec;
+import org.apache.hadoop.io.compress.CompressionCodecFactory;
+import org.apache.hadoop.io.compress.CompressionInputStream;
+import org.apache.hadoop.io.compress.Decompressor;
+import org.apache.hadoop.io.compress.SplitCompressionInputStream;
+import org.apache.hadoop.io.compress.SplittableCompressionCodec;
+import org.apache.hadoop.mapreduce.InputSplit;
+import org.apache.hadoop.mapreduce.RecordReader;
+import org.apache.hadoop.mapreduce.TaskAttemptContext;
+import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
+import org.apache.hadoop.mapreduce.lib.input.FileSplit;
+import org.apache.hadoop.util.LineReader;
+
+/**
+ * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files.  
Files are broken into lines.
+ * Values are the line of csv files.
+ */
+public class CSVInputFormat extends FileInputFormat<NullWritable, 
StringArrayWritable> {
+
+  @Override
+  public RecordReader<NullWritable, StringArrayWritable> 
createRecordReader(InputSplit inputSplit,
+  TaskAttemptContext context) throws IOException, InterruptedException 
{
+return new NewCSVRecordReader();
+  }
+
+  /**
+   * Treats value as line in file. Key is null.
+   */
+  public static class NewCSVRecordReader extends 
RecordReader<NullWritable, StringArrayWritable> {
+
+private long start;
+private long end;
+private BoundedInputStream boundedInputStream;
+private Reader reader;
+private CsvParser csvParser;
+private StringArrayWritable value;
+private String[] columns;
+private Seekable filePosition;
+private boolean isCompressedInput;
+private Decompressor decompressor;
+
+@Override
+public void initialize(InputSplit inputSplit, TaskAttemptContext 
context)
+throws IOException, InterruptedException {
+  FileSplit split = (FileSplit) inputSplit;
+  this.start = split.getStart();
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386474
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java 
---
@@ -0,0 +1,180 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.mapreduce;
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai

Github user QiangCai commented on a diff in the pull request:

https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386400
  
--- Diff: 
hadoop/src/main/java/org/apache/carbondata/hadoop/io/StringArrayWritable.java 
---
@@ -0,0 +1,69 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.carbondata.hadoop.io;
+
+import java.io.DataInput;
+import java.io.DataOutput;
+import java.io.IOException;
+import java.nio.charset.Charset;
+import java.util.Arrays;
+
+import org.apache.hadoop.io.Writable;
+
+/**
+ * A String sequence that is usable as a key or value.
+ */
+public class StringArrayWritable implements Writable {
+  private String[] values;
+
+  public String[] toStrings() {
+return values;
+  }
+
+  public void set(String[] values) {
+this.values = values;
+  }
+
+  public String[] get() {
+return values;
+  }
+
+  @Override public void readFields(DataInput in) throws IOException {
--- End diff --

fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #127: [CARBONDATA-213] Remove dependency: ...

2016-10-13 Thread QiangCai

GitHub user QiangCai reopened a pull request:

https://github.com/apache/incubator-carbondata/pull/127

[CARBONDATA-213] Remove dependency: thrift complier

[CARBONDATA-213] Remove dependency: thrift complier

**analysis**

I think it unnecessary for user/developer to download thrift complier When 
building CarbonData project.

**solution**

Provide the java code, generated by thrift complier.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/QiangCai/incubator-carbondata fixthrifterror

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/incubator-carbondata/pull/127.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #127


commit ff895c5276569bef358ec02356400210014911de
Author: QiangCai <qiang...@qq.com>
Date:   2016-10-13T08:44:22Z

add format java module




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-carbondata pull request #132: [CARBONDATA-218]Remove dependency: s...

2016-10-13 Thread QiangCai

Github user QiangCai closed the pull request at:

https://github.com/apache/incubator-carbondata/pull/132


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

1 2 >

1 - 100 of 130 matches

Mail list logo