Re: [DISCUSSION] Support new feature: bitmap encode
Hi, In my opinion, we need the following points as least. 1. create table ddl can specify a bitmap option 2. add a new Encoding: BITMAP - Best Regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-Support-new-feature-bitmap-encode-tp10913p11210.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-941) 7. Compaction of Partition Table
QiangCai created CARBONDATA-941: --- Summary: 7. Compaction of Partition Table Key: CARBONDATA-941 URL: https://issues.apache.org/jira/browse/CARBONDATA-941 Project: CarbonData Issue Type: Sub-task Reporter: QiangCai compaction same partition of segments -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-940) 6. Alter table add/drop partition
QiangCai created CARBONDATA-940: --- Summary: 6. Alter table add/drop partition Key: CARBONDATA-940 URL: https://issues.apache.org/jira/browse/CARBONDATA-940 Project: CarbonData Issue Type: Sub-task Reporter: QiangCai -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-939) 5. Partition tables join on partition column
QiangCai created CARBONDATA-939: --- Summary: 5. Partition tables join on partition column Key: CARBONDATA-939 URL: https://issues.apache.org/jira/browse/CARBONDATA-939 Project: CarbonData Issue Type: Sub-task Reporter: QiangCai -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-938) 4. Detail filter query on partition column
QiangCai created CARBONDATA-938: --- Summary: 4. Detail filter query on partition column Key: CARBONDATA-938 URL: https://issues.apache.org/jira/browse/CARBONDATA-938 Project: CarbonData Issue Type: Sub-task Reporter: QiangCai use filter(equal,range, in etc.) to get partition id list, use this partition id list to filter BTree. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-937) 3. Data loading of partition table
QiangCai created CARBONDATA-937: --- Summary: 3. Data loading of partition table Key: CARBONDATA-937 URL: https://issues.apache.org/jira/browse/CARBONDATA-937 Project: CarbonData Issue Type: Sub-task Reporter: QiangCai use PartitionInfo to generate Partitioner (hash, list, range) use Partitioner to repartition input data file, reuse loadDataFrame flow use partition id to replace task no in carbondata/index file name -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-936) 2. Create Table with Partition
QiangCai created CARBONDATA-936: --- Summary: 2. Create Table with Partition Key: CARBONDATA-936 URL: https://issues.apache.org/jira/browse/CARBONDATA-936 Project: CarbonData Issue Type: Sub-task Environment: CarbonSparkSqlParser parse partition part to generate PartitionInfo, add PartitionInfo to TableModel. CreateTable add PartitionInfo to TableInfo, store PartitionInfo in TableSchema support spark 2.1 at first. Reporter: QiangCai -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-935) 1. Define PartitionInfo model
QiangCai created CARBONDATA-935: --- Summary: 1. Define PartitionInfo model Key: CARBONDATA-935 URL: https://issues.apache.org/jira/browse/CARBONDATA-935 Project: CarbonData Issue Type: Sub-task Reporter: QiangCai modify schema.thrift to define PartitionInfo, add PartitionInfo to TableSchema -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: [Discussion] Implement Partition Table Feature
sub-task list of Partition Table Feature: 1. Define PartitionInfo model modify schema.thrift to define PartitionInfo, add PartitionInfo to TableSchema 2. Create Table with Partition CarbonSparkSqlParser parse partition part to generate PartitionInfo, add PartitionInfo to TableModel. CreateTable add PartitionInfo to TableInfo, store PartitionInfo in TableSchema 3. Data loading of partition table use PartitionInfo to generate Partitioner (hash, list, range) use Partitioner to repartition input data file, reuse loadDataFrame flow use partition id to replace task no in carbondata/index file name 4. Detail filter query on partition column support equal filter to get partition id, use this partition id to filter BTree. In the future, will support other filter(range, in...) 5. Partition tables join on partition column 6. Alter table add/drop partition Any suggestion? Best Regards, David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Implement-Partition-Table-Feature-tp10938p11151.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: [Discussion] Implement Partition Table Feature
Hi Cao Lu, I suggest to mention the following information. 1. table creation modify schema.thrift, add optional partitioner information to TableSchema 2. alter table add/drop partition 3. data loading of partition table use partitioner information of TableSchema to generate the table partitioner, then use this partitioner to repartition input RDD, finally reuse loadDataFrame flow. use partition id to replace task no in carbondata/inde file name, so no need to store partition information in footer and index file, 4. detail query on partition table with partition column filter. use partition column filter to get partition id list, use partition id list to filter BTree. 5. partition tables join on partition column -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Implement-Partition-Table-Feature-tp10938p11139.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-923) InserInto read from one row not working
QiangCai created CARBONDATA-923: --- Summary: InserInto read from one row not working Key: CARBONDATA-923 URL: https://issues.apache.org/jira/browse/CARBONDATA-923 Project: CarbonData Issue Type: Bug Reporter: QiangCai Assignee: QiangCai Reproduce: create table OneRowTable(col1 string, col2 string, col3 int, col4 double) stored by 'carbondata' insert into OneRowTable select '0.1', 'a.b', 1, 1.2 Exception: org.apache.spark.sql.AnalysisException: cannot resolve '`0.1`' given input columns: [0.1, a.b, 1, 1.2];; 'Project ['0.1, 'a.b] +- Project [0.1 AS 0.1#11, a.b AS a.b#12, 1 AS 1#13, 1.2 AS 1.2#14] +- OneRowRelation$ -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: how to distribute cabon.properties file
+1 -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/how-to-distribute-cabon-properties-file-tp10687p10869.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: bucket table
Hi Lu, Please find faq page: http://carbondata.apache.org/docs/latest/faq.html I think you should add option 'BAD_RECORDS_ACTION'='FORCE' to load sql. e.g. load data ... options( 'BAD_RECORDS_ACTION'='FORCE') Best Regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/bucket-table-tp10862p10865.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: CarbonLock Exception refractor
I agree to refactor code to give detail information. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CarbonLock-Exception-refractor-tp10686p10864.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: What is the problem of insert overwrite a table stored by carbondata
Hi, Now CarbonData don't implement overwrite InsertInto. It is a bug, should be fixed, and we can implement overwrite InsertInto. Best Regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/What-is-the-problem-of-insert-overwrite-a-table-stored-by-carbondata-tp10691p10813.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-887) lazy rdd iterator for InsertInto
QiangCai created CARBONDATA-887: --- Summary: lazy rdd iterator for InsertInto Key: CARBONDATA-887 URL: https://issues.apache.org/jira/browse/CARBONDATA-887 Project: CarbonData Issue Type: Improvement Reporter: QiangCai Assignee: QiangCai -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-886) remove all redundant local variable
QiangCai created CARBONDATA-886: --- Summary: remove all redundant local variable Key: CARBONDATA-886 URL: https://issues.apache.org/jira/browse/CARBONDATA-886 Project: CarbonData Issue Type: Improvement Reporter: QiangCai Assignee: QiangCai Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: how to load the dictionary file when loading data.
Hi, You can have a look AllDictionaryExample in spark module. A example of dict file as following: 2,usa 2,china 1,2015/7/26 1,2015/7/23 1,2015/7/30 3,aaa3 3,aaa10 The line format is "," Best regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/how-to-load-the-dictionary-file-when-loading-data-tp10459p10485.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: Carbondata with Datastax / Cassandra
Hi, Now CarbonData can't support cfs file system. I think we can try to support it. Best regards David CaiQiang. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Carbondata-with-Datastax-Cassandra-tp10031p10138.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[DISCUSSION]implement delta encoding for numeric type column in SORT_COLUMNS
Hi all, Now we plan to implement delta encoding for the numeric type column in SORT_COLUMNS. 1. use delta encoding to encode the numeric type data 2. write present metadata to the page header, to record the null value. 3. improve the compression on no-dictionary string column. use RLE to compress the array of the lengths in LV encoding. Any thoughts, comments and questions ? Best Regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-implement-delta-encoding-for-numeric-type-column-in-SORT-COLUMNS-tp10124.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: Dimension column of integer type - to exclude from dictionary
SORT_COLUMNS can add a numeric type column to a dimension without dictionary encoding. SORT_COLUMNS feature was implemented in 12-dev branch. Best Regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dimension-column-of-integer-type-to-exclude-from-dictionary-tp9961p9977.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[DISCUSSION]support new feature: Partition Table
Hi all, Let's start the discussion regarding the partition table. To support partition table, what we should do? 1. create table with partition to support Range Partitioning, Hash Partitioning, List Partitioning and Composite Partitioning, write the partition info to schema. 2. during data loading, re-partition the input data, start a task process a partition, write partition information to footer and index file. 3. during data query, prune B+Tree by partition if the filter contain the partition column. or prune data blocks by partition when there is only partition column predicate. 4. optimizer the join performance of two partition tables if partition column is the join column. Any thoughts, comments and questions ? Thanks! Best Regards David -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-support-new-feature-Partition-Table-tp9935.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-842) when SORT_COLUMN is empty, no need to sort data.
QiangCai created CARBONDATA-842: --- Summary: when SORT_COLUMN is empty, no need to sort data. Key: CARBONDATA-842 URL: https://issues.apache.org/jira/browse/CARBONDATA-842 Project: CarbonData Issue Type: Sub-task Reporter: QiangCai -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-841) improve the compress encoding for numeric type column to give good performance
QiangCai created CARBONDATA-841: --- Summary: improve the compress encoding for numeric type column to give good performance Key: CARBONDATA-841 URL: https://issues.apache.org/jira/browse/CARBONDATA-841 Project: CarbonData Issue Type: Sub-task Reporter: QiangCai Now no-dictionary column use LV(length-value) encoding. It isn't the best choice for numeric type column. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: carbondata find a bug
+1 Best Regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbondata-find-a-bug-tp9747p9749.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: [New Feature] Range Filter Optimization
+1 Best Regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/New-Feature-Range-Filter-Optimization-tp9343p9383.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: [PROPOSAL] Update on the Jenkins CarbonData job
+1 Best Regards David QiangCai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/PROPOSAL-Update-on-the-Jenkins-CarbonData-job-tp9202p9231.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-782) Support SORT_COLUMNS
QiangCai created CARBONDATA-782: --- Summary: Support SORT_COLUMNS Key: CARBONDATA-782 URL: https://issues.apache.org/jira/browse/CARBONDATA-782 Project: CarbonData Issue Type: New Feature Reporter: QiangCai Assignee: QiangCai The tasks of SORT_COLUMNS: 1.Support create table with sort_columns property. e.g. tblproperties('sort_columns' = 'col7,col3') The table with SORT_COLUMNS property will be sorted by SORT_COLUMNS. The order of columns is decided by SORT_COLUMNS. 2.Change the encoding rule of SORT_COLUMNS Firstly, the rule of column encoding will keep consistent with previous. Secondly, if a column of SORT_COLUMNS is a measure before, now this column will be created as a dimension. And this dimension is a no-dicitonary column(Better to use other direct-dictionary). Thirdly, the dimension of SORT_COLUMNS have RLE and ROWID page, other dimension have only RLE(not sorted). 3.The start/end key should be composed of SORT_COLUMNS. Using SORT_COLUMNS to build start/end key during data loading and select query. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: column auto mapping when loading data from csv file
Hi Ravindra, How about to use 'NOT_AUTOFILEHEADER'='true' as following? I think 'AUTOFILEHEADER'='true' should be a default behavior. if (load sql contain "FILEHEADER") { 1. input files shouldn't contain a fileheader 2. use "FILEHEADER" parameter to load data after passing column check } else { if (not exists 'NOT_AUTOFILEHEADER' option) { 1.auto map the first row of input files with table's columns if(the first row contain all column names ) { 2. use first row as the file header to load data } else if (the first row contain part of column names) { 2. stop loading } else { 2. use the origin order of table's columns to load data } } else { 1. input files should contain a file header 2. use first row as the file header to load data after passing column check } } -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/column-auto-mapping-when-loading-data-from-csv-file-tp8717p8753.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-765) dataframe wirter need to first drop table unless loading said table not found
QiangCai created CARBONDATA-765: --- Summary: dataframe wirter need to first drop table unless loading said table not found Key: CARBONDATA-765 URL: https://issues.apache.org/jira/browse/CARBONDATA-765 Project: CarbonData Issue Type: Bug Reporter: QiangCai Assignee: QiangCai dataframe wirter need to first drop table unless loading said table not found -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-764) Improving Non-dictionary storage & performance
QiangCai created CARBONDATA-764: --- Summary: Improving Non-dictionary storage & performance Key: CARBONDATA-764 URL: https://issues.apache.org/jira/browse/CARBONDATA-764 Project: CarbonData Issue Type: Improvement Reporter: QiangCai mail list: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Improving-Non-dictionary-storage-amp-performance-td8146.html -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-762) modify all schemaName->databaseName, cubeName->tableName
QiangCai created CARBONDATA-762: --- Summary: modify all schemaName->databaseName, cubeName->tableName Key: CARBONDATA-762 URL: https://issues.apache.org/jira/browse/CARBONDATA-762 Project: CarbonData Issue Type: Bug Reporter: QiangCai Assignee: QiangCai Priority: Minor modify all schemaName->databaseName, cubeName->tableName -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-763) Add L5 loading support, global sorting like HBase
QiangCai created CARBONDATA-763: --- Summary: Add L5 loading support, global sorting like HBase Key: CARBONDATA-763 URL: https://issues.apache.org/jira/browse/CARBONDATA-763 Project: CarbonData Issue Type: Bug Reporter: QiangCai Add L5 loading support, global sorting like HBase -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-761) Dictionary server should not be shutdown after loading
QiangCai created CARBONDATA-761: --- Summary: Dictionary server should not be shutdown after loading Key: CARBONDATA-761 URL: https://issues.apache.org/jira/browse/CARBONDATA-761 Project: CarbonData Issue Type: Bug Components: data-load Reporter: QiangCai Assignee: QiangCai Priority: Minor Code: CarbonTableSchema/LoadTable -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-760) Should to avoid ERROR log for successful select query
QiangCai created CARBONDATA-760: --- Summary: Should to avoid ERROR log for successful select query Key: CARBONDATA-760 URL: https://issues.apache.org/jira/browse/CARBONDATA-760 Project: CarbonData Issue Type: Bug Components: data-query Reporter: QiangCai Assignee: QiangCai Priority: Minor Some table without delete or update operator maybe not have delta files. Select query shouldn't record error log. Code: SegmentUpdateStatusManager.getDeltaFiles Log detail: ERROR 06-03 19:21:37,531 - pool-475-thread-1 Invalid tuple id arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/0 ERROR 06-03 19:21:37,948 - pool-475-thread-1 Invalid tuple id arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/1 ERROR 06-03 19:21:38,517 - pool-475-thread-1 Invalid tuple id arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/2 ERROR 06-03 19:21:38,909 - pool-475-thread-1 Invalid tuple id arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/3 ERROR 06-03 19:21:39,292 - pool-475-thread-1 Invalid tuple id arbonstore/default/comparetest_carbon/Fact/0/0/0-0-0-1488799238178/4 -- This message was sent by Atlassian JIRA (v6.3.15#6346)
Re: I loaded the data with the timestamp field unsuccessful
try /M/dd Best Regards David CaiQiang -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/I-loaded-the-data-with-the-timestamp-field-unsuccessful-tp8417p8419.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: [DISCUSS] For the dimension default should be no dictionary
+1 It is not easy for user to understand the previous options. The logic of this two options SORT_COLUMNS AND TABLE_DICTIOANRY is very clear. I am coding to implement SORT_COLUMNS option by this way. Best Regards David Caiqiang -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSS-For-the-dimension-default-should-be-no-dictionary-tp8010p8122.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: 回复: data lost when loading data from csv file to carbon table
Maybe you can check PR594, it will fix a bug which will impact the result of loading. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/data-lost-when-loading-data-from-csv-file-to-carbon-table-tp7554p7639.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-701) There is a memory leak issue in no kettle loading flow
QiangCai created CARBONDATA-701: --- Summary: There is a memory leak issue in no kettle loading flow Key: CARBONDATA-701 URL: https://issues.apache.org/jira/browse/CARBONDATA-701 Project: CarbonData Issue Type: Improvement Components: data-load Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Fix For: 1.0.1-incubating When loading more data, throw OOM exception. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (CARBONDATA-659) Should add WhitespaceAround and ParenPad to javastyle
QiangCai created CARBONDATA-659: --- Summary: Should add WhitespaceAround and ParenPad to javastyle Key: CARBONDATA-659 URL: https://issues.apache.org/jira/browse/CARBONDATA-659 Project: CarbonData Issue Type: Improvement Reporter: QiangCai Assignee: QiangCai Priority: Trivial -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-627) Fix Union unit test case for spark2
QiangCai created CARBONDATA-627: --- Summary: Fix Union unit test case for spark2 Key: CARBONDATA-627 URL: https://issues.apache.org/jira/browse/CARBONDATA-627 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Priority: Minor Fix For: 1.0.0-incubating UnionTestCase failed in spark2, We should fix it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-614) Should fix dictionary locked issue
QiangCai created CARBONDATA-614: --- Summary: Should fix dictionary locked issue Key: CARBONDATA-614 URL: https://issues.apache.org/jira/browse/CARBONDATA-614 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Fix For: 1.0.0-incubating When carbon.properties.filepath is configured exactly, still show the following exception. Error: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 8, hadoop-slave-2): java.lang.RuntimeException: Dictionary file name is locked for updation. Please try after some time at scala.sys.package$.error(package.scala:27) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD$$anon$1.(CarbonGlobalDictionaryRDD.scala:364) at org.apache.carbondata.spark.rdd.CarbonGlobalDictionaryGenerateRDD.compute(CarbonGlobalDictionaryRDD.scala:302) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-600) Should reuse unit test case for integration module
QiangCai created CARBONDATA-600: --- Summary: Should reuse unit test case for integration module Key: CARBONDATA-600 URL: https://issues.apache.org/jira/browse/CARBONDATA-600 Project: CarbonData Issue Type: Bug Components: spark-integration Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Priority: Minor Fix For: 1.0.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-601) Should reuse unit test case for integration module
QiangCai created CARBONDATA-601: --- Summary: Should reuse unit test case for integration module Key: CARBONDATA-601 URL: https://issues.apache.org/jira/browse/CARBONDATA-601 Project: CarbonData Issue Type: Test Components: spark-integration Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Priority: Minor Fix For: 1.0.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: carbon thrift server for spark 2.0 showing unusual behaviour
column name is "int"? type is "String"? better to try another column name. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbon-thrift-server-for-spark-2-0-showing-unusual-behaviour-tp5384p5454.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: carbontable compact throw err
You can check as following and show the result. 1) select * from test limit 1 2) show segments for table test limit 1000 3) alter table test compact 'major' Better to provide more log info. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbontable-compact-throw-err-tp5382p5426.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: Dictionary file is locked for Updation, unable to Load
I think you can have a look this maillist. http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-td5076.html Have a look the following guide and pay attention to carbon.properties file. https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide For spark yarn cluster mode, 1. both driver side and executor side need same carbon.properties file. 2. set carbon.lock.type=HDFSLOCK 3. set carbon.properties.filepath spark.executor.extraJavaOptions -Dcarbon.properties.filepath= spark.driver.extraJavaOptions -Dcarbon.properties.filepath= -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-Updation-unable-to-Load-tp5359p5422.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: why there is a table name option in carbon source format?
For Spark 2, when using SparkSession to create carbon table, need tableName option to create carbon schema in store location folder. Better to use CarbonSession to create carbon table now. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/why-there-is-a-table-name-option-in-carbon-source-format-tp5385p5420.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: CatalystAnalysy
You can try -Dscala.version= -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/CatalystAnalysy-tp5129p5141.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: 回复: Dictionary file is locked for updation
please correct the path of carbon.properties file. spark.executor.extraJavaOptions -Dcarbon.properties.filepath=carbon.properties -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-tp5076p5092.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: 回复: Dictionary file is locked for updation
Please try to add carbon.storelocation to carbon.properties file. e.g. carbon.storelocation=hdfs://master:9000/carbondata/store You can have a look the following guide and pay attention to carbon.properties file. https://cwiki.apache.org/confluence/display/CARBONDATA/Cluster+deployment+guide -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Dictionary-file-is-locked-for-updation-tp5076p5090.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[Discussion]Simplify the deployment of carbondata
hi all, I suggest to simplify deployment of CarbonData as following. 1. remove kettle dependency completely, no need to deploy "carbonplugins" folder on each node, no need to set "carbhon.kettle.home" 2. remove carbon.properties file from executor side, pass CarbonData configuration to executor side from driver side 3. use "spark.sql.warehouse.dir"(spark2) or "hive.metastore.warehouse.dir"(spark1) instead of "carbon.storelocation" So we will just need to deploy CarbonData jars on cluster mode in the future. What's your opinion? Best Regards David Cai -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Discussion-Simplify-the-deployment-of-carbondata-tp5000.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: same query and I change the value than throw a error
Please provide executor side log -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/same-query-and-I-change-the-value-than-throw-a-error-tp4811p4893.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: 回复: etl.DataLoadingException: The input file does not exist
Please find the following item in carbon.properties file, give a proper path(hdfs://master:9000/) carbon.ddl.base.hdfs.url During loading, will combine this url and data file path. BTW, better to provide the version number. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/etl-DataLoadingException-The-input-file-does-not-exist-tp4853p4888.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: carbondata-0.2 load data failed in yarn molde
I think the root cause is metadata lock type. Please add "carbon.lock.type" configuration to carbon.properties as following. #Local mode carbon.lock.type=LOCALLOCK #Cluster mode carbon.lock.type=HDFSLOCK -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/carbondata-0-2-load-data-failed-in-yarn-molde-tp3908p4887.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-540) Support inserInto without kettle for spark2
QiangCai created CARBONDATA-540: --- Summary: Support inserInto without kettle for spark2 Key: CARBONDATA-540 URL: https://issues.apache.org/jira/browse/CARBONDATA-540 Project: CarbonData Issue Type: Improvement Components: data-load Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Fix For: 1.0.0-incubating Support inserInto without kettle for spark2 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [DISCUSSION] CarbonData loading solution discussion
+1We should flexibility choose loading solution according to Scenario 1 and 2, and will get performance benefits. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/DISCUSSION-CarbonData-loading-solution-discussion-tp4490p4520.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-535) carbondata should support datatype: Date and Char
QiangCai created CARBONDATA-535: --- Summary: carbondata should support datatype: Date and Char Key: CARBONDATA-535 URL: https://issues.apache.org/jira/browse/CARBONDATA-535 Project: CarbonData Issue Type: Improvement Components: file-format Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Fix For: 1.0.0-incubating carbondata should support datatype: Date and Char -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-497) [Spark2] fix datatype issue of CarbonLateDecoderRule
QiangCai created CARBONDATA-497: --- Summary: [Spark2] fix datatype issue of CarbonLateDecoderRule Key: CARBONDATA-497 URL: https://issues.apache.org/jira/browse/CARBONDATA-497 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Fix For: 1.0.0-incubating In spark2, LogicalPlan resolve method need to check input data type. If the data type is wrong, the logical plan will be unresolved. CarbonLateDecoderRule should correct the datatype of dictionary dimension to resolve the logical plan. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-493) Insertinto sql can not select from a empty table
QiangCai created CARBONDATA-493: --- Summary: Insertinto sql can not select from a empty table Key: CARBONDATA-493 URL: https://issues.apache.org/jira/browse/CARBONDATA-493 Project: CarbonData Issue Type: Bug Affects Versions: 1.0.0-incubating Reporter: QiangCai Assignee: QiangCai Fix For: 1.0.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-488) add InsertInto feature for spark2
QiangCai created CARBONDATA-488: --- Summary: add InsertInto feature for spark2 Key: CARBONDATA-488 URL: https://issues.apache.org/jira/browse/CARBONDATA-488 Project: CarbonData Issue Type: New Feature Components: data-load Affects Versions: 0.3.0-incubating Reporter: QiangCai Assignee: QiangCai Fix For: 0.3.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-486) Rreading dataframe concurrently will lead to wrong data
QiangCai created CARBONDATA-486: --- Summary: Rreading dataframe concurrently will lead to wrong data Key: CARBONDATA-486 URL: https://issues.apache.org/jira/browse/CARBONDATA-486 Project: CarbonData Issue Type: Bug Components: data-load Affects Versions: 0.3.0-incubating Reporter: QiangCai Assignee: QiangCai Fix For: 0.3.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (CARBONDATA-481) [SPARK2]fix late decoder and support whole stage code gen
QiangCai created CARBONDATA-481: --- Summary: [SPARK2]fix late decoder and support whole stage code gen Key: CARBONDATA-481 URL: https://issues.apache.org/jira/browse/CARBONDATA-481 Project: CarbonData Issue Type: Bug Components: data-query Affects Versions: 0.2.0-incubating Reporter: QiangCai Fix For: 0.3.0-incubating -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: [Feature Proposal] Spark 2 integration with CarbonData
+1 I think I can finish some tasks. please assign some tasks to me. -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/Feature-Proposal-Spark-2-integration-with-CarbonData-tp3236p3320.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: [VOTE] Apache CarbonData 0.2.0-incubating release
+1 -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/VOTE-Apache-CarbonData-0-2-0-incubating-release-tp2823p2836.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
Re: As planed, we are ready to make Apache CarbonData 0.2.0 release:
I look forward to release this version. Carbondata improved query and load performance. And it is a good news no need to install thrift for building project. Btw, How many PR merged into this version? -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/As-planed-we-are-ready-to-make-Apache-CarbonData-0-2-0-release-tp2738p2752.html Sent from the Apache CarbonData Mailing List archive mailing list archive at Nabble.com.
[jira] [Created] (CARBONDATA-368) Should improve performance of DataFrame loading
QiangCai created CARBONDATA-368: --- Summary: Should improve performance of DataFrame loading Key: CARBONDATA-368 URL: https://issues.apache.org/jira/browse/CARBONDATA-368 Project: CarbonData Issue Type: Improvement Components: data-load Affects Versions: 0.3.0-incubating Reporter: QiangCai Assignee: QiangCai -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[GitHub] incubator-carbondata pull request #278: [CARBONDATA-85][WIP] support insert ...
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/278 [CARBONDATA-85][WIP] support insert into carbon table select from table **1.Support insert into carbon table select from table** **2.Imporve performance of dataframe loading** You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata loaddataframe Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/278.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #278 commit 217947dc6f167f8ae490d28254eed4785eea73d3 Author: QiangCai <david.c...@gmail.com> Date: 2016-10-24T02:54:20Z DataLoadCoalescedRDD DataLoadPartitionCoalescer concurrently read dataframe commit 39d517179184c8412a488e44b5b914412ec24451 Author: QiangCai <qiang...@qq.com> Date: 2016-11-01T09:39:57Z add test case --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058166 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java --- @@ -22,28 +22,44 @@ import java.io.DataOutput; import java.io.IOException; import java.io.Serializable; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.Distributable; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.path.CarbonTablePath; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapreduce.lib.input.FileSplit; + /** * Carbon input split to allow distributed read of CarbonInputFormat. */ -public class CarbonInputSplit extends FileSplit implements Serializable, Writable { +public class CarbonInputSplit extends FileSplit implements Distributable, Serializable, Writable { private static final long serialVersionUID = 3520344046772190207L; private String segmentId; - /** + public String taskId = "0"; + + /* * Number of BlockLets in a block */ private int numberOfBlocklets = 0; - public CarbonInputSplit() { -super(null, 0, 0, new String[0]); + public CarbonInputSplit() { } - public CarbonInputSplit(String segmentId, Path path, long start, long length, + private void parserPath(Path path) { --- End diff -- please use CarbonTablePath.DataFileUtil.getTaskNo --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #262: [CARBONDATA-308] Use CarbonInputForm...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/262#discussion_r86058188 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/CarbonInputSplit.java --- @@ -22,28 +22,44 @@ import java.io.DataOutput; import java.io.IOException; import java.io.Serializable; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.Distributable; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.path.CarbonTablePath; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.Writable; import org.apache.hadoop.mapreduce.lib.input.FileSplit; + /** * Carbon input split to allow distributed read of CarbonInputFormat. */ -public class CarbonInputSplit extends FileSplit implements Serializable, Writable { +public class CarbonInputSplit extends FileSplit implements Distributable, Serializable, Writable { private static final long serialVersionUID = 3520344046772190207L; private String segmentId; - /** + public String taskId = "0"; + + /* * Number of BlockLets in a block */ private int numberOfBlocklets = 0; - public CarbonInputSplit() { -super(null, 0, 0, new String[0]); + public CarbonInputSplit() { } - public CarbonInputSplit(String segmentId, Path path, long start, long length, + private void parserPath(Path path) { +String[] nameParts = path.getName().split("-"); +if (nameParts != null && nameParts.length >= 3) { + this.taskId = nameParts[2]; +} + } + + private CarbonInputSplit(String segmentId, Path path, long start, long length, --- End diff -- please initialize taskId --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85632892 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -470,6 +472,34 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K break; } } +HashMap<String, String> dateformatsHashMap = new HashMap<String, String>(); +if (meta.dateFormat != null) { + String[] dateformats = meta.dateFormat.split(CarbonCommonConstants.COMMA); + for (String dateFormat:dateformats) { +String[] dateFormatSplits = dateFormat.split(":", 2); + dateformatsHashMap.put(dateFormatSplits[0].toLowerCase().trim(), +dateFormatSplits[1].trim()); + } +} +String[] DimensionColumnIds = meta.getDimensionColumnIds(); +directDictionaryGenerators = +new DirectDictionaryGenerator[DimensionColumnIds.length]; +for (int i = 0; i < DimensionColumnIds.length; i++) { + ColumnSchemaDetails columnSchemaDetails = columnSchemaDetailsWrapper.get( + DimensionColumnIds[i]); + if (columnSchemaDetails.isDirectDictionary()) { +String columnName = columnSchemaDetails.getColumnName(); +DataType columnType = columnSchemaDetails.getColumnType(); +if (dateformatsHashMap.containsKey(columnName)) { --- End diff -- better to use "get" method, just look up map once. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85463723 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/Segment.java --- @@ -0,0 +1,94 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.List; + +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.hadoop.fs.FileStatus; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +/** + * Within a carbon table, each data load becomes one Segment, which stores all data files belong to this load in + * the segment folder. + */ +public abstract class Segment { + + protected String id; + + /** + * Path of the segment folder + */ + private String path; + + public Segment(String id, String path) { +this.id = id; +this.path = path; + } + + public String getId() { +return id; + } + + public String getPath() { +return path; + } + + /** + * return all InputSplit of this segment, each file is a InputSplit + * @param job job context + * @return all InputSplit + * @throws IOException + */ + public List getAllSplits(JobContext job) throws IOException { --- End diff -- I suggest to return List --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284] Abstracting index a...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85464310 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/segment/impl/IndexedSegment.java --- @@ -0,0 +1,73 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.segment.impl; + +import java.io.IOException; +import java.util.LinkedList; +import java.util.List; + +import org.apache.carbondata.hadoop.CarbonInputSplit; --- End diff -- please use internal.CarbonInputSplit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85461092 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle( Seq.empty } + private def validateDateFormat(dateFormat: String, dateDimensionsName: ArrayBuffer[String]): + Unit = { +if (dateFormat == "") { + throw new MalformedCarbonCommandException("Error: Option DateFormat is set an empty string.") +} else { + var dateFormats: Array[String] = dateFormat.split(",") --- End diff -- CarbonCommonConstant.COMMA --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460088 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1244,6 +1260,25 @@ case class LoadTableUsingKettle( Seq.empty } + private def validateDateFormat(dateFormat: String, dateDimensionsName: ArrayBuffer[String]): + Unit = { +if (dateFormat == "") { + throw new MalformedCarbonCommandException("Error: Option DateFormat is set an empty string.") +} else { + var dateFormats: Array[String] = dateFormat.split(",") + for (singleDateFormat <- dateFormats) { +var dateFormatSplits: Array[String] = singleDateFormat.split(":", 2) +if (!dateDimensionsName.contains(dateFormatSplits(0))) { --- End diff -- take care case-insensitive --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459286 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle( val allDictionaryPath = options.getOrElse("all_dictionary_path", "") val complex_delimiter_level_1 = options.getOrElse("complex_delimiter_level_1", "\\$") val complex_delimiter_level_2 = options.getOrElse("complex_delimiter_level_2", "\\:") + val timeFormat = options.getOrElse("timeformat", null) --- End diff -- "timeFormat" is useless --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85460589 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -343,7 +345,8 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K } data.setGenerator( - KeyGeneratorFactory.getKeyGenerator(getUpdatedLens(meta.dimLens, meta.dimPresent))); + KeyGeneratorFactory.getKeyGenerator( + getUpdatedLens(meta.dimLens, meta.dimPresent))); --- End diff -- keep code style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85459810 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1143,6 +1141,21 @@ case class LoadTableUsingKettle( val allDictionaryPath = options.getOrElse("all_dictionary_path", "") val complex_delimiter_level_1 = options.getOrElse("complex_delimiter_level_1", "\\$") val complex_delimiter_level_2 = options.getOrElse("complex_delimiter_level_2", "\\:") + val timeFormat = options.getOrElse("timeformat", null) + val dateFormat = options.getOrElse("dateformat", null) + val tableDimensions: util.List[CarbonDimension] = table.getDimensionByTableName(tableName) + val dateDimensionsName = new ArrayBuffer[String] + tableDimensions.toArray.foreach { +dimension => { + val columnSchema: ColumnSchema = dimension.asInstanceOf[CarbonDimension].getColumnSchema + if (columnSchema.getDataType.name == "TIMESTAMP") { +dateDimensionsName += columnSchema.getColumnName + } +} + } + if (dateFormat != null) { +validateDateFormat(dateFormat, dateDimensionsName) + } --- End diff -- please move these code into method validateDateFormat --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284][WIP] Abstracting in...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061184 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java --- @@ -0,0 +1,220 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.memory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.core.util.CarbonTimeStatisticsFactory; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { + + private static final Log LOG = LogFactory.getLog(InMemoryBTreeIndex.class); + private Segment segment; + + InMemoryBTreeIndex(Segment segment) { +this.segment = segment; + } + + @Override + public String getName() { +return null; + } + + @Override + public List filter(JobContext job, FilterResolverIntf filter) + throws IOException { + +List result = new LinkedList(); + +FilterExpressionProcessor filterExpressionProcessor = new FilterExpressionProcessor(); + +AbsoluteTableIdentifier absoluteTableIdentifier = null; + //CarbonInputFormatUtil.getAbsoluteTableIdentifier(job.getConfiguration()); + +//for this segment fetch blocks matching filter in BTree +List dataRefNodes = null; +try { + dataRefNodes = getDataBlocksOfSegment(job, filterExpressionProcessor, absoluteTableIdentifier, + filter, segment.getId()); +} catch (IndexBuilderException e) { + throw new IOException(e.getMessage()); +} +for (DataRefNode dataRefNode : dataRefNodes) { + BlockBTreeLeafNode leafNode = (BlockBTreeLeafNode) dataRefNode; + TableBlockInfo tableBlockInfo = leafNode.getTableBlockInfo(); + result.add(
[GitHub] incubator-carbondata pull request #208: [CARBONDATA-284][WIP] Abstracting in...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/208#discussion_r85061025 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/internal/index/memory/InMemoryBTreeIndex.java --- @@ -0,0 +1,214 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.carbondata.hadoop.internal.index.memory; + +import java.io.IOException; +import java.util.ArrayList; +import java.util.HashMap; +import java.util.LinkedList; +import java.util.List; +import java.util.Map; + +import org.apache.carbondata.core.carbon.AbsoluteTableIdentifier; +import org.apache.carbondata.core.carbon.datastore.DataRefNode; +import org.apache.carbondata.core.carbon.datastore.DataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.IndexKey; +import org.apache.carbondata.core.carbon.datastore.SegmentTaskIndexStore; +import org.apache.carbondata.core.carbon.datastore.block.AbstractIndex; +import org.apache.carbondata.core.carbon.datastore.block.BlockletInfos; +import org.apache.carbondata.core.carbon.datastore.block.SegmentProperties; +import org.apache.carbondata.core.carbon.datastore.block.TableBlockInfo; +import org.apache.carbondata.core.carbon.datastore.exception.IndexBuilderException; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BTreeDataRefNodeFinder; +import org.apache.carbondata.core.carbon.datastore.impl.btree.BlockBTreeLeafNode; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatistic; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsConstants; +import org.apache.carbondata.core.carbon.querystatistics.QueryStatisticsRecorder; +import org.apache.carbondata.core.keygenerator.KeyGenException; +import org.apache.carbondata.hadoop.CarbonInputSplit; +import org.apache.carbondata.hadoop.internal.index.Index; +import org.apache.carbondata.hadoop.internal.segment.Segment; +import org.apache.carbondata.hadoop.util.CarbonInputFormatUtil; +import org.apache.carbondata.scan.executor.exception.QueryExecutionException; +import org.apache.carbondata.scan.filter.FilterExpressionProcessor; +import org.apache.carbondata.scan.filter.FilterUtil; +import org.apache.carbondata.scan.filter.resolver.FilterResolverIntf; +import org.apache.commons.logging.Log; +import org.apache.commons.logging.LogFactory; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.JobContext; + +class InMemoryBTreeIndex implements Index { --- End diff -- I understand InMemoryBTreeIndex is segment level's index. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85040305 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java --- @@ -111,7 +110,7 @@ /** * timeFormat --- End diff -- please correct comment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85039192 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -470,6 +474,36 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K break; } } +HashMap<String, String> dateformatsHashMap = new HashMap<String, String>(); +if (meta.dateFormat != null) { + String[] dateformats = meta.dateFormat.split(","); + for (String dateFormat:dateformats) { +String[] dateFormatSplits = dateFormat.split(":", 2); + dateformatsHashMap.put(dateFormatSplits[0],dateFormatSplits[1]); +// TODO verify the dateFormatSplits is valid or not + } +} +directDictionaryGenerators = +new DirectDictionaryGenerator[meta.getDimensionColumnIds().length]; +for (int i = 0; i < meta.getDimensionColumnIds().length; i++) { --- End diff -- not good to invoke getDimensionColumnIds many times --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85039860 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java --- @@ -39,37 +39,32 @@ */ public class TimeStampDirectDictionaryGenerator implements DirectDictionaryGenerator { - private TimeStampDirectDictionaryGenerator() { + private ThreadLocal threadLocal = new ThreadLocal<>(); - } - - public static TimeStampDirectDictionaryGenerator instance = - new TimeStampDirectDictionaryGenerator(); + private String dateFormat; /** * The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis. */ - public static final long granularityFactor; + public long granularityFactor; /** * The date timestamp to be considered as start date for calculating the timestamp * java counts the number of milliseconds from start of "January 1, 1970", this property is * customized the start of position. for example "January 1, 2000" */ - public static final long cutOffTimeStamp; + public long cutOffTimeStamp; /** * Logger instance */ + private static final LogService LOGGER = - LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName()); + LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName()); - /** - * initialization block for granularityFactor and cutOffTimeStamp - */ - static { + public TimeStampDirectDictionaryGenerator(String dateFormat) { --- End diff -- please keep default dateformat TimeStampDirectDictionaryGenerator() construct method, If DataLoading command didn't provide dateformat option for some column, we can use none-parameter construct method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85039488 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@ -470,6 +474,36 @@ public boolean processRow(StepMetaInterface smi, StepDataInterface sdi) throws K break; } } +HashMap<String, String> dateformatsHashMap = new HashMap<String, String>(); +if (meta.dateFormat != null) { + String[] dateformats = meta.dateFormat.split(","); + for (String dateFormat:dateformats) { +String[] dateFormatSplits = dateFormat.split(":", 2); + dateformatsHashMap.put(dateFormatSplits[0],dateFormatSplits[1]); +// TODO verify the dateFormatSplits is valid or not + } +} +directDictionaryGenerators = +new DirectDictionaryGenerator[meta.getDimensionColumnIds().length]; +for (int i = 0; i < meta.getDimensionColumnIds().length; i++) { + ColumnSchemaDetails columnSchemaDetails = columnSchemaDetailsWrapper.get( + meta.getDimensionColumnIds()[i]); + if (columnSchemaDetails.isDirectDictionary()) { +if (dateformatsHashMap.containsKey(columnSchemaDetails.getColumnName())) { + directDictionaryGenerators[i] = + DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator( + columnSchemaDetails.getColumnType(), + dateformatsHashMap.get(columnSchemaDetails.getColumnName())); +} else { + String dateFormat = CarbonProperties.getInstance() + .getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, + CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT); + directDictionaryGenerators[i] = + DirectDictionaryKeyGeneratorFactory.getDirectDictionaryGenerator( + columnSchemaDetails.getColumnType(), dateFormat); --- End diff -- 1. move out CarbonProperties.getInstance().getProperty from for loop 2. for defaut dataformat, use method getDirectDictionaryGenerator(DataType dataType) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85038170 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1129,6 +1130,9 @@ case class LoadTable( carbonLoadModel.setEscapeChar(escapeChar) carbonLoadModel.setQuoteChar(quoteChar) carbonLoadModel.setCommentChar(commentchar) + carbonLoadModel.setDateFormat(dateFormat) --- End diff -- It is necessary to validate input "dateFormat" before dataloading --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85038977 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenMeta.java --- @@ -651,6 +654,7 @@ public void setDefault() { columnSchemaDetails = ""; columnsDataTypeString=""; tableOption = ""; +dateFormat = CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT; --- End diff -- Here should be empty string --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85040709 --- Diff: processing/src/test/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGeneratorTest.java --- @@ -37,7 +37,7 @@ private int surrogateKey = -1; @Before public void setUp() throws Exception { -TimeStampDirectDictionaryGenerator generator = TimeStampDirectDictionaryGenerator.instance; +TimeStampDirectDictionaryGenerator generator = new TimeStampDirectDictionaryGenerator(CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT); --- End diff -- Should use carbon property to create generator, not default value. please correct all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036554 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java --- @@ -39,37 +39,32 @@ */ public class TimeStampDirectDictionaryGenerator implements DirectDictionaryGenerator { - private TimeStampDirectDictionaryGenerator() { + private ThreadLocal threadLocal = new ThreadLocal<>(); - } - - public static TimeStampDirectDictionaryGenerator instance = - new TimeStampDirectDictionaryGenerator(); + private String dateFormat; /** * The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis. */ - public static final long granularityFactor; + public long granularityFactor; /** * The date timestamp to be considered as start date for calculating the timestamp * java counts the number of milliseconds from start of "January 1, 1970", this property is * customized the start of position. for example "January 1, 2000" */ - public static final long cutOffTimeStamp; + public long cutOffTimeStamp; /** * Logger instance */ + private static final LogService LOGGER = - LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName()); + LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName()); --- End diff -- please correct all code style wrap line indentation length is 4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036811 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java --- @@ -92,23 +87,24 @@ private TimeStampDirectDictionaryGenerator() { cutOffTimeStampLocal = -1; } else { try { -SimpleDateFormat timeParser = new SimpleDateFormat(CarbonProperties.getInstance() -.getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, -CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)); +SimpleDateFormat timeParser = new SimpleDateFormat( +CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT); --- End diff -- why just use default value? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85038702 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java --- @@ -117,9 +113,11 @@ private TimeStampDirectDictionaryGenerator() { * @return dictionary value */ @Override public int generateDirectSurrogateKey(String memberStr) { -SimpleDateFormat timeParser = new SimpleDateFormat(CarbonProperties.getInstance() -.getProperty(CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT, -CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT)); +SimpleDateFormat timeParser = threadLocal.get(); +if(timeParser == null){ + timeParser = new SimpleDateFormat(dateFormat); + threadLocal.set(timeParser); +} timeParser.setLenient(false); --- End diff -- Please extract above codes to a new initial method, and invoke this method in different thread. It it not good to run these codes in generateDirectSurrogateKey method. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85035902 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/DirectDictionaryKeyGeneratorFactory.java --- @@ -39,14 +40,26 @@ private DirectDictionaryKeyGeneratorFactory() { * @param dataType DataType * @return the generator instance */ - public static DirectDictionaryGenerator getDirectDictionaryGenerator(DataType dataType) { + public static DirectDictionaryGenerator getDirectDictionaryGenerator(DataType dataType, + String dateFormat) { DirectDictionaryGenerator directDictionaryGenerator = null; switch (dataType) { case TIMESTAMP: -directDictionaryGenerator = TimeStampDirectDictionaryGenerator.instance; +directDictionaryGenerator = new TimeStampDirectDictionaryGenerator(dateFormat); break; default: +} +return directDictionaryGenerator; + } + public static DirectDictionaryGenerator getDirectDictionaryGenerator(DataType dataType) { +DirectDictionaryGenerator directDictionaryGenerator = null; +switch (dataType) { + case TIMESTAMP: +directDictionaryGenerator = new TimeStampDirectDictionaryGenerator( +CarbonCommonConstants.CARBON_TIMESTAMP_DEFAULT_FORMAT); --- End diff -- here need to use CarbonProperty CarbonCommonConstants.CARBON_TIMESTAMP_FORMAT --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036534 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/DirectDictionaryKeyGeneratorFactory.java --- @@ -39,14 +40,26 @@ private DirectDictionaryKeyGeneratorFactory() { * @param dataType DataType * @return the generator instance */ - public static DirectDictionaryGenerator getDirectDictionaryGenerator(DataType dataType) { + public static DirectDictionaryGenerator getDirectDictionaryGenerator(DataType dataType, + String dateFormat) { --- End diff -- please keep java code style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #219: [CARBONDATA-37]Support different tim...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/219#discussion_r85036431 --- Diff: core/src/main/java/org/apache/carbondata/core/keygenerator/directdictionary/timestamp/TimeStampDirectDictionaryGenerator.java --- @@ -39,37 +39,32 @@ */ public class TimeStampDirectDictionaryGenerator implements DirectDictionaryGenerator { - private TimeStampDirectDictionaryGenerator() { + private ThreadLocal threadLocal = new ThreadLocal<>(); - } - - public static TimeStampDirectDictionaryGenerator instance = - new TimeStampDirectDictionaryGenerator(); + private String dateFormat; /** * The value of 1 unit of the SECOND, MINUTE, HOUR, or DAY in millis. */ - public static final long granularityFactor; + public long granularityFactor; /** * The date timestamp to be considered as start date for calculating the timestamp * java counts the number of milliseconds from start of "January 1, 1970", this property is * customized the start of position. for example "January 1, 2000" */ - public static final long cutOffTimeStamp; + public long cutOffTimeStamp; /** * Logger instance */ + private static final LogService LOGGER = - LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName()); + LogServiceFactory.getLogService(TimeStampDirectDictionaryGenerator.class.getName()); --- End diff -- please correct code style --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #127: [CARBONDATA-213] Remove dependency: ...
Github user QiangCai closed the pull request at: https://github.com/apache/incubator-carbondata/pull/127 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r84068389 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/csv/CSVInputFormatTest.java --- @@ -0,0 +1,171 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.csv; + +import java.io.File; +import java.io.FileInputStream; +import java.io.FileOutputStream; +import java.io.IOException; + +import org.apache.carbondata.hadoop.io.StringArrayWritable; + +import junit.framework.TestCase; +import org.junit.Assert; +import org.junit.Test; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.compress.BZip2Codec; +import org.apache.hadoop.io.compress.CompressionOutputStream; +import org.apache.hadoop.io.compress.GzipCodec; +import org.apache.hadoop.io.compress.Lz4Codec; +import org.apache.hadoop.io.compress.SnappyCodec; +import org.apache.hadoop.mapreduce.Job; +import org.apache.hadoop.mapreduce.Mapper; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; + +public class CSVInputFormatTest extends TestCase { + + /** + * generate compressed files, no need to call this method. + * @throws Exception + */ + public void testGenerateCompressFiles() throws Exception { +String pwd = new File("src/test/resources").getCanonicalPath(); +String inputFile = pwd + "/data.csv"; +FileInputStream input = new FileInputStream(inputFile); +Configuration conf = new Configuration(); + +// .gz +String outputFile = pwd + "/data.csv.gz"; +FileOutputStream output = new FileOutputStream(outputFile); +GzipCodec gzip = new GzipCodec(); +gzip.setConf(conf); +CompressionOutputStream outputStream = gzip.createOutputStream(output); +int i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +// .bz2 +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.bz2"; +output = new FileOutputStream(outputFile); +BZip2Codec bzip2 = new BZip2Codec(); +bzip2.setConf(conf); +outputStream = bzip2.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +// .snappy +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.snappy"; +output = new FileOutputStream(outputFile); +SnappyCodec snappy = new SnappyCodec(); +snappy.setConf(conf); +outputStream = snappy.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + +//.lz4 +input = new FileInputStream(inputFile); +outputFile = pwd + "/data.csv.lz4"; +output = new FileOutputStream(outputFile); +Lz4Codec lz4 = new Lz4Codec(); +lz4.setConf(conf); +outputStream = lz4.createOutputStream(output); +i = -1; +while ((i = input.read()) != -1) { + outputStream.write(i); +} +outputStream.close(); +input.close(); + + } + + /** + * CSVCheckMapper check the content of csv files. + */ + public static class CSVCheckMapper extends Mapper<NullWritable, StringArrayWritable, NullWritable, + NullWritable> { +@Override +protected void map(NullWritable key, StringArrayWritable value, Context context) +throws IOException, InterruptedException {
[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83387366 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; + +import java.io.IOException; +import java.io.InputStream; +import java.io.InputStreamReader; +import java.io.Reader; + +import org.apache.carbondata.hadoop.io.BoundedInputStream; +import org.apache.carbondata.hadoop.io.StringArrayWritable; +import org.apache.carbondata.hadoop.util.CSVInputFormatUtil; + +import com.univocity.parsers.csv.CsvParser; +import org.apache.hadoop.conf.Configuration; +import org.apache.hadoop.fs.FSDataInputStream; +import org.apache.hadoop.fs.FileSystem; +import org.apache.hadoop.fs.Path; +import org.apache.hadoop.fs.Seekable; +import org.apache.hadoop.io.NullWritable; +import org.apache.hadoop.io.Text; +import org.apache.hadoop.io.compress.CodecPool; +import org.apache.hadoop.io.compress.CompressionCodec; +import org.apache.hadoop.io.compress.CompressionCodecFactory; +import org.apache.hadoop.io.compress.CompressionInputStream; +import org.apache.hadoop.io.compress.Decompressor; +import org.apache.hadoop.io.compress.SplitCompressionInputStream; +import org.apache.hadoop.io.compress.SplittableCompressionCodec; +import org.apache.hadoop.mapreduce.InputSplit; +import org.apache.hadoop.mapreduce.RecordReader; +import org.apache.hadoop.mapreduce.TaskAttemptContext; +import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; +import org.apache.hadoop.mapreduce.lib.input.FileSplit; +import org.apache.hadoop.util.LineReader; + +/** + * An {@link org.apache.hadoop.mapreduce.InputFormat} for csv files. Files are broken into lines. + * Values are the line of csv files. + */ +public class CSVInputFormat extends FileInputFormat<NullWritable, StringArrayWritable> { + + @Override + public RecordReader<NullWritable, StringArrayWritable> createRecordReader(InputSplit inputSplit, + TaskAttemptContext context) throws IOException, InterruptedException { +return new NewCSVRecordReader(); + } + + /** + * Treats value as line in file. Key is null. + */ + public static class NewCSVRecordReader extends RecordReader<NullWritable, StringArrayWritable> { + +private long start; +private long end; +private BoundedInputStream boundedInputStream; +private Reader reader; +private CsvParser csvParser; +private StringArrayWritable value; +private String[] columns; +private Seekable filePosition; +private boolean isCompressedInput; +private Decompressor decompressor; + +@Override +public void initialize(InputSplit inputSplit, TaskAttemptContext context) +throws IOException, InterruptedException { + FileSplit split = (FileSplit) inputSplit; + this.start = split.getStart(); --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386474 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.mapreduce; --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386400 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/io/StringArrayWritable.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ +package org.apache.carbondata.hadoop.io; + +import java.io.DataInput; +import java.io.DataOutput; +import java.io.IOException; +import java.nio.charset.Charset; +import java.util.Arrays; + +import org.apache.hadoop.io.Writable; + +/** + * A String sequence that is usable as a key or value. + */ +public class StringArrayWritable implements Writable { + private String[] values; + + public String[] toStrings() { +return values; + } + + public void set(String[] values) { +this.values = values; + } + + public String[] get() { +return values; + } + + @Override public void readFields(DataInput in) throws IOException { --- End diff -- fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #127: [CARBONDATA-213] Remove dependency: ...
GitHub user QiangCai reopened a pull request: https://github.com/apache/incubator-carbondata/pull/127 [CARBONDATA-213] Remove dependency: thrift complier [CARBONDATA-213] Remove dependency: thrift complier **analysis** I think it unnecessary for user/developer to download thrift complier When building CarbonData project. **solution** Provide the java code, generated by thrift complier. You can merge this pull request into a Git repository by running: $ git pull https://github.com/QiangCai/incubator-carbondata fixthrifterror Alternatively you can review and apply these changes as the patch at: https://github.com/apache/incubator-carbondata/pull/127.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #127 commit ff895c5276569bef358ec02356400210014911de Author: QiangCai <qiang...@qq.com> Date: 2016-10-13T08:44:22Z add format java module --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] incubator-carbondata pull request #132: [CARBONDATA-218]Remove dependency: s...
Github user QiangCai closed the pull request at: https://github.com/apache/incubator-carbondata/pull/132 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---