[GitHub] incubator-carbondata pull request #236: [CARBONDATA-Pending] Add dictionary ...

2016-10-13 Thread jackylk
GitHub user jackylk opened a pull request: https://github.com/apache/incubator-carbondata/pull/236 [CARBONDATA-Pending] Add dictionary interface for new data load flow In this PR, following is added: - a BiDictionary interface which can retrieve key by value, or retrieve value

[GitHub] incubator-carbondata pull request #237: [CARBONDATA-317] - CSV having only s...

2016-10-13 Thread mohammadshahidkhan
GitHub user mohammadshahidkhan opened a pull request: https://github.com/apache/incubator-carbondata/pull/237 [CARBONDATA-317] - CSV having only space char is throwing NullPointer… Problem: Data loading fails if csv is having only empty chars Analysis: During data load,

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Jihong Ma
the question is what would be the default implementation? Load data without dictionary? My thought is we can provide a tool to generate global dictionary using sample data set, so the initial global dictionaries is available before normal data loading. We shall be able to perform

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/212#discussion_r83336930 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala --- @@ -55,18 +55,11 @@ class CarbonSource

[GitHub] incubator-carbondata pull request #236: [CARBONDATA-299] Add dictionary inte...

2016-10-13 Thread ravipesala
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/236#discussion_r83359504 --- Diff: core/src/main/java/org/apache/carbondata/core/devapi/BiDictionary.java --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] incubator-carbondata pull request #236: [CARBONDATA-299] Add dictionary inte...

2016-10-13 Thread ravipesala
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/236#discussion_r83359748 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/dictionary/InMemBiDictionary.java --- @@ -0,0 +1,85 @@ +/*

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83355953 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapred/CSVInputFormat.java --- @@ -0,0 +1,193 @@ +/* + * Licensed to the

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83359938 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/util/CSVInputFormatUtil.java --- @@ -0,0 +1,57 @@ +/* + * Licensed to the

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83360131 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83359593 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83360081 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83355842 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/io/StringArrayWritable.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83359637 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83359457 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to

[GitHub] incubator-carbondata pull request #236: [CARBONDATA-299] Add dictionary inte...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/236#discussion_r83360359 --- Diff: core/src/main/java/org/apache/carbondata/core/devapi/BiDictionary.java --- @@ -0,0 +1,53 @@ +/* + * Licensed to the Apache

[GitHub] incubator-carbondata pull request #236: [CARBONDATA-299] Add dictionary inte...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/236#discussion_r83360819 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/dictionary/InMemBiDictionary.java --- @@ -0,0 +1,85 @@ +/*

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83354231 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java --- @@ -73,15 +72,15 @@ public

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83354325 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java --- @@ -55,14 +54,14 @@ public

[GitHub] incubator-carbondata pull request #238: [WIP] Correct Some Spelling Mistakes

2016-10-13 Thread lion-x
GitHub user lion-x opened a pull request: https://github.com/apache/incubator-carbondata/pull/238 [WIP] Correct Some Spelling Mistakes # Why raise this PR? Correct some simple spelling mistakes. You can merge this pull request into a Git repository by running: $ git pull

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-13 Thread Jay357089
Github user Jay357089 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83354112 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,15 @@

[GitHub] incubator-carbondata pull request #223: [CARBONDATA-292] add infomation for ...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/223#discussion_r83355181 --- Diff: docs/DML-Operations-on-Carbon.md --- @@ -91,6 +91,12 @@ Following are the options that can be used in load data: ```ruby

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Liang Chen
Hi jihong I am not sure that users can accept to use extra tool to do this work, because provide tool or do scan at first time per table for most of global dict are same cost from users perspective, and maintain the dict file also be same cost, they always expecting that system can automatically

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Aniket Adnaik
I have following comments; 1. If external dictionary is provided, we accept it. This interface should be generic enough, so that we can perform lookup, add, delete, create and drop functionality. I believe we already have this functionality to some extent. As long as we are able to maintain the

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Aniket Adnaik
After rethinking at point 4 in my previous email; It will be very expensive to rebuild and re-encode the values , so may not be a viable option. only future loads can benefit from it. But then will end up having some segments using global dictionary and some using local dictionary. May be we

Re: Disscusion shall CI support run carbondata based on multi version spark?

2016-10-13 Thread Liang Chen
Yes, need to solve it , the CI should support different spark version. Regards Liang zhujin wrote > One issue: > I modified the spark.version in pom.xml,using spark1.6.2, then compliation > failed. > > > Root cause: > There was a "unused import statement" warinng in CarbonOptimizer class >

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83349721 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1422,6 +1422,7 @@

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

2016-10-13 Thread ravipesala
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/212#discussion_r83350430 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -861,9 +861,11 @@

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-13 Thread Jay357089
Github user Jay357089 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83352166 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1422,6 +1422,7 @@

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

2016-10-13 Thread ravipesala
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/212#discussion_r83350489 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/CarbonDatasourceRelation.scala --- @@ -55,18 +55,11 @@ class CarbonSource

[GitHub] incubator-carbondata pull request #212: [CARBONDATA-285] Use path parameter ...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/212#discussion_r83351395 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -861,9 +861,11 @@

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83361557 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1422,6 +1422,7 @@

[GitHub] incubator-carbondata pull request #127: [CARBONDATA-213] Remove dependency: ...

2016-10-13 Thread QiangCai
Github user QiangCai closed the pull request at: https://github.com/apache/incubator-carbondata/pull/127 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-13 Thread QiangCai
GitHub user QiangCai opened a pull request: https://github.com/apache/incubator-carbondata/pull/233 [CARBONDATA-296]1.Add CSVInputFormat to read csv files. **1 Add CSVInputFormat to read csv files** MRv1:

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-13 Thread Ravindra Pesala
Hi Jihong/Aniket, In the current implementation of carbondata we are already handling external dictionary while loading the data. But here the question is what would be the default implementation? Load data with out dictionary? Regards, Ravi On 13 October 2016 at 03:50, Aniket Adnaik

[GitHub] incubator-carbondata pull request #132: [CARBONDATA-218]Remove dependency: s...

2016-10-13 Thread QiangCai
Github user QiangCai closed the pull request at: https://github.com/apache/incubator-carbondata/pull/132 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] incubator-carbondata pull request #127: [CARBONDATA-213] Remove dependency: ...

2016-10-13 Thread QiangCai
GitHub user QiangCai reopened a pull request: https://github.com/apache/incubator-carbondata/pull/127 [CARBONDATA-213] Remove dependency: thrift complier [CARBONDATA-213] Remove dependency: thrift complier **analysis** I think it unnecessary for user/developer to

[jira] [Created] (CARBONDATA-315) Data loading fails if parsing a double value returns infinity

2016-10-13 Thread Manish Gupta (JIRA)
Manish Gupta created CARBONDATA-315: --- Summary: Data loading fails if parsing a double value returns infinity Key: CARBONDATA-315 URL: https://issues.apache.org/jira/browse/CARBONDATA-315 Project:

[GitHub] incubator-carbondata pull request #234: [CARBONDATA-315] Data loading fails ...

2016-10-13 Thread manishgupta88
GitHub user manishgupta88 opened a pull request: https://github.com/apache/incubator-carbondata/pull/234 [CARBONDATA-315] Data loading fails if parsing a double value returns infinity Problem: Data loading fails if parsing a double value returns infinity Analysis: During

[GitHub] incubator-carbondata pull request #235: [CARBONDATA-316] Change BAD_RECORDS_...

2016-10-13 Thread mohammadshahidkhan
GitHub user mohammadshahidkhan opened a pull request: https://github.com/apache/incubator-carbondata/pull/235 [CARBONDATA-316] Change BAD_RECORDS_LOGGER_ACTION to BAD_RECORDS_ACTION **Poblem** the name BAD_RECORDS_LOGGER_ACTION is not related to logging the bad records, its

[GitHub] incubator-carbondata pull request #223: [CARBONDATA-292] add infomation for ...

2016-10-13 Thread Jay357089
Github user Jay357089 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/223#discussion_r83205716 --- Diff: docs/DML-Operations-on-Carbon.md --- @@ -104,8 +109,10 @@ Following are the options that can be used in load data:

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

2016-10-13 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83208961 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/DataLoadProcessorStep.java --- @@ -0,0 +1,40 @@ +package

[jira] [Created] (CARBONDATA-316) Change BAD_RECORDS_LOGGER_ACTION to BAD_RECORDS_ACTION

2016-10-13 Thread Mohammad Shahid Khan (JIRA)
Mohammad Shahid Khan created CARBONDATA-316: --- Summary: Change BAD_RECORDS_LOGGER_ACTION to BAD_RECORDS_ACTION Key: CARBONDATA-316 URL: https://issues.apache.org/jira/browse/CARBONDATA-316

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-13 Thread Jay357089
Github user Jay357089 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83208043 --- Diff: processing/src/main/java/org/apache/carbondata/processing/store/writer/AbstractFactDataWriter.java --- @@ -252,6 +252,9 @@ private

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

2016-10-13 Thread ravipesala
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83208986 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/iterators/RecordReaderIterator.java --- @@ -0,0 +1,40 @@