[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-14 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/240#discussion_r83506371 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/steps/input/InputProcessorStepImpl.java --- @@ -0,0 +1,171 @@

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

2016-10-14 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391617 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/BadRecordslogger.java --- @@

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

2016-10-14 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391382 --- Diff: integration/spark/src/main/java/org/apache/carbondata/spark/load/CarbonLoadModel.java --- @@ -117,9 +117,9 @@

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

2016-10-14 Thread mohammadshahidkhan
Github user mohammadshahidkhan commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/218#discussion_r83391717 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java ---

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

2016-10-14 Thread ravipesala
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/229#discussion_r83373827 --- Diff: processing/src/main/java/org/apache/carbondata/processing/newflow/AbstractDataLoadProcessorStep.java --- @@ -73,15 +72,15 @@

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386474 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83386400 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/io/StringArrayWritable.java --- @@ -0,0 +1,69 @@ +/* + * Licensed to the

Re: Subscribe mailing list

2016-10-14 Thread Ravindra Pesala
Hi, Please send mail to dev-subscr...@carbondata.incubator.apache.org to subscribe mailing list. Thanks, Ravi. On 14 October 2016 at 11:45, Anurag Srivastava wrote: > Hello , > > I want add my mail in your mailing list. > > -- > *Thanks* > > > *Anurag

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-14 Thread Jay357089
Github user Jay357089 commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83376650 --- Diff: integration/spark/src/main/scala/org/apache/spark/sql/execution/command/carbonTableSchema.scala --- @@ -1422,6 +1422,7 @@

[GitHub] incubator-carbondata pull request #239: [CARBONDATA-305] Add load option for...

2016-10-14 Thread jackylk
GitHub user jackylk opened a pull request: https://github.com/apache/incubator-carbondata/pull/239 [CARBONDATA-305] Add load option for new loading flow Added new option for SQL and dataframe.write - For SQL, option is: `USE_KETTLE`, default value is true - For dataframe,

[GitHub] incubator-carbondata pull request #229: [CARBONDATA-297]Added interface for ...

2016-10-14 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/229 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread QiangCai
Github user QiangCai commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83387366 --- Diff: hadoop/src/main/java/org/apache/carbondata/hadoop/mapreduce/CSVInputFormat.java --- @@ -0,0 +1,180 @@ +/* + * Licensed to

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jacky Li
Hi, I can offer one more approach for this discussion, since new dictionary values are rare in case of incremental load (ensure first load having as much dictionary value as possible), so synchronization should be rare. So how about using Zookeeper + HDFS file to provide this service. This is

[jira] [Created] (CARBONDATA-318) Implement an ExternalSorter that makes maximum usage of memory while sorting

2016-10-14 Thread Jacky Li (JIRA)
Jacky Li created CARBONDATA-318: --- Summary: Implement an ExternalSorter that makes maximum usage of memory while sorting Key: CARBONDATA-318 URL: https://issues.apache.org/jira/browse/CARBONDATA-318

[GitHub] incubator-carbondata pull request #213: [CARBONDATA-286] Support Append mode...

2016-10-14 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/213 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] incubator-carbondata pull request #234: [CARBONDATA-315] Data loading fails ...

2016-10-14 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/234#discussion_r83424392 --- Diff: processing/src/main/java/org/apache/carbondata/processing/surrogatekeysgenerator/csvbased/CarbonCSVBasedSeqGenStep.java --- @@

[GitHub] incubator-carbondata pull request #204: [CARBONDATA-280]Fix the bug that whe...

2016-10-14 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/204 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] incubator-carbondata pull request #218: [CARBONDATA-288] In hdfs bad record ...

2016-10-14 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/218 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] incubator-carbondata pull request #234: [CARBONDATA-315] Data loading fails ...

2016-10-14 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/234 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] incubator-carbondata pull request #236: [CARBONDATA-299] Add dictionary inte...

2016-10-14 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/incubator-carbondata/pull/236 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

Subscribe to carbondata

2016-10-14 Thread Abhishek Giri
Hi, Please add me to carbon data mailing list Regards - Abhishek Giri 9717415895

Subscribe mailing list

2016-10-14 Thread Anurag Srivastava
Hello , I want add my mail in your mailing list. -- *Thanks* *Anurag Srivastava**Software Consultant* *Knoldus Software LLP* *India - US - Canada* * Twitter | FB | LinkedIn

[GitHub] incubator-carbondata pull request #240: [CARBONDATA-298]Added InputProcessor...

2016-10-14 Thread ravipesala
GitHub user ravipesala opened a pull request: https://github.com/apache/incubator-carbondata/pull/240 [CARBONDATA-298]Added InputProcessorStep to read data from csv reader iterator. Add InputProcessorStep which should iterate recordreader of csv input and parse the data as per the

[GitHub] incubator-carbondata pull request #230: [CARBONDATA-306]Add block size info ...

2016-10-14 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/230#discussion_r83421288 --- Diff: processing/src/main/java/org/apache/carbondata/processing/mdkeygen/MDKeyGenStep.java --- @@ -314,7 +314,7 @@ private boolean

[GitHub] incubator-carbondata pull request #233: [CARBONDATA-296]1.Add CSVInputFormat...

2016-10-14 Thread jackylk
Github user jackylk commented on a diff in the pull request: https://github.com/apache/incubator-carbondata/pull/233#discussion_r83422823 --- Diff: hadoop/src/test/java/org/apache/carbondata/hadoop/csv/CSVInputFormatTest.java --- @@ -0,0 +1,171 @@ +/* + * Licensed to the

Re: [Discussion] Code generation in carbon result preparation

2016-10-14 Thread Vimal Das Kammath
Hi Vishal, I think, we need both solution 1 & 2 Solution1 may need re-desiging several parts of Carbon's query process starting from scanner, aggregator to result preparation. This can help avoid the frequent cache invalidation. In Solution2 code generation will not solve the frequent cache

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala
Hi, 1. Using the external tool to generate the dictionary : I think It cannot be default solution, it is just one option to user if they are willing to generate dictionary separately and provide to carbon while loading the data to boost performance. 2. Using 2 pass solution(current solution) :

Re: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Ravindra Pesala
Hi Jihong, I agree, we can use external tool for first load, but for incremental load we should have solution to add global dictionary. So this solution should be enough to generate global dictionary even if user does not use external tool for first time. That solution could be distributed map or

RE: Discussion(New feature) regarding single pass data loading solution.

2016-10-14 Thread Jihong Ma
Hi Ravi, The major concern I have for generating global dictionary from scratch with a single scan is performance, the way to handle an occasional update to the dictionary is way simpler and cost effective in terms of synchronization cost and refresh the global/local cache copy. There are a

Re: Disscusion shall CI support run carbondata based on multi version spark?

2016-10-14 Thread Vimal Das Kammath
Yes, I Agree. CI should be configured to build Carbon on different spark versions. On Fri, Oct 14, 2016 at 7:56 AM, Liang Chen wrote: > > Yes, need to solve it , the CI should support different spark version. > > Regards > Liang > > > zhujin wrote > > One issue: > > I