[jira] [Created] (CARBONDATA-456) Select count(*) from table is slower.

2016-11-27 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created CARBONDATA-456: -- Summary: Select count(*) from table is slower. Key: CARBONDATA-456 URL: https://issues.apache.org/jira/browse/CARBONDATA-456 Project: CarbonData

One Pass Load Design Document

2016-11-27 Thread Lion.X
One Pass Load Design Document, Pls Review and give your suggestion. https://docs.google.com/document/d/1m6rY7vJMu604FagIJmrOhhy_RiUoK53-LPO6qE8jeNU/edit?usp=sharing -- View this message in context: http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/One-Pass-Load-Design-Docum

Re: [Feature Proposal] Spark 2 integration with CarbonData

2016-11-27 Thread Venkata Gollamudi
Hi All, +1 I agree with Jacky and it is important for CarbonData community to work on Spark2.x. As Spark2.x has major design and interface changes. It is also challenge to support both Spark2.x and Spark1.x. We can start creating sub-tasks under issue(CARBONDATA-322) Regards, Ramana On Sun, Nov

Re: [improvement] Support unsafe in-memory sort in carbondata

2016-11-27 Thread Venkata Gollamudi
This proposal looks good, should improve performance and GC issues during dataload. Please create an issue in Jira. We can create unsafe functions in common module (just like spark) to allow them to be used across modules/components, also can check if can reuse any from spark unsafe. On Sun, Nov 2

Re: [New Feature] Adding bucketed table feature to Carbondata

2016-11-27 Thread Ravindra Pesala
Hi Raghu, In Hive's or Spark's terminology Partitioning and bucketing are different. Partitioning divides the large amount of data into number pieces of folders based on table columns value.Here the number partitions created is depending upon the cardinality of that partitioned column. So it is ve

[improvement] Support unsafe in-memory sort in carbondata

2016-11-27 Thread Ravindra Pesala
Hi All, In the current carbondata system loading performance is not so encouraging since we need to sort the data at executor level for data loading. Carbondata collects batch of data and sorts before dumping to the temporary files and finally it does merge sort from those temporary files to finis

Re: [New Feature] Adding bucketed table feature to Carbondata

2016-11-27 Thread Raghunandan S
How is this different from partitioning? On Sun, 27 Nov 2016 at 11:21 PM, Ravindra Pesala wrote: > Hi All, > > Bucketing concept is based on the hash partition the bucketed column as per > configured bucket numbers. Records with same bucketed column always goes to > the same same bucket. Physical

[New Feature] Adding bucketed table feature to Carbondata

2016-11-27 Thread Ravindra Pesala
Hi All, Bucketing concept is based on the hash partition the bucketed column as per configured bucket numbers. Records with same bucketed column always goes to the same same bucket. Physically each bucket is a file/files in table directory. Advantages Bucketed table is useful feature to do the map

Re: CarbonData propose major version number increment for next version (to 1.0.0)

2016-11-27 Thread Vimal Das Kammath
+1 -vimal On Nov 23, 2016 9:39 PM, "Venkata Gollamudi" wrote: > Hi All, > > CarbonData 0.2.0 has been a good work and stable release with lot of > defects fixed and with number of performance improvements. > https://issues.apache.org/jira/browse/CARBONDATA-320?jql=project%20%3D% > 20CARBONDATA%20

[jira] [Created] (CARBONDATA-455) Benchmark for HashMap and DAT

2016-11-27 Thread He Xiaoqiao (JIRA)
He Xiaoqiao created CARBONDATA-455: -- Summary: Benchmark for HashMap and DAT Key: CARBONDATA-455 URL: https://issues.apache.org/jira/browse/CARBONDATA-455 Project: CarbonData Issue Type: Sub-

[jira] [Created] (CARBONDATA-454) Add new unit test for DAT

2016-11-27 Thread He Xiaoqiao (JIRA)
He Xiaoqiao created CARBONDATA-454: -- Summary: Add new unit test for DAT Key: CARBONDATA-454 URL: https://issues.apache.org/jira/browse/CARBONDATA-454 Project: CarbonData Issue Type: Sub-task

[jira] [Created] (CARBONDATA-453) Implement DAT(Double Array Trie) for Dictionary

2016-11-27 Thread He Xiaoqiao (JIRA)
He Xiaoqiao created CARBONDATA-453: -- Summary: Implement DAT(Double Array Trie) for Dictionary Key: CARBONDATA-453 URL: https://issues.apache.org/jira/browse/CARBONDATA-453 Project: CarbonData

[jira] [Created] (CARBONDATA-452) Optimize structure of Dictionary use Trie in place of HashMap

2016-11-27 Thread He Xiaoqiao (JIRA)
He Xiaoqiao created CARBONDATA-452: -- Summary: Optimize structure of Dictionary use Trie in place of HashMap Key: CARBONDATA-452 URL: https://issues.apache.org/jira/browse/CARBONDATA-452 Project: Carb

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-27 Thread Xiaoqiao He
Hi Kumar Vishal, I'll create task to trace this issue. Thanks for your suggestions. Regards, He Xiaoqiao On Sun, Nov 27, 2016 at 1:41 AM, Kumar Vishal wrote: > Hi Xiaoqiao He, > > You can go ahead with DAT implementation, based on the result. > I will look forward for you PR. > > Please let m