Ravindra Pesala created CARBONDATA-456:
--
Summary: Select count(*) from table is slower.
Key: CARBONDATA-456
URL: https://issues.apache.org/jira/browse/CARBONDATA-456
Project: CarbonData
One Pass Load Design Document, Pls Review and give your suggestion.
https://docs.google.com/document/d/1m6rY7vJMu604FagIJmrOhhy_RiUoK53-LPO6qE8jeNU/edit?usp=sharing
--
View this message in context:
http://apache-carbondata-mailing-list-archive.1130556.n5.nabble.com/One-Pass-Load-Design-Docum
Hi All,
+1
I agree with Jacky and it is important for CarbonData community to work on
Spark2.x. As Spark2.x has major design and interface changes. It is also
challenge to support both Spark2.x and Spark1.x. We can start creating
sub-tasks under issue(CARBONDATA-322)
Regards,
Ramana
On Sun, Nov
This proposal looks good, should improve performance and GC issues during
dataload. Please create an issue in Jira. We can create unsafe functions in
common module (just like spark) to allow them to be used across
modules/components, also can check if can reuse any from spark unsafe.
On Sun, Nov 2
Hi Raghu,
In Hive's or Spark's terminology Partitioning and bucketing are different.
Partitioning divides the large amount of data into number pieces of folders
based on table columns value.Here the number partitions created is
depending upon the cardinality of that partitioned column. So it is ve
Hi All,
In the current carbondata system loading performance is not so encouraging
since we need to sort the data at executor level for data loading.
Carbondata collects batch of data and sorts before dumping to the temporary
files and finally it does merge sort from those temporary files to finis
How is this different from partitioning?
On Sun, 27 Nov 2016 at 11:21 PM, Ravindra Pesala
wrote:
> Hi All,
>
> Bucketing concept is based on the hash partition the bucketed column as per
> configured bucket numbers. Records with same bucketed column always goes to
> the same same bucket. Physical
Hi All,
Bucketing concept is based on the hash partition the bucketed column as per
configured bucket numbers. Records with same bucketed column always goes to
the same same bucket. Physically each bucket is a file/files in table
directory.
Advantages
Bucketed table is useful feature to do the map
+1
-vimal
On Nov 23, 2016 9:39 PM, "Venkata Gollamudi" wrote:
> Hi All,
>
> CarbonData 0.2.0 has been a good work and stable release with lot of
> defects fixed and with number of performance improvements.
> https://issues.apache.org/jira/browse/CARBONDATA-320?jql=project%20%3D%
> 20CARBONDATA%20
He Xiaoqiao created CARBONDATA-455:
--
Summary: Benchmark for HashMap and DAT
Key: CARBONDATA-455
URL: https://issues.apache.org/jira/browse/CARBONDATA-455
Project: CarbonData
Issue Type: Sub-
He Xiaoqiao created CARBONDATA-454:
--
Summary: Add new unit test for DAT
Key: CARBONDATA-454
URL: https://issues.apache.org/jira/browse/CARBONDATA-454
Project: CarbonData
Issue Type: Sub-task
He Xiaoqiao created CARBONDATA-453:
--
Summary: Implement DAT(Double Array Trie) for Dictionary
Key: CARBONDATA-453
URL: https://issues.apache.org/jira/browse/CARBONDATA-453
Project: CarbonData
He Xiaoqiao created CARBONDATA-452:
--
Summary: Optimize structure of Dictionary use Trie in place of
HashMap
Key: CARBONDATA-452
URL: https://issues.apache.org/jira/browse/CARBONDATA-452
Project: Carb
Hi Kumar Vishal,
I'll create task to trace this issue.
Thanks for your suggestions.
Regards,
He Xiaoqiao
On Sun, Nov 27, 2016 at 1:41 AM, Kumar Vishal
wrote:
> Hi Xiaoqiao He,
>
> You can go ahead with DAT implementation, based on the result.
> I will look forward for you PR.
>
> Please let m
14 matches
Mail list logo