com]
> Sent: Friday, November 25, 2016 9:52 AM
> To: dev@carbondata.incubator.apache.org
> Subject: Re: [Improvement] Use Trie in place of HashMap to reduce memory
> footprint of Dictionary
>
> Hi Liang, Kumar Vishal,
>
> I has done a standard benchmark about multiply data structures for
&
to be included.
Jihong
-Original Message-
From: Xiaoqiao He [mailto:xq.he2...@gmail.com]
Sent: Friday, November 25, 2016 9:52 AM
To: dev@carbondata.incubator.apache.org
Subject: Re: [Improvement] Use Trie in place of HashMap to reduce memory
footprint of Dictionary
Hi Liang, Kumar
Hi Kumar Vishal,
I'll create task to trace this issue.
Thanks for your suggestions.
Regards,
He Xiaoqiao
On Sun, Nov 27, 2016 at 1:41 AM, Kumar Vishal
wrote:
> Hi Xiaoqiao He,
>
> You can go ahead with DAT implementation, based on the result.
> I will look forward for you PR.
>
> Please let m
Hi Xiaoqiao He,
You can go ahead with DAT implementation, based on the result.
I will look forward for you PR.
Please let me know if you need any support:).
-Regards
KUmar Vishal
On Fri, Nov 25, 2016 at 11:22 PM, Xiaoqiao He wrote:
> Hi Liang, Kumar Vishal,
>
> I has done a standard benchmark
Hi Liang, Kumar Vishal,
I has done a standard benchmark about multiply data structures for
Dictionary following your suggestions. Based on the test results, I think
DAT may be the best choice for CarbonData.
*1. Here are 2 test results:*
---
Hi xiaoqiao
ok, look forward to seeing your test result.
Can you take this task for this improvement? Please let me know if you need
any support :)
Regards
Liang
hexiaoqiao wrote
> Hi Kumar Vishal,
>
> Thanks for your suggestions. As you said, choose Trie replace HashMap we
> can get better me
Hi Kumar Vishal,
Thanks for your suggestions. As you said, choose Trie replace HashMap we
can get better memory footprint and also good performance. Of course, DAT
is not only choice, and I will do test about DAT vs Radix Trie and release
the test result as soon as possible. Thanks your suggestion
Hi Liang,
Generally, yes, because the same prefix of items in dictionary does not
require to repeat in DAT, and more data better result.
Actually the cost of DAT is building Tree, and i don't think we need to
consider it since this cost appears only once when load data.
FYI.
Regards,
Xiaoqiao
Hi XIaoqiao He,
+1,
For forward dictionary case it will be very good optimisation, as our case
is very specific storing byte array to int mapping[data to surrogate key
mapping], I think we will get much better memory footprint and performance
will be also good(2x). We can also try radix tree(radix
Hi xiaoqiao
For the below example, 600K dictionary data:
It is to say that using "DAT" can save 36M memory against
"ConcurrentHashMap", whereas the performance just lost less (1718ms) ?
One more question:if increases the dictionary data size, what's the
comparison results "ConcurrentHashMap" VS "
hi Liang,
Thanks for your reply, i need to correct the experiment result because it's
wrong order NO.1 column of result data table.
In order to compare performance between Trie and HashMap, Two different
structures are constructed using the same dictionary data which size is
600K and each item's
Hi xiaoqiao
This improvement looks great!
Can you please explain the below data, what does it mean?
--
ConcurrentHashMap
~68MB 14543
Double Array Trie
~104MB 12825
Regards
Liang
2016-11-24 2:04 GMT+08:00 Xiaoqiao He :
> Hi All,
>
> I would like to propose Dictionary improvement which u
12 matches
Mail list logo