Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-28 Thread Xiaoqiao He
Hi Jihong, Thanks for your attentions and reply. 1. Actually I has done benchmark with English/Chinese dictionary size in {100K,200K,300K,400K,500K,600K} separately, and test result is basic same as mentioned in this mail flow before, I will submit and open the benchmark code and dictionary

RE: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-28 Thread Jihong Ma
Thank you Xiaoqiao for looking into this issue and sharing your result! Have you tried varied dictionary size for comparison among all the alternatives? And please pay closer attention to the license of DAT implementation, as they are under LGPL, generally speaking, it is not legally allowed

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-27 Thread Xiaoqiao He
Hi Kumar Vishal, I'll create task to trace this issue. Thanks for your suggestions. Regards´╝î He Xiaoqiao On Sun, Nov 27, 2016 at 1:41 AM, Kumar Vishal wrote: > Hi Xiaoqiao He, > > You can go ahead with DAT implementation, based on the result. > I will look forward

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-26 Thread Kumar Vishal
Hi Xiaoqiao He, You can go ahead with DAT implementation, based on the result. I will look forward for you PR. Please let me know if you need any support:). -Regards KUmar Vishal On Fri, Nov 25, 2016 at 11:22 PM, Xiaoqiao He wrote: > Hi Liang, Kumar Vishal, > > I has

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-25 Thread Xiaoqiao He
Hi Liang, Kumar Vishal, I has done a standard benchmark about multiply data structures for Dictionary following your suggestions. Based on the test results, I think DAT may be the best choice for CarbonData. *1. Here are 2 test results:*

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-24 Thread Xiaoqiao He
Hi Kumar Vishal, Thanks for your suggestions. As you said, choose Trie replace HashMap we can get better memory footprint and also good performance. Of course, DAT is not only choice, and I will do test about DAT vs Radix Trie and release the test result as soon as possible. Thanks your

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-24 Thread Xiaoqiao He
Hi Liang, Generally, yes, because the same prefix of items in dictionary does not require to repeat in DAT, and more data better result. Actually the cost of DAT is building Tree, and i don't think we need to consider it since this cost appears only once when load data. FYI. Regards, Xiaoqiao

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-24 Thread Kumar Vishal
Hi XIaoqiao He, +1, For forward dictionary case it will be very good optimisation, as our case is very specific storing byte array to int mapping[data to surrogate key mapping], I think we will get much better memory footprint and performance will be also good(2x). We can also try radix tree(radix

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-23 Thread Liang Chen
Hi xiaoqiao For the below example, 600K dictionary data: It is to say that using "DAT" can save 36M memory against "ConcurrentHashMap", whereas the performance just lost less (1718ms) ? One more question:if increases the dictionary data size, what's the comparison results "ConcurrentHashMap" VS

Re: [Improvement] Use Trie in place of HashMap to reduce memory footprint of Dictionary

2016-11-23 Thread Xiaoqiao He
hi Liang, Thanks for your reply, i need to correct the experiment result because it's wrong order NO.1 column of result data table. In order to compare performance between Trie and HashMap, Two different structures are constructed using the same dictionary data which size is 600K and each item's