在 2015年4月21日星期二 UTC+8上午11:41:21,Wenwei Peng写道:
>
> Title: The core of the core of the big data solutions -- Map
> Author: pengwenwei
> Email: wenwei19710430
> Language: c++
> Platform: Windows, linux
> Technology: Perfect hash algorithm
> Level: Advanced
> Description: Map algorithm with high performance
> Section MFC c++ map stl
> SubSection c++ algorithm
> License: (GPLv3)
>
> Download demo project - 1070 Kb
> Download source - 1070 Kb
>
> Introduction:
> For the c++ program, map is used everywhere.And bottleneck of program
> performance is often the performance of map.Especially in the case of large
> data,and the business association closely and unable to realize the data
> distribution and parallel processing condition.So the performance of map
> becomes the key technology.
>
> In the work experience with telecommunications industry and the
> information security industry, I was dealing with the big bottom
> data,especially the most complex information security industry data,all
> can’t do without map.
>
> For example, IP table, MAC table, telephone number list, domain name
> resolution table, ID number table query, the Trojan horse virus
> characteristic code of cloud killing etc..
>
> The map of STL library using binary chop, its has the worst
> performance.Google Hash map has the optimal performance and memory at
> present, but it has repeated collision probability.Now the big data rarely
> use a collision probability map,especially relating to fees, can’t be wrong.
>
> Now I put my algorithms out here,there are three kinds of map,after the
> build is Hash map.We can test the comparison,my algorithm has the zero
> probability of collision,but its performance is also better than the hash
> algorithm, even its ordinary performance has no much difference with Google.
>
> My algorithm is perfect hash algorithm,its key index and the principle of
> compression algorithm is out of the ordinary,the most important is a
> completely different structure,so the key index compression is
> fundamentally different.The most direct benefit for program is that for the
> original map need ten servers for solutions but now I only need one server.
> Declare: the code can not be used for commercial purposes, if for
> commercial applications,you can contact me with QQ 75293192.
> Download:
> https://sourceforge.net/projects/pwwhashmap/files
>
> Applications:
> First,modern warfare can’t be without the mass of information query, if
> the query of enemy target information slows down a second, it could lead to
> the delaying fighter, leading to failure of the entire war. Information
> retrieval is inseparable from the map, if military products use pwwhashMap
> instead of the traditional map,you must be the winner.
>
> Scond,the performance of the router determines the surfing speed, just
> replace open source router code map for pwwHashMap, its speed can increase
> ten times.
> There are many tables to query and set in the router DHCP ptotocol,such as
> IP,Mac ,and all these are completed by map.But until now,all map are using
> STL liabrary,its performance is very low,and using the Hash map has error
> probability,so it can only use multi router packet dispersion treatment.If
> using pwwHashMap, you can save at least ten sets of equipment.
>
> Third,Hadoop is recognized as the big data solutions at present,and its
> most fundamental thing is super heavy use of the map,instead of SQL and
> table.Hadoop assumes the huge amounts of data so that the data is
> completely unable to move, people must carry on the data analysis in the
> local.But as long as the open source Hadoop code of the map changes into
> pwwHashMap, the performance will increase hundredfold without any problems.
>
>
> Background to this article that may be useful such as an introduction to
> the basic ideas presented:
> http://blog.csdn.net/chixinmuzi/article/details/1727195
>
>
>
--
--
v8-dev mailing list
[email protected]
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups
"v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/d/optout.
Technology transfer $2000000000
Contact information:
qq: 75293192
email:[email protected],[email protected]
address:Xianggan Batang District 17-18 xiangtan City, Hunan China
postcode:411101
I am currently working in a state secret unit, e-mail need filter, so e-mail
not all reach my mailbox.and phone may be answered or interrupted strangers.
come to my home to Interviews as far as possible 。
Title: The core of the big data solutions -- Map.
Author: pengwenwei
Email: [email protected]
Language: c++
Platform: Windows, linux
Technology: Perfect hash algorithm
Level: Advanced
Description: A high performance map algorithm
Section MFC c++ map stl
SubSection c++ algorithm
License: (GPLv3)
Map is widely used in c++ programs. Its performance is critical to programs'
performance. Especially in big data and the scenarios which can't realize data
distribution and parallel processing.
I have been working on big data analysis for many years in telecommunition and
information security industry. The data analysis is so complicated that they
can't work without map. Especially in information security industry, the data
is much more complicated than others. For example, ip table, mac table,
telephone numbers table, dns table etc.
Currently, the STL map and Google's hash map are the most popular maps. But
they have some disadvantages. The STL map is based on binary chop, which causes
a bad performance. Google Hash map has the best performance at present, but it
has probability of collision. For big data analysis, the collision probability
is unacceptable.
Now I would like to publish pwwMap. It includes three different maps for
different scenarios:
1. Memory Map(memMap): It has a good access speed. But its size is limited by
memory size.
2. Harddisk Map(diskMap): It utilizes hard disk to store data. So it could
accept much more data than memory map.
3. Hashmap(hashMap): It has the best performance and a great lookup speed, but
it doesn't have 'insert' and 'delete' functionality.
MemMap and diskMap could be converted to hashMap by function memMap2HashMap and
diskMap2HashMap. According to the test result, my algorithms' collision
probability is zero. About performance, memMap has a comparable performance
with google, and hashMap's performance is 100 times better than Google's
hashmap.
In summary, pwwhash are perfect hash algorithms with zero collision
probability. You can refer to following artical to find the key index and
compress algorithm theory:
http://blog.csdn.net/chixinmuzi/article/details/1727195
Source code and documents:
https://sourceforge.net/projects/pwwhashmap/files/?source=navbar