How to sort and count efficiently?

2019-06-30 Thread Peng Yu
Hi, I have a long list of string (each string is in a line). I need to count the number of appearance for each string. I currently use `sort` to sort the list and then use another program to do the count. The second program doing the count needs only a small amount of the memory as the input is

Re: How to sort and count efficiently?

2019-06-30 Thread Peng Yu
The problem with this kind of awk program is that everything will be loaded to memory. But bare `sort` use external files to save memory. When the hash in awk is too large, accessing it can become very slow (maybe due to potential cache miss or slow down of hash as a function of hash size). On

Re: How to sort and count efficiently?

2019-06-30 Thread Assaf Gordon
Correcting myself: On Sun, Jun 30, 2019 at 10:08:46AM -0600, Assaf Gordon wrote: > On Sun, Jun 30, 2019 at 07:34:19AM -0500, Peng Yu wrote: > > > > I have a long list of string (each string is in a line). I need to > > count the number of appearance for each string. > > > > [...] Does anybody

Re: How to sort and count efficiently?

2019-06-30 Thread Assaf Gordon
On 2019-06-30 11:10 a.m., Peng Yu wrote: The problem with this kind of awk program is that everything will be loaded to memory. Well, those are the to main options: store in memory or resort to disk I/O. each has its own pros and cons. But bare `sort` use external files to save memory.

Re: How to sort and count efficiently?

2019-06-30 Thread Assaf Gordon
Hello, On Sun, Jun 30, 2019 at 07:34:19AM -0500, Peng Yu wrote: > Hi, > > I have a long list of string (each string is in a line). I need to > count the number of appearance for each string. > > I currently use `sort` to sort the list and then use another program > to do the count. The second