On Wed, Feb 18, 2009 at 10:14 PM, Pablo Manalastas
<[email protected]> wrote:
> Ramil's original problem is not how to read all those tens
> of gigabytes of text data, but the more simple problem of keeping
> a count of the number of lines read, since if wc uses int (fortunately
> it does not), then wc can count only up to 2 billion lines. But he
> expects to read up to 100 billion lines. Note that he does not need
> to keep them in memory -- he only needs to count the number of lines.
> I believe that wc is even an overkill, since the following simple
> code will do the job:
>
> unsigned long long n = 0;
> while( (c = getchar()) != EOF) {
> if(c == '\n') ++n;
> }
> return n;
>
> With this code, you do not even buffer the file, except for the
> buffering (usually 4k) that the C library implementation of the
> getchar() macro requires.
yup that would do doc but the problem using getchar() is too slow
compare to fgets() or read() used by wc..
example of a simple code using fgets..
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#define BUFSIZE 1024
int main(void) {
char buffer[BUFSIZE];
uintmax_t lines;
for (lines = 0; fgets(buffer, BUFSIZE, stdin) != NULL; lines++);
printf("%ju\n", lines);
return(0);
}
save to let say wc2.c
gcc -ansi -Wall -O3 -o wc2 wc2.c
./wc2 < /path/to/file
the code above only needs a buffer size of maximum charactes in a
given line... this is faster than using getch() but a little bit
slower than read()...
fooler.
_________________________________________________
Philippine Linux Users' Group (PLUG) Mailing List
http://lists.linux.org.ph/mailman/listinfo/plug
Searchable Archives: http://archives.free.net.ph