Re: [ngram] Re: Using huge-count.pl with lots of files
Hi Catherine, Just to make sure I'm understanding what you'd like to do, could you send the command you are trying to run, and some idea of the number of files you'd like to process? Thanks! Ted On Sun, Apr 15, 2018 at 6:01 PM, catherine.dejage...@gmail.com [ngram] < ngram@yahoogroups.com> wrote: > > > That makes sense, but I'm not sure it will give me the behavior I want. I > don't want bigrams to span from one file to the next, but I do want them to > span across newlines. If I concatenate the files, then as I understand it > my first condition is no longer met. Could I run huge-count.pl on > subgroups of files, then combine the results? And how would I do that? > >
[ngram] Re: Using huge-count.pl with lots of files
That makes sense, but I'm not sure it will give me the behavior I want. I don't want bigrams to span from one file to the next, but I do want them to span across newlines. If I concatenate the files, then as I understand it my first condition is no longer met. Could I run huge-count.pl on subgroups of files, then combine the results? And how would I do that?
Re: [ngram] Using huge-count.pl with lots of files
I guess my first thought would be to see if there is a simple way to compute the input you are providing to huge count into fewer files. If you have a lot of files that start with the letter 'a', for example, you could concatentate them all together via a (Linux) command like cat a* > myafiles.txt and then use myafiles.txt as an input to huge_count. This is just one idea, but it's a start perhaps. If this isn't helpful please let us know and we can try again! On Sun, Apr 15, 2018 at 1:19 PM, catherine.dejage...@gmail.com [ngram] < ngram@yahoogroups.com> wrote: > > > I am trying to get the bigram counts aggregated across a lot of files. > However, when I ran huge-count.pl using the list of files as an input, I > got the error "Argument list too long". What would you recommend for > combining many files, when there are too many files to just run > huge-count.pl as is? > > > Thank you, > > Catherine > > > >
[ngram] Using huge-count.pl with lots of files
I am trying to get the bigram counts aggregated across a lot of files. However, when I ran huge-count.pl using the list of files as an input, I got the error "Argument list too long". What would you recommend for combining many files, when there are too many files to just run huge-count.pl as is? Thank you, Catherine