Re: [ngram] Re: Using huge-count.pl with lots of files

2018-04-15 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
Hi Catherine,

Just to make sure I'm understanding what you'd like to do, could you send
the command you are trying to run, and some idea of the number of files
you'd like to process?

Thanks!
Ted

On Sun, Apr 15, 2018 at 6:01 PM, catherine.dejage...@gmail.com [ngram] <
ngram@yahoogroups.com> wrote:

>
>
> That makes sense, but I'm not sure it will give me the behavior I want. I
> don't want bigrams to span from one file to the next, but I do want them to
> span across newlines. If I concatenate the files, then as I understand it
> my first condition is no longer met. Could I run huge-count.pl on
> subgroups of files, then combine the results? And how would I do that?
> 
>


[ngram] Re: Using huge-count.pl with lots of files

2018-04-15 Thread catherine.dejage...@gmail.com [ngram]
That makes sense, but I'm not sure it will give me the behavior I want. I don't 
want bigrams to span from one file to the next, but I do want them to span 
across newlines. If I concatenate the files, then as I understand it my first 
condition is no longer met. Could I run huge-count.pl on subgroups of files, 
then combine the results? And how would I do that?

Re: [ngram] Using huge-count.pl with lots of files

2018-04-15 Thread Ted Pedersen tpede...@d.umn.edu [ngram]
I guess my first thought would be to see if there is a simple way to
compute the input you are providing to huge count into fewer files. If you
have a lot of files that start with the letter 'a', for example, you could
concatentate them all together via a (Linux) command like

cat a* > myafiles.txt

and then use myafiles.txt as an input to huge_count.

This is just one idea, but it's a start perhaps. If this isn't helpful
please let us know and we can try again!

On Sun, Apr 15, 2018 at 1:19 PM, catherine.dejage...@gmail.com [ngram] <
ngram@yahoogroups.com> wrote:

>
>
> I am trying to get the bigram counts aggregated across a lot of files.
> However, when I ran huge-count.pl using the list of files as an input, I
> got the error "Argument list too long". What would you recommend for
> combining many files, when there are too many files to just run
> huge-count.pl as is?
>
>
> Thank you,
>
> Catherine
>
>
> 
>


[ngram] Using huge-count.pl with lots of files

2018-04-15 Thread catherine.dejage...@gmail.com [ngram]
I am trying to get the bigram counts aggregated across a lot of files. However, 
when I ran huge-count.pl using the list of files as an input, I got the error 
"Argument list too long". What would you recommend for combining many files, 
when there are too many files to just run huge-count.pl as is?
 

 Thank you,
 Catherine