Re: spell check - how to?

2008-11-07 Thread Robert Huff

Giorgos Keramidas writes:

  The main drawback of being unable to use the `freebsd' wordlist
  is that you will get many false positives for words that are
  perfectly valid for FreeBSD documentation but are not standard
  English words.

I have a script which does something similar, using ispell.
It's based on the Perl script - found on-line - appended below.
I pseudo-fixed that running the output through sort and
starting with least frequent hits.
Attempts to build a project-specific dictionary proved too
confusing and it was ultimatly not worth the effort.


Robert Huff



#!/usr/local/bin/perl -W

# WordFreq.pl -- Count word frequency in a text file
$ver = v1.0; # 05-Dec-2001 JP Vossen [EMAIL PROTECTED]

# Basics from 8.3, page 280 of _Perl_Cookbook_
# Added stop words

(($myname = $0) =~ s/^.*(\/|\\)|\..*$//ig); # remove up to last \ or / and 
after any .
$Greeting =  ($myname $ver Copyright 12001 JP Vossen 
(http://www.jpsdomain.org/)\n);
$Greeting .= (Licensed under the GNU GENERAL PUBLIC LICENSE:\n);
$Greeting .= (See http://www.gnu.org/copyleft/gpl.html for full text and 
details.\n); # Version and copyright info

%seen = ();   # Create the hash

# Define the stopwords
@stopwords = (a, an, and, are, as, at, be, but, by, 
does, for, from, had, have, her, his, if, in, is,
it, not, of, on, or, that, the, this, to, was,
which, with, you);


if ((@ARGV =~ /\?/) || (@ARGV  5) || (@ARGV  0)) { #if wrong # of args, or 
a ? in args - die
print STDERR (\n$Greeting\n\tUsage: $myname -i {infile} [-s]\n);
print STDERR (\nIf -s is used, the list of stop words will NOT be 
used.\n);
print STDERR (The stopwords currently defined are:\n\n );
foreach $stopword (@stopwords) {
print STDERR ($stopword );
} # end of foreach stopword
die (\n);
}

use Getopt::Std; # User Perl5 built-in program argument handler
getopts('i:o:s');# Define possible args.

if (! $opt_i) { $opt_i = -; }  # If no input file specified, use STDIN
if (! $opt_o) { $opt_o = -; }  # If no output file specified, use STDOUT

open (INFILE, $opt_i) || die $myname: error opening $opt_i $!\n;
open (OUTFILE, $opt_o) || die $myname: error opening $opt_o $!\n;

print STDERR (\n$Greeting\n);

while (INFILE) {# Read the input file
while ( /(\w['\w-]*)/g ) {# If we have a word
$seen{lc $1}++;   # Count it in the hash
} # end of while words
} # end of while input

if (! $opt_s) {   # If we're using stopwords
foreach $stopword (@stopwords) {  # for each stopword
delete($seen{$stopword}); # Remove it from the hash
} # end of foreach stopword   # This way we only test once for each
} # end of if using stopwords   stopword, rather than in a loop!


# Print the results, sorted most frequent words at the top
foreach $word ( sort { $seen{$b} = $seen{$a} } keys %seen) {
printf OUTFILE (%6d %s\n, $seen{$word}, $word);
} # end of foreach word

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


spell check - how to?

2008-11-06 Thread Eitan Adler
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

[cc to -questions as it might be a general question]

I'm looking to do a full spell check/fix on the handbook.
I found make spellcheck-txt which apparently removed certain items
that would not go well through a spell checker - however I don't know
how to actually run the spellcheck itself.

I have aspell installed.
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.9 (FreeBSD)

iEYEARECAAYFAkkTx0wACgkQtl8kq+nCzNHfcACeLHva5seuXKoCX8GT4JBTJfwx
lTgAn2RjPnpyU1KkoR51O+hQ1MJLIYGD
=ISoq
-END PGP SIGNATURE-
___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: spell check - how to?

2008-11-06 Thread Giorgos Keramidas
On Thu, 06 Nov 2008 23:42:52 -0500, Eitan Adler [EMAIL PROTECTED] wrote:
 [cc to -questions as it might be a general question]

Hi Eitan :-)

 I'm looking to do a full spell check/fix on the handbook.  I found
 make spellcheck-txt which apparently removed certain items that
 would not go well through a spell checker - however I don't know how
 to actually run the spellcheck itself.

 I have aspell installed.

You will probably have to tweak the ISPELLOPTS to use aspell.  The
current spellcheck targets use a default ISPELLOPTS with a value of:

%%%
doc/el/share/mk/doc.project.mk:ISPELL?= ispell
doc/el/share/mk/doc.project.mk:ISPELLOPTS?= -l -p /usr/share/dict/freebsd 
${ISPELLFLAGS}
%%%

The -l option is not supported by aspell, so the following spellcheck
run fails:

: % pwd
: /ws/doc/en_US.ISO8859-1/articles/contributing
: % env ISPELL=aspell make FORMATS=txt spellcheck
: Spellcheck article.txt
: Error: You must specify a parameter for -l.
: *** Error code 1
:

But you can set ISPELL and ISPELLOPTS in the runtime environment to pass
aspell-compatible options:

: % env ISPELL=aspell ISPELLOPTS='list' make FORMATS=txt spellcheck
: Spellcheck article.txt
: jcamou
: IEEE
: ...

Unfortunately, the wordlist at `/usr/share/dict/freebsd' is not usable
with aspell right now, so if you try to use it you will get errors like:

: env ISPELL=aspell \
: ISPELLOPTS='-p /usr/share/dict/freebsd ${ISPELLFLAGS}' \
: ISPELLFLAGS='list' make FORMATS=txt spellcheck
: Spellcheck article.txt
: Error: The file /usr/share/dict/freebsd is not in the proper format.
: *** Error code 1

The main drawback of being unable to use the `freebsd' wordlist is that
you will get many false positives for words that are perfectly valid for
FreeBSD documentation but are not standard English words.

___
freebsd-questions@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-questions
To unsubscribe, send any mail to [EMAIL PROTECTED]