Re: [PLUG] Counting Files

2021-08-19 Thread Michael Barnes
On Mon, Aug 16, 2021 at 7:25 PM wes  wrote:

> To get the count of unique callsigns, you can just feed this same command
> into wc -l.
>
> find Processed -type f -printf '%f\n' | sed "s/@.*//" | uniq -c | wc -l
>
> -wes
>
>
> On Mon, Aug 16, 2021 at 7:21 PM wes  wrote:
>
> > if the @ is consistent with all the files, that makes it relatively easy.
> >
> > find Processed -type f -printf '%f\n' | sed "s/@.*//" | uniq -c
> >
> > -wes
> >
> > On Mon, Aug 16, 2021 at 7:17 PM Michael Barnes 
> > wrote:
> >
> >> On Mon, Aug 16, 2021 at 5:29 PM David Fleck 
> >> wrote:
> >>
> >> > As Wes said, an example or two would help greatly.
> >> >
> >> > --- David Fleck
> >> >
> >> > ‐‐‐ Original Message ‐‐‐
> >> >
> >> > On Monday, August 16th, 2021 at 7:17 PM, wes 
> wrote:
> >> >
> >> > > are firstnames and lastnames always separated by the same character
> in
> >> > each
> >> > >
> >> > > filename?
> >> > >
> >> > > are the names separated from the rest of the info in the filename
> the
> >> > same
> >> > >
> >> > > way for each file?
> >> > >
> >> > > are you doing this once, or will this be a repeating task that would
> >> be
> >> > >
> >> > > handy to automate?
> >> > >
> >> > > would you be able to provide a few same filenames, perhaps with the
> >> > >
> >> > > personal info obfuscated?
> >> > >
> >> > > generally, the way I would approach this is to pare the filenames
> >> down to
> >> > >
> >> > > the people's names, and then run uniq against that list. uniq -c
> will
> >> > >
> >> > > provide a count of how many times a given string appears in the
> >> input. if
> >> > >
> >> > > I'm doing this once, I would generate a text file containing the
> list
> >> of
> >> > >
> >> > > filenames I will be working with, for example:
> >> > >
> >> > > find Processed -type f > processed-files.txt
> >> > >
> >> > > then use a text editor to pare down the entries as described above,
> >> using
> >> > >
> >> > > find and replace functions to remove the extra data, so only the
> >> people's
> >> > >
> >> > > names remain. then simply uniq -c that file and you're done. I
> >> personally
> >> > >
> >> > > use vi for this, but just about any editor will do. I like this
> >> approach
> >> > >
> >> > > for a number of reasons, not the least of which is that I can
> >> spot-check
> >> > >
> >> > > random samples after each editing step to try to spot unexpected
> >> results.
> >> > >
> >> > > if you want to automate this, it may be a little more complicated,
> and
> >> > the
> >> > >
> >> > > answers to my initial questions become important. if you can
> provide a
> >> > >
> >> > > little more context, I will try to help further.
> >> > >
> >> > > -wes
> >> > >
> >> > > On Mon, Aug 16, 2021 at 5:01 PM Michael Barnes
> barnmich...@gmail.com
> >> > >
> >> > > wrote:
> >> > >
> >> > > > Here's a fun trivia task. For an activity I am involved in, I get
> >> files
> >> > > >
> >> > > > from members to process. The filename starts with the member's
> name
> >> > and has
> >> > > >
> >> > > > other info to identify the file. After processing, the file goes
> in
> >> the
> >> > > >
> >> > > > ./Processed folder. There are thousands of files now in that
> folder.
> >> > Right
> >> > > >
> >> > > > now, I'm looking for a couple basic pieces of information. First,
> I
> >> > want to
> >> > > >
> >> > > > know how many unique names I have in the list. Second, I'd like a
> >> list
> >> > of
> >> > > >
> >> > > > names and how many files go with each name.
> >> > > >
> >> > > > I'm sure this is trivial, but my mind is blanking out on it. A
> >> couple
> >> > > >
> >> > > > simple examples would be nice. Non-answers, like "easy to do
> >> > with'xxx'" or
> >> > > >
> >> > > > references to man pages or George's Book, etc. are not helpful
> right
> >> > now.
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Michael
> >> >
> >>
> >> Actually, they are callsigns instead of names. A couple of examples:
> >>
> >> w7...@k-0496-20210526.txt
> >> wa7...@k-0497-20210714.txt
> >> n8...@k-4386-20210725.txt
> >>
> >> I would like a simple count of the unique callsigns on a random basis
> and
> >> possibly an occasional report listing each callsign and how many files
> are
> >> in the folder for each.
> >>
> >> Michael
> >>
> >
>

Thanks Everybody,

This has been educational for me. It looks like there were several working
options. I started with Wes' option refined by Robert.
$ find  -type f | cut -d @ -f1 | sort | uniq -c

Since I was working from within the /Processed folder, I did not specify it
on the command line.
Then, I discovered some of the callsigns were not capitalized, so I added
the ignore case option.
$ find  -type f | cut -d @ -f1 | sort | uniq -i -c

That gave me usable output #1.

I added the count with
$ find  -type f | cut -d @ -f1 | sort | uniq -i -c | wc -l

Which gave me output #2.

Finally, I added another sort to give Output #3 for the frequency option.
$ find  -type f | cut -d @ -f1 | sort | uniq -i -c | sort -n


I gave Wes' 

Re: [PLUG] Constant Contact html and w3m

2021-08-19 Thread Tomas Kuchta
On Thu, Aug 19, 2021, 17:24 Keith Lofstrom  wrote:

> Many mailing lists use Constant Contact.  The mails arrive
> as an ascii version and an html version.  The unformatted
> ascii version is unreadable.  The html version is readable,
> but I don't use complex web browsers on random content sent
> from uncertified sources - I hope to avoid zero day attacks.
>
> I sorta trust Constant Contact, and look at the formatted
> html version with w3m.  I presume w3m is a simpler tool,
> and not a likely target for attack.
>
> Is this naive?
> .


I may be naive 

I do not consider html email unsafe to look at in normal/maintained Linux
mail client or browser - AS LONG AS I DON'T CLICK ON LINKS.

I occasionally use w3m or even better - browsh. I do not do it for
security, I use text based browsers for convenience via ssh.

I cannot praise enough browsh - it is magic - you should try it.
https://github.com/browsh-org/browsh

Best, Tomas


[PLUG] Constant Contact html and w3m

2021-08-19 Thread Keith Lofstrom
Many mailing lists use Constant Contact.  The mails arrive
as an ascii version and an html version.  The unformatted
ascii version is unreadable.  The html version is readable,
but I don't use complex web browsers on random content sent
from uncertified sources - I hope to avoid zero day attacks.   

I sorta trust Constant Contact, and look at the formatted
html version with w3m.  I presume w3m is a simpler tool,
and not a likely target for attack. 

Is this naive?

Keith

-- 
Keith Lofstrom  kei...@keithl.com