Re: I have some bad news

2016-09-07 Thread Bill Cole

On 6 Sep 2016, at 16:04, do...@mail.com wrote:


On Mon, 05 Sep 2016 20:17:18 "Bill Cole" wrote:

On 4 Sep 2016, at 21:11, @lbutlr wrote:


On Sep 1, 2016, at 7:41 PM, David Niklas >
<[do...@mail.com]()> wrote:


Would you like to go out to lunch?


Other than your message, that phrase does not appear in 7 years of 
my

mail.


It's in hash-buster/bayes-buster parts in 5 messages in my spam 
corpus

spread over 4 years without other obvious commonalities (other than
their use of such tactics.)


It was just an example to make a point. You would need to look at your
cool database for a non-spamy string and place it in with an equally 
spamy

one to figure out if I have found a bug in your cool program.

BTW: You never mentioned if anyone accepted your offer yet.


You seem to have me confused with Marc Perkel. I am not Marc Perkel. 
This should have been apparent from the attribution line you included in 
your message.


The point I was hoping others would infer is simply that different 
people get substantially different mail (ham and spam) which makes 
statistical approaches of all sorts increasingly ineffective as you 
increase the diversity of the recipient population. This latest FUSSP 
proposal is even more fragile to that sort of breakage because all it 
takes to completely burn a classifier token is a single appearance in 
both classes. As one grows a source corpus across a broad enough 
audience, the usable tokens trend inevitably towards zero while the 
remaining usable tokens are those which simply don't occur very often 
and so aren't operationally valuable.


Despite Mr. Perkel's extensive insistence to the contrary, his proposal 
does logically reduce to a variation on Bayesian filtering which avoids 
FPs at the cost of not being able to make any judgment at all on the 
actually difficult cases.




Re: I have some bad news

2016-09-06 Thread doark
On Mon, 05 Sep 2016 20:17:18 "Bill Cole" wrote:
> On 4 Sep 2016, at 21:11, @lbutlr wrote:
> 
> > On Sep 1, 2016, at 7:41 PM, David Niklas >
> > <[do...@mail.com]()> wrote: 
> >>  
> >> Would you like to go out to lunch?  
> >
> > Other than your message, that phrase does not appear in 7 years of my
> > mail.  
> 
> It's in hash-buster/bayes-buster parts in 5 messages in my spam corpus
> spread over 4 years without other obvious commonalities (other than
> their use of such tactics.)

It was just an example to make a point. You would need to look at your
cool database for a non-spamy string and place it in with an equally spamy
one to figure out if I have found a bug in your cool program.

BTW: You never mentioned if anyone accepted your offer yet.

Sincerely,
David


Re: I have some bad news

2016-09-05 Thread Dave Warren
On Sun, Sep 4, 2016, at 18:11, @lbutlr wrote:
> On Sep 1, 2016, at 7:41 PM, David Niklas  wrote:
>>
>> Would you like to go out to lunch?
>
> Other than your message, that phrase does not appear in 7 years of
> my mail.

And? Replace the string with an example that does appear frequently in
ham. Or, a dozen examples that do, structured into a plausible
paragraph.


Re: I have some bad news

2016-09-05 Thread Bill Cole

On 4 Sep 2016, at 21:11, @lbutlr wrote:

On Sep 1, 2016, at 7:41 PM, David Niklas 
<[do...@mail.com]()> wrote:








Would you like to go out to lunch?


Other than your message, that phrase does not appear in 7 years of my 
mail.



It's in hash-buster/bayes-buster parts in 5 messages in my spam corpus 
spread over 4 years without other obvious commonalities (other than 
their use of such tactics.)




Re: I have some bad news

2016-09-05 Thread @lbutlr
On Sep 1, 2016, at 7:41 PM, David Niklas  wrote:Would you like to go out to lunch?Other than your message, that phrase does not appear in 7 years of my mail.

Re: I have some bad news

2016-09-01 Thread David Niklas
On Mon, 15 Aug 2016 22:22:47 -0700
Marc Perkel  wrote:

> Well, this is kind of hard to say so just going to say it. I have stage
> 4 lung cancer and the probably spectrum is not good. I've been fighting
> spam for the last 15 years and I'd like to keep fighting spam from the
> grave. So I'm willing to share my technology with anyone interested.
> 
> Several months ago I talked about a new trick I came up with to fight
> spam and also positively identify good email as good. I've been running
> it now for 7 months and it is a breakthrough. At the time I had
> intended to patent it just to get enough protection to license it to
> the big boys, but now it is unlikely I'll be around long enough for
> that. I have however noticed that because of my condition people are
> paying attention to me more now that there's a deadline.
> 
> Here's my spam filtering trick. It's something that can be easily
> integrated into SpamAssassin. Being that my programming is somewhat
> sloppy at times it can probably be done even better than what I did.
> The thing to keep in mind when reading this is that it's not bayesian
> filtering. Many people in the spam filtering community make that
> mistake. This is done with set operations using Redis. Here's the link.
> 
> http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter
> 
> I'm still doing well for now and if not for this diagnosis I wouldn't
> know I was sick, And I want to get as much done in this window as
> possible. Since I live in Gilroy California I'm thinking I'd like to
> contact the spam filtering person at Google and let them continue to
> really develop what I started. So if someone could hook me up with the
> right person(s) there I would appreciate it. And I'm willing to work
> with anyone else that can make use of my work. (My way of cheating
> death.)
> 
> Below is a letter I wrote to EFF staff where I used to work. It
> summarizes my situation. I'm still doing well considering.
> 
> 
> Hi Cindy,
> 
> Hate to ruin your Monday morning but I have some bad news. I have stage
> 4 lung cancer and the odds are not with me. I'm slowly telling the
> world and realizing the the problem with having so many friends is that
> I'm making a lot of people very sad. And that is very difficult for me
> to do.
> 
> I'm dealing with it about as well as can be expected, maybe a little
> better than that. My needs are covered for now, but dealing with
> rolling out the information. Please pass this email on to the staff
> there. I'm somewhat concerned about getting too much response at once.
> There is no specific time frame for me yet but stage 4 lung is almost
> always fatal and it's more likely months and not years.
> 
> I have a lot of friends who are offering to take care of me. I have a
> paid for house, some savings, and I'm still doing well off my spam
> filtering business. I am going to be looking for someone to take over
> my small techno empire in the hopes of keeping my web sites and the
> people who I host for online. While I plan to put up a good fight if I
> get 2 years that would be considered a win. Taking over my empire would
> be a great opportunity for the right person and I need to find someone
> to do that. I am unfortunately really good at what I do and might be
> tricky getting someone to take that over.
> 
> I have lived a good life. I have done more than most people have done
> in 100 lifetimes. At the age of 60 I was already down to my last 1/4
> tank so if I don't get the last 20 years I really have little to
> complain about. At this point my goals are to upload what's left of me
> to the web, which is the afterlife in my world. I have to finish up
> certain philosophical projects with my Church of Reality, which,
> interestingly enough might lead to a solution for the control problem
> for Artificial Intelligence. (Something I need to finish writing up.)
> 
> Oddly enough the idea of being dead doesn't worry me. And that might be
> the denial speaking. However the process of getting there is going to
> be overwhelming. And it's been just a week since I found out. And I'm
> exploring the idea that there might even be an upside to being
> terminal. Maybe new opportunities will open up.
> 
> I do want to say that working at EFF was some of the best times of my
> life and I really appreciate having had that opportunity. The internet
> is the new nervous system of humanity and is therefore sacred space,
> not just in a religious sense, but in a Reality based sense. To protect
> it is to protect the essence of humanity itself. The Internet is our
> common mind and it is t

Re: I have some bad news

2016-08-25 Thread @lbutlr
On 15 Aug 2016, at 23:22, Marc Perkel  wrote:
> Well, this is kind of hard to say so just going to say it. I have stage 4 
> lung cancer and the probably spectrum is not good. I've been fighting spam 
> for the last 15 years and I'd like to keep fighting spam from the grave. So 
> I'm willing to share my technology with anyone interested.

I encourage you to concentrate of fighting cancer right now, and while the 
prognosis for stage-4 anything is not good, it is neither certain. It appears 
that attitude does help, so pump yourself up to beat it.



Re: I have some bad news

2016-08-25 Thread Ted Mittelstaedt



On 8/19/2016 3:34 AM, Ram wrote:



Marc thats too bad. But stage 4 lung cancer does not mean you have to
die of it.
And chill about spam. I know you have been great at contributions to
anti-spam ( and we all remember your distinct hate of SPF :-) ).
But antispam is just "commodity" technology.

Probably ML will take over antispam in the future and people would just
subscribe to some good ML antispam service. Running your own antispam is
too much of an attention grabbing task, and no one wants to put in so
much time today



You must not have checked prices on antispam services lately or prices 
on mailboxes.  Just about everyone out there in the web hosting biz 
provides 10-20 free emailboxes (they have to, otherwise small businesses 
would switch to a competitor) and every antispam service out
there charges at least a buck a month per box.  (they have to otherwise 
they would go out of business)


In this environment people have no choice but to run their own antispam.

But you are right in that nobody (including the people running it) wants 
to put in time to doing it.   Do you -want- to clean your toilet?  Do 
you -have-the-money- to pay someone else to do it?


We are all always looking for better toilet-cleaning brushes.  If Marc
has invented a better one, people will want it!   But they won't go buy
a $200 toilet cleaning brush when they can go to the grocery store and
buy a plastic one for $5 that will last 20 years.

Ted

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



Re: I have some bad news

2016-08-19 Thread Benny Pedersen

On 2016-08-19 12:34, Ram wrote:


And chill about spam. I know you have been great at contributions to
anti-spam ( and we all remember your distinct hate of SPF :-) ).
But antispam is just "commodity" technology.


sid-milter is at fault, i think more users now use pypolicyd-spf to get 
rid of sender-id :=)



Links:
--
[1]
http://www.localhost.localdomain/foo/?utm_source=All-emp&utm_medium=Email-Disclaimer&utm_campaign=Weekly-Webinar-2


is this link better then squid ?




Re: I have some bad news

2016-08-18 Thread Matus UHLAR - fantomas

On 17.08.16 11:02, Marc Perkel wrote:
For what it's worth I have noticed that people who are familiar with 
Bayesian filtering seem to have a mental block when it comes to 
understanding this. People who know nothing about bayesian get it 
instantly. Here's the actual formula.


card(Test_message intersect Spam diff Ham) minus card(Test_message intersect 
Ham diff Spam)


I guess it's because people who are familiar with bayesian filtering say
"this is the same as bayes, just tweaked"

while people who are not think it's really a new idea.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
We are but packets in the Internet of life (userfriendly.org)


Re: I have some bad news

2016-08-17 Thread Marc Perkel
For what it's worth I have noticed that people who are familiar with 
Bayesian filtering seem to have a mental block when it comes to 
understanding this. People who know nothing about bayesian get it 
instantly. Here's the actual formula.


card(Test_message intersect Spam diff Ham) minus card(Test_message intersect 
Ham diff Spam)



On 08/17/16 09:16, Shawn Bakhtiar wrote:


On Aug 17, 2016, at 3:43 AM, Matus UHLAR - fantomas 
mailto:uh...@fantomas.sk>> wrote:


On 16.08.16 20:06, Marc Perkel wrote:
What I'm doing is looking for fingerprints in email that intersect 
HAM and not in SPAM - which would be a HAM result.

If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.


so, if mail matches both hammy and spammy tokens (or token sets), you 
don't

classify at all?



I guess what is confusing me (and I imagine others, as alluded to by 
Matus) is the fact that you are describing a special condition 
of Bayes' probability theorem. You are testing two variables (match 
SPAM and match HAM) (not matching is simply the negation of matching) 
thus giving you four conditions:


1) SPAM  &&HAM
2) SPAM  &&~HAM
3) ~SPAM &&HAM
4) ~SPAM &&~HAM

Here is a great diagram to show the four probable conditions:
https://en.wikipedia.org/wiki/Bayes%27_theorem#/media/File:Bayes%27_Theorem_2D.svg

So (if I am correct) Matus is asking what if condition 1 is true? How 
are you classifying an email than? Which is often the state of most 
emails, and thus why the use of Naive Bayes spam filtering, which 
generates a probability based on Bayes' probability theorem and is the 
conventional methodology to date. A Rose by any other name


Condition 4 is obvious it's nothing you have ever seen so classifying 
it anything other than HAM would be a huge mistake (IMHO), and fully 
covered by the aforementioned theorem as the probability of SPAM would 
(should) be 0. Same with Condition 3, obviously it never hits SPAM so 
wether it matches HAM or not you're going to treat it as HAM anyway 
same as condition 4.


That leaves condition 2. Which (if I'm not mistaken) is "... it 
matches SPAM and does NOT match HAM - then it's SPAM.". Which brings 
us back to Matus question, what if the email contains a single HAM 
token? Two HAM tokens? This is exactly what Bayes' probability theorem 
is designed for. All you are doing is defining a special condition in 
which the HAM probability is ZERO.


I think that's were I need to understand a bit more about what HAM 
means in this solution, does getting a hit on HAM somehow negate it 
being SPAM completely? In other words if the email contains some set 
of tokens that are SPAM, yet only one HAM token, that single HAM token 
makes it not SPAM? If so, you have a long way to go in convincing me 
that this is a good solution.


So if I say to you, "Let's get some lunch" that's ham because 
spammers never say that, but normal people do. So the way to test 
what "spammers never say" is to store what they do say and see if 
it's NOT in the list. (Thus the infinite set)




Actually I get SPAM with that very set of tokes in it. If somehow the 
HAM rating of it overrides the SPAM, I don't believe it would have a 
desirable effect.


I get plenty of:

"
Hay Shawn,

Hope you have time to do some lunch, click on this link and check out 
my new pictures!


Wannabe Phisher
"

Based on your example there's plenty of HAM and SPAM tokens in there, 
"Click on this link" high probability of SPAM-e-ness, would it get 
HAMed based on "hope you have time to do lunch". Or am I missing 
something?



Similarly, there's only so many ways to misspell viagra, and good 
email wouldn't have it spelled wrong.


Does that make sense?




Again, what you are saying makes sense in that it is special condition 
of the probability theory, What does not make sense is why would you 
not simply use the probability theory, that already encompasses that 
condition?



--
Matus UHLAR - fantomas, uh...@fantomas.sk  
; http://www.fantomas.sk/

Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-17 Thread Marc Perkel



On 08/17/16 03:51, Antony Stone wrote:

On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote:


What I'm doing is looking for fingerprints in email that intersect HAM
and not in SPAM - which would be a HAM result.
If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

So if I say to you, "Let's get some lunch" that's ham because spammers
never say that, but normal people do. So the way to test what "spammers
never say" is to store what they do say and see if it's NOT in the list.
(Thus the infinite set)

What length are the tokens you store in the list?  Single words (so the above
lunch example would contain 4 tokens)?  Entire phrases (so the above would be
just 1 token)?  Also how do you deal with spam which contains random cuttings
from legitimate texts (generally along with a graphic attachment and/or a URL
to get aross the "real" message)?


I tokenize a lot of different things but the fingerprints are at most 3 
to 4 tokens long. If you go more then you get a database that's too big. 
And in the body I'm just looking at the first 50 words, and a "concept 
parser" that looks at the whole body.


http://wiki.junkemailfilter.com/index.php/Concept_Parsing_Spam_Filter




Similarly, there's only so many ways to misspell viagra, and good email
wouldn't have it spelled wrong.

Does this mean that people with bad spelling will more likely get classified as
spam, because they do not match the 'ham' group very well?
No - unless they misspell a lot of words the same way spammers misspell 
it. If a spammer isn't misspelling the same way and normal people are - 
it can count as ham - or be ignored.




Also, what happens to mail contains lots of tokens which match neither set
(for example, perfectly legitimate email which happens to be in a language the
system hasn't been trained with)?

Mail that doesn't match either side produces no score.




Antony.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-17 Thread Marc Perkel



On 08/17/16 03:43, Matus UHLAR - fantomas wrote:

On 16.08.16 20:06, Marc Perkel wrote:
What I'm doing is looking for fingerprints in email that intersect 
HAM and not in SPAM - which would be a HAM result.

If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.


so, if mail matches both hammy and spammy tokens (or token sets), you 
don't

classify at all?



On that fingerprint is it matches both it creates no score on that item. 
The idea is to generate a lot of fingerprints so that something scores. 
If you look at enough stuff to generate hundreds of fingerprints and you 
have big reference corpi then you will usually get a result on 
something. Usually a big result in one direction.


But ignoring if it's in both makes it more immune to poisoning.

--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-17 Thread shanew

I'm finding this discussion interesting, because I've been trying to
wrap my head around the theoretical basis of this system.  As such,
I've noticed that several questions have been asked now that are
explained in the document Marc initially pointed to
(http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter).
Given Marc's situation, it seems reasonable to read that document
before asking too many questions.

As a way to (maybe) save Marc some time, test my own knowledge and
perhaps help move the conversation forward, I'm going to summarize
the questions I've seen so far and, as much as possible, the answers
to those questions (and Marc, correct me if I'm getting anything wrong
here):

- How do you classify an email that has tokens from both the ham and
spam set?
Whichever set (out of "only found in ham" and "only found in spam") is
larger (or "better") determines the final classification.

- What length are the tokens?
Marc's examples use multiple length tokens, capturing everything
between 1 and 4 "words", but I suspect the exact maximum token
length might be adjustable.

- What happens when spammers use "hammy" text to avoid detection?
I don't see this directly addressed, but I would guess there are
several things that mitigate against this.  Multi-word tokens
prevent the truly random word salad attempts at poisoning, and
probably help with "cuttings" from other texts because the transition
from one cutting to the next probably doesn't appear in ham, leaving
the "spam-only" aspects of the mail to push it towards a spam
classification.  The unlearning and expiration of fingerprints would
mean that such cuttings would have to appear repeatedly over time in
legitimate mail to tip an email toward a ham classification.

- Will bad spellers (or typists) be seen as spammier?
Again, I don't see this addressed specifically, but I don't think so,
unless they are such tremendously bad spellers that nearly every word
is misspelled.  To take the "let's get some lunch" example, even if I
accidentally mis-type "some" as "som", I still have other tokens to
compare against, and the tokens "som", "get som", "som lunch", "let's
get som", etc. would have to have appeared in spam (and only spam) to
pull the classification toward spam.  So I'd say the occasional typo
or misspelling would come up neutral.

- What happens to messages that have a lot of neutral tokens?
Now I'm really speculating, but unless every token is neutral, there's
still something to decide on, though it does seem that detection
becomes less reliable as the number of non-neutral tokens appraches
zero.  A similar question that I thought of is what happens to
messages where the the final sets "only found in spam" and "only found
in ham" are nearly (or exactly) the same size.  If you're using this
filter as part of SA scoring, the answer would seem to be that you
have an appropriately small score for "undetermined" (like bogofilter
does), but if it's acting as a separate filter, I don't know.

On Wed, 17 Aug 2016, Antony Stone wrote:


On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote:


What I'm doing is looking for fingerprints in email that intersect HAM
and not in SPAM - which would be a HAM result.
If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

So if I say to you, "Let's get some lunch" that's ham because spammers
never say that, but normal people do. So the way to test what "spammers
never say" is to store what they do say and see if it's NOT in the list.
(Thus the infinite set)


What length are the tokens you store in the list?  Single words (so the above
lunch example would contain 4 tokens)?  Entire phrases (so the above would be
just 1 token)?  Also how do you deal with spam which contains random cuttings
from legitimate texts (generally along with a graphic attachment and/or a URL
to get aross the "real" message)?


Similarly, there's only so many ways to misspell viagra, and good email
wouldn't have it spelled wrong.


Does this mean that people with bad spelling will more likely get classified as
spam, because they do not match the 'ham' group very well?

Also, what happens to mail contains lots of tokens which match neither set
(for example, perfectly legitimate email which happens to be in a language the
system hasn't been trained with)?


Antony.




--
Public key #7BBC68D9 at| Shane Williams
http://pgp.mit.edu/|  System Admin - UT CompSci
=--+---
All syllogisms contain three lines |  sha...@shanew.net
Therefore this is not a syllogism  | www.ischool.utexas.edu/~shanew


Re: I have some bad news

2016-08-17 Thread Shawn Bakhtiar

On Aug 17, 2016, at 3:43 AM, Matus UHLAR - fantomas 
mailto:uh...@fantomas.sk>> wrote:

On 16.08.16 20:06, Marc Perkel wrote:
What I'm doing is looking for fingerprints in email that intersect HAM and not 
in SPAM - which would be a HAM result.
If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

so, if mail matches both hammy and spammy tokens (or token sets), you don't
classify at all?


I guess what is confusing me (and I imagine others, as alluded to by Matus) is 
the fact that you are describing a special condition of Bayes' probability 
theorem. You are testing two variables (match SPAM and match HAM) (not matching 
is simply the negation of matching) thus giving you four conditions:

1) SPAM  && HAM
2) SPAM   && ~HAM
3) ~SPAM && HAM
4) ~SPAM && ~HAM

Here is a great diagram to show the four probable conditions:
https://en.wikipedia.org/wiki/Bayes%27_theorem#/media/File:Bayes%27_Theorem_2D.svg

So (if I am correct) Matus is asking what if condition 1 is true? How are you 
classifying an email than? Which is often the state of most emails, and thus 
why the use of Naive Bayes spam filtering, which generates a probability based 
on Bayes' probability theorem and is the conventional methodology to date. A 
Rose by any other name

Condition 4 is obvious it's nothing you have ever seen so classifying it 
anything other than HAM would be a huge mistake (IMHO), and fully covered by 
the aforementioned theorem as the probability of SPAM would (should) be 0. Same 
with Condition 3, obviously it never hits SPAM so wether it matches HAM or not 
you're going to treat it as HAM anyway same as condition 4.

That leaves condition 2. Which (if I'm not mistaken) is "... it matches SPAM 
and does NOT match HAM - then it's SPAM.". Which brings us back to Matus 
question, what if the email contains a single HAM token? Two HAM tokens? This 
is exactly what Bayes' probability theorem is designed for. All you are doing 
is defining a special condition in which the HAM probability is ZERO.

I think that's were I need to understand a bit more about what HAM means in 
this solution, does getting a hit on HAM somehow negate it being SPAM 
completely? In other words if the email contains some set of tokens that are 
SPAM, yet only one HAM token, that single HAM token makes it not SPAM? If so, 
you have a long way to go in convincing me that this is a good solution.

So if I say to you, "Let's get some lunch" that's ham because spammers never 
say that, but normal people do. So the way to test what "spammers never say" is 
to store what they do say and see if it's NOT in the list. (Thus the infinite 
set)


Actually I get SPAM with that very set of tokes in it. If somehow the HAM 
rating of it overrides the SPAM, I don't believe it would have a desirable 
effect.

I get plenty of:

"
Hay Shawn,

Hope you have time to do some lunch, click on this link and check out my new 
pictures!

Wannabe Phisher
"

Based on your example there's plenty of HAM and SPAM tokens in there, "Click on 
this link" high probability of SPAM-e-ness, would it get HAMed based on "hope 
you have time to do lunch". Or am I missing something?


Similarly, there's only so many ways to misspell viagra, and good email 
wouldn't have it spelled wrong.

Does that make sense?


Again, what you are saying makes sense in that it is special condition of the 
probability theory, What does not make sense is why would you not simply use 
the probability theory, that already encompasses that condition?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; 
http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.



Re: I have some bad news

2016-08-17 Thread Antony Stone
On Wednesday 17 August 2016 at 05:06:50, Marc Perkel wrote:

> What I'm doing is looking for fingerprints in email that intersect HAM
> and not in SPAM - which would be a HAM result.
> If it matches SPAM and does NOT match HAM - then it's SPAM.
> 
> The magic is in the NOT matching on the other side.
> 
> So if I say to you, "Let's get some lunch" that's ham because spammers
> never say that, but normal people do. So the way to test what "spammers
> never say" is to store what they do say and see if it's NOT in the list.
> (Thus the infinite set)

What length are the tokens you store in the list?  Single words (so the above 
lunch example would contain 4 tokens)?  Entire phrases (so the above would be 
just 1 token)?  Also how do you deal with spam which contains random cuttings 
from legitimate texts (generally along with a graphic attachment and/or a URL 
to get aross the "real" message)?

> Similarly, there's only so many ways to misspell viagra, and good email
> wouldn't have it spelled wrong.

Does this mean that people with bad spelling will more likely get classified as 
spam, because they do not match the 'ham' group very well?

Also, what happens to mail contains lots of tokens which match neither set 
(for example, perfectly legitimate email which happens to be in a language the 
system hasn't been trained with)?


Antony.

-- 
Wanted: telepath.   You know where to apply.

   Please reply to the list;
 please *don't* CC me.


Re: I have some bad news

2016-08-17 Thread Matus UHLAR - fantomas

On 16.08.16 20:06, Marc Perkel wrote:
What I'm doing is looking for fingerprints in email that intersect 
HAM and not in SPAM - which would be a HAM result.

If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.


so, if mail matches both hammy and spammy tokens (or token sets), you don't
classify at all?

So if I say to you, "Let's get some lunch" that's ham because 
spammers never say that, but normal people do. So the way to test 
what "spammers never say" is to store what they do say and see if 
it's NOT in the list. (Thus the infinite set)


Similarly, there's only so many ways to misspell viagra, and good 
email wouldn't have it spelled wrong.


Does that make sense?


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.


Re: I have some bad news

2016-08-16 Thread Marc Perkel

Hi Shawn,

What I'm doing is looking for fingerprints in email that intersect HAM 
and not in SPAM - which would be a HAM result.

If it matches SPAM and does NOT match HAM - then it's SPAM.

The magic is in the NOT matching on the other side.

So if I say to you, "Let's get some lunch" that's ham because spammers 
never say that, but normal people do. So the way to test what "spammers 
never say" is to store what they do say and see if it's NOT in the list. 
(Thus the infinite set)


Similarly, there's only so many ways to misspell viagra, and good email 
wouldn't have it spelled wrong.


Does that make sense?


On 08/16/16 12:57, Shawn Bakhtiar wrote:

Marc,

Let me first say I am truly sorry to here about your cancer. I lost my 
father to cancer just over a decade ago, after a long battle with 
sarcoma of the throat and tongue. So I pray and wish you the best.


I sent this to you in January 2016 (don't recall if I ever got a reply 
to it) but based on your document:


/Set theory is not my strongest suit,  but your diagram looks incorrect:/
/http://www.junkemailfilter.com/patent/patent5.pdf/
/
/
/Let:/
/
/
/H be ham /
/S be spam /
/E be an email/
/
/
/Than you state that:/
/HE = (H u E)/
/SE = (S u E)/
/
/
/But than the next diagram shows that there is some solution in which 
(HE u SE) and thus there may be some set which is (HE / SE). Even 
though in the first diagram S and H do not intersect./

/
/
/This is not logical. Either (H u S) in which there are tokens common 
to the ham and spam token sets, or it does not, so which is it?? in 
other words, if a token is both ham and spam how are you calculating 
it’s weight?? Is it spam or ham? /

/
/
/Clearly it’s the latter (they do not intersect) as described in this:/
/http://www.junkemailfilter.com/patent/patent2.pdf/
/
/
/In which case you are simply looking to see if (H u E) > (S u E) and 
has nothing to do with what is not in the set, and there is indeed no 
(H u S) or the negation or NOT which is (H / S), so as everyone has 
been trying to explain it has NOTHING to do with what is NOT matched./

/
/
/By they way, you can’t match an infinite set (well theoretically but 
not actually). /
/https://en.wikipedia.org/wiki/Intersection_(set_theory)/ 


/
/
/Since the current Bayes learns both SPAM and HAM I imagine that it 
does a very similar thing, other than perhaps the larger multi word 
token sets, which seems a trivial thing to add, and available in other 
tool sets. /



I'll only add this, if you believe that your SPAM has been greatly 
reduced. That's awesome! But have you really isolated it to this "new 
technique" or in playing around have you inadvertently changed 
something else that may have changed your results?


I am also not saying that you have not developed some "new technique", 
but that if you have, your description of it does not line up 
logically with the technique itself. Back in January you were looking 
to patent it, today you simply want it to live on. I suggest that if 
it is indeed the latter, than perhaps it's time to release the source 
code/scripts and let a few more eyes look at the logic to see exactly 
what is it doing, that you believe is so different than what is out there.


Again, I pray and hope the best for you,
Shawn




On Aug 16, 2016, at 6:45 AM, Marc Perkel > wrote:


Thanks for the encouragement Ted. Unfortunately I know way too much 
about mathematics and I have a deep understanding of probability 
spectrums. There's a curve and I'm going to be somewhere on it. If 
I'm lucky I might be here for some time. But my life is a casino 
right now. And yes - there is also a probability spectrum for any of 
us getting hit by a bus tomorrow as well. SpamAssassin is based on 
statistical probabilities.


I have to have a dual track strategy. One one hand I need to do what 
I can to move the curve into the future. But at the same time I need 
to accomplish thing that are important within a limited time slot as 
well.


Spam filtering isn't just another job to me. I actually have a 
passion for it. On a philosophical basis I look at the internet as 
the new nervous system for humanity and is now core to who we are as 
a species. And email is a very key technology in that nervous system.


In that context spam is like poison where predators suck some of the 
life out of humanity, and my real life has always been about the 
progress of the human race.


I am somewhat of a spam fighting savant. I actually run very little 
of my email through SpamAssassin, truth be told. Over the years I've 
thrown some ideas into the mix and sometimes they have been adopted 
to make SA better. Sometimes I just get shouted down by trolls and 
the ideas go no where.


At this point however there's a deadline and I have ideas that could 
be implemented in SA very very easily. In fact it was through SA that 
I discovered Redis, and SA already talk

Re: I have some bad news

2016-08-16 Thread Marc Perkel



On 08/16/16 15:22, Ted Mittelstaedt wrote:


I read though the site, and here's why I probably couldn't implement it,
at least not as it stands now.

SpamAssassin basically depends on a diet of spam to feed the learner.
The learner learns what is spam.  If you add some ham into the learner
it works better - but the main thrust of it is feed me spam feed me spam.

Your method depends on a diet of -ham- not spam because you are doing 
the opposite of SA


My problem as an admin is this.  I can guarantee that when a customer
complains about a piece of junk, that what they give me is junk.

But customers don't complain about ham.  So I'm not going to see it.
And I cannot just iterate through all my customer mailboxes and
assume they are all full of ham, because some of my customers are
lazy and won't delete spam, or they don't read their mailbox for
months at a time, etc. etc.  I cannot guarantee I'll get only ham
by doing that - and so therfore I don't have a guaranteed source
of ham.

You said that your existing perl scripts are hacks and ugly.  But,
I'm wagering that most of your ugly programming is user interface
code that somehow coaxes your users to yield up a diet of ham.

My problem is there is a tremendous dearth of user interface code
out there to get EITHER spam or ham.

The closest I have ever found is the mailwatch interface but that is
god-awful complex.  I have it running on an ISP customer of mine's
mailserver but God what a hack.

Without that, all I can do is what I do now, which is make sure that
all customers accessing my server with IMAP have a junk mail folder and
know that if they drag spam into there that I'll suck it into the
learner.  Of course, POP3 clients have nothing and I cannot tell
some POP3 user "Oh if you really want to reduce your spam load then
give up your POP3 email client and use this slick webinterface I have 
setup for you to send and receive email."


I'm actually not as interested in your engine as I am in how you get
your customers to participate with it because if you have found a
way to get 'em to do it, that is truly revolutionary.

Mine would rather bitch and moan about spam and when they get it,
just delete it - which while it puts it in a deleted folder that I
can get at (if they are IMAP) it mixes it up with deleted ham, so
I cannot take that mess of mixed unidentified spam and ham and use it 
for anything.


Ted


Hi Ted,

My system depends on a stream of both ham and spam creating a ham corpus 
and a spam corpus. I already had many rules in place (Not SA) to 
identify ham. Actually all you need is my RBL 
hostkarma.junkemailfilter.com with result 127.0.0.1 and the FcRDNS is 
good - there's your ham stream.


SA has a mindset of detecting spam. You have to change that to detecting 
spam and ham. Once you have streams going into the learner then you can 
not only increase spam detection, but you can positively identify good 
email as good and have almost no false positives. Then the output with 
strong scores are fed back into the learner where it learns how people 
who send ham speak and people who send spam speak. And it's very very 
effective. and I'm just giving it away.


Thanks for looking at it though.



--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-16 Thread Ted Mittelstaedt



On 8/16/2016 6:45 AM, Marc Perkel wrote:

Thanks for the encouragement Ted. Unfortunately I know way too much
about mathematics and I have a deep understanding of probability
spectrums. There's a curve and I'm going to be somewhere on it. If I'm
lucky I might be here for some time. But my life is a casino right now.
And yes - there is also a probability spectrum for any of us getting hit
by a bus tomorrow as well. SpamAssassin is based on statistical
probabilities.

I have to have a dual track strategy. One one hand I need to do what I
can to move the curve into the future. But at the same time I need to
accomplish thing that are important within a limited time slot as well.

Spam filtering isn't just another job to me. I actually have a passion
for it. On a philosophical basis I look at the internet as the new
nervous system for humanity and is now core to who we are as a species.
And email is a very key technology in that nervous system.

In that context spam is like poison where predators suck some of the
life out of humanity, and my real life has always been about the
progress of the human race.



I think you already have found a way to fight your cancer. :-)


I am somewhat of a spam fighting savant. I actually run very little of
my email through SpamAssassin, truth be told. Over the years I've thrown
some ideas into the mix and sometimes they have been adopted to make SA
better. Sometimes I just get shouted down by trolls and the ideas go no
where.

At this point however there's a deadline and I have ideas that could be
implemented in SA very very easily. In fact it was through SA that I
discovered Redis, and SA already talks to redis.

Although my innovation is excellent as a programmer I'm mediocre. Never
worked as a team. Easily frustrated. Probably somewhat autistic and
somewhat arrogant. So mostly living in my own world doing my own
development. I have my little online empire. I work from home. I make a
great living. And I really like (most of) my customers and enjoy doing
tech support. And it's allowed me a lot of free time to do things that
I'm really interested in.

But my ideas are now my immortality, so I'm now releasing this to the
world. And mostly this simple AI method that SA could easily implement.

This new spam filtering trick is not only extremely effective, it's
extremely simple. I had it working in 2 days. The developers here could
probably implement it in 1 day. (At least the core functionality) And
with a team of better programmers probably do a better job and get a
even better result than I get. In fact you don't need or even want my
sloppy code (not in Perl). All you need is to read the description of
how it works and once you get it - coding it is trivial.

So - this is an opportunity to milk the mind of the dying spam savant.
It works, it's easy, and I'm just handing it to you all. There is no
reason I would be making this up. All you all need to do is accept this
gift.



I read though the site, and here's why I probably couldn't implement it,
at least not as it stands now.

SpamAssassin basically depends on a diet of spam to feed the learner.
The learner learns what is spam.  If you add some ham into the learner
it works better - but the main thrust of it is feed me spam feed me spam.

Your method depends on a diet of -ham- not spam because you are doing 
the opposite of SA


My problem as an admin is this.  I can guarantee that when a customer
complains about a piece of junk, that what they give me is junk.

But customers don't complain about ham.  So I'm not going to see it.
And I cannot just iterate through all my customer mailboxes and
assume they are all full of ham, because some of my customers are
lazy and won't delete spam, or they don't read their mailbox for
months at a time, etc. etc.  I cannot guarantee I'll get only ham
by doing that - and so therfore I don't have a guaranteed source
of ham.

You said that your existing perl scripts are hacks and ugly.  But,
I'm wagering that most of your ugly programming is user interface
code that somehow coaxes your users to yield up a diet of ham.

My problem is there is a tremendous dearth of user interface code
out there to get EITHER spam or ham.

The closest I have ever found is the mailwatch interface but that is
god-awful complex.  I have it running on an ISP customer of mine's
mailserver but God what a hack.

Without that, all I can do is what I do now, which is make sure that
all customers accessing my server with IMAP have a junk mail folder and
know that if they drag spam into there that I'll suck it into the
learner.  Of course, POP3 clients have nothing and I cannot tell
some POP3 user "Oh if you really want to reduce your spam load then
give up your POP3 email client and use this slick webinterface I have 
setup for you to send and receive email."


I'm actually not as interested in your engine as I am in how you get
your customers to participate with it because if you have found a
way to get 'em to do it,

Re: I have some bad news

2016-08-16 Thread Shawn Bakhtiar
Marc,

Let me first say I am truly sorry to here about your cancer. I lost my father 
to cancer just over a decade ago, after a long battle with sarcoma of the 
throat and tongue. So I pray and wish you the best.

I sent this to you in January 2016 (don't recall if I ever got a reply to it) 
but based on your document:

Set theory is not my strongest suit,  but your diagram looks incorrect:
http://www.junkemailfilter.com/patent/patent5.pdf

Let:

H be ham
S be spam
E be an email

Than you state that:
HE = (H u E)
SE = (S u E)

But than the next diagram shows that there is some solution in which (HE u SE) 
and thus there may be some set which is (HE / SE). Even though in the first 
diagram S and H do not intersect.

This is not logical. Either (H u S) in which there are tokens common to the ham 
and spam token sets, or it does not, so which is it?? in other words, if a 
token is both ham and spam how are you calculating it’s weight?? Is it spam or 
ham?

Clearly it’s the latter (they do not intersect) as described in this:
http://www.junkemailfilter.com/patent/patent2.pdf

In which case you are simply looking to see if (H u E) > (S u E) and has 
nothing to do with what is not in the set, and there is indeed no (H u S) or 
the negation or NOT which is (H / S), so as everyone has been trying to explain 
it has NOTHING to do with what is NOT matched.

By they way, you can’t match an infinite set (well theoretically but not 
actually).
https://en.wikipedia.org/wiki/Intersection_(set_theory)

Since the current Bayes learns both SPAM and HAM I imagine that it does a very 
similar thing, other than perhaps the larger multi word token sets, which seems 
a trivial thing to add, and available in other tool sets.


I'll only add this, if you believe that your SPAM has been greatly reduced. 
That's awesome! But have you really isolated it to this "new technique" or in 
playing around have you inadvertently changed something else that may have 
changed your results?

I am also not saying that you have not developed some "new technique", but that 
if you have, your description of it does not line up logically with the 
technique itself. Back in January you were looking to patent it, today you 
simply want it to live on. I suggest that if it is indeed the latter, than 
perhaps it's time to release the source code/scripts and let a few more eyes 
look at the logic to see exactly what is it doing, that you believe is so 
different than what is out there.

Again, I pray and hope the best for you,
Shawn




On Aug 16, 2016, at 6:45 AM, Marc Perkel 
mailto:supp...@junkemailfilter.com>> wrote:

Thanks for the encouragement Ted. Unfortunately I know way too much about 
mathematics and I have a deep understanding of probability spectrums. There's a 
curve and I'm going to be somewhere on it. If I'm lucky I might be here for 
some time. But my life is a casino right now. And yes - there is also a 
probability spectrum for any of us getting hit by a bus tomorrow as well. 
SpamAssassin is based on statistical probabilities.

I have to have a dual track strategy. One one hand I need to do what I can to 
move the curve into the future. But at the same time I need to accomplish thing 
that are important within a limited time slot as well.

Spam filtering isn't just another job to me. I actually have a passion for it. 
On a philosophical basis I look at the internet as the new nervous system for 
humanity and is now core to who we are as a species. And email is a very key 
technology in that nervous system.

In that context spam is like poison where predators suck some of the life out 
of humanity, and my real life has always been about the progress of the human 
race.

I am somewhat of a spam fighting savant. I actually run very little of my email 
through SpamAssassin, truth be told. Over the years I've thrown some ideas into 
the mix and sometimes they have been adopted to make SA better. Sometimes I 
just get shouted down by trolls and the ideas go no where.

At this point however there's a deadline and I have ideas that could be 
implemented in SA very very easily. In fact it was through SA that I discovered 
Redis, and SA already talks to redis.

Although my innovation is excellent as a programmer I'm mediocre. Never worked 
as a team. Easily frustrated. Probably somewhat autistic and somewhat arrogant. 
So mostly living in my own world doing my own development. I have my little 
online empire. I work from home. I make a great living. And I really like (most 
of) my customers and enjoy doing tech support. And it's allowed me a lot of 
free time to do things that I'm really interested in.

But my ideas are now my immortality, so I'm now releasing this to the world. 
And mostly this simple AI method that SA could easily implement.

This new spam filtering trick is not only extremely effective, it's extremely 
simple. I had it working in 2 days. The developers here could probably 
implement it in 1 day. (At least the core functionali

Re: I have some bad news

2016-08-16 Thread Marc Perkel
Thanks for the encouragement Ted. Unfortunately I know way too much 
about mathematics and I have a deep understanding of probability 
spectrums. There's a curve and I'm going to be somewhere on it. If I'm 
lucky I might be here for some time. But my life is a casino right now. 
And yes - there is also a probability spectrum for any of us getting hit 
by a bus tomorrow as well. SpamAssassin is based on statistical 
probabilities.


I have to have a dual track strategy. One one hand I need to do what I 
can to move the curve into the future. But at the same time I need to 
accomplish thing that are important within a limited time slot as well.


Spam filtering isn't just another job to me. I actually have a passion 
for it. On a philosophical basis I look at the internet as the new 
nervous system for humanity and is now core to who we are as a species. 
And email is a very key technology in that nervous system.


In that context spam is like poison where predators suck some of the 
life out of humanity, and my real life has always been about the 
progress of the human race.


I am somewhat of a spam fighting savant. I actually run very little of 
my email through SpamAssassin, truth be told. Over the years I've thrown 
some ideas into the mix and sometimes they have been adopted to make SA 
better. Sometimes I just get shouted down by trolls and the ideas go no 
where.


At this point however there's a deadline and I have ideas that could be 
implemented in SA very very easily. In fact it was through SA that I 
discovered Redis, and SA already talks to redis.


Although my innovation is excellent as a programmer I'm mediocre. Never 
worked as a team. Easily frustrated. Probably somewhat autistic and 
somewhat arrogant. So mostly living in my own world doing my own 
development. I have my little online empire. I work from home. I make a 
great living. And I really like (most of) my customers and enjoy doing 
tech support. And it's allowed me a lot of free time to do things that 
I'm really interested in.


But my ideas are now my immortality, so I'm now releasing this to the 
world. And mostly this simple AI method that SA could easily implement.


This new spam filtering trick is not only extremely effective, it's 
extremely simple. I had it working in 2 days. The developers here could 
probably implement it in 1 day. (At least the core functionality) And 
with a team of better programmers probably do a better job and get a 
even better result than I get. In fact you don't need or even want my 
sloppy code (not in Perl). All you need is to read the description of 
how it works and once you get it - coding it is trivial.


So - this is an opportunity to milk the mind of the dying spam savant. 
It works, it's easy, and I'm just handing it to you all. There is no 
reason I would be making this up. All you all need to do is accept this 
gift.



On 08/16/16 01:03, Ted Mittelstaedt wrote:

Hi Marc,

  Back in 1994 I was diagnosed with testicular cancer, it was 
essentially "stage 4" as it had metastasized throughout my body.


  But, it responded to chemo and here I am today.  In fact ironically
my original oncologist died a few years ago - on a fishing trip he had
an accident and drowned.

  The Universe has an interesting sense of humor and likes to throw
curve balls.  Take what you have been told about your "probability
spectrum" and toss it in the trash - hakuna matata.   You could 
accidentally step in front of a bus tomorrow and be dead.   You could

live another 20 years.   Statistics on people only have meaning on
large groups of people - they are irrelevant when it comes to the
individual.

  I've met a number of people who had serious cancers.  And I learned
one thing from that.   The people who survived - every one of them,
fighters.  And everyone fights differently.  Some get on the food 
bandwagon and try overdosing on green tea and every alleged 
anti-cancer food out there.  Others jump into yoga, and I knew one guy 
who went out and binged watched Monty Python to spend as much time 
laughing as possible.  Me, I fought on a more mental approach.  I 
dropped everything in my life that I was not completely satisfied with 
- I turned my back on my job, my apartment, etc. - every burden or 
responsibility that I had which I didn't like and didn't really want - 
and dove into the treatment, and I never let myself believe I was in 
any danger of dying.


  Of course, not all who fight, survive.  But I will say with absolute
conviction that everyone I ever met who had a serious cancer and had
that "attitude of acceptance", later died.  You are a fighter or you
wouldn't even be here.  Now, fight to win.

Ted




--
Marc Perkel - Sales/Support
supp...@junkemailfilter.com
http://www.junkemailfilter.com
Junk Email Filter dot com
415-992-3400



Re: I have some bad news

2016-08-16 Thread Ted Mittelstaedt

Hi Marc,

  Back in 1994 I was diagnosed with testicular cancer, it was 
essentially "stage 4" as it had metastasized throughout my body.


  But, it responded to chemo and here I am today.  In fact ironically
my original oncologist died a few years ago - on a fishing trip he had
an accident and drowned.

  The Universe has an interesting sense of humor and likes to throw
curve balls.  Take what you have been told about your "probability
spectrum" and toss it in the trash - hakuna matata.   You could 
accidentally step in front of a bus tomorrow and be dead.   You could

live another 20 years.   Statistics on people only have meaning on
large groups of people - they are irrelevant when it comes to the
individual.

  I've met a number of people who had serious cancers.  And I learned
one thing from that.   The people who survived - every one of them,
fighters.  And everyone fights differently.  Some get on the food 
bandwagon and try overdosing on green tea and every alleged anti-cancer 
food out there.  Others jump into yoga, and I knew one guy who went out 
and binged watched Monty Python to spend as much time laughing as 
possible.  Me, I fought on a more mental approach.  I dropped everything 
in my life that I was not completely satisfied with - I turned my back 
on my job, my apartment, etc. - every burden or responsibility that I 
had which I didn't like and didn't really want - and dove into the 
treatment, and I never let myself believe I was in any danger of dying.


  Of course, not all who fight, survive.  But I will say with absolute
conviction that everyone I ever met who had a serious cancer and had
that "attitude of acceptance", later died.  You are a fighter or you
wouldn't even be here.  Now, fight to win.

Ted

On 8/15/2016 10:22 PM, Marc Perkel wrote:

Well, this is kind of hard to say so just going to say it. I have stage
4 lung cancer and the probably spectrum is not good. I've been fighting
spam for the last 15 years and I'd like to keep fighting spam from the
grave. So I'm willing to share my technology with anyone interested.

Several months ago I talked about a new trick I came up with to fight
spam and also positively identify good email as good. I've been running
it now for 7 months and it is a breakthrough. At the time I had intended
to patent it just to get enough protection to license it to the big
boys, but now it is unlikely I'll be around long enough for that. I have
however noticed that because of my condition people are paying attention
to me more now that there's a deadline.

Here's my spam filtering trick. It's something that can be easily
integrated into SpamAssassin. Being that my programming is somewhat
sloppy at times it can probably be done even better than what I did. The
thing to keep in mind when reading this is that it's not bayesian
filtering. Many people in the spam filtering community make that
mistake. This is done with set operations using Redis. Here's the link.

http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter

I'm still doing well for now and if not for this diagnosis I wouldn't
know I was sick, And I want to get as much done in this window as
possible. Since I live in Gilroy California I'm thinking I'd like to
contact the spam filtering person at Google and let them continue to
really develop what I started. So if someone could hook me up with the
right person(s) there I would appreciate it. And I'm willing to work
with anyone else that can make use of my work. (My way of cheating death.)

Below is a letter I wrote to EFF staff where I used to work. It
summarizes my situation. I'm still doing well considering.


Hi Cindy,

Hate to ruin your Monday morning but I have some bad news. I have stage
4 lung cancer and the odds are not with me. I'm slowly telling the world
and realizing the the problem with having so many friends is that I'm
making a lot of people very sad. And that is very difficult for me to do.

I'm dealing with it about as well as can be expected, maybe a little
better than that. My needs are covered for now, but dealing with rolling
out the information. Please pass this email on to the staff there. I'm
somewhat concerned about getting too much response at once. There is no
specific time frame for me yet but stage 4 lung is almost always fatal
and it's more likely months and not years.

I have a lot of friends who are offering to take care of me. I have a
paid for house, some savings, and I'm still doing well off my spam
filtering business. I am going to be looking for someone to take over my
small techno empire in the hopes of keeping my web sites and the people
who I host for online. While I plan to put up a good fight if I get 2
years that would be considered a win. Taking over my empire would be a
great opportunity for the right person and I need to find so

I have some bad news

2016-08-15 Thread Marc Perkel
Well, this is kind of hard to say so just going to say it. I have stage 
4 lung cancer and the probably spectrum is not good. I've been fighting 
spam for the last 15 years and I'd like to keep fighting spam from the 
grave. So I'm willing to share my technology with anyone interested.


Several months ago I talked about a new trick I came up with to fight 
spam and also positively identify good email as good. I've been running 
it now for 7 months and it is a breakthrough. At the time I had intended 
to patent it just to get enough protection to license it to the big 
boys, but now it is unlikely I'll be around long enough for that. I have 
however noticed that because of my condition people are paying attention 
to me more now that there's a deadline.


Here's my spam filtering trick. It's something that can be easily 
integrated into SpamAssassin. Being that my programming is somewhat 
sloppy at times it can probably be done even better than what I did. The 
thing to keep in mind when reading this is that it's not bayesian 
filtering. Many people in the spam filtering community make that 
mistake. This is done with set operations using Redis. Here's the link.


http://wiki.junkemailfilter.com/index.php/The_Evolution_Spam_Filter

I'm still doing well for now and if not for this diagnosis I wouldn't 
know I was sick, And I want to get as much done in this window as 
possible. Since I live in Gilroy California I'm thinking I'd like to 
contact the spam filtering person at Google and let them continue to 
really develop what I started. So if someone could hook me up with the 
right person(s) there I would appreciate it. And I'm willing to work 
with anyone else that can make use of my work. (My way of cheating death.)


Below is a letter I wrote to EFF staff where I used to work. It 
summarizes my situation. I'm still doing well considering.



Hi Cindy,

Hate to ruin your Monday morning but I have some bad news. I have stage 
4 lung cancer and the odds are not with me. I'm slowly telling the world 
and realizing the the problem with having so many friends is that I'm 
making a lot of people very sad. And that is very difficult for me to do.


I'm dealing with it about as well as can be expected, maybe a little 
better than that. My needs are covered for now, but dealing with rolling 
out the information. Please pass this email on to the staff there. I'm 
somewhat concerned about getting too much response at once. There is no 
specific time frame for me yet but stage 4 lung is almost always fatal 
and it's more likely months and not years.


I have a lot of friends who are offering to take care of me. I have a 
paid for house, some savings, and I'm still doing well off my spam 
filtering business. I am going to be looking for someone to take over my 
small techno empire in the hopes of keeping my web sites and the people 
who I host for online. While I plan to put up a good fight if I get 2 
years that would be considered a win. Taking over my empire would be a 
great opportunity for the right person and I need to find someone to do 
that. I am unfortunately really good at what I do and might be tricky 
getting someone to take that over.


I have lived a good life. I have done more than most people have done in 
100 lifetimes. At the age of 60 I was already down to my last 1/4 tank 
so if I don't get the last 20 years I really have little to complain 
about. At this point my goals are to upload what's left of me to the 
web, which is the afterlife in my world. I have to finish up certain 
philosophical projects with my Church of Reality, which, interestingly 
enough might lead to a solution for the control problem for Artificial 
Intelligence. (Something I need to finish writing up.)


Oddly enough the idea of being dead doesn't worry me. And that might be 
the denial speaking. However the process of getting there is going to be 
overwhelming. And it's been just a week since I found out. And I'm 
exploring the idea that there might even be an upside to being terminal. 
Maybe new opportunities will open up.


I do want to say that working at EFF was some of the best times of my 
life and I really appreciate having had that opportunity. The internet 
is the new nervous system of humanity and is therefore sacred space, not 
just in a religious sense, but in a Reality based sense. To protect it 
is to protect the essence of humanity itself. The Internet is our common 
mind and it is the core of who we are as a human species. (Note to legal 
team, I think there is a legal argument opportunity in this statement.)


A person's story is everything they do from the moment they are born to 
the moment they die. And then your story is the effect you had on 
advancing the evolution of life from what we were, to what we are, to 
what we will become. So my story will become part of