Re: Logging of Web Usage

2003-04-05 Thread Bill Stewart
At 11:32 AM 04/03/2003 -0800, Bill Frantz wrote:
Ah yes, I haven't updated my timings for the new machines that are faster
than my 550Mhz.  :-)
The only other item is importance is that the exhaustive search time isn't
the time to reverse one IP, but the time to reverse all the IPs that have
been recorded.
Also, until recently, there was the problem that storing a hash value
for every IP address took 8-10 bytes * 2**32, and the resulting 32-40GB
was an annoyingly large storage quantity, requiring a deck of Exabyte tapes
or corporate-budget quantities of disk drive, which also meant that
sorting the results was also awkward.  These days, disk drive prices
are $1/GB at Fry's for 3.5" IDE drives, so there's no reason not to have
120GB on your desk top.
This does mean that if you're keeping hashed logs you should probably
use some sort of keyed hash - even if you don't change the keys often,
you've at least prevented pre-computed dictionary attacks over the
entire IPv4 address space, and the key should be long enough (e.g. 128 bit)
so that dictionary attacks on the "IP addresses of Usual Suspects"
also can't be precomputed.
A related question is keeping lists of public information,
e.g. don't-spam lists, in some form that isn't readily abusable,
such as hashed addresses.  The possible namespace there is much larger,
but the actual namespace isn't likely to be more than a couple of billion,
in spite of the number of spammers selling their lists of 9 billion names.
There's the question of how exact a match do you need -
if mail is for [EMAIL PROTECTED], you'd ideally like to be able to check
[EMAIL PROTECTED], [EMAIL PROTECTED], and @example.com,
which makes the lookup process more complex.
-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Logging of Web Usage

2003-04-04 Thread Aaron D. Gifford
So instead of one-way-hashing just the IP, hash the IP and a temporary 
throw-away secret that gets cycled at some regular interval (daily, 
weekly, monthly).  Yes, this means that the logged IPs are still 
decypherable by anyone with access to that secret, but anyone with 
access to the machine in question, the software, etc. already has the 
ability to create a covert unhashed log.  Just be sure you safely cycle 
the secret (i.e. generate it from a secure random source, store it only 
in memory or securely on the file system, don't back it up or copy it 
anywhere else, and the when you discard it, make sure the memory is 
overwritten and/or the file system safely overwritten so that it cannot 
be recovered).

One of the problems is that cycling the secret means you can't do the 
blind log statitistics gathering across secret changes that you were 
keeping the logs around for in the first place.  So you'd have to choose 
a cycling interval to balance your statistical or other log analysis 
needs against IP blinding requirements.

This does defeat some of the usefulness of the idea in the first place, 
but hey, as has been shown, just hashing the IP isn't such a good idea.

Aaron out.



-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Logging of Web Usage

2003-04-04 Thread Ben Laurie
Bill Frantz wrote:
At 6:16 PM -0800 4/2/03, Seth David Schoen wrote:

Bill Frantz writes:


The http://cryptome.org/usage-logs.htm URL says:


Low resolution data in most cases is intended to be sufficient for
marketing analyses.  It may take the form of IP addresses that have been
subjected to a one way hash, to refer URLs that exclude information other
than the high level domain, or temporary cookies.
Note that since IPv4 addresses are 32 bits, anyone willing to dedicate a
computer for a few hours can reverse a one way hash by exhaustive search.
Truncating IPs seems a much more privacy friendly approach.
This problem would be less acute with IPv6 addresses.
I'm skeptical that it will even take "a few hours"; on a 1.5 GHz
desktop machine, using "openssl speed", I see about a million hash
operations per second.  (It depends slightly on which hash you choose.)
This is without compiling OpenSSL with processor-specific optimizations.


Ah yes, I haven't updated my timings for the new machines that are faster
than my 550Mhz.  :-)
The only other item is importance is that the exhaustive search time isn't
the time to reverse one IP, but the time to reverse all the IPs that have
been recorded.
You only need to build the dictionary once.

Cheers,

Ben.

--
http://www.apache-ssl.org/ben.html   http://www.thebunker.net/
"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff
-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Logging of Web Usage

2003-04-03 Thread Roop Mukherjee
Could this not use most of the code from the Onion Router itself. I am 
assuming that the code was made freely available and someone has a copy if 
it?

-- roop

On Thu, 3 Apr 2003, Ben Laurie wrote:
> Ben.
> 
> [1] FWIW, I'd be willing to work on that, but not on my own (unless 
> someone wants to keep me in the style to which I am accustomed, that is).
> 
> 


-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Logging of Web Usage

2003-04-03 Thread Bill Frantz
At 6:16 PM -0800 4/2/03, Seth David Schoen wrote:
>Bill Frantz writes:
>
>> The http://cryptome.org/usage-logs.htm URL says:
>>
>> >Low resolution data in most cases is intended to be sufficient for
>> >marketing analyses.  It may take the form of IP addresses that have been
>> >subjected to a one way hash, to refer URLs that exclude information other
>> >than the high level domain, or temporary cookies.
>>
>> Note that since IPv4 addresses are 32 bits, anyone willing to dedicate a
>> computer for a few hours can reverse a one way hash by exhaustive search.
>> Truncating IPs seems a much more privacy friendly approach.
>>
>> This problem would be less acute with IPv6 addresses.
>
>I'm skeptical that it will even take "a few hours"; on a 1.5 GHz
>desktop machine, using "openssl speed", I see about a million hash
>operations per second.  (It depends slightly on which hash you choose.)
>This is without compiling OpenSSL with processor-specific optimizations.

Ah yes, I haven't updated my timings for the new machines that are faster
than my 550Mhz.  :-)

The only other item is importance is that the exhaustive search time isn't
the time to reverse one IP, but the time to reverse all the IPs that have
been recorded.

Cheers - Bill


-
Bill Frantz   | Due process for all| Periwinkle -- Consulting
(408)356-8506 | used to be the | 16345 Englewood Ave.
[EMAIL PROTECTED] | American way.  | Los Gatos, CA 95032, USA



-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Logging of Web Usage

2003-04-03 Thread Ben Laurie
John Young wrote:
Ben,

Would you care to comment for publication on web logging 
described in these two files:

  http://cryptome.org/no-logs.htm

  http://cryptome.org/usage-logs.htm

Cryptome invites comments from others who know the capabilities 
of servers to log or not, and other means for protecting user privacy 
by users themselves rather than by reliance upon privacy policies 
of site operators and government regulation.

This relates to the data retention debate and current initiatives 
of law enforcement to subpoena, surveil, steal and manipulate
log data.
I don't have time right now to comment in detail (I will try to later), 
but it seems to me that, as someone else commented, relying on operators 
to not keep logs is really not the way to go. If you want privacy or 
anonymity, then you have to create it for yourself, not expect others to 
provide it for you.

Of course, it is possible to reduce your exposure to others whilst still 
taking advantage of privacy-enhancing services they offer. Two obvious 
examples of this are the mixmaster anonymous remailer network, and onion 
routing.

It seems to me if you want to make serious inroads into privacy w.r.t. 
logging of traffic, then what you want to put your energy into is onion 
routing. There is _still_ no deployable free software to do it, and that 
is ridiculous[1]. It seems to me that this is the single biggest win we 
can have against all sorts of privacy invasions.

Make log retention useless for any purpose other than statistics and 
maintenance. Don't try to make it only used for those purposes.

Cheers,

Ben.

[1] FWIW, I'd be willing to work on that, but not on my own (unless 
someone wants to keep me in the style to which I am accustomed, that is).

--
http://www.apache-ssl.org/ben.html   http://www.thebunker.net/
"There is no limit to what a man can do or how far he can go if he
doesn't mind who gets the credit." - Robert Woodruff
-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Logging of Web Usage

2003-04-03 Thread Seth David Schoen
Bill Frantz writes:

> The http://cryptome.org/usage-logs.htm URL says:
> 
> >Low resolution data in most cases is intended to be sufficient for
> >marketing analyses.  It may take the form of IP addresses that have been
> >subjected to a one way hash, to refer URLs that exclude information other
> >than the high level domain, or temporary cookies.
> 
> Note that since IPv4 addresses are 32 bits, anyone willing to dedicate a
> computer for a few hours can reverse a one way hash by exhaustive search.
> Truncating IPs seems a much more privacy friendly approach.
> 
> This problem would be less acute with IPv6 addresses.

I'm skeptical that it will even take "a few hours"; on a 1.5 GHz
desktop machine, using "openssl speed", I see about a million hash
operations per second.  (It depends slightly on which hash you choose.)
This is without compiling OpenSSL with processor-specific optimizations.

That would imply a mean time to reverse the hash of about 2100 seconds,
which we could probably improve with processor-specific optimizations
or by buying a more recent machine.  What's more, we can exclude from our
search parts of the IP address space which haven't been allocated, and
optimize the search by beginning with IP networks which are more
likely to be the source of hits based on prior statistical evidence.  Even
without _any_ of these improvements, it's just about 35 minutes on average.

I used to advocate one-way hashing for logs, but a 35-minute search on
an ordinary desktop PC is not much obstacle.  It might still be
helpful if you used a keyed hash and then threw away the key after a
short time period (perhaps every 6 hours).  Then you can't identify or
link visitors across 6-hour periods.  If the key is very long,
reversing the hash could become very hard.

The logging problem will depend on what server operators are trying to
accomplish.  Some people just want to try to count unique visitors;
strangely enough, they might get more privacy-protective (and comparably
precise) results by issuing short-lived cookies.

-- 
Seth David Schoen <[EMAIL PROTECTED]> | Very frankly, I am opposed to people
 http://www.loyalty.org/~schoen/   | being programmed by others.
 http://vitanuova.loyalty.org/ | -- Fred Rogers (1928-2003),
   |464 U.S. 417, 445 (1984)

-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]


Re: Logging of Web Usage

2003-04-02 Thread Bill Frantz
At 2:58 PM -0800 4/2/03, John Young wrote:
>Ben,
>
>Would you care to comment for publication on web logging
>described in these two files:
>
>  http://cryptome.org/no-logs.htm
>
>  http://cryptome.org/usage-logs.htm
>
>Cryptome invites comments from others who know the capabilities
>of servers to log or not, and other means for protecting user privacy
>by users themselves rather than by reliance upon privacy policies
>of site operators and government regulation.
>
>This relates to the data retention debate and current initiatives
>of law enforcement to subpoena, surveil, steal and manipulate
>log data.
>
>Thanks,
>
>John

The http://cryptome.org/usage-logs.htm URL says:

>Low resolution data in most cases is intended to be sufficient for
>marketing analyses.  It may take the form of IP addresses that have been
>subjected to a one way hash, to refer URLs that exclude information other
>than the high level domain, or temporary cookies.

Note that since IPv4 addresses are 32 bits, anyone willing to dedicate a
computer for a few hours can reverse a one way hash by exhaustive search.
Truncating IPs seems a much more privacy friendly approach.

This problem would be less acute with IPv6 addresses.

Cheers - Bill


-
Bill Frantz   | Due process for all| Periwinkle -- Consulting
(408)356-8506 | used to be the | 16345 Englewood Ave.
[EMAIL PROTECTED] | American way.  | Los Gatos, CA 95032, USA



-
The Cryptography Mailing List
Unsubscribe by sending "unsubscribe cryptography" to [EMAIL PROTECTED]