Re: Hash functions

2010-12-22 Thread Boris Zbarsky

On 12/20/10 9:49 PM, Glenn Maynard wrote:

Pushing gigabytes through it is plausible for most of these uses.


OK.  So we're talking multiple seconds even with a C hash 
implementation, right?  I'm looking at 5 seconds per gigabyte here for 
the command-line utility I see.


I guess at that point even a 2-3x slowdown over C is pretty noticeable.  :(

-Boris



Re: Hash functions

2010-12-22 Thread Charles Pritchard

On 12/21/2010 11:58 AM, Tab Atkins Jr. wrote:

On Mon, Dec 20, 2010 at 5:49 PM, Boris Zbarskybzbar...@mit.edu  wrote:

On 12/20/10 7:42 PM, Glenn Maynard wrote:

Has a hash functions API been considered, so browsers can expose, for
example, a native SHA-1 implementation?  Doing this in JS is possible,
but painfully slow, even with current JS implementations.

Before we go further into this, can we quantify painfully slow so we have
some idea of the magnitude of the problem?

Using the testcase at https://bugzilla.mozilla.org/attachment.cgi?id=487844
but modifying the input string to be 768 chars, I see a current js
implementation on modern laptop hardware take around 7 seconds to hash it
50,000 times.

I get similar times for an MD5 implementation I found and chopped down.


It'd be nice to see an SHA1 JS test case setup for performance testing 
using recent APIs like ArrayBuffer.


These kinds of self-contained functions are low hanging fruit for 
compiler optimization.
use strict, Typed Arrays and Object.seal give JS compilers a 
reasonable chance

to hit the performance of C compilers.

I've not been able to find a 'public domain' JS SHA1 implementation on 
the web.


If you need a quick hash function, there are faster ones than SHA1.


I still think it may be useful for the security use-case as well,
where you explicitly want a slow hash to begin with.  If JS imposes a
slowdown on top of that, it could render a good hash too slow to
actually use in practice.  Plus, you have to depend on the hash
implementation you pulled off the web or hacked together yourself,
which you probably didn't manually verify before starting to use.

I'd only expect vendors to implement sha1 and md5.

NaCl-LLVM makes sense if you're looking at just porting existing C over 
to the web.

Or the Mozilla ChromeWorker proposal.

Current JS code, BSD licensed SHA1 and MD5 scripts have been through 
extensive open source review.

Sure, a lot of open source code floats around without due review.
They're no different in credibility than BSD licensed code in C.


-Charles




Re: Hash functions

2010-12-22 Thread Glenn Maynard
On Wed, Dec 22, 2010 at 4:39 PM, Charles Pritchard ch...@visc.us wrote:
 It'd be nice to see an SHA1 JS test case setup for performance testing using
 recent APIs like ArrayBuffer.

 These kinds of self-contained functions are low hanging fruit for compiler
 optimization.
 use strict, Typed Arrays and Object.seal give JS compilers a reasonable
 chance
 to hit the performance of C compilers.

I'm skeptical, but that's from past history with high-level languages,
and the JS optimizations lately have surprised everyone, so of course
it'd be great if this was the case.

 I've not been able to find a 'public domain' JS SHA1 implementation on the
 web.
 Current JS code, BSD licensed SHA1 and MD5 scripts have been through 
 extensive open source review.

Were you looking for public domain code rather than BSD-licensed code
for some reason?

 If you need a quick hash function, there are faster ones than SHA1.

That's usually not a very appealing workaround.

 I still think it may be useful for the security use-case as well,
 where you explicitly want a slow hash to begin with.  If JS imposes a
 slowdown on top of that, it could render a good hash too slow to
 actually use in practice.  Plus, you have to depend on the hash
 implementation you pulled off the web or hacked together yourself,
 which you probably didn't manually verify before starting to use.

 I'd only expect vendors to implement sha1 and md5.

I hope they'd implement the other SHA hashes with a little nudging.
SHA-2 code is just as readily available, and I'd hate for Javascript
to be one more excuse not to upgrade a hash.

-- 
Glenn Maynard



Re: Hash functions

2010-12-21 Thread Toni Ruottu
I would love even a painfully slow implementation provided by the
browser. I have encountered lots of cases where being able to talk a
protocol requires computing a sha1 or an md5 hash. Speed has never
been the problem for me, but external javascript library dependencies
are painful to maintain.

  --Toni

On Tue, Dec 21, 2010 at 2:42 AM, Glenn Maynard gl...@zewt.org wrote:
 Has a hash functions API been considered, so browsers can expose, for
 example, a native SHA-1 implementation?  Doing this in JS is possible,
 but painfully slow, even with current JS implementations.

 Some fairly obvious use cases:

  - Avoid uploading a file to the server if it already has a copy.  For
 example, if you attach a large file to an email, and you already have
 a copy of that file in your mailbox attached to another mail, don't
 upload the whole file; just send a reference the existing one.
  - Resumable file uploads.  An implementation of a chunked, resumable
 uploader will want to validate that the file the user is sending is
 actually what's been received by the server so far, and roll back the
 transfer partially or completely if they're out of sync.
  - Local file validation and updating.  A web-based game may want to
 save large blocks of resources locally, rather than depending on HTTP
 caching to do it, which is inappropriate for a game with several
 hundred megabytes or more of resources.  Native hashing would help
 automatic updating of data.

 If there's a more appropriate place for this, let me know.

 --
 Glenn Maynard





Re: Hash functions

2010-12-21 Thread Tab Atkins Jr.
On Mon, Dec 20, 2010 at 5:49 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 On 12/20/10 7:42 PM, Glenn Maynard wrote:

 Has a hash functions API been considered, so browsers can expose, for
 example, a native SHA-1 implementation?  Doing this in JS is possible,
 but painfully slow, even with current JS implementations.

 Before we go further into this, can we quantify painfully slow so we have
 some idea of the magnitude of the problem?

 Using the testcase at https://bugzilla.mozilla.org/attachment.cgi?id=487844
 but modifying the input string to be 768 chars, I see a current js
 implementation on modern laptop hardware take around 7 seconds to hash it
 50,000 times.

I get similar times for an MD5 implementation I found and chopped down.


 So I guess the question is how much data we want to be pushing through the
 hash function and what throughput we expect, and whether we think JS engines
 simply won't get there or will take too long to do so.

Notice that all three of the OP's use-cases were based on checksumming
files.  I don't know how reading in a Blob and then hashing it would
compare to just hashing an equivalent string, but I suspect it would
have a decent perf hit.  This seems like a pretty useful ability in
general, enough so that it's probably worthwhile to build it in
directly as a Blob.checksum() function or something.

I still think it may be useful for the security use-case as well,
where you explicitly want a slow hash to begin with.  If JS imposes a
slowdown on top of that, it could render a good hash too slow to
actually use in practice.  Plus, you have to depend on the hash
implementation you pulled off the web or hacked together yourself,
which you probably didn't manually verify before starting to use.

~TJ



Re: Hash functions

2010-12-21 Thread Glenn Maynard
On Tue, Dec 21, 2010 at 2:58 PM, Tab Atkins Jr. jackalm...@gmail.com wrote:
 Notice that all three of the OP's use-cases were based on checksumming
 files.  I don't know how reading in a Blob and then hashing it would
 compare to just hashing an equivalent string, but I suspect it would
 have a decent perf hit.  This seems like a pretty useful ability in
 general, enough so that it's probably worthwhile to build it in
 directly as a Blob.checksum() function or something.

I wouldn't expect a big performance hit from that.  Start reading the
next block (say, 1MB at a time) asynchronously, run the SHA1 for the
previous block while that's running, and repeat.  (I'd do the whole
thing in a worker thread, so the synchronous SHA1 call itself wouldn't
bog down the UI.)  I'm not up on the internals of the file API
implementations, but it seems like that should be pretty efficient.

(Even the async requests should be unnecessary, since the OS should be
doing read-ahead anyway.)

 I still think it may be useful for the security use-case as well,
 where you explicitly want a slow hash to begin with.  If JS imposes a
 slowdown on top of that, it could render a good hash too slow to
 actually use in practice.  Plus, you have to depend on the hash
 implementation you pulled off the web or hacked together yourself,
 which you probably didn't manually verify before starting to use.

I had to think pretty hard to come up with a use case for something
like bcrypt in Javascript, and the only one I've come up with seems
somewhat flimsy:

Sites could avoid ever sending passwords from the client to the
server.  Instead of sending a password directly, the server sends the
salt for the user's password (or a random salt for a new password);
the client sends the password hashed with that salt.  This means that
sniffing the traffic--even on the server itself where you might in
underneith HTTPS--couldn't record a large log of user passwords to be
used elsewhere; the server would never know the password to begin
with, not even when the user first sets his password.

The flimsy part is it depends on the Javascript doing the hashing
being secure; if the server is compromised, the script itself could
just be replaced.  That's still potentially a win, because that's
harder to hide--the backdoored authentication code now has to stash
the unencrypted password in the request somewhere, rather than being
done in the server where it's completely invisible to users.

Did you have any particular uses in mind?

-- 
Glenn Maynard



Re: Hash functions

2010-12-20 Thread Boris Zbarsky

On 12/20/10 7:42 PM, Glenn Maynard wrote:

Has a hash functions API been considered, so browsers can expose, for
example, a native SHA-1 implementation?  Doing this in JS is possible,
but painfully slow, even with current JS implementations.


Before we go further into this, can we quantify painfully slow so we 
have some idea of the magnitude of the problem?


Using the testcase at 
https://bugzilla.mozilla.org/attachment.cgi?id=487844 but modifying the 
input string to be 768 chars, I see a current js implementation on 
modern laptop hardware take around 7 seconds to hash it 50,000 times.


Or if I modify it to only calculate one hash, but have that be the hash 
of a 3,840,000 character string, I get times around 400ms.


Running a command-line shasum utility on the same 3,840,000 characters 
(as ASCII in a file, etc) on the same hardware seems to be about 8x 
faster than that for me (50ms or so).


I have not looked at the actual JS SHA-1 implemementation involved here, 
by the way; chances are it's far from optimal for a JS 
implementation  Chances also are it will get faster as efforts like 
Crankshaft, Mozilla's type inference, etc get deployed.


And of course for a different hashing algorithm all bets would be off in 
terms of js perf; it might be more or less js-amenable


So I guess the question is how much data we want to be pushing through 
the hash function and what throughput we expect, and whether we think JS 
engines simply won't get there or will take too long to do so.


-Boris



Re: Hash functions

2010-12-20 Thread Glenn Maynard
On Mon, Dec 20, 2010 at 8:49 PM, Boris Zbarsky bzbar...@mit.edu wrote:
 Or if I modify it to only calculate one hash, but have that be the hash of a
 3,840,000 character string, I get times around 400ms.

 Running a command-line shasum utility on the same 3,840,000 characters (as
 ASCII in a file, etc) on the same hardware seems to be about 8x faster than
 that for me (50ms or so).

I see similar magnitudes comparing FF4b7 vs. sha1sum(1).

 So I guess the question is how much data we want to be pushing through the
 hash function and what throughput we expect, and whether we think JS engines
 simply won't get there or will take too long to do so.

Pushing gigabytes through it is plausible for most of these uses.
Even something as simple as a web-based sha1sum application would have
saved me and a user time recently while validating a large download.

-- 
Glenn Maynard