Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-30 Thread Alexey Proskuryakov

 24 нояб. 2014 г., в 1:28, Antti Koivisto koivi...@iki.fi написал(а):
 
 I don't think this is really 32bit vs 64bit platform issue. The vast majority 
 of 64bit systems our users have (that is iOS devices) can't use memory 
 buffers sized anywhere near the 32bit limit even in theory. Also when using 
 Vector auto-grow capabilities (which is really the point of using a vector 
 instead of just allocating a buffer) you need way more memory than the actual 
 data size. Growing a Vectoruint8_t beyond 4GB has peak allocation of 9GB.

The argument that we should support the same functionality across all devices 
is a pretty strong one. However it's not as simple as it may sound.

1. The user impact is different. I haven't seen any reports of people trying 
this on iOS devices, so the importance of fixing the problem on iOS is lower 
than the importance of fixing it on OS X, where people actually do encounter it 
in practice.

2. Relative cost of supporting large files on memory constrained devices may be 
different (e.g. using off_t as Maciej proposed might be too big of a 
performance hit). When we know it, we can decide whether the ability to upload 
large files is worth the cost.

3. The scope of work required to make this work is likely different. I suspect 
that uploading such huge files from iOS Safari is currently essentially 
impossible for other reasons (we can discuss it in more detail offline).

Also, the argument against huge Vectors is a straw man one. What I'm saying is 
we have a lot of code that already needs to deal with large sizes, and this 
code is everywhere. It's not like we have a large data world and a small data 
world inside WebKit, and can perform magnitude checks at the boundary. It's 
quite the opposite, everything works together, and the practical solution is to 
have checks isolated inside Vector.

 Are there any examples of Vectors in the current code base where we would 
 usefully fix an actual problem even in medium term by switching to 64bit 
 storage sizes? I don't think they exists. Cases like file uploads should 
 stream the data or use some mapped-file backed buffer type that is not a 
 Vector.

With modern APIs that are now exposed to JS code, file uploads are no longer an 
isolated code path. The Blob is sliced, partially read into memory and 
processed. It is almost certain that YouTube and Google Drive don't attempt to 
read huge slices, so 32-bit lengths in Vectors and even ArrayBuffers will 
probably work in practice. But it's not appropriate for all users of Vector to 
perform the checks, because failure to do them right will have serious 
consequences.

- Alexey

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-24 Thread Antti Koivisto
I don't think this is really 32bit vs 64bit platform issue. The vast
majority of 64bit systems our users have (that is iOS devices) can't use
memory buffers sized anywhere near the 32bit limit even in theory. Also
when using Vector auto-grow capabilities (which is really the point of
using a vector instead of just allocating a buffer) you need way more
memory than the actual data size. Growing a Vectoruint8_t beyond 4GB has
peak allocation of 9GB.

Are there any examples of Vectors in the current code base where we would
usefully fix an actual problem even in medium term by switching to 64bit
storage sizes? I don't think they exists. Cases like file uploads should
stream the data or use some mapped-file backed buffer type that is not a
Vector.

With this in mind I think the right direction is to make the Vector API
match the implementation and just start using unsigned indexes everywhere.


   antti

On Fri, Nov 21, 2014 at 2:59 AM, Maciej Stachowiak m...@apple.com wrote:


 On Nov 20, 2014, at 4:51 PM, Maciej Stachowiak m...@apple.com wrote:


 On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov a...@webkit.org wrote:


 19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org написал(а):

 These and related uses are all over the place - see also Vectors
 in FormDataBuilder, data returned from
 FrameLoader::loadResourceSynchronously, plug-in code that loads from
 network, SharedBuffer etc.


 Another way to say this is: we need to deal with large data arrays
 throughout loading code. We do not really need full size vectors in most
 other code - it's sort of OK for HTML parser or for image decoder to fail
 catastrophically when there is too much data fed to it.

 This is somewhat questionable design, but if we are going to stick to it,
 then magnitude checks should be centralized, not sprinkled throughout the
 code. We should not make this check explicitly when feeding a network
 resource to the parser, for example.

 A 64-bit API for Vector solves this nearly flawlessly. We do not perform
 the checks manually every time we use a Vector, Vector does them for us.

 Other options are:

 - uint64_t everywhere. This way, we'll solve practical problems with large
 resources once and for all. Also, this may prove to be necessary to solve
 even YouTube/Google Drive uploads, I do not know that yet.

 - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones
 will still be limited. I like this option, because it won't make us pay the
 memory and performance cost on old crippled 32-bit machines, which are
 unlikely to be used for manipulating huge volumes of data anyway.


 We probably want YouTube upload of large files to work on 32-bit machines.
 Though presumably if you want to upload a file larger than the virtual
 address space, you need to represent it in some way other than a Vector.


 Thinking about this more - I think file sizes should probably be an off_t,
 not a size_t, so on 32-bit platforms some care must still be taken in the
 conversion between file sizes and vector sizes.

 In general, I like the idea of Vector having a size_t API and a choice of
 smaller or larger internal size. We might also want to change other
 containers with size-related interfaces to match.

 Regards,
 Maciej



 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Carlos Garcia Campos
El mié, 19-11-2014 a las 12:20 -0800, Chris Dumez escribió:
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity
 internally to save memory on 64-bit, causing a mismatch between the
 API and the internal representation [3]
 - Some reviewers have asked me to use unsigned for loop counters
 iterating over vectors (which looks unsafe as the Vector API, e.g.
 size(), returns a size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should
 do it

I started this [1] some time ago, copy pasting what Darin told me in a
review. Whatever we decide could be documented there.

 However, Alexey recently raised concerns about this change. it doesn't
 strike him as the right direction. 4Gb is not much, and we should
 have more of WebKit work with the right data types, not less.”.
 I did not initially realize that this change was controversial, but
 now that it seems it is, I thought I would raise the question on
 webkit-dev to see what people think about this.
 
 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 
 [1] http://trac.webkit.org/changeset/176275
 [2] http://trac.webkit.org/changeset/176293
 [3] http://trac.webkit.org/changeset/148891

[1] http://trac.webkit.org/wiki/TypesForSize



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Alexey Proskuryakov

19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org написал(а):

 These and related uses are all over the place - see also Vectors in 
 FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, 
 plug-in code that loads from network, SharedBuffer etc.

Another way to say this is: we need to deal with large data arrays throughout 
loading code. We do not really need full size vectors in most other code - it's 
sort of OK for HTML parser or for image decoder to fail catastrophically when 
there is too much data fed to it.

This is somewhat questionable design, but if we are going to stick to it, then 
magnitude checks should be centralized, not sprinkled throughout the code. We 
should not make this check explicitly when feeding a network resource to the 
parser, for example.

A 64-bit API for Vector solves this nearly flawlessly. We do not perform the 
checks manually every time we use a Vector, Vector does them for us.

Other options are:

- uint64_t everywhere. This way, we'll solve practical problems with large 
resources once and for all. Also, this may prove to be necessary to solve even 
YouTube/Google Drive uploads, I do not know that yet.

- size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will 
still be limited. I like this option, because it won't make us pay the memory 
and performance cost on old crippled 32-bit machines, which are unlikely to be 
used for manipulating huge volumes of data anyway.

We'll also need to develop some magic for 53-bit JS bindings, which I'm not 
sure about.

- Alexey

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Alexey Proskuryakov

20 нояб. 2014 г., в 10:45, Filip Pizlo fpi...@apple.com написал(а):

 - uint64_t everywhere. This way, we'll solve practical problems with large 
 resources once and for all. Also, this may prove to be necessary to solve 
 even YouTube/Google Drive uploads, I do not know that yet.
 
 How does this solve the problem of 4GB data on 32-bit systems? 

OK, that was not very thoughtful of me indeed. This option is not a good one.

 Are you saying that because the code that measures file sizes uses a 64-bit 
 type then therefore the code that measures memory object sizes should also 
 use that same type?

I'm not; it seems practical enough to isolate code that deals with local files, 
so they don't need to affect the design.

- Alexey

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Geoffrey Garen
Yes, good point. I should have been more specific. I believe this math might 
overflow and then truncate:

templatetypename T, unsigned inlineCapacity, typename OverflowHandler
bool VectorT, inlineCapacity, OverflowHandler::tryExpandCapacity(unsigned 
newMinCapacity)
{
return tryReserveCapacity(std::max(newMinCapacity, std::max(16u, 
static_castunsigned(capacity() + capacity() / 4 + 1;
}

Geoff

 On Nov 19, 2014, at 2:04 PM, Chris Dumez cdu...@apple.com wrote:
 
 Hi,
 
 I believe the Vector class is already checking for overflows. I looked 
 quickly and it seems that when trying to allocate / reallocate the Vector 
 buffer, we will call CRASH() if:
 (newCapacity  std::numeric_limitsunsigned::max() / sizeof(T))
 
 Kr, 
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 
 
 On Nov 19, 2014, at 1:57 PM, Geoffrey Garen gga...@apple.com 
 mailto:gga...@apple.com wrote:
 
 I don’t haver a specific opinion on size_t vs unsigned, but I have a related 
 point:
 
 If Vector uses unsigned, then it must RELEASE_ASSERT that it does not 
 overflow unsigned when growing. 
 
 Even if most normal content allocates  4GB, exploit content surely will try 
 to allocate  4GB.
 
 Chris, maybe you can make this change in trunk, since trunk uses unsigned?
 
 Geoff
 
 On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 That makes good sense for internal implementation, do you think that class 
 API should also use a custom type, or should it use size_t?
 
 With Vector though, I don't know how we would differentiate code paths that 
 need large allocations from ones that don't. Nearly anything that is 
 exposed as a JS API or deals with external world can hit sizes over 4Gb. 
 That's not out of reach in most scenarios, not even for resources loaded 
 from network.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 ArrayBuffers are very special because they are part of ECMAScript.
 
 We use unsigned for the length, because once upon a time, that would have 
 been the right type; these days the right type would be a 53-bit integer.  
 So, size_t would be wrong.  I believe that ArrayBuffers should be changed 
 to use a very special size type; probably it wouldn't even be a primitive 
 type but a class that carefully checked that you never overflowed 53 bits.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 This is not exactly about Vector, but if one uses 
 FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
 overflows ArrayBuffer. WebKit actually crashes when uploading 
 multi-gigabyte files to YouTube, Google Drive and other similar services, 
 although I haven't checked whether it's because of ArrayBuffers, or 
 because of a use of int/unsigned in another code path.
 
 I think that we should use size_t everywhere except for perhaps a few key 
 places where memory impact is critical (and of course we need explicit 
 checks when casting down to an unsigned). Or perhaps the rule can be even 
 simpler, and unsigned may never be used for indices and sizes, period.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 Whatever we do, the clients of Vector should be consistent about what 
 type they use.  I'm actually fine with Vector internally using unsigned 
 even if the API uses size_t, but right now we have lots of code that 
 uses a mix of size_t and unsigned when indexing into Vectors.  That's 
 confusing.
 
 If I picked one type to use for all Vector indices, it would be unsigned 
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
 unless you do Vectorchar.  Usually we have Vectors of pointer-width 
 things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  
 Even in a world where we had more than 32GB of memory to use within a 
 single web process, I would hope that we wouldn't use it all on a single 
 Vector and that if we did, we would treat that one specially for a bunch 
 of other sensible reasons (among them being that allocating a contiguous 
 slab of virtual memory of that size is rather taxing).  So, size_t would 
 buy us almost nothing since if we had a vector grow large enough to need 
 it, we would be having a bad time already.
 
 I wonder: are there cases that anyone remembers where we have tried to 
 use Vector to store more than UINT_MAX things?  Another interesting 
 question is: What's the largest number of things that we store into any 
 Vector?  We could use such a metric to project how big Vectors might get 
 in the future.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com 
 mailto:cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types 
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for 

Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Filip Pizlo
I think I'm starting to see what you're saying, so let me restate it my way to 
be sure:

Code that deals with various loaded blobs must be resilient against them 
getting larger than a 32-bit size can handle and this is the subject of ongoing 
development effort.  Such code may or may not use a Vector.  It definitely must 
use size_t for all of its sizes.  The trouble with Vector using unsigned in its 
API then causes the following troubles:

If the code uses a Vector for the large data: in addition to not being able to 
actually hold all of the data, there will be a risk of truncation from size_t 
to unsigned on the client side of the Vector, which is much harder to prevent 
than a truncation inside Vector itself.  For such code the ideal would be a 
Vector that measures itself using size_t, both in its internal data structure 
and in its API.

If the code does not use Vector for that large data but uses it nonetheless for 
other things: things won't obviously go wrong but you end up in sketchy 
territory: you will have code that is juggling unsigned's for the Vector and 
size_t's for the large data.  Any arithmetic between the two can easily become 
suspect.  Consider a case where you might use a metric of size or offset in the 
large data in some arithmetic that flows into Vector.  If Vector's indexing 
operations use unsigned, then you will have truncation.

These problems fall into two categories:

1) Actually not being able to store the large data but failing gracefully when 
it happens.

2) Truncation leading to some kind of undesirable behavior.

Someone mentioned something about warnings that can tell us about (2).  
Apparently we cannot turn the warning on right now.  The hope is that if we 
convert everything to unsigned then we won't have the warnings in trunk and 
then we can turn the warning on, and catch such bugs.  This seems very 
aspirational.

Also, it means that everyone hacking on code that mixes sizes of files with 
sizes of vectors will have to scratch their heads about which type to use, 
and then if they do it wrong they will get some warnings.  Of course, the 
warnings can easily be silenced with a cast.  That's super dangerous.  In my 
experience, if the easiest way to make the compiler STFU is by doing the wrong 
thing (static_cast, in this case) then people will do the wrong thing quite 
frequently.

(A particularly funny example of this is checked exceptions in Java.  If you 
call a method declared to throw an exception then the caller must either catch 
it or declare that they throw it, too.  The easiest way to silence the error is 
to catch it with an empty catch block that ignores the exception, since that 
requires the fewest keystrokes.  Someone once did a study of large Java 
codebases and found such empty catch blocks to be a chronic problem - people 
did it a lot and it was responsible for bad bugs.  The lesson: a type error is 
only a good thing if the path of least resistance to making the compiler STFU 
involves actually improving the code.)

An alternate approach is to agree to use size_t for sizes.  This is an easy to 
explain rule and it should be easy for reviewers to catch it.  It might also 
make that compiler warning more meaningful, in the sense that in a world where 
we are mixing size_t and unsigned we will *have* to have some casts between the 
two but in a world where we use size_t uniformly any such cast is an immediate 
red flag.

So, I think I'm starting to agree with the size_t approach.

I wonder what the downsides are to this approach.  Footprint of Vector?

-Filip


 On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov a...@webkit.org wrote:
 
 
 19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org написал(а):
 
 These and related uses are all over the place - see also Vectors in 
 FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, 
 plug-in code that loads from network, SharedBuffer etc.
 
 Another way to say this is: we need to deal with large data arrays throughout 
 loading code. We do not really need full size vectors in most other code - 
 it's sort of OK for HTML parser or for image decoder to fail catastrophically 
 when there is too much data fed to it.
 
 This is somewhat questionable design, but if we are going to stick to it, 
 then magnitude checks should be centralized, not sprinkled throughout the 
 code. We should not make this check explicitly when feeding a network 
 resource to the parser, for example.
 
 A 64-bit API for Vector solves this nearly flawlessly. We do not perform the 
 checks manually every time we use a Vector, Vector does them for us.
 
 Other options are:
 
 - uint64_t everywhere. This way, we'll solve practical problems with large 
 resources once and for all. Also, this may prove to be necessary to solve 
 even YouTube/Google Drive uploads, I do not know that yet.
 
 - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will 
 still be 

Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Geoffrey Garen
 I wonder what the downsides are to this approach.  Footprint of Vector?

It looks like the original change was motivated by shrinking Vector:

https://bugs.webkit.org/show_bug.cgi?id=97268 
https://bugs.webkit.org/show_bug.cgi?id=97268

Sadly, it didn’t include any data on the observed benefit :(.

Geoff___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Filip Pizlo
If there was a benefit, then the following approach would make sense to me:

- Have both Vector and SmallVector.  SmallVector will use unsigned internally.

- Use SmallVector in the places where sizeof(Vector) matters.  This would take 
some effort but I think that we have the tools that would be needed to be 
systematic about it.

- Both Vector and SmallVector will have an API based on size_t.  SmallVector 
will do truncation checks in all of the places.

-Filip


 On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com wrote:
 
 I wonder what the downsides are to this approach.  Footprint of Vector?
 
 It looks like the original change was motivated by shrinking Vector:
 
   https://bugs.webkit.org/show_bug.cgi?id=97268 
 https://bugs.webkit.org/show_bug.cgi?id=97268
 
 Sadly, it didn’t include any data on the observed benefit :(.
 
 Geoff

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Chris Dumez
The corresponding Blink bug did contain some performance data:
CrBug#229226 https://code.google.com/p/chromium/issues/detail?id=229226

Kr,
--
Chris Dumez - Apple Inc.
Cupertino, CA




 On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com wrote:
 
 I wonder what the downsides are to this approach.  Footprint of Vector?
 
 It looks like the original change was motivated by shrinking Vector:
 
   https://bugs.webkit.org/show_bug.cgi?id=97268 
 https://bugs.webkit.org/show_bug.cgi?id=97268
 
 Sadly, it didn’t include any data on the observed benefit :(.
 
 Geoff
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Filip Pizlo
That looks like a pretty good performance win.

I'd advocate for SmallVector and Vector, then.

-Filip


 On Nov 20, 2014, at 11:38 AM, Chris Dumez cdu...@apple.com wrote:
 
 The corresponding Blink bug did contain some performance data:
 CrBug#229226 https://code.google.com/p/chromium/issues/detail?id=229226
 
 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 
 
 On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com 
 mailto:gga...@apple.com wrote:
 
 I wonder what the downsides are to this approach.  Footprint of Vector?
 
 It looks like the original change was motivated by shrinking Vector:
 
  https://bugs.webkit.org/show_bug.cgi?id=97268 
 https://bugs.webkit.org/show_bug.cgi?id=97268
 
 Sadly, it didn’t include any data on the observed benefit :(.
 
 Geoff
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev
 

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Antti Koivisto
Or Vector and ByteBuffer.

All these cases requiring very large buffers seem to be about untyped data.


   antti

On Thu, Nov 20, 2014 at 9:40 PM, Filip Pizlo fpi...@apple.com wrote:

 That looks like a pretty good performance win.

 I'd advocate for SmallVector and Vector, then.

 -Filip


 On Nov 20, 2014, at 11:38 AM, Chris Dumez cdu...@apple.com wrote:

 The corresponding Blink bug did contain some performance data:
 CrBug#229226 https://code.google.com/p/chromium/issues/detail?id=229226

 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA




 On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com wrote:

 I wonder what the downsides are to this approach.  Footprint of Vector?


 It looks like the original change was motivated by shrinking Vector:

 https://bugs.webkit.org/show_bug.cgi?id=97268

 Sadly, it didn’t include any data on the observed benefit :(.

 Geoff
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev




 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Filip Pizlo

 On Nov 20, 2014, at 2:06 PM, Alexey Proskuryakov a...@webkit.org wrote:
 
 
 SmallVector and Vector seem reasonable to me. I think that this is the right 
 nomenclature - using limited size vectors should be a conscious choice.
 
 Even SmallVector should probably have a size_t API, so that we could 
 centralize magnitude checks. What do you think?

Agreed.

This avoids any risk of someone computing a size_t quantity and flowing it into 
the SmallVector API in a way that would result in truncation that SmallVector 
cannot internally catch.

-Filip


 
 - Alexey
 
 
 20 нояб. 2014 г., в 11:40, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 That looks like a pretty good performance win.
 
 I'd advocate for SmallVector and Vector, then.
 
 -Filip
 
 
 On Nov 20, 2014, at 11:38 AM, Chris Dumez cdu...@apple.com 
 mailto:cdu...@apple.com wrote:
 
 The corresponding Blink bug did contain some performance data:
 CrBug#229226 https://code.google.com/p/chromium/issues/detail?id=229226
 
 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 
 
 On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com 
 mailto:gga...@apple.com wrote:
 
 I wonder what the downsides are to this approach.  Footprint of Vector?
 
 It looks like the original change was motivated by shrinking Vector:
 
https://bugs.webkit.org/show_bug.cgi?id=97268 
 https://bugs.webkit.org/show_bug.cgi?id=97268
 
 Sadly, it didn’t include any data on the observed benefit :(.
 
 Geoff
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev 
 https://lists.webkit.org/mailman/listinfo/webkit-dev
 
 
 
 

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Maciej Stachowiak

 On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov a...@webkit.org wrote:
 
 
 19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org написал(а):
 
 These and related uses are all over the place - see also Vectors in 
 FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, 
 plug-in code that loads from network, SharedBuffer etc.
 
 Another way to say this is: we need to deal with large data arrays throughout 
 loading code. We do not really need full size vectors in most other code - 
 it's sort of OK for HTML parser or for image decoder to fail catastrophically 
 when there is too much data fed to it.
 
 This is somewhat questionable design, but if we are going to stick to it, 
 then magnitude checks should be centralized, not sprinkled throughout the 
 code. We should not make this check explicitly when feeding a network 
 resource to the parser, for example.
 
 A 64-bit API for Vector solves this nearly flawlessly. We do not perform the 
 checks manually every time we use a Vector, Vector does them for us.
 
 Other options are:
 
 - uint64_t everywhere. This way, we'll solve practical problems with large 
 resources once and for all. Also, this may prove to be necessary to solve 
 even YouTube/Google Drive uploads, I do not know that yet.
 
 - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will 
 still be limited. I like this option, because it won't make us pay the memory 
 and performance cost on old crippled 32-bit machines, which are unlikely to 
 be used for manipulating huge volumes of data anyway.

We probably want YouTube upload of large files to work on 32-bit machines. 
Though presumably if you want to upload a file larger than the virtual address 
space, you need to represent it in some way other than a Vector.

 - Maciej


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-20 Thread Maciej Stachowiak

 On Nov 20, 2014, at 4:51 PM, Maciej Stachowiak m...@apple.com wrote:
 
 
 On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org написал(а):
 
 These and related uses are all over the place - see also Vectors in 
 FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, 
 plug-in code that loads from network, SharedBuffer etc.
 
 Another way to say this is: we need to deal with large data arrays 
 throughout loading code. We do not really need full size vectors in most 
 other code - it's sort of OK for HTML parser or for image decoder to fail 
 catastrophically when there is too much data fed to it.
 
 This is somewhat questionable design, but if we are going to stick to it, 
 then magnitude checks should be centralized, not sprinkled throughout the 
 code. We should not make this check explicitly when feeding a network 
 resource to the parser, for example.
 
 A 64-bit API for Vector solves this nearly flawlessly. We do not perform the 
 checks manually every time we use a Vector, Vector does them for us.
 
 Other options are:
 
 - uint64_t everywhere. This way, we'll solve practical problems with large 
 resources once and for all. Also, this may prove to be necessary to solve 
 even YouTube/Google Drive uploads, I do not know that yet.
 
 - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will 
 still be limited. I like this option, because it won't make us pay the 
 memory and performance cost on old crippled 32-bit machines, which are 
 unlikely to be used for manipulating huge volumes of data anyway.
 
 We probably want YouTube upload of large files to work on 32-bit machines. 
 Though presumably if you want to upload a file larger than the virtual 
 address space, you need to represent it in some way other than a Vector.

Thinking about this more - I think file sizes should probably be an off_t, not 
a size_t, so on 32-bit platforms some care must still be taken in the 
conversion between file sizes and vector sizes.

In general, I like the idea of Vector having a size_t API and a choice of 
smaller or larger internal size. We might also want to change other containers 
with size-related interfaces to match.

Regards,
Maciej


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Filip Pizlo
Whatever we do, the clients of Vector should be consistent about what type they 
use.  I'm actually fine with Vector internally using unsigned even if the API 
uses size_t, but right now we have lots of code that uses a mix of size_t and 
unsigned when indexing into Vectors.  That's confusing.

If I picked one type to use for all Vector indices, it would be unsigned rather 
than size_t.  Vector being limited to unsigned doesn't imply 4GB unless you do 
Vectorchar.  Usually we have Vectors of pointer-width things, which means 
32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  Even in a world where we 
had more than 32GB of memory to use within a single web process, I would hope 
that we wouldn't use it all on a single Vector and that if we did, we would 
treat that one specially for a bunch of other sensible reasons (among them 
being that allocating a contiguous slab of virtual memory of that size is 
rather taxing).  So, size_t would buy us almost nothing since if we had a 
vector grow large enough to need it, we would be having a bad time already.

I wonder: are there cases that anyone remembers where we have tried to use 
Vector to store more than UINT_MAX things?  Another interesting question is: 
What's the largest number of things that we store into any Vector?  We could 
use such a metric to project how big Vectors might get in the future.

-Filip


 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types instead 
 of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity internally to 
 save memory on 64-bit, causing a mismatch between the API and the internal 
 representation [3]
 - Some reviewers have asked me to use unsigned for loop counters iterating 
 over vectors (which looks unsafe as the Vector API, e.g. size(), returns a 
 size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other 
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should do it
 
 However, Alexey recently raised concerns about this change. it doesn't 
 strike him as the right direction. 4Gb is not much, and we should have more 
 of WebKit work with the right data types, not less.”.
 I did not initially realize that this change was controversial, but now that 
 it seems it is, I thought I would raise the question on webkit-dev to see 
 what people think about this.
 
 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 [1] http://trac.webkit.org/changeset/176275 
 http://trac.webkit.org/changeset/176275
 [2] http://trac.webkit.org/changeset/176293 
 http://trac.webkit.org/changeset/176293
 [3] http://trac.webkit.org/changeset/148891 
 http://trac.webkit.org/changeset/148891
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Alexey Proskuryakov

This is not exactly about Vector, but if one uses 
FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte 
files to YouTube, Google Drive and other similar services, although I haven't 
checked whether it's because of ArrayBuffers, or because of a use of 
int/unsigned in another code path.

I think that we should use size_t everywhere except for perhaps a few key 
places where memory impact is critical (and of course we need explicit checks 
when casting down to an unsigned). Or perhaps the rule can be even simpler, and 
unsigned may never be used for indices and sizes, period.

- Alexey


19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com написал(а):

 Whatever we do, the clients of Vector should be consistent about what type 
 they use.  I'm actually fine with Vector internally using unsigned even if 
 the API uses size_t, but right now we have lots of code that uses a mix of 
 size_t and unsigned when indexing into Vectors.  That's confusing.
 
 If I picked one type to use for all Vector indices, it would be unsigned 
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
 unless you do Vectorchar.  Usually we have Vectors of pointer-width things, 
 which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  Even in a 
 world where we had more than 32GB of memory to use within a single web 
 process, I would hope that we wouldn't use it all on a single Vector and that 
 if we did, we would treat that one specially for a bunch of other sensible 
 reasons (among them being that allocating a contiguous slab of virtual memory 
 of that size is rather taxing).  So, size_t would buy us almost nothing since 
 if we had a vector grow large enough to need it, we would be having a bad 
 time already.
 
 I wonder: are there cases that anyone remembers where we have tried to use 
 Vector to store more than UINT_MAX things?  Another interesting question is: 
 What's the largest number of things that we store into any Vector?  We could 
 use such a metric to project how big Vectors might get in the future.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types 
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity internally to 
 save memory on 64-bit, causing a mismatch between the API and the internal 
 representation [3]
 - Some reviewers have asked me to use unsigned for loop counters iterating 
 over vectors (which looks unsafe as the Vector API, e.g. size(), returns a 
 size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other 
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should do it
 
 However, Alexey recently raised concerns about this change. it doesn't 
 strike him as the right direction. 4Gb is not much, and we should have more 
 of WebKit work with the right data types, not less.”.
 I did not initially realize that this change was controversial, but now that 
 it seems it is, I thought I would raise the question on webkit-dev to see 
 what people think about this.
 
 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 [1] http://trac.webkit.org/changeset/176275
 [2] http://trac.webkit.org/changeset/176293
 [3] http://trac.webkit.org/changeset/148891
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Filip Pizlo
ArrayBuffers are very special because they are part of ECMAScript.

We use unsigned for the length, because once upon a time, that would have been 
the right type; these days the right type would be a 53-bit integer.  So, 
size_t would be wrong.  I believe that ArrayBuffers should be changed to use a 
very special size type; probably it wouldn't even be a primitive type but a 
class that carefully checked that you never overflowed 53 bits.

-Filip


 On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org wrote:
 
 
 This is not exactly about Vector, but if one uses 
 FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
 overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte 
 files to YouTube, Google Drive and other similar services, although I haven't 
 checked whether it's because of ArrayBuffers, or because of a use of 
 int/unsigned in another code path.
 
 I think that we should use size_t everywhere except for perhaps a few key 
 places where memory impact is critical (and of course we need explicit checks 
 when casting down to an unsigned). Or perhaps the rule can be even simpler, 
 and unsigned may never be used for indices and sizes, period.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 Whatever we do, the clients of Vector should be consistent about what type 
 they use.  I'm actually fine with Vector internally using unsigned even if 
 the API uses size_t, but right now we have lots of code that uses a mix of 
 size_t and unsigned when indexing into Vectors.  That's confusing.
 
 If I picked one type to use for all Vector indices, it would be unsigned 
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
 unless you do Vectorchar.  Usually we have Vectors of pointer-width 
 things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  Even 
 in a world where we had more than 32GB of memory to use within a single web 
 process, I would hope that we wouldn't use it all on a single Vector and 
 that if we did, we would treat that one specially for a bunch of other 
 sensible reasons (among them being that allocating a contiguous slab of 
 virtual memory of that size is rather taxing).  So, size_t would buy us 
 almost nothing since if we had a vector grow large enough to need it, we 
 would be having a bad time already.
 
 I wonder: are there cases that anyone remembers where we have tried to use 
 Vector to store more than UINT_MAX things?  Another interesting question is: 
 What's the largest number of things that we store into any Vector?  We could 
 use such a metric to project how big Vectors might get in the future.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com 
 mailto:cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types 
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity internally 
 to save memory on 64-bit, causing a mismatch between the API and the 
 internal representation [3]
 - Some reviewers have asked me to use unsigned for loop counters iterating 
 over vectors (which looks unsafe as the Vector API, e.g. size(), returns a 
 size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other 
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should do it
 
 However, Alexey recently raised concerns about this change. it doesn't 
 strike him as the right direction. 4Gb is not much, and we should have 
 more of WebKit work with the right data types, not less.”.
 I did not initially realize that this change was controversial, but now 
 that it seems it is, I thought I would raise the question on webkit-dev to 
 see what people think about this.
 
 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 [1] http://trac.webkit.org/changeset/176275 
 http://trac.webkit.org/changeset/176275
 [2] http://trac.webkit.org/changeset/176293 
 http://trac.webkit.org/changeset/176293
 [3] http://trac.webkit.org/changeset/148891 
 http://trac.webkit.org/changeset/148891
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev 
 https://lists.webkit.org/mailman/listinfo/webkit-dev
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev
 
 

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Antti Koivisto
I agree with Phil, we should just use unsigned consistently. For very large
memory allocations Vector is probably not the right type to use.


  antti

On Wed, Nov 19, 2014 at 10:32 PM, Filip Pizlo fpi...@apple.com wrote:

 Whatever we do, the clients of Vector should be consistent about what type
 they use.  I'm actually fine with Vector internally using unsigned even if
 the API uses size_t, but right now we have lots of code that uses a mix of
 size_t and unsigned when indexing into Vectors.  That's confusing.

 If I picked one type to use for all Vector indices, it would be unsigned
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB
 unless you do Vectorchar.  Usually we have Vectors of pointer-width
 things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).
 Even in a world where we had more than 32GB of memory to use within a
 single web process, I would hope that we wouldn't use it all on a single
 Vector and that if we did, we would treat that one specially for a bunch of
 other sensible reasons (among them being that allocating a contiguous slab
 of virtual memory of that size is rather taxing).  So, size_t would buy us
 almost nothing since if we had a vector grow large enough to need it, we
 would be having a bad time already.

 I wonder: are there cases that anyone remembers where we have tried to use
 Vector to store more than UINT_MAX things?  Another interesting question
 is: What's the largest number of things that we store into any Vector?  We
 could use such a metric to project how big Vectors might get in the future.

 -Filip


 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com wrote:

 Hi all,

 I recently started updating the WTF::Vector API to use unsigned types
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity internally
 to save memory on 64-bit, causing a mismatch between the API and the
 internal representation [3]
 - Some reviewers have asked me to use unsigned for loop counters iterating
 over vectors (which looks unsafe as the Vector API, e.g. size(), returns a
 size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should do it

 However, Alexey recently raised concerns about this change. it doesn't
 strike him as the right direction. 4Gb is not much, and we should have
 more of WebKit work with the right data types, not less.”.
 I did not initially realize that this change was controversial, but now
 that it seems it is, I thought I would raise the question on webkit-dev to
 see what people think about this.

 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA


 [1] http://trac.webkit.org/changeset/176275
 [2] http://trac.webkit.org/changeset/176293
 [3] http://trac.webkit.org/changeset/148891

 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev



 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Benjamin Poulain

I agree with Filip and Antti.

The internal type of Vector is an orthogonal issue.

Keeping the old size_t API with the unsigned internal has caused a 
mess of types in the codebase and some confusion for contributors. We 
should fix the API and clean up our code.


There are cases were we can expect large vectors. It would be reasonable 
to have the storage type as a template argument of Vector. But in most 
cases we have a lot of very small vectors, not a small amount of large 
vectors.
We could also have a BigVector class that would be optimized for bigger 
allocations.


Benjamin

On 11/19/14 12:54 PM, Alexey Proskuryakov wrote:


This is not exactly about Vector, but if one
uses FileReader.prototype.readAsArrayBuffer() on a large file, I think
that it overflows ArrayBuffer. WebKit actually crashes when uploading
multi-gigabyte files to YouTube, Google Drive and other similar
services, although I haven't checked whether it's because of
ArrayBuffers, or because of a use of int/unsigned in another code path.

I think that we should use size_t everywhere except for perhaps a few
key places where memory impact is critical (and of course we need
explicit checks when casting down to an unsigned). Or perhaps the rule
can be even simpler, and unsigned may never be used for indices and
sizes, period.

- Alexey


19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com
mailto:fpi...@apple.com написал(а):


Whatever we do, the clients of Vector should be consistent about what
type they use.  I'm actually fine with Vector internally using
unsigned even if the API uses size_t, but right now we have lots of
code that uses a mix of size_t and unsigned when indexing into
Vectors.  That's confusing.

If I picked one type to use for all Vector indices, it would be
unsigned rather than size_t.  Vector being limited to unsigned doesn't
imply 4GB unless you do Vectorchar.  Usually we have Vectors of
pointer-width things, which means 32GB on 64-bit systems (UINT_MAX *
sizeof(void*)).  Even in a world where we had more than 32GB of memory
to use within a single web process, I would hope that we wouldn't use
it all on a single Vector and that if we did, we would treat that one
specially for a bunch of other sensible reasons (among them being that
allocating a contiguous slab of virtual memory of that size is rather
taxing).  So, size_t would buy us almost nothing since if we had a
vector grow large enough to need it, we would be having a bad time
already.

I wonder: are there cases that anyone remembers where we have tried to
use Vector to store more than UINT_MAX things?  Another interesting
question is: What's the largest number of things that we store into
any Vector?  We could use such a metric to project how big Vectors
might get in the future.

-Filip



On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com
mailto:cdu...@apple.com wrote:

Hi all,

I recently started updating the WTF::Vector API to use unsigned types
instead of size_t [1][2], because:
- WTF::Vector is already using unsigned type for size/capacity
internally to save memory on 64-bit, causing a mismatch between the
API and the internal representation [3]
- Some reviewers have asked me to use unsigned for loop counters
iterating over vectors (which looks unsafe as the Vector API, e.g.
size(), returns a size_t).
- I heard from Joseph that this type mismatch is forcing us (and
other projects using WTF) to disable some build time warnings
- The few people I talked to before making that change said we should
do it

However, Alexey recently raised concerns about this change. it
doesn't strike him as the right direction. 4Gb is not much, and we
should have more of WebKit work with the right data types, not less.”.
I did not initially realize that this change was controversial, but
now that it seems it is, I thought I would raise the question on
webkit-dev to see what people think about this.

Kr,
--
Chris Dumez - Apple Inc.
Cupertino, CA


[1] http://trac.webkit.org/changeset/176275
[2] http://trac.webkit.org/changeset/176293
[3] http://trac.webkit.org/changeset/148891

___
webkit-dev mailing list
webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


___
webkit-dev mailing list
webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev





___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev



___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Alexey Proskuryakov

That makes good sense for internal implementation, do you think that class API 
should also use a custom type, or should it use size_t?

With Vector though, I don't know how we would differentiate code paths that 
need large allocations from ones that don't. Nearly anything that is exposed as 
a JS API or deals with external world can hit sizes over 4Gb. That's not out of 
reach in most scenarios, not even for resources loaded from network.

- Alexey


19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com написал(а):

 ArrayBuffers are very special because they are part of ECMAScript.
 
 We use unsigned for the length, because once upon a time, that would have 
 been the right type; these days the right type would be a 53-bit integer.  
 So, size_t would be wrong.  I believe that ArrayBuffers should be changed to 
 use a very special size type; probably it wouldn't even be a primitive type 
 but a class that carefully checked that you never overflowed 53 bits.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org wrote:
 
 
 This is not exactly about Vector, but if one uses 
 FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
 overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte 
 files to YouTube, Google Drive and other similar services, although I 
 haven't checked whether it's because of ArrayBuffers, or because of a use of 
 int/unsigned in another code path.
 
 I think that we should use size_t everywhere except for perhaps a few key 
 places where memory impact is critical (and of course we need explicit 
 checks when casting down to an unsigned). Or perhaps the rule can be even 
 simpler, and unsigned may never be used for indices and sizes, period.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com написал(а):
 
 Whatever we do, the clients of Vector should be consistent about what type 
 they use.  I'm actually fine with Vector internally using unsigned even if 
 the API uses size_t, but right now we have lots of code that uses a mix of 
 size_t and unsigned when indexing into Vectors.  That's confusing.
 
 If I picked one type to use for all Vector indices, it would be unsigned 
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
 unless you do Vectorchar.  Usually we have Vectors of pointer-width 
 things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  
 Even in a world where we had more than 32GB of memory to use within a 
 single web process, I would hope that we wouldn't use it all on a single 
 Vector and that if we did, we would treat that one specially for a bunch of 
 other sensible reasons (among them being that allocating a contiguous slab 
 of virtual memory of that size is rather taxing).  So, size_t would buy us 
 almost nothing since if we had a vector grow large enough to need it, we 
 would be having a bad time already.
 
 I wonder: are there cases that anyone remembers where we have tried to use 
 Vector to store more than UINT_MAX things?  Another interesting question 
 is: What's the largest number of things that we store into any Vector?  We 
 could use such a metric to project how big Vectors might get in the future.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types 
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity internally 
 to save memory on 64-bit, causing a mismatch between the API and the 
 internal representation [3]
 - Some reviewers have asked me to use unsigned for loop counters iterating 
 over vectors (which looks unsafe as the Vector API, e.g. size(), returns a 
 size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other 
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should do it
 
 However, Alexey recently raised concerns about this change. it doesn't 
 strike him as the right direction. 4Gb is not much, and we should have 
 more of WebKit work with the right data types, not less.”.
 I did not initially realize that this change was controversial, but now 
 that it seems it is, I thought I would raise the question on webkit-dev to 
 see what people think about this.
 
 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 [1] http://trac.webkit.org/changeset/176275
 [2] http://trac.webkit.org/changeset/176293
 [3] http://trac.webkit.org/changeset/148891
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev
 
 ___
 webkit-dev mailing list
 webkit-dev@lists.webkit.org
 https://lists.webkit.org/mailman/listinfo/webkit-dev
 
 
 


___
webkit-dev mailing list

Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Geoffrey Garen
I don’t haver a specific opinion on size_t vs unsigned, but I have a related 
point:

If Vector uses unsigned, then it must RELEASE_ASSERT that it does not overflow 
unsigned when growing. 

Even if most normal content allocates  4GB, exploit content surely will try to 
allocate  4GB.

Chris, maybe you can make this change in trunk, since trunk uses unsigned?

Geoff

 On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org wrote:
 
 
 That makes good sense for internal implementation, do you think that class 
 API should also use a custom type, or should it use size_t?
 
 With Vector though, I don't know how we would differentiate code paths that 
 need large allocations from ones that don't. Nearly anything that is exposed 
 as a JS API or deals with external world can hit sizes over 4Gb. That's not 
 out of reach in most scenarios, not even for resources loaded from network.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 ArrayBuffers are very special because they are part of ECMAScript.
 
 We use unsigned for the length, because once upon a time, that would have 
 been the right type; these days the right type would be a 53-bit integer.  
 So, size_t would be wrong.  I believe that ArrayBuffers should be changed to 
 use a very special size type; probably it wouldn't even be a primitive type 
 but a class that carefully checked that you never overflowed 53 bits.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 This is not exactly about Vector, but if one uses 
 FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
 overflows ArrayBuffer. WebKit actually crashes when uploading 
 multi-gigabyte files to YouTube, Google Drive and other similar services, 
 although I haven't checked whether it's because of ArrayBuffers, or because 
 of a use of int/unsigned in another code path.
 
 I think that we should use size_t everywhere except for perhaps a few key 
 places where memory impact is critical (and of course we need explicit 
 checks when casting down to an unsigned). Or perhaps the rule can be even 
 simpler, and unsigned may never be used for indices and sizes, period.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 Whatever we do, the clients of Vector should be consistent about what type 
 they use.  I'm actually fine with Vector internally using unsigned even if 
 the API uses size_t, but right now we have lots of code that uses a mix of 
 size_t and unsigned when indexing into Vectors.  That's confusing.
 
 If I picked one type to use for all Vector indices, it would be unsigned 
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
 unless you do Vectorchar.  Usually we have Vectors of pointer-width 
 things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  
 Even in a world where we had more than 32GB of memory to use within a 
 single web process, I would hope that we wouldn't use it all on a single 
 Vector and that if we did, we would treat that one specially for a bunch 
 of other sensible reasons (among them being that allocating a contiguous 
 slab of virtual memory of that size is rather taxing).  So, size_t would 
 buy us almost nothing since if we had a vector grow large enough to need 
 it, we would be having a bad time already.
 
 I wonder: are there cases that anyone remembers where we have tried to use 
 Vector to store more than UINT_MAX things?  Another interesting question 
 is: What's the largest number of things that we store into any Vector?  We 
 could use such a metric to project how big Vectors might get in the future.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com 
 mailto:cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types 
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity internally 
 to save memory on 64-bit, causing a mismatch between the API and the 
 internal representation [3]
 - Some reviewers have asked me to use unsigned for loop counters 
 iterating over vectors (which looks unsafe as the Vector API, e.g. 
 size(), returns a size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other 
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should do 
 it
 
 However, Alexey recently raised concerns about this change. it doesn't 
 strike him as the right direction. 4Gb is not much, and we should have 
 more of WebKit work with the right data types, not less.”.
 I did not initially realize that this change was controversial, but now 
 that it seems it is, I thought I would raise the question on webkit-dev 
 to see what people think about this.
 
 Kr,
 --
 Chris Dumez - Apple Inc.
 Cupertino, 

Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Filip Pizlo

 On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org wrote:
 
 
 That makes good sense for internal implementation, do you think that class 
 API should also use a custom type, or should it use size_t?

I would propose having an Int53 type in JSC, because that is now a thing in 
ECMAScript.  ArrayBuffer would use it.

Int53 could have an implicit conversion to size_t, but not the other way around.

 
 With Vector though, I don't know how we would differentiate code paths that 
 need large allocations from ones that don't. Nearly anything that is exposed 
 as a JS API or deals with external world can hit sizes over 4Gb. That's not 
 out of reach in most scenarios, not even for resources loaded from network.

Can you provide an example?

- Except for ArrayBuffers, other kinds of JS arrays are still semantically 
limited to length = 2^31 - 2.

- Anytime we have an array of structured data where individual elements are 
large-ish, a vector of size UINT_MAX would not imply 4GB.  It would imply 
something much larger.  All it takes is Vectorvoid* for example - and even 
with the size being an unsigned, the size limit is actually 32GB on 64-bit.  
So, I don't think it's accurate to speak of a 4GB limit since this has nothing 
to do with bytes.

-Filip


 
 - Alexey
 
 
 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 ArrayBuffers are very special because they are part of ECMAScript.
 
 We use unsigned for the length, because once upon a time, that would have 
 been the right type; these days the right type would be a 53-bit integer.  
 So, size_t would be wrong.  I believe that ArrayBuffers should be changed to 
 use a very special size type; probably it wouldn't even be a primitive type 
 but a class that carefully checked that you never overflowed 53 bits.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 This is not exactly about Vector, but if one uses 
 FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
 overflows ArrayBuffer. WebKit actually crashes when uploading 
 multi-gigabyte files to YouTube, Google Drive and other similar services, 
 although I haven't checked whether it's because of ArrayBuffers, or because 
 of a use of int/unsigned in another code path.
 
 I think that we should use size_t everywhere except for perhaps a few key 
 places where memory impact is critical (and of course we need explicit 
 checks when casting down to an unsigned). Or perhaps the rule can be even 
 simpler, and unsigned may never be used for indices and sizes, period.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 Whatever we do, the clients of Vector should be consistent about what type 
 they use.  I'm actually fine with Vector internally using unsigned even if 
 the API uses size_t, but right now we have lots of code that uses a mix of 
 size_t and unsigned when indexing into Vectors.  That's confusing.
 
 If I picked one type to use for all Vector indices, it would be unsigned 
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
 unless you do Vectorchar.  Usually we have Vectors of pointer-width 
 things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  
 Even in a world where we had more than 32GB of memory to use within a 
 single web process, I would hope that we wouldn't use it all on a single 
 Vector and that if we did, we would treat that one specially for a bunch 
 of other sensible reasons (among them being that allocating a contiguous 
 slab of virtual memory of that size is rather taxing).  So, size_t would 
 buy us almost nothing since if we had a vector grow large enough to need 
 it, we would be having a bad time already.
 
 I wonder: are there cases that anyone remembers where we have tried to use 
 Vector to store more than UINT_MAX things?  Another interesting question 
 is: What's the largest number of things that we store into any Vector?  We 
 could use such a metric to project how big Vectors might get in the future.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com 
 mailto:cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types 
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity internally 
 to save memory on 64-bit, causing a mismatch between the API and the 
 internal representation [3]
 - Some reviewers have asked me to use unsigned for loop counters 
 iterating over vectors (which looks unsafe as the Vector API, e.g. 
 size(), returns a size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other 
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should do 
 it
 
 However, Alexey recently raised concerns about this 

Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Chris Dumez
Hi,

I believe the Vector class is already checking for overflows. I looked quickly 
and it seems that when trying to allocate / reallocate the Vector buffer, we 
will call CRASH() if:
(newCapacity  std::numeric_limitsunsigned::max() / sizeof(T))

Kr, 
--
Chris Dumez - Apple Inc.
Cupertino, CA




 On Nov 19, 2014, at 1:57 PM, Geoffrey Garen gga...@apple.com wrote:
 
 I don’t haver a specific opinion on size_t vs unsigned, but I have a related 
 point:
 
 If Vector uses unsigned, then it must RELEASE_ASSERT that it does not 
 overflow unsigned when growing. 
 
 Even if most normal content allocates  4GB, exploit content surely will try 
 to allocate  4GB.
 
 Chris, maybe you can make this change in trunk, since trunk uses unsigned?
 
 Geoff
 
 On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 That makes good sense for internal implementation, do you think that class 
 API should also use a custom type, or should it use size_t?
 
 With Vector though, I don't know how we would differentiate code paths that 
 need large allocations from ones that don't. Nearly anything that is exposed 
 as a JS API or deals with external world can hit sizes over 4Gb. That's not 
 out of reach in most scenarios, not even for resources loaded from network.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 ArrayBuffers are very special because they are part of ECMAScript.
 
 We use unsigned for the length, because once upon a time, that would have 
 been the right type; these days the right type would be a 53-bit integer.  
 So, size_t would be wrong.  I believe that ArrayBuffers should be changed 
 to use a very special size type; probably it wouldn't even be a primitive 
 type but a class that carefully checked that you never overflowed 53 bits.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 This is not exactly about Vector, but if one uses 
 FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
 overflows ArrayBuffer. WebKit actually crashes when uploading 
 multi-gigabyte files to YouTube, Google Drive and other similar services, 
 although I haven't checked whether it's because of ArrayBuffers, or 
 because of a use of int/unsigned in another code path.
 
 I think that we should use size_t everywhere except for perhaps a few key 
 places where memory impact is critical (and of course we need explicit 
 checks when casting down to an unsigned). Or perhaps the rule can be even 
 simpler, and unsigned may never be used for indices and sizes, period.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 Whatever we do, the clients of Vector should be consistent about what 
 type they use.  I'm actually fine with Vector internally using unsigned 
 even if the API uses size_t, but right now we have lots of code that uses 
 a mix of size_t and unsigned when indexing into Vectors.  That's 
 confusing.
 
 If I picked one type to use for all Vector indices, it would be unsigned 
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
 unless you do Vectorchar.  Usually we have Vectors of pointer-width 
 things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  
 Even in a world where we had more than 32GB of memory to use within a 
 single web process, I would hope that we wouldn't use it all on a single 
 Vector and that if we did, we would treat that one specially for a bunch 
 of other sensible reasons (among them being that allocating a contiguous 
 slab of virtual memory of that size is rather taxing).  So, size_t would 
 buy us almost nothing since if we had a vector grow large enough to need 
 it, we would be having a bad time already.
 
 I wonder: are there cases that anyone remembers where we have tried to 
 use Vector to store more than UINT_MAX things?  Another interesting 
 question is: What's the largest number of things that we store into any 
 Vector?  We could use such a metric to project how big Vectors might get 
 in the future.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com 
 mailto:cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types 
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity 
 internally to save memory on 64-bit, causing a mismatch between the API 
 and the internal representation [3]
 - Some reviewers have asked me to use unsigned for loop counters 
 iterating over vectors (which looks unsafe as the Vector API, e.g. 
 size(), returns a size_t).
 - I heard from Joseph that this type mismatch is forcing us (and other 
 projects using WTF) to disable some build time warnings
 - The few people I talked to before making that change said we should do 
 it
 
 However, Alexey recently 

Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Oliver Hunt
On a related note I’ve also got a patch up for review that adds bounds checking 
to Vector::iterator, and that vigorously performs bounds and overflow checking.

—Oliver

 On Nov 19, 2014, at 2:04 PM, Chris Dumez cdu...@apple.com wrote:
 
 Hi,
 
 I believe the Vector class is already checking for overflows. I looked 
 quickly and it seems that when trying to allocate / reallocate the Vector 
 buffer, we will call CRASH() if:
 (newCapacity  std::numeric_limitsunsigned::max() / sizeof(T))
 
 Kr, 
 --
 Chris Dumez - Apple Inc.
 Cupertino, CA
 
 
 
 
 On Nov 19, 2014, at 1:57 PM, Geoffrey Garen gga...@apple.com 
 mailto:gga...@apple.com wrote:
 
 I don’t haver a specific opinion on size_t vs unsigned, but I have a related 
 point:
 
 If Vector uses unsigned, then it must RELEASE_ASSERT that it does not 
 overflow unsigned when growing. 
 
 Even if most normal content allocates  4GB, exploit content surely will try 
 to allocate  4GB.
 
 Chris, maybe you can make this change in trunk, since trunk uses unsigned?
 
 Geoff
 
 On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 That makes good sense for internal implementation, do you think that class 
 API should also use a custom type, or should it use size_t?
 
 With Vector though, I don't know how we would differentiate code paths that 
 need large allocations from ones that don't. Nearly anything that is 
 exposed as a JS API or deals with external world can hit sizes over 4Gb. 
 That's not out of reach in most scenarios, not even for resources loaded 
 from network.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 ArrayBuffers are very special because they are part of ECMAScript.
 
 We use unsigned for the length, because once upon a time, that would have 
 been the right type; these days the right type would be a 53-bit integer.  
 So, size_t would be wrong.  I believe that ArrayBuffers should be changed 
 to use a very special size type; probably it wouldn't even be a primitive 
 type but a class that carefully checked that you never overflowed 53 bits.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org 
 mailto:a...@webkit.org wrote:
 
 
 This is not exactly about Vector, but if one uses 
 FileReader.prototype.readAsArrayBuffer() on a large file, I think that it 
 overflows ArrayBuffer. WebKit actually crashes when uploading 
 multi-gigabyte files to YouTube, Google Drive and other similar services, 
 although I haven't checked whether it's because of ArrayBuffers, or 
 because of a use of int/unsigned in another code path.
 
 I think that we should use size_t everywhere except for perhaps a few key 
 places where memory impact is critical (and of course we need explicit 
 checks when casting down to an unsigned). Or perhaps the rule can be even 
 simpler, and unsigned may never be used for indices and sizes, period.
 
 - Alexey
 
 
 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 Whatever we do, the clients of Vector should be consistent about what 
 type they use.  I'm actually fine with Vector internally using unsigned 
 even if the API uses size_t, but right now we have lots of code that 
 uses a mix of size_t and unsigned when indexing into Vectors.  That's 
 confusing.
 
 If I picked one type to use for all Vector indices, it would be unsigned 
 rather than size_t.  Vector being limited to unsigned doesn't imply 4GB 
 unless you do Vectorchar.  Usually we have Vectors of pointer-width 
 things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)).  
 Even in a world where we had more than 32GB of memory to use within a 
 single web process, I would hope that we wouldn't use it all on a single 
 Vector and that if we did, we would treat that one specially for a bunch 
 of other sensible reasons (among them being that allocating a contiguous 
 slab of virtual memory of that size is rather taxing).  So, size_t would 
 buy us almost nothing since if we had a vector grow large enough to need 
 it, we would be having a bad time already.
 
 I wonder: are there cases that anyone remembers where we have tried to 
 use Vector to store more than UINT_MAX things?  Another interesting 
 question is: What's the largest number of things that we store into any 
 Vector?  We could use such a metric to project how big Vectors might get 
 in the future.
 
 -Filip
 
 
 On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com 
 mailto:cdu...@apple.com wrote:
 
 Hi all,
 
 I recently started updating the WTF::Vector API to use unsigned types 
 instead of size_t [1][2], because:
 - WTF::Vector is already using unsigned type for size/capacity 
 internally to save memory on 64-bit, causing a mismatch between the API 
 and the internal representation [3]
 - Some reviewers have asked me to use unsigned for loop counters 
 iterating over vectors (which looks unsafe as the 

Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Alexey Proskuryakov

19 нояб. 2014 г., в 13:58, Filip Pizlo fpi...@apple.com написал(а):

 With Vector though, I don't know how we would differentiate code paths that 
 need large allocations from ones that don't. Nearly anything that is exposed 
 as a JS API or deals with external world can hit sizes over 4Gb. That's not 
 out of reach in most scenarios, not even for resources loaded from network.
 
 Can you provide an example?

Yes. XMLHttpRequest::m_binaryResponseBuilder keeps the downloaded data in a 
Vector, so any time there is much data, something bad will happen. This is a 
case that we should support, and not just crash as we would when we think that 
only exploits would try to use as much memory.

All code that is Blob related also uses Vectors, and of course Blobs can 
legitimately be large.

Crypto code uses Vectors internally for the data.

These and related uses are all over the place - see also Vectors in 
FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, 
plug-in code that loads from network, SharedBuffer etc.

- Alexey

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev


Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?

2014-11-19 Thread Chris Dumez
If we don’t want to crash on overflow, the callers can use the try*() API I 
believe (e.g. tryAppend()). This returns false (and does not resize the vector) 
instead of crashing, when we reach the size limit.

Kr,
--
Chris Dumez - Apple Inc.
Cupertino, CA




 On Nov 19, 2014, at 2:58 PM, Alexey Proskuryakov a...@webkit.org wrote:
 
 
 19 нояб. 2014 г., в 13:58, Filip Pizlo fpi...@apple.com 
 mailto:fpi...@apple.com написал(а):
 
 With Vector though, I don't know how we would differentiate code paths that 
 need large allocations from ones that don't. Nearly anything that is 
 exposed as a JS API or deals with external world can hit sizes over 4Gb. 
 That's not out of reach in most scenarios, not even for resources loaded 
 from network.
 
 Can you provide an example?
 
 Yes. XMLHttpRequest::m_binaryResponseBuilder keeps the downloaded data in a 
 Vector, so any time there is much data, something bad will happen. This is a 
 case that we should support, and not just crash as we would when we think 
 that only exploits would try to use as much memory.
 
 All code that is Blob related also uses Vectors, and of course Blobs can 
 legitimately be large.
 
 Crypto code uses Vectors internally for the data.
 
 These and related uses are all over the place - see also Vectors in 
 FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, 
 plug-in code that loads from network, SharedBuffer etc.
 
 - Alexey
 

___
webkit-dev mailing list
webkit-dev@lists.webkit.org
https://lists.webkit.org/mailman/listinfo/webkit-dev