Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
24 нояб. 2014 г., в 1:28, Antti Koivisto koivi...@iki.fi написал(а): I don't think this is really 32bit vs 64bit platform issue. The vast majority of 64bit systems our users have (that is iOS devices) can't use memory buffers sized anywhere near the 32bit limit even in theory. Also when using Vector auto-grow capabilities (which is really the point of using a vector instead of just allocating a buffer) you need way more memory than the actual data size. Growing a Vectoruint8_t beyond 4GB has peak allocation of 9GB. The argument that we should support the same functionality across all devices is a pretty strong one. However it's not as simple as it may sound. 1. The user impact is different. I haven't seen any reports of people trying this on iOS devices, so the importance of fixing the problem on iOS is lower than the importance of fixing it on OS X, where people actually do encounter it in practice. 2. Relative cost of supporting large files on memory constrained devices may be different (e.g. using off_t as Maciej proposed might be too big of a performance hit). When we know it, we can decide whether the ability to upload large files is worth the cost. 3. The scope of work required to make this work is likely different. I suspect that uploading such huge files from iOS Safari is currently essentially impossible for other reasons (we can discuss it in more detail offline). Also, the argument against huge Vectors is a straw man one. What I'm saying is we have a lot of code that already needs to deal with large sizes, and this code is everywhere. It's not like we have a large data world and a small data world inside WebKit, and can perform magnitude checks at the boundary. It's quite the opposite, everything works together, and the practical solution is to have checks isolated inside Vector. Are there any examples of Vectors in the current code base where we would usefully fix an actual problem even in medium term by switching to 64bit storage sizes? I don't think they exists. Cases like file uploads should stream the data or use some mapped-file backed buffer type that is not a Vector. With modern APIs that are now exposed to JS code, file uploads are no longer an isolated code path. The Blob is sliced, partially read into memory and processed. It is almost certain that YouTube and Google Drive don't attempt to read huge slices, so 32-bit lengths in Vectors and even ArrayBuffers will probably work in practice. But it's not appropriate for all users of Vector to perform the checks, because failure to do them right will have serious consequences. - Alexey ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
I don't think this is really 32bit vs 64bit platform issue. The vast majority of 64bit systems our users have (that is iOS devices) can't use memory buffers sized anywhere near the 32bit limit even in theory. Also when using Vector auto-grow capabilities (which is really the point of using a vector instead of just allocating a buffer) you need way more memory than the actual data size. Growing a Vectoruint8_t beyond 4GB has peak allocation of 9GB. Are there any examples of Vectors in the current code base where we would usefully fix an actual problem even in medium term by switching to 64bit storage sizes? I don't think they exists. Cases like file uploads should stream the data or use some mapped-file backed buffer type that is not a Vector. With this in mind I think the right direction is to make the Vector API match the implementation and just start using unsigned indexes everywhere. antti On Fri, Nov 21, 2014 at 2:59 AM, Maciej Stachowiak m...@apple.com wrote: On Nov 20, 2014, at 4:51 PM, Maciej Stachowiak m...@apple.com wrote: On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov a...@webkit.org wrote: 19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org написал(а): These and related uses are all over the place - see also Vectors in FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, plug-in code that loads from network, SharedBuffer etc. Another way to say this is: we need to deal with large data arrays throughout loading code. We do not really need full size vectors in most other code - it's sort of OK for HTML parser or for image decoder to fail catastrophically when there is too much data fed to it. This is somewhat questionable design, but if we are going to stick to it, then magnitude checks should be centralized, not sprinkled throughout the code. We should not make this check explicitly when feeding a network resource to the parser, for example. A 64-bit API for Vector solves this nearly flawlessly. We do not perform the checks manually every time we use a Vector, Vector does them for us. Other options are: - uint64_t everywhere. This way, we'll solve practical problems with large resources once and for all. Also, this may prove to be necessary to solve even YouTube/Google Drive uploads, I do not know that yet. - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will still be limited. I like this option, because it won't make us pay the memory and performance cost on old crippled 32-bit machines, which are unlikely to be used for manipulating huge volumes of data anyway. We probably want YouTube upload of large files to work on 32-bit machines. Though presumably if you want to upload a file larger than the virtual address space, you need to represent it in some way other than a Vector. Thinking about this more - I think file sizes should probably be an off_t, not a size_t, so on 32-bit platforms some care must still be taken in the conversion between file sizes and vector sizes. In general, I like the idea of Vector having a size_t API and a choice of smaller or larger internal size. We might also want to change other containers with size-related interfaces to match. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
El mié, 19-11-2014 a las 12:20 -0800, Chris Dumez escribió: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it I started this [1] some time ago, copy pasting what Darin told me in a review. Whatever we decide could be documented there. However, Alexey recently raised concerns about this change. it doesn't strike him as the right direction. 4Gb is not much, and we should have more of WebKit work with the right data types, not less.”. I did not initially realize that this change was controversial, but now that it seems it is, I thought I would raise the question on webkit-dev to see what people think about this. Kr, -- Chris Dumez - Apple Inc. Cupertino, CA [1] http://trac.webkit.org/changeset/176275 [2] http://trac.webkit.org/changeset/176293 [3] http://trac.webkit.org/changeset/148891 [1] http://trac.webkit.org/wiki/TypesForSize ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org написал(а): These and related uses are all over the place - see also Vectors in FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, plug-in code that loads from network, SharedBuffer etc. Another way to say this is: we need to deal with large data arrays throughout loading code. We do not really need full size vectors in most other code - it's sort of OK for HTML parser or for image decoder to fail catastrophically when there is too much data fed to it. This is somewhat questionable design, but if we are going to stick to it, then magnitude checks should be centralized, not sprinkled throughout the code. We should not make this check explicitly when feeding a network resource to the parser, for example. A 64-bit API for Vector solves this nearly flawlessly. We do not perform the checks manually every time we use a Vector, Vector does them for us. Other options are: - uint64_t everywhere. This way, we'll solve practical problems with large resources once and for all. Also, this may prove to be necessary to solve even YouTube/Google Drive uploads, I do not know that yet. - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will still be limited. I like this option, because it won't make us pay the memory and performance cost on old crippled 32-bit machines, which are unlikely to be used for manipulating huge volumes of data anyway. We'll also need to develop some magic for 53-bit JS bindings, which I'm not sure about. - Alexey ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
20 нояб. 2014 г., в 10:45, Filip Pizlo fpi...@apple.com написал(а): - uint64_t everywhere. This way, we'll solve practical problems with large resources once and for all. Also, this may prove to be necessary to solve even YouTube/Google Drive uploads, I do not know that yet. How does this solve the problem of 4GB data on 32-bit systems? OK, that was not very thoughtful of me indeed. This option is not a good one. Are you saying that because the code that measures file sizes uses a 64-bit type then therefore the code that measures memory object sizes should also use that same type? I'm not; it seems practical enough to isolate code that deals with local files, so they don't need to affect the design. - Alexey ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
Yes, good point. I should have been more specific. I believe this math might overflow and then truncate: templatetypename T, unsigned inlineCapacity, typename OverflowHandler bool VectorT, inlineCapacity, OverflowHandler::tryExpandCapacity(unsigned newMinCapacity) { return tryReserveCapacity(std::max(newMinCapacity, std::max(16u, static_castunsigned(capacity() + capacity() / 4 + 1; } Geoff On Nov 19, 2014, at 2:04 PM, Chris Dumez cdu...@apple.com wrote: Hi, I believe the Vector class is already checking for overflows. I looked quickly and it seems that when trying to allocate / reallocate the Vector buffer, we will call CRASH() if: (newCapacity std::numeric_limitsunsigned::max() / sizeof(T)) Kr, -- Chris Dumez - Apple Inc. Cupertino, CA On Nov 19, 2014, at 1:57 PM, Geoffrey Garen gga...@apple.com mailto:gga...@apple.com wrote: I don’t haver a specific opinion on size_t vs unsigned, but I have a related point: If Vector uses unsigned, then it must RELEASE_ASSERT that it does not overflow unsigned when growing. Even if most normal content allocates 4GB, exploit content surely will try to allocate 4GB. Chris, maybe you can make this change in trunk, since trunk uses unsigned? Geoff On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: That makes good sense for internal implementation, do you think that class API should also use a custom type, or should it use size_t? With Vector though, I don't know how we would differentiate code paths that need large allocations from ones that don't. Nearly anything that is exposed as a JS API or deals with external world can hit sizes over 4Gb. That's not out of reach in most scenarios, not even for resources loaded from network. - Alexey 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): ArrayBuffers are very special because they are part of ECMAScript. We use unsigned for the length, because once upon a time, that would have been the right type; these days the right type would be a 53-bit integer. So, size_t would be wrong. I believe that ArrayBuffers should be changed to use a very special size type; probably it wouldn't even be a primitive type but a class that carefully checked that you never overflowed 53 bits. -Filip On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com mailto:cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
I think I'm starting to see what you're saying, so let me restate it my way to be sure: Code that deals with various loaded blobs must be resilient against them getting larger than a 32-bit size can handle and this is the subject of ongoing development effort. Such code may or may not use a Vector. It definitely must use size_t for all of its sizes. The trouble with Vector using unsigned in its API then causes the following troubles: If the code uses a Vector for the large data: in addition to not being able to actually hold all of the data, there will be a risk of truncation from size_t to unsigned on the client side of the Vector, which is much harder to prevent than a truncation inside Vector itself. For such code the ideal would be a Vector that measures itself using size_t, both in its internal data structure and in its API. If the code does not use Vector for that large data but uses it nonetheless for other things: things won't obviously go wrong but you end up in sketchy territory: you will have code that is juggling unsigned's for the Vector and size_t's for the large data. Any arithmetic between the two can easily become suspect. Consider a case where you might use a metric of size or offset in the large data in some arithmetic that flows into Vector. If Vector's indexing operations use unsigned, then you will have truncation. These problems fall into two categories: 1) Actually not being able to store the large data but failing gracefully when it happens. 2) Truncation leading to some kind of undesirable behavior. Someone mentioned something about warnings that can tell us about (2). Apparently we cannot turn the warning on right now. The hope is that if we convert everything to unsigned then we won't have the warnings in trunk and then we can turn the warning on, and catch such bugs. This seems very aspirational. Also, it means that everyone hacking on code that mixes sizes of files with sizes of vectors will have to scratch their heads about which type to use, and then if they do it wrong they will get some warnings. Of course, the warnings can easily be silenced with a cast. That's super dangerous. In my experience, if the easiest way to make the compiler STFU is by doing the wrong thing (static_cast, in this case) then people will do the wrong thing quite frequently. (A particularly funny example of this is checked exceptions in Java. If you call a method declared to throw an exception then the caller must either catch it or declare that they throw it, too. The easiest way to silence the error is to catch it with an empty catch block that ignores the exception, since that requires the fewest keystrokes. Someone once did a study of large Java codebases and found such empty catch blocks to be a chronic problem - people did it a lot and it was responsible for bad bugs. The lesson: a type error is only a good thing if the path of least resistance to making the compiler STFU involves actually improving the code.) An alternate approach is to agree to use size_t for sizes. This is an easy to explain rule and it should be easy for reviewers to catch it. It might also make that compiler warning more meaningful, in the sense that in a world where we are mixing size_t and unsigned we will *have* to have some casts between the two but in a world where we use size_t uniformly any such cast is an immediate red flag. So, I think I'm starting to agree with the size_t approach. I wonder what the downsides are to this approach. Footprint of Vector? -Filip On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov a...@webkit.org wrote: 19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org написал(а): These and related uses are all over the place - see also Vectors in FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, plug-in code that loads from network, SharedBuffer etc. Another way to say this is: we need to deal with large data arrays throughout loading code. We do not really need full size vectors in most other code - it's sort of OK for HTML parser or for image decoder to fail catastrophically when there is too much data fed to it. This is somewhat questionable design, but if we are going to stick to it, then magnitude checks should be centralized, not sprinkled throughout the code. We should not make this check explicitly when feeding a network resource to the parser, for example. A 64-bit API for Vector solves this nearly flawlessly. We do not perform the checks manually every time we use a Vector, Vector does them for us. Other options are: - uint64_t everywhere. This way, we'll solve practical problems with large resources once and for all. Also, this may prove to be necessary to solve even YouTube/Google Drive uploads, I do not know that yet. - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will still be
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
I wonder what the downsides are to this approach. Footprint of Vector? It looks like the original change was motivated by shrinking Vector: https://bugs.webkit.org/show_bug.cgi?id=97268 https://bugs.webkit.org/show_bug.cgi?id=97268 Sadly, it didn’t include any data on the observed benefit :(. Geoff___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
If there was a benefit, then the following approach would make sense to me: - Have both Vector and SmallVector. SmallVector will use unsigned internally. - Use SmallVector in the places where sizeof(Vector) matters. This would take some effort but I think that we have the tools that would be needed to be systematic about it. - Both Vector and SmallVector will have an API based on size_t. SmallVector will do truncation checks in all of the places. -Filip On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com wrote: I wonder what the downsides are to this approach. Footprint of Vector? It looks like the original change was motivated by shrinking Vector: https://bugs.webkit.org/show_bug.cgi?id=97268 https://bugs.webkit.org/show_bug.cgi?id=97268 Sadly, it didn’t include any data on the observed benefit :(. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
The corresponding Blink bug did contain some performance data: CrBug#229226 https://code.google.com/p/chromium/issues/detail?id=229226 Kr, -- Chris Dumez - Apple Inc. Cupertino, CA On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com wrote: I wonder what the downsides are to this approach. Footprint of Vector? It looks like the original change was motivated by shrinking Vector: https://bugs.webkit.org/show_bug.cgi?id=97268 https://bugs.webkit.org/show_bug.cgi?id=97268 Sadly, it didn’t include any data on the observed benefit :(. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
That looks like a pretty good performance win. I'd advocate for SmallVector and Vector, then. -Filip On Nov 20, 2014, at 11:38 AM, Chris Dumez cdu...@apple.com wrote: The corresponding Blink bug did contain some performance data: CrBug#229226 https://code.google.com/p/chromium/issues/detail?id=229226 Kr, -- Chris Dumez - Apple Inc. Cupertino, CA On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com mailto:gga...@apple.com wrote: I wonder what the downsides are to this approach. Footprint of Vector? It looks like the original change was motivated by shrinking Vector: https://bugs.webkit.org/show_bug.cgi?id=97268 https://bugs.webkit.org/show_bug.cgi?id=97268 Sadly, it didn’t include any data on the observed benefit :(. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
Or Vector and ByteBuffer. All these cases requiring very large buffers seem to be about untyped data. antti On Thu, Nov 20, 2014 at 9:40 PM, Filip Pizlo fpi...@apple.com wrote: That looks like a pretty good performance win. I'd advocate for SmallVector and Vector, then. -Filip On Nov 20, 2014, at 11:38 AM, Chris Dumez cdu...@apple.com wrote: The corresponding Blink bug did contain some performance data: CrBug#229226 https://code.google.com/p/chromium/issues/detail?id=229226 Kr, -- Chris Dumez - Apple Inc. Cupertino, CA On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com wrote: I wonder what the downsides are to this approach. Footprint of Vector? It looks like the original change was motivated by shrinking Vector: https://bugs.webkit.org/show_bug.cgi?id=97268 Sadly, it didn’t include any data on the observed benefit :(. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
On Nov 20, 2014, at 2:06 PM, Alexey Proskuryakov a...@webkit.org wrote: SmallVector and Vector seem reasonable to me. I think that this is the right nomenclature - using limited size vectors should be a conscious choice. Even SmallVector should probably have a size_t API, so that we could centralize magnitude checks. What do you think? Agreed. This avoids any risk of someone computing a size_t quantity and flowing it into the SmallVector API in a way that would result in truncation that SmallVector cannot internally catch. -Filip - Alexey 20 нояб. 2014 г., в 11:40, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): That looks like a pretty good performance win. I'd advocate for SmallVector and Vector, then. -Filip On Nov 20, 2014, at 11:38 AM, Chris Dumez cdu...@apple.com mailto:cdu...@apple.com wrote: The corresponding Blink bug did contain some performance data: CrBug#229226 https://code.google.com/p/chromium/issues/detail?id=229226 Kr, -- Chris Dumez - Apple Inc. Cupertino, CA On Nov 20, 2014, at 11:32 AM, Geoffrey Garen gga...@apple.com mailto:gga...@apple.com wrote: I wonder what the downsides are to this approach. Footprint of Vector? It looks like the original change was motivated by shrinking Vector: https://bugs.webkit.org/show_bug.cgi?id=97268 https://bugs.webkit.org/show_bug.cgi?id=97268 Sadly, it didn’t include any data on the observed benefit :(. Geoff ___ webkit-dev mailing list webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov a...@webkit.org wrote: 19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org написал(а): These and related uses are all over the place - see also Vectors in FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, plug-in code that loads from network, SharedBuffer etc. Another way to say this is: we need to deal with large data arrays throughout loading code. We do not really need full size vectors in most other code - it's sort of OK for HTML parser or for image decoder to fail catastrophically when there is too much data fed to it. This is somewhat questionable design, but if we are going to stick to it, then magnitude checks should be centralized, not sprinkled throughout the code. We should not make this check explicitly when feeding a network resource to the parser, for example. A 64-bit API for Vector solves this nearly flawlessly. We do not perform the checks manually every time we use a Vector, Vector does them for us. Other options are: - uint64_t everywhere. This way, we'll solve practical problems with large resources once and for all. Also, this may prove to be necessary to solve even YouTube/Google Drive uploads, I do not know that yet. - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will still be limited. I like this option, because it won't make us pay the memory and performance cost on old crippled 32-bit machines, which are unlikely to be used for manipulating huge volumes of data anyway. We probably want YouTube upload of large files to work on 32-bit machines. Though presumably if you want to upload a file larger than the virtual address space, you need to represent it in some way other than a Vector. - Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
On Nov 20, 2014, at 4:51 PM, Maciej Stachowiak m...@apple.com wrote: On Nov 20, 2014, at 9:26 AM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: 19 нояб. 2014 г., в 14:58, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org написал(а): These and related uses are all over the place - see also Vectors in FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, plug-in code that loads from network, SharedBuffer etc. Another way to say this is: we need to deal with large data arrays throughout loading code. We do not really need full size vectors in most other code - it's sort of OK for HTML parser or for image decoder to fail catastrophically when there is too much data fed to it. This is somewhat questionable design, but if we are going to stick to it, then magnitude checks should be centralized, not sprinkled throughout the code. We should not make this check explicitly when feeding a network resource to the parser, for example. A 64-bit API for Vector solves this nearly flawlessly. We do not perform the checks manually every time we use a Vector, Vector does them for us. Other options are: - uint64_t everywhere. This way, we'll solve practical problems with large resources once and for all. Also, this may prove to be necessary to solve even YouTube/Google Drive uploads, I do not know that yet. - size_t everywhere. Same effect on 64-bit platforms, while 32-bit ones will still be limited. I like this option, because it won't make us pay the memory and performance cost on old crippled 32-bit machines, which are unlikely to be used for manipulating huge volumes of data anyway. We probably want YouTube upload of large files to work on 32-bit machines. Though presumably if you want to upload a file larger than the virtual address space, you need to represent it in some way other than a Vector. Thinking about this more - I think file sizes should probably be an off_t, not a size_t, so on 32-bit platforms some care must still be taken in the conversion between file sizes and vector sizes. In general, I like the idea of Vector having a size_t API and a choice of smaller or larger internal size. We might also want to change other containers with size-related interfaces to match. Regards, Maciej ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently raised concerns about this change. it doesn't strike him as the right direction. 4Gb is not much, and we should have more of WebKit work with the right data types, not less.”. I did not initially realize that this change was controversial, but now that it seems it is, I thought I would raise the question on webkit-dev to see what people think about this. Kr, -- Chris Dumez - Apple Inc. Cupertino, CA [1] http://trac.webkit.org/changeset/176275 http://trac.webkit.org/changeset/176275 [2] http://trac.webkit.org/changeset/176293 http://trac.webkit.org/changeset/176293 [3] http://trac.webkit.org/changeset/148891 http://trac.webkit.org/changeset/148891 ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently raised concerns about this change. it doesn't strike him as the right direction. 4Gb is not much, and we should have more of WebKit work with the right data types, not less.”. I did not initially realize that this change was controversial, but now that it seems it is, I thought I would raise the question on webkit-dev to see what people think about this. Kr, -- Chris Dumez - Apple Inc. Cupertino, CA [1] http://trac.webkit.org/changeset/176275 [2] http://trac.webkit.org/changeset/176293 [3] http://trac.webkit.org/changeset/148891 ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
ArrayBuffers are very special because they are part of ECMAScript. We use unsigned for the length, because once upon a time, that would have been the right type; these days the right type would be a 53-bit integer. So, size_t would be wrong. I believe that ArrayBuffers should be changed to use a very special size type; probably it wouldn't even be a primitive type but a class that carefully checked that you never overflowed 53 bits. -Filip On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org wrote: This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com mailto:cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently raised concerns about this change. it doesn't strike him as the right direction. 4Gb is not much, and we should have more of WebKit work with the right data types, not less.”. I did not initially realize that this change was controversial, but now that it seems it is, I thought I would raise the question on webkit-dev to see what people think about this. Kr, -- Chris Dumez - Apple Inc. Cupertino, CA [1] http://trac.webkit.org/changeset/176275 http://trac.webkit.org/changeset/176275 [2] http://trac.webkit.org/changeset/176293 http://trac.webkit.org/changeset/176293 [3] http://trac.webkit.org/changeset/148891 http://trac.webkit.org/changeset/148891 ___ webkit-dev mailing list webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
I agree with Phil, we should just use unsigned consistently. For very large memory allocations Vector is probably not the right type to use. antti On Wed, Nov 19, 2014 at 10:32 PM, Filip Pizlo fpi...@apple.com wrote: Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently raised concerns about this change. it doesn't strike him as the right direction. 4Gb is not much, and we should have more of WebKit work with the right data types, not less.”. I did not initially realize that this change was controversial, but now that it seems it is, I thought I would raise the question on webkit-dev to see what people think about this. Kr, -- Chris Dumez - Apple Inc. Cupertino, CA [1] http://trac.webkit.org/changeset/176275 [2] http://trac.webkit.org/changeset/176293 [3] http://trac.webkit.org/changeset/148891 ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
I agree with Filip and Antti. The internal type of Vector is an orthogonal issue. Keeping the old size_t API with the unsigned internal has caused a mess of types in the codebase and some confusion for contributors. We should fix the API and clean up our code. There are cases were we can expect large vectors. It would be reasonable to have the storage type as a template argument of Vector. But in most cases we have a lot of very small vectors, not a small amount of large vectors. We could also have a BigVector class that would be optimized for bigger allocations. Benjamin On 11/19/14 12:54 PM, Alexey Proskuryakov wrote: This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com mailto:cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently raised concerns about this change. it doesn't strike him as the right direction. 4Gb is not much, and we should have more of WebKit work with the right data types, not less.”. I did not initially realize that this change was controversial, but now that it seems it is, I thought I would raise the question on webkit-dev to see what people think about this. Kr, -- Chris Dumez - Apple Inc. Cupertino, CA [1] http://trac.webkit.org/changeset/176275 [2] http://trac.webkit.org/changeset/176293 [3] http://trac.webkit.org/changeset/148891 ___ webkit-dev mailing list webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org mailto:webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
That makes good sense for internal implementation, do you think that class API should also use a custom type, or should it use size_t? With Vector though, I don't know how we would differentiate code paths that need large allocations from ones that don't. Nearly anything that is exposed as a JS API or deals with external world can hit sizes over 4Gb. That's not out of reach in most scenarios, not even for resources loaded from network. - Alexey 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com написал(а): ArrayBuffers are very special because they are part of ECMAScript. We use unsigned for the length, because once upon a time, that would have been the right type; these days the right type would be a 53-bit integer. So, size_t would be wrong. I believe that ArrayBuffers should be changed to use a very special size type; probably it wouldn't even be a primitive type but a class that carefully checked that you never overflowed 53 bits. -Filip On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org wrote: This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently raised concerns about this change. it doesn't strike him as the right direction. 4Gb is not much, and we should have more of WebKit work with the right data types, not less.”. I did not initially realize that this change was controversial, but now that it seems it is, I thought I would raise the question on webkit-dev to see what people think about this. Kr, -- Chris Dumez - Apple Inc. Cupertino, CA [1] http://trac.webkit.org/changeset/176275 [2] http://trac.webkit.org/changeset/176293 [3] http://trac.webkit.org/changeset/148891 ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev ___ webkit-dev mailing list
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
I don’t haver a specific opinion on size_t vs unsigned, but I have a related point: If Vector uses unsigned, then it must RELEASE_ASSERT that it does not overflow unsigned when growing. Even if most normal content allocates 4GB, exploit content surely will try to allocate 4GB. Chris, maybe you can make this change in trunk, since trunk uses unsigned? Geoff On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org wrote: That makes good sense for internal implementation, do you think that class API should also use a custom type, or should it use size_t? With Vector though, I don't know how we would differentiate code paths that need large allocations from ones that don't. Nearly anything that is exposed as a JS API or deals with external world can hit sizes over 4Gb. That's not out of reach in most scenarios, not even for resources loaded from network. - Alexey 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): ArrayBuffers are very special because they are part of ECMAScript. We use unsigned for the length, because once upon a time, that would have been the right type; these days the right type would be a 53-bit integer. So, size_t would be wrong. I believe that ArrayBuffers should be changed to use a very special size type; probably it wouldn't even be a primitive type but a class that carefully checked that you never overflowed 53 bits. -Filip On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com mailto:cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently raised concerns about this change. it doesn't strike him as the right direction. 4Gb is not much, and we should have more of WebKit work with the right data types, not less.”. I did not initially realize that this change was controversial, but now that it seems it is, I thought I would raise the question on webkit-dev to see what people think about this. Kr, -- Chris Dumez - Apple Inc. Cupertino,
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org wrote: That makes good sense for internal implementation, do you think that class API should also use a custom type, or should it use size_t? I would propose having an Int53 type in JSC, because that is now a thing in ECMAScript. ArrayBuffer would use it. Int53 could have an implicit conversion to size_t, but not the other way around. With Vector though, I don't know how we would differentiate code paths that need large allocations from ones that don't. Nearly anything that is exposed as a JS API or deals with external world can hit sizes over 4Gb. That's not out of reach in most scenarios, not even for resources loaded from network. Can you provide an example? - Except for ArrayBuffers, other kinds of JS arrays are still semantically limited to length = 2^31 - 2. - Anytime we have an array of structured data where individual elements are large-ish, a vector of size UINT_MAX would not imply 4GB. It would imply something much larger. All it takes is Vectorvoid* for example - and even with the size being an unsigned, the size limit is actually 32GB on 64-bit. So, I don't think it's accurate to speak of a 4GB limit since this has nothing to do with bytes. -Filip - Alexey 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): ArrayBuffers are very special because they are part of ECMAScript. We use unsigned for the length, because once upon a time, that would have been the right type; these days the right type would be a 53-bit integer. So, size_t would be wrong. I believe that ArrayBuffers should be changed to use a very special size type; probably it wouldn't even be a primitive type but a class that carefully checked that you never overflowed 53 bits. -Filip On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com mailto:cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently raised concerns about this
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
Hi, I believe the Vector class is already checking for overflows. I looked quickly and it seems that when trying to allocate / reallocate the Vector buffer, we will call CRASH() if: (newCapacity std::numeric_limitsunsigned::max() / sizeof(T)) Kr, -- Chris Dumez - Apple Inc. Cupertino, CA On Nov 19, 2014, at 1:57 PM, Geoffrey Garen gga...@apple.com wrote: I don’t haver a specific opinion on size_t vs unsigned, but I have a related point: If Vector uses unsigned, then it must RELEASE_ASSERT that it does not overflow unsigned when growing. Even if most normal content allocates 4GB, exploit content surely will try to allocate 4GB. Chris, maybe you can make this change in trunk, since trunk uses unsigned? Geoff On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: That makes good sense for internal implementation, do you think that class API should also use a custom type, or should it use size_t? With Vector though, I don't know how we would differentiate code paths that need large allocations from ones that don't. Nearly anything that is exposed as a JS API or deals with external world can hit sizes over 4Gb. That's not out of reach in most scenarios, not even for resources loaded from network. - Alexey 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): ArrayBuffers are very special because they are part of ECMAScript. We use unsigned for the length, because once upon a time, that would have been the right type; these days the right type would be a 53-bit integer. So, size_t would be wrong. I believe that ArrayBuffers should be changed to use a very special size type; probably it wouldn't even be a primitive type but a class that carefully checked that you never overflowed 53 bits. -Filip On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com mailto:cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the Vector API, e.g. size(), returns a size_t). - I heard from Joseph that this type mismatch is forcing us (and other projects using WTF) to disable some build time warnings - The few people I talked to before making that change said we should do it However, Alexey recently
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
On a related note I’ve also got a patch up for review that adds bounds checking to Vector::iterator, and that vigorously performs bounds and overflow checking. —Oliver On Nov 19, 2014, at 2:04 PM, Chris Dumez cdu...@apple.com wrote: Hi, I believe the Vector class is already checking for overflows. I looked quickly and it seems that when trying to allocate / reallocate the Vector buffer, we will call CRASH() if: (newCapacity std::numeric_limitsunsigned::max() / sizeof(T)) Kr, -- Chris Dumez - Apple Inc. Cupertino, CA On Nov 19, 2014, at 1:57 PM, Geoffrey Garen gga...@apple.com mailto:gga...@apple.com wrote: I don’t haver a specific opinion on size_t vs unsigned, but I have a related point: If Vector uses unsigned, then it must RELEASE_ASSERT that it does not overflow unsigned when growing. Even if most normal content allocates 4GB, exploit content surely will try to allocate 4GB. Chris, maybe you can make this change in trunk, since trunk uses unsigned? Geoff On Nov 19, 2014, at 1:50 PM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: That makes good sense for internal implementation, do you think that class API should also use a custom type, or should it use size_t? With Vector though, I don't know how we would differentiate code paths that need large allocations from ones that don't. Nearly anything that is exposed as a JS API or deals with external world can hit sizes over 4Gb. That's not out of reach in most scenarios, not even for resources loaded from network. - Alexey 19 нояб. 2014 г., в 13:19, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): ArrayBuffers are very special because they are part of ECMAScript. We use unsigned for the length, because once upon a time, that would have been the right type; these days the right type would be a 53-bit integer. So, size_t would be wrong. I believe that ArrayBuffers should be changed to use a very special size type; probably it wouldn't even be a primitive type but a class that carefully checked that you never overflowed 53 bits. -Filip On Nov 19, 2014, at 12:54 PM, Alexey Proskuryakov a...@webkit.org mailto:a...@webkit.org wrote: This is not exactly about Vector, but if one uses FileReader.prototype.readAsArrayBuffer() on a large file, I think that it overflows ArrayBuffer. WebKit actually crashes when uploading multi-gigabyte files to YouTube, Google Drive and other similar services, although I haven't checked whether it's because of ArrayBuffers, or because of a use of int/unsigned in another code path. I think that we should use size_t everywhere except for perhaps a few key places where memory impact is critical (and of course we need explicit checks when casting down to an unsigned). Or perhaps the rule can be even simpler, and unsigned may never be used for indices and sizes, period. - Alexey 19 нояб. 2014 г., в 12:32, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): Whatever we do, the clients of Vector should be consistent about what type they use. I'm actually fine with Vector internally using unsigned even if the API uses size_t, but right now we have lots of code that uses a mix of size_t and unsigned when indexing into Vectors. That's confusing. If I picked one type to use for all Vector indices, it would be unsigned rather than size_t. Vector being limited to unsigned doesn't imply 4GB unless you do Vectorchar. Usually we have Vectors of pointer-width things, which means 32GB on 64-bit systems (UINT_MAX * sizeof(void*)). Even in a world where we had more than 32GB of memory to use within a single web process, I would hope that we wouldn't use it all on a single Vector and that if we did, we would treat that one specially for a bunch of other sensible reasons (among them being that allocating a contiguous slab of virtual memory of that size is rather taxing). So, size_t would buy us almost nothing since if we had a vector grow large enough to need it, we would be having a bad time already. I wonder: are there cases that anyone remembers where we have tried to use Vector to store more than UINT_MAX things? Another interesting question is: What's the largest number of things that we store into any Vector? We could use such a metric to project how big Vectors might get in the future. -Filip On Nov 19, 2014, at 12:20 PM, Chris Dumez cdu...@apple.com mailto:cdu...@apple.com wrote: Hi all, I recently started updating the WTF::Vector API to use unsigned types instead of size_t [1][2], because: - WTF::Vector is already using unsigned type for size/capacity internally to save memory on 64-bit, causing a mismatch between the API and the internal representation [3] - Some reviewers have asked me to use unsigned for loop counters iterating over vectors (which looks unsafe as the
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
19 нояб. 2014 г., в 13:58, Filip Pizlo fpi...@apple.com написал(а): With Vector though, I don't know how we would differentiate code paths that need large allocations from ones that don't. Nearly anything that is exposed as a JS API or deals with external world can hit sizes over 4Gb. That's not out of reach in most scenarios, not even for resources loaded from network. Can you provide an example? Yes. XMLHttpRequest::m_binaryResponseBuilder keeps the downloaded data in a Vector, so any time there is much data, something bad will happen. This is a case that we should support, and not just crash as we would when we think that only exploits would try to use as much memory. All code that is Blob related also uses Vectors, and of course Blobs can legitimately be large. Crypto code uses Vectors internally for the data. These and related uses are all over the place - see also Vectors in FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, plug-in code that loads from network, SharedBuffer etc. - Alexey ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev
Re: [webkit-dev] size_t vs unsigned in WTF::Vector API ?
If we don’t want to crash on overflow, the callers can use the try*() API I believe (e.g. tryAppend()). This returns false (and does not resize the vector) instead of crashing, when we reach the size limit. Kr, -- Chris Dumez - Apple Inc. Cupertino, CA On Nov 19, 2014, at 2:58 PM, Alexey Proskuryakov a...@webkit.org wrote: 19 нояб. 2014 г., в 13:58, Filip Pizlo fpi...@apple.com mailto:fpi...@apple.com написал(а): With Vector though, I don't know how we would differentiate code paths that need large allocations from ones that don't. Nearly anything that is exposed as a JS API or deals with external world can hit sizes over 4Gb. That's not out of reach in most scenarios, not even for resources loaded from network. Can you provide an example? Yes. XMLHttpRequest::m_binaryResponseBuilder keeps the downloaded data in a Vector, so any time there is much data, something bad will happen. This is a case that we should support, and not just crash as we would when we think that only exploits would try to use as much memory. All code that is Blob related also uses Vectors, and of course Blobs can legitimately be large. Crypto code uses Vectors internally for the data. These and related uses are all over the place - see also Vectors in FormDataBuilder, data returned from FrameLoader::loadResourceSynchronously, plug-in code that loads from network, SharedBuffer etc. - Alexey ___ webkit-dev mailing list webkit-dev@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-dev