Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-15 Thread Jonas Sicking
On Mon, Feb 14, 2011 at 11:38 PM, Pablo Castro
 wrote:
> (sorry for my random out-of-timing previous email on this thread. please see 
> below for an actually up to date reply)
>
> -Original Message-
> From: Jonas Sicking [mailto:jo...@sicking.cc]
> Sent: Monday, February 07, 2011 3:31 PM
>
> On Mon, Feb 7, 2011 at 3:07 PM, Jeremy Orlow  wrote:
>> On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking  wrote:
>>>
>>> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow  wrote:
>>> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking  wrote:
>>> >>
>>> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow 
>>> >> wrote:
>>> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher 
>>> >> > wrote:
>>> >> >>
>>> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>>> >> >>>
>>> >> >>> My current thinking is that we should have some relatively large
>>> >> >>> limitmaybe on the order of 64k?  It seems like it'd be very
>>> >> >>> difficult
>>> >> >>> to
>>> >> >>> hit such a limit with any sort of legitimate use case, and the
>>> >> >>> chances
>>> >> >>> of
>>> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
>>> >> >>> is
>>> >> >>> just
>>> >> >>> not going to work well in any implementation (if it doesn't simply
>>> >> >>> oom
>>> >> >>> the
>>> >> >>> process!).  So despite what I said earlier, I guess I think we
>>> >> >>> should
>>> >> >>> have
>>> >> >>> some limit...but keep it an order of magnitude or two larger than
>>> >> >>> what
>>> >> >>> we
>>> >> >>> expect any legitimate usage to hit just to keep the system as
>>> >> >>> flexible
>>> >> >>> as
>>> >> >>> possible.
>>> >> >>>
>>> >> >>> Does that sound reasonable to people?
>>> >> >>
>>> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>>> >> >>  I'm
>>> >> >> hesitant to spec an exact size as a MUST given how technology has a
>>> >> >> way
>>> >> >> of
>>> >> >> changing in unexpected ways that makes old constraints obsolete.
>>> >> >>  But
>>> >> >> then,
>>> >> >> I may just be overly concerned about this too.
>>> >> >
>>> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>>> >> > develop
>>> >> > against one of the implementations that don't place a limit and then
>>> >> > their
>>> >> > app would break on the others.
>>> >> > The reason that I suggested 64K is that it seems outrageously big for
>>> >> > the
>>> >> > data types that we're looking at.  But it's too small to do much with
>>> >> > base64
>>> >> > encoding binary blobs into it or anything else like that that I could
>>> >> > see
>>> >> > becoming rather large.  So it seems like a limit that'd avoid major
>>> >> > abuses
>>> >> > (where someone is probably approaching the problem wrong) but would
>>> >> > not
>>> >> > come
>>> >> > close to limiting any practical use I can imagine.
>>> >> > With our architecture in Chrome, we will probably need to have some
>>> >> > limit.
>>> >> >  We haven't decided what that is yet, but since I remember others
>>> >> > saying
>>> >> > similar things when we talked about this at TPAC, it seems like it
>>> >> > might
>>> >> > be
>>> >> > best to standardize it--even though it does feel a bit dirty.
>>> >>
>>> >> One problem with putting a limit is that it basically forces
>>> >> implementations to use a specific encoding, or pay a hefty price. For
>>> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>>> >> data? If it is of UTF8 data, and the implementation uses something
>>> >> else to store the date, you risk having to convert the data just to
>>> >> measure the size. Possibly this would be different if we measured size
>>> >> using UTF16 as javascript more or less enforces that the source string
>>> >> is UTF16 which means that you can measure utf16 size on the cheap,
>>> >> even if the stored data uses a different format.
>>> >
>>> > That's a very good point.  What's your suggestion then?  Spec unlimited
>>> > storage and have non-normative text saying that
>>> > most implementations will
>>> > likely have some limit?  Maybe we can at least spec a minimum limit in
>>> > terms
>>> > of a particular character encoding?  (Implementations could translate
>>> > this
>>> > into the worst case size for their own native encoding and then ensure
>>> > their
>>> > limit is higher.)
>>>
>>> I'm fine with relying on UTF16 encoding size and specifying a 64K
>>> limit. Like Shawn points out, this API is fairly geared towards
>>> JavaScript anyway (and I personally don't think that's a bad thing).
>>> One thing that I just thought of is that even if implementations use
>>> other encodings, you can in the vast majority of cases do a worst-case
>>> estimate and easily see that the key that is used is below 64K.
>>>
>>> That said, does having a 64K limit really help anyone? In SQLite we
>>> can easily store vastly more than that, enough that we don't have to
>>> specify a limit. And my understanding is that in the Microsoft
>>> implementation, the limits for what they can store without re

RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-14 Thread Pablo Castro
(sorry for my random out-of-timing previous email on this thread. please see 
below for an actually up to date reply)

-Original Message-
From: Jonas Sicking [mailto:jo...@sicking.cc] 
Sent: Monday, February 07, 2011 3:31 PM

On Mon, Feb 7, 2011 at 3:07 PM, Jeremy Orlow  wrote:
> On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking  wrote:
>>
>> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow  wrote:
>> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking  wrote:
>> >>
>> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow 
>> >> wrote:
>> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher 
>> >> > wrote:
>> >> >>
>> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>> >> >>>
>> >> >>> My current thinking is that we should have some relatively large
>> >> >>> limitmaybe on the order of 64k?  It seems like it'd be very
>> >> >>> difficult
>> >> >>> to
>> >> >>> hit such a limit with any sort of legitimate use case, and the
>> >> >>> chances
>> >> >>> of
>> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
>> >> >>> is
>> >> >>> just
>> >> >>> not going to work well in any implementation (if it doesn't simply
>> >> >>> oom
>> >> >>> the
>> >> >>> process!).  So despite what I said earlier, I guess I think we
>> >> >>> should
>> >> >>> have
>> >> >>> some limit...but keep it an order of magnitude or two larger than
>> >> >>> what
>> >> >>> we
>> >> >>> expect any legitimate usage to hit just to keep the system as
>> >> >>> flexible
>> >> >>> as
>> >> >>> possible.
>> >> >>>
>> >> >>> Does that sound reasonable to people?
>> >> >>
>> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>> >> >>  I'm
>> >> >> hesitant to spec an exact size as a MUST given how technology has a
>> >> >> way
>> >> >> of
>> >> >> changing in unexpected ways that makes old constraints obsolete.
>> >> >>  But
>> >> >> then,
>> >> >> I may just be overly concerned about this too.
>> >> >
>> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>> >> > develop
>> >> > against one of the implementations that don't place a limit and then
>> >> > their
>> >> > app would break on the others.
>> >> > The reason that I suggested 64K is that it seems outrageously big for
>> >> > the
>> >> > data types that we're looking at.  But it's too small to do much with
>> >> > base64
>> >> > encoding binary blobs into it or anything else like that that I could
>> >> > see
>> >> > becoming rather large.  So it seems like a limit that'd avoid major
>> >> > abuses
>> >> > (where someone is probably approaching the problem wrong) but would
>> >> > not
>> >> > come
>> >> > close to limiting any practical use I can imagine.
>> >> > With our architecture in Chrome, we will probably need to have some
>> >> > limit.
>> >> >  We haven't decided what that is yet, but since I remember others
>> >> > saying
>> >> > similar things when we talked about this at TPAC, it seems like it
>> >> > might
>> >> > be
>> >> > best to standardize it--even though it does feel a bit dirty.
>> >>
>> >> One problem with putting a limit is that it basically forces
>> >> implementations to use a specific encoding, or pay a hefty price. For
>> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>> >> data? If it is of UTF8 data, and the implementation uses something
>> >> else to store the date, you risk having to convert the data just to
>> >> measure the size. Possibly this would be different if we measured size
>> >> using UTF16 as javascript more or less enforces that the source string
>> >> is UTF16 which means that you can measure utf16 size on the cheap,
>> >> even if the stored data uses a different format.
>> >
>> > That's a very good point.  What's your suggestion then?  Spec unlimited
>> > storage and have non-normative text saying that
>> > most implementations will
>> > likely have some limit?  Maybe we can at least spec a minimum limit in
>> > terms
>> > of a particular character encoding?  (Implementations could translate
>> > this
>> > into the worst case size for their own native encoding and then ensure
>> > their
>> > limit is higher.)
>>
>> I'm fine with relying on UTF16 encoding size and specifying a 64K
>> limit. Like Shawn points out, this API is fairly geared towards
>> JavaScript anyway (and I personally don't think that's a bad thing).
>> One thing that I just thought of is that even if implementations use
>> other encodings, you can in the vast majority of cases do a worst-case
>> estimate and easily see that the key that is used is below 64K.
>>
>> That said, does having a 64K limit really help anyone? In SQLite we
>> can easily store vastly more than that, enough that we don't have to
>> specify a limit. And my understanding is that in the Microsoft
>> implementation, the limits for what they can store without resorting
>> to various tricks, is much lower. So since that implementation will
>> have to implement special handling of long keys anyway, is there a
>> difference between say

RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-14 Thread Pablo Castro

>> From: jor...@google.com [mailto:jor...@google.com] On Behalf Of Jeremy Orlow
>> Sent: Sunday, February 06, 2011 12:43 PM
>>
>> On Tue, Dec 14, 2010 at 4:26 PM, Pablo Castro  
>> wrote:
>>
>> From: jor...@google.com [mailto:jor...@google.com] On Behalf Of Jeremy Orlow
>> Sent: Tuesday, December 14, 2010 4:23 PM
>>
>> >> On Wed, Dec 15, 2010 at 12:19 AM, Pablo Castro 
>> >>  wrote:
>> >>
>> >> From: public-webapps-requ...@w3.org 
>> >> [mailto:public-webapps-requ...@w3.org] On Behalf Of Jonas Sicking
>> >> Sent: Friday, December 10, 2010 1:42 PM
>> >>
>> >> >> On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow  
>> >> >> wrote:
>> >> >> > Any more thoughts on this?
>> >> >>
>> >> >> I don't feel strongly one way or another. Implementation wise I don't
>> >> >> really understand why implementations couldn't use keys of unlimited
>> >> >> size. I wouldn't imagine implementations would want to use fixed-size
>> >> >> allocations for every key anyway, right (which would be a strong
>> >> >> reason to keep maximum size down).
>> >> I don't have a very strong opinion either. I don't quite agree with the 
>> >> guideline of "having something working slowly is better than not working 
>> >> at all"...as having something not work at all sometimes may help 
>> >> developers hit a wall and think differently about their approach for a 
>> >> given problem. That said, if folks think this is an instance where we're 
>> >> better off not having a limit I'm fine with it.
>> >>
>> >> My only concern is that the developer might not hit this wall, but then 
>> >> some user (doing things the developer didn't fully anticipate) could hit 
>> >> that wall.  I can definitely see both sides of the argument though.  And 
>> >> elsewhere we've headed more in the direction of forcing the developer to 
>> >> think about performance, but this case seems a bit more non-deterministic 
>> >> than any of those.
 >>
>> Yeah, that's a good point for this case, avoiding data-dependent errors is 
>> probably worth the perf hit.
 >>
>> My current thinking is that we should have some relatively large 
>> limitmaybe on the order of 64k?  It seems like it'd be very difficult to 
>> hit such a limit with any sort of legitimate use case, and the chances of 
>> some subtle data-dependent error would be much less.  But a 1GB key is just 
>> not going to work well in any implementation (if it doesn't simply oom the 
>> process!).  So despite what I said earlier, I guess I think we should have 
>> some limit...but keep it an order of magnitude or two larger than what we 
>> expect any legitimate usage to hit just to keep the system as flexible as 
>> possible.
>>
>> Does that sound reasonable to people?

I thought we were trying to avoid data-dependent errors and thus shooting for 
having no limit (which may translate into having very large limits in actual 
implementations but not the kind of thing you'd typically hit).  

Specifying an exact size may be a bit weird...I guess an alternative could be 
to spec what is the minimum size UAs need to support. A related problem is what 
units is this specified in, if it's bytes then that means developers need to 
make assumptions about how strings are stored or something.

-pablo
 



Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-07 Thread Jonas Sicking
On Mon, Feb 7, 2011 at 3:07 PM, Jeremy Orlow  wrote:
> On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking  wrote:
>>
>> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow  wrote:
>> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking  wrote:
>> >>
>> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow 
>> >> wrote:
>> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher 
>> >> > wrote:
>> >> >>
>> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>> >> >>>
>> >> >>> My current thinking is that we should have some relatively large
>> >> >>> limitmaybe on the order of 64k?  It seems like it'd be very
>> >> >>> difficult
>> >> >>> to
>> >> >>> hit such a limit with any sort of legitimate use case, and the
>> >> >>> chances
>> >> >>> of
>> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
>> >> >>> is
>> >> >>> just
>> >> >>> not going to work well in any implementation (if it doesn't simply
>> >> >>> oom
>> >> >>> the
>> >> >>> process!).  So despite what I said earlier, I guess I think we
>> >> >>> should
>> >> >>> have
>> >> >>> some limit...but keep it an order of magnitude or two larger than
>> >> >>> what
>> >> >>> we
>> >> >>> expect any legitimate usage to hit just to keep the system as
>> >> >>> flexible
>> >> >>> as
>> >> >>> possible.
>> >> >>>
>> >> >>> Does that sound reasonable to people?
>> >> >>
>> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>> >> >>  I'm
>> >> >> hesitant to spec an exact size as a MUST given how technology has a
>> >> >> way
>> >> >> of
>> >> >> changing in unexpected ways that makes old constraints obsolete.
>> >> >>  But
>> >> >> then,
>> >> >> I may just be overly concerned about this too.
>> >> >
>> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>> >> > develop
>> >> > against one of the implementations that don't place a limit and then
>> >> > their
>> >> > app would break on the others.
>> >> > The reason that I suggested 64K is that it seems outrageously big for
>> >> > the
>> >> > data types that we're looking at.  But it's too small to do much with
>> >> > base64
>> >> > encoding binary blobs into it or anything else like that that I could
>> >> > see
>> >> > becoming rather large.  So it seems like a limit that'd avoid major
>> >> > abuses
>> >> > (where someone is probably approaching the problem wrong) but would
>> >> > not
>> >> > come
>> >> > close to limiting any practical use I can imagine.
>> >> > With our architecture in Chrome, we will probably need to have some
>> >> > limit.
>> >> >  We haven't decided what that is yet, but since I remember others
>> >> > saying
>> >> > similar things when we talked about this at TPAC, it seems like it
>> >> > might
>> >> > be
>> >> > best to standardize it--even though it does feel a bit dirty.
>> >>
>> >> One problem with putting a limit is that it basically forces
>> >> implementations to use a specific encoding, or pay a hefty price. For
>> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>> >> data? If it is of UTF8 data, and the implementation uses something
>> >> else to store the date, you risk having to convert the data just to
>> >> measure the size. Possibly this would be different if we measured size
>> >> using UTF16 as javascript more or less enforces that the source string
>> >> is UTF16 which means that you can measure utf16 size on the cheap,
>> >> even if the stored data uses a different format.
>> >
>> > That's a very good point.  What's your suggestion then?  Spec unlimited
>> > storage and have non-normative text saying that
>> > most implementations will
>> > likely have some limit?  Maybe we can at least spec a minimum limit in
>> > terms
>> > of a particular character encoding?  (Implementations could translate
>> > this
>> > into the worst case size for their own native encoding and then ensure
>> > their
>> > limit is higher.)
>>
>> I'm fine with relying on UTF16 encoding size and specifying a 64K
>> limit. Like Shawn points out, this API is fairly geared towards
>> JavaScript anyway (and I personally don't think that's a bad thing).
>> One thing that I just thought of is that even if implementations use
>> other encodings, you can in the vast majority of cases do a worst-case
>> estimate and easily see that the key that is used is below 64K.
>>
>> That said, does having a 64K limit really help anyone? In SQLite we
>> can easily store vastly more than that, enough that we don't have to
>> specify a limit. And my understanding is that in the Microsoft
>> implementation, the limits for what they can store without resorting
>> to various tricks, is much lower. So since that implementation will
>> have to implement special handling of long keys anyway, is there a
>> difference between saying a 64K limit vs. saying unlimited?
>
> As I explained earlier: "The reason that I suggested 64K is that it seems
> outrageously big for the data types that we're looking at.  But it's too
> small to do much with base64 encoding bin

Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-07 Thread Jeremy Orlow
On Mon, Feb 7, 2011 at 2:49 PM, Jonas Sicking  wrote:

> On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow  wrote:
> > On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking  wrote:
> >>
> >> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow 
> wrote:
> >> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher 
> >> > wrote:
> >> >>
> >> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
> >> >>>
> >> >>> My current thinking is that we should have some relatively large
> >> >>> limitmaybe on the order of 64k?  It seems like it'd be very
> >> >>> difficult
> >> >>> to
> >> >>> hit such a limit with any sort of legitimate use case, and the
> chances
> >> >>> of
> >> >>> some subtle data-dependent error would be much less.  But a 1GB key
> is
> >> >>> just
> >> >>> not going to work well in any implementation (if it doesn't simply
> oom
> >> >>> the
> >> >>> process!).  So despite what I said earlier, I guess I think we
> should
> >> >>> have
> >> >>> some limit...but keep it an order of magnitude or two larger than
> what
> >> >>> we
> >> >>> expect any legitimate usage to hit just to keep the system as
> flexible
> >> >>> as
> >> >>> possible.
> >> >>>
> >> >>> Does that sound reasonable to people?
> >> >>
> >> >> Are we thinking about making this a MUST requirement, or a SHOULD?
>  I'm
> >> >> hesitant to spec an exact size as a MUST given how technology has a
> way
> >> >> of
> >> >> changing in unexpected ways that makes old constraints obsolete.  But
> >> >> then,
> >> >> I may just be overly concerned about this too.
> >> >
> >> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
> >> > develop
> >> > against one of the implementations that don't place a limit and then
> >> > their
> >> > app would break on the others.
> >> > The reason that I suggested 64K is that it seems outrageously big for
> >> > the
> >> > data types that we're looking at.  But it's too small to do much with
> >> > base64
> >> > encoding binary blobs into it or anything else like that that I could
> >> > see
> >> > becoming rather large.  So it seems like a limit that'd avoid major
> >> > abuses
> >> > (where someone is probably approaching the problem wrong) but would
> not
> >> > come
> >> > close to limiting any practical use I can imagine.
> >> > With our architecture in Chrome, we will probably need to have some
> >> > limit.
> >> >  We haven't decided what that is yet, but since I remember others
> saying
> >> > similar things when we talked about this at TPAC, it seems like it
> might
> >> > be
> >> > best to standardize it--even though it does feel a bit dirty.
> >>
> >> One problem with putting a limit is that it basically forces
> >> implementations to use a specific encoding, or pay a hefty price. For
> >> example if we choose a 64K limit, is that of UTF8 data or of UTF16
> >> data? If it is of UTF8 data, and the implementation uses something
> >> else to store the date, you risk having to convert the data just to
> >> measure the size. Possibly this would be different if we measured size
> >> using UTF16 as javascript more or less enforces that the source string
> >> is UTF16 which means that you can measure utf16 size on the cheap,
> >> even if the stored data uses a different format.
> >
> > That's a very good point.  What's your suggestion then?  Spec unlimited
> > storage and have non-normative text saying that most implementations will
> > likely have some limit?  Maybe we can at least spec a minimum limit in
> terms
> > of a particular character encoding?  (Implementations could translate
> this
> > into the worst case size for their own native encoding and then ensure
> their
> > limit is higher.)
>
> I'm fine with relying on UTF16 encoding size and specifying a 64K
> limit. Like Shawn points out, this API is fairly geared towards
> JavaScript anyway (and I personally don't think that's a bad thing).
> One thing that I just thought of is that even if implementations use
> other encodings, you can in the vast majority of cases do a worst-case
> estimate and easily see that the key that is used is below 64K.
>
> That said, does having a 64K limit really help anyone? In SQLite we
> can easily store vastly more than that, enough that we don't have to
> specify a limit. And my understanding is that in the Microsoft
> implementation, the limits for what they can store without resorting
> to various tricks, is much lower. So since that implementation will
> have to implement special handling of long keys anyway, is there a
> difference between saying a 64K limit vs. saying unlimited?
>

As I explained earlier: "The reason that I suggested 64K is that it seems
outrageously big for the data types that we're looking at.  But it's too
small to do much with base64 encoding binary blobs into it or anything else
like that that I could see becoming rather large.  So it seems like a limit
that'd avoid major abuses (where someone is probably approaching the problem
wrong) but would not come close to limiting any practical use I can ima

Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-07 Thread Jonas Sicking
On Sun, Feb 6, 2011 at 11:41 PM, Jeremy Orlow  wrote:
> On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking  wrote:
>>
>> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow  wrote:
>> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher 
>> > wrote:
>> >>
>> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>> >>>
>> >>> My current thinking is that we should have some relatively large
>> >>> limitmaybe on the order of 64k?  It seems like it'd be very
>> >>> difficult
>> >>> to
>> >>> hit such a limit with any sort of legitimate use case, and the chances
>> >>> of
>> >>> some subtle data-dependent error would be much less.  But a 1GB key is
>> >>> just
>> >>> not going to work well in any implementation (if it doesn't simply oom
>> >>> the
>> >>> process!).  So despite what I said earlier, I guess I think we should
>> >>> have
>> >>> some limit...but keep it an order of magnitude or two larger than what
>> >>> we
>> >>> expect any legitimate usage to hit just to keep the system as flexible
>> >>> as
>> >>> possible.
>> >>>
>> >>> Does that sound reasonable to people?
>> >>
>> >> Are we thinking about making this a MUST requirement, or a SHOULD?  I'm
>> >> hesitant to spec an exact size as a MUST given how technology has a way
>> >> of
>> >> changing in unexpected ways that makes old constraints obsolete.  But
>> >> then,
>> >> I may just be overly concerned about this too.
>> >
>> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
>> > develop
>> > against one of the implementations that don't place a limit and then
>> > their
>> > app would break on the others.
>> > The reason that I suggested 64K is that it seems outrageously big for
>> > the
>> > data types that we're looking at.  But it's too small to do much with
>> > base64
>> > encoding binary blobs into it or anything else like that that I could
>> > see
>> > becoming rather large.  So it seems like a limit that'd avoid major
>> > abuses
>> > (where someone is probably approaching the problem wrong) but would not
>> > come
>> > close to limiting any practical use I can imagine.
>> > With our architecture in Chrome, we will probably need to have some
>> > limit.
>> >  We haven't decided what that is yet, but since I remember others saying
>> > similar things when we talked about this at TPAC, it seems like it might
>> > be
>> > best to standardize it--even though it does feel a bit dirty.
>>
>> One problem with putting a limit is that it basically forces
>> implementations to use a specific encoding, or pay a hefty price. For
>> example if we choose a 64K limit, is that of UTF8 data or of UTF16
>> data? If it is of UTF8 data, and the implementation uses something
>> else to store the date, you risk having to convert the data just to
>> measure the size. Possibly this would be different if we measured size
>> using UTF16 as javascript more or less enforces that the source string
>> is UTF16 which means that you can measure utf16 size on the cheap,
>> even if the stored data uses a different format.
>
> That's a very good point.  What's your suggestion then?  Spec unlimited
> storage and have non-normative text saying that most implementations will
> likely have some limit?  Maybe we can at least spec a minimum limit in terms
> of a particular character encoding?  (Implementations could translate this
> into the worst case size for their own native encoding and then ensure their
> limit is higher.)

I'm fine with relying on UTF16 encoding size and specifying a 64K
limit. Like Shawn points out, this API is fairly geared towards
JavaScript anyway (and I personally don't think that's a bad thing).
One thing that I just thought of is that even if implementations use
other encodings, you can in the vast majority of cases do a worst-case
estimate and easily see that the key that is used is below 64K.

That said, does having a 64K limit really help anyone? In SQLite we
can easily store vastly more than that, enough that we don't have to
specify a limit. And my understanding is that in the Microsoft
implementation, the limits for what they can store without resorting
to various tricks, is much lower. So since that implementation will
have to implement special handling of long keys anyway, is there a
difference between saying a 64K limit vs. saying unlimited?

Pablo: Would love to get your input on the above.

/ Jonas



Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-07 Thread Shawn Wilsher

On 2/7/2011 12:32 AM, Glenn Maynard wrote:

Is that a safe assumption to design around?  The API might later be bound to
other languages fortunate enough not to be stuck in UTF-16.
As I recall, we've already made design decisions based on the fact that 
the primary consumer of this API is going to be JavaScript on the web. 
(What those decisions were about, I don't recall offhand, however.)


Cheers,

Shawn



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-07 Thread Glenn Maynard
On Mon, Feb 7, 2011 at 2:38 AM, Jonas Sicking  wrote:

> One problem with putting a limit is that it basically forces
> implementations to use a specific encoding, or pay a hefty price. For
> example if we choose a 64K limit, is that of UTF8 data or of UTF16
> data? If it is of UTF8 data, and the implementation uses something
> else to store the date, you risk having to convert the data just to
> measure the size. Possibly this would be different if we measured size
> using UTF16 as javascript more or less enforces that the source string
> is UTF16 which means that you can measure utf16 size on the cheap,
> even if the stored data uses a different format.
>

Is that a safe assumption to design around?  The API might later be bound to
other languages fortunate enough not to be stuck in UTF-16.

-- 
Glenn Maynard


Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-06 Thread Jeremy Orlow
On Sun, Feb 6, 2011 at 11:38 PM, Jonas Sicking  wrote:

> On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow  wrote:
> > On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher 
> wrote:
> >>
> >> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
> >>>
> >>> My current thinking is that we should have some relatively large
> >>> limitmaybe on the order of 64k?  It seems like it'd be very
> difficult
> >>> to
> >>> hit such a limit with any sort of legitimate use case, and the chances
> of
> >>> some subtle data-dependent error would be much less.  But a 1GB key is
> >>> just
> >>> not going to work well in any implementation (if it doesn't simply oom
> >>> the
> >>> process!).  So despite what I said earlier, I guess I think we should
> >>> have
> >>> some limit...but keep it an order of magnitude or two larger than what
> we
> >>> expect any legitimate usage to hit just to keep the system as flexible
> as
> >>> possible.
> >>>
> >>> Does that sound reasonable to people?
> >>
> >> Are we thinking about making this a MUST requirement, or a SHOULD?  I'm
> >> hesitant to spec an exact size as a MUST given how technology has a way
> of
> >> changing in unexpected ways that makes old constraints obsolete.  But
> then,
> >> I may just be overly concerned about this too.
> >
> > If we put a limit, it'd be a MUST for sure.  Otherwise people would
> develop
> > against one of the implementations that don't place a limit and then
> their
> > app would break on the others.
> > The reason that I suggested 64K is that it seems outrageously big for the
> > data types that we're looking at.  But it's too small to do much with
> base64
> > encoding binary blobs into it or anything else like that that I could see
> > becoming rather large.  So it seems like a limit that'd avoid major
> abuses
> > (where someone is probably approaching the problem wrong) but would not
> come
> > close to limiting any practical use I can imagine.
> > With our architecture in Chrome, we will probably need to have some
> limit.
> >  We haven't decided what that is yet, but since I remember others saying
> > similar things when we talked about this at TPAC, it seems like it might
> be
> > best to standardize it--even though it does feel a bit dirty.
>
> One problem with putting a limit is that it basically forces
> implementations to use a specific encoding, or pay a hefty price. For
> example if we choose a 64K limit, is that of UTF8 data or of UTF16
> data? If it is of UTF8 data, and the implementation uses something
> else to store the date, you risk having to convert the data just to
> measure the size. Possibly this would be different if we measured size
> using UTF16 as javascript more or less enforces that the source string
> is UTF16 which means that you can measure utf16 size on the cheap,
> even if the stored data uses a different format.
>

That's a very good point.  What's your suggestion then?  Spec unlimited
storage and have non-normative text saying that most implementations will
likely have some limit?  Maybe we can at least spec a minimum limit in terms
of a particular character encoding?  (Implementations could translate this
into the worst case size for their own native encoding and then ensure their
limit is higher.)

J


Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-06 Thread Jonas Sicking
On Sun, Feb 6, 2011 at 2:31 PM, Jeremy Orlow  wrote:
> On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher  wrote:
>>
>> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>>>
>>> My current thinking is that we should have some relatively large
>>> limitmaybe on the order of 64k?  It seems like it'd be very difficult
>>> to
>>> hit such a limit with any sort of legitimate use case, and the chances of
>>> some subtle data-dependent error would be much less.  But a 1GB key is
>>> just
>>> not going to work well in any implementation (if it doesn't simply oom
>>> the
>>> process!).  So despite what I said earlier, I guess I think we should
>>> have
>>> some limit...but keep it an order of magnitude or two larger than what we
>>> expect any legitimate usage to hit just to keep the system as flexible as
>>> possible.
>>>
>>> Does that sound reasonable to people?
>>
>> Are we thinking about making this a MUST requirement, or a SHOULD?  I'm
>> hesitant to spec an exact size as a MUST given how technology has a way of
>> changing in unexpected ways that makes old constraints obsolete.  But then,
>> I may just be overly concerned about this too.
>
> If we put a limit, it'd be a MUST for sure.  Otherwise people would develop
> against one of the implementations that don't place a limit and then their
> app would break on the others.
> The reason that I suggested 64K is that it seems outrageously big for the
> data types that we're looking at.  But it's too small to do much with base64
> encoding binary blobs into it or anything else like that that I could see
> becoming rather large.  So it seems like a limit that'd avoid major abuses
> (where someone is probably approaching the problem wrong) but would not come
> close to limiting any practical use I can imagine.
> With our architecture in Chrome, we will probably need to have some limit.
>  We haven't decided what that is yet, but since I remember others saying
> similar things when we talked about this at TPAC, it seems like it might be
> best to standardize it--even though it does feel a bit dirty.

One problem with putting a limit is that it basically forces
implementations to use a specific encoding, or pay a hefty price. For
example if we choose a 64K limit, is that of UTF8 data or of UTF16
data? If it is of UTF8 data, and the implementation uses something
else to store the date, you risk having to convert the data just to
measure the size. Possibly this would be different if we measured size
using UTF16 as javascript more or less enforces that the source string
is UTF16 which means that you can measure utf16 size on the cheap,
even if the stored data uses a different format.

/ Jonas



Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-06 Thread Jeremy Orlow
On Sun, Feb 6, 2011 at 2:03 PM, Shawn Wilsher  wrote:

> On 2/6/2011 12:42 PM, Jeremy Orlow wrote:
>
>> My current thinking is that we should have some relatively large
>> limitmaybe on the order of 64k?  It seems like it'd be very difficult
>> to
>> hit such a limit with any sort of legitimate use case, and the chances of
>> some subtle data-dependent error would be much less.  But a 1GB key is
>> just
>> not going to work well in any implementation (if it doesn't simply oom the
>> process!).  So despite what I said earlier, I guess I think we should have
>> some limit...but keep it an order of magnitude or two larger than what we
>> expect any legitimate usage to hit just to keep the system as flexible as
>> possible.
>>
>> Does that sound reasonable to people?
>>
> Are we thinking about making this a MUST requirement, or a SHOULD?  I'm
> hesitant to spec an exact size as a MUST given how technology has a way of
> changing in unexpected ways that makes old constraints obsolete.  But then,
> I may just be overly concerned about this too.
>

If we put a limit, it'd be a MUST for sure.  Otherwise people would develop
against one of the implementations that don't place a limit and then their
app would break on the others.

The reason that I suggested 64K is that it seems outrageously big for the
data types that we're looking at.  But it's too small to do much with base64
encoding binary blobs into it or anything else like that that I could see
becoming rather large.  So it seems like a limit that'd avoid major abuses
(where someone is probably approaching the problem wrong) but would not come
close to limiting any practical use I can imagine.

With our architecture in Chrome, we will probably need to have some limit.
 We haven't decided what that is yet, but since I remember others saying
similar things when we talked about this at TPAC, it seems like it might be
best to standardize it--even though it does feel a bit dirty.

J


Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-06 Thread Shawn Wilsher

On 2/6/2011 12:42 PM, Jeremy Orlow wrote:

My current thinking is that we should have some relatively large
limitmaybe on the order of 64k?  It seems like it'd be very difficult to
hit such a limit with any sort of legitimate use case, and the chances of
some subtle data-dependent error would be much less.  But a 1GB key is just
not going to work well in any implementation (if it doesn't simply oom the
process!).  So despite what I said earlier, I guess I think we should have
some limit...but keep it an order of magnitude or two larger than what we
expect any legitimate usage to hit just to keep the system as flexible as
possible.

Does that sound reasonable to people?
Are we thinking about making this a MUST requirement, or a SHOULD?  I'm 
hesitant to spec an exact size as a MUST given how technology has a way 
of changing in unexpected ways that makes old constraints obsolete.  But 
then, I may just be overly concerned about this too.


Cheers,

Shawn



smime.p7s
Description: S/MIME Cryptographic Signature


Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2011-02-06 Thread Jeremy Orlow
On Tue, Dec 14, 2010 at 4:26 PM, Pablo Castro wrote:

>
> From: jor...@google.com [mailto:jor...@google.com] On Behalf Of Jeremy
> Orlow
> Sent: Tuesday, December 14, 2010 4:23 PM
>
> >> On Wed, Dec 15, 2010 at 12:19 AM, Pablo Castro <
> pablo.cas...@microsoft.com> wrote:
> >>
> >> From: public-webapps-requ...@w3.org [mailto:
> public-webapps-requ...@w3.org] On Behalf Of Jonas Sicking
> >> Sent: Friday, December 10, 2010 1:42 PM
> >>
> >> >> On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow 
> wrote:
> >> >> > Any more thoughts on this?
> >> >>
> >> >> I don't feel strongly one way or another. Implementation wise I don't
> >> >> really understand why implementations couldn't use keys of unlimited
> >> >> size. I wouldn't imagine implementations would want to use fixed-size
> >> >> allocations for every key anyway, right (which would be a strong
> >> >> reason to keep maximum size down).
> >> I don't have a very strong opinion either. I don't quite agree with the
> guideline of "having something working slowly is better than not working at
> all"...as having something not work at all sometimes may help developers hit
> a wall and think differently about their approach for a given problem. That
> said, if folks think this is an instance where we're better off not having a
> limit I'm fine with it.
> >>
> >> My only concern is that the developer might not hit this wall, but then
> some user (doing things the developer didn't fully anticipate) could hit
> that wall.  I can definitely see both sides of the argument though.  And
> elsewhere we've headed more in the direction of forcing the developer to
> think about performance, but this case seems a bit more non-deterministic
> than any of those.
>
> Yeah, that's a good point for this case, avoiding data-dependent errors is
> probably worth the perf hit.


My current thinking is that we should have some relatively large
limitmaybe on the order of 64k?  It seems like it'd be very difficult to
hit such a limit with any sort of legitimate use case, and the chances of
some subtle data-dependent error would be much less.  But a 1GB key is just
not going to work well in any implementation (if it doesn't simply oom the
process!).  So despite what I said earlier, I guess I think we should have
some limit...but keep it an order of magnitude or two larger than what we
expect any legitimate usage to hit just to keep the system as flexible as
possible.

Does that sound reasonable to people?

J


RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-12-14 Thread Pablo Castro

From: jor...@google.com [mailto:jor...@google.com] On Behalf Of Jeremy Orlow
Sent: Tuesday, December 14, 2010 4:23 PM

>> On Wed, Dec 15, 2010 at 12:19 AM, Pablo Castro  
>> wrote:
>>
>> From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] 
>> On Behalf Of Jonas Sicking
>> Sent: Friday, December 10, 2010 1:42 PM
>>
>> >> On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow  wrote:
>> >> > Any more thoughts on this?
>> >>
>> >> I don't feel strongly one way or another. Implementation wise I don't
>> >> really understand why implementations couldn't use keys of unlimited
>> >> size. I wouldn't imagine implementations would want to use fixed-size
>> >> allocations for every key anyway, right (which would be a strong
>> >> reason to keep maximum size down).
>> I don't have a very strong opinion either. I don't quite agree with the 
>> guideline of "having something working slowly is better than not working at 
>> all"...as having something not work at all sometimes may help developers hit 
>> a wall and think differently about their approach for a given problem. That 
>> said, if folks think this is an instance where we're better off not having a 
>> limit I'm fine with it.
>>
>> My only concern is that the developer might not hit this wall, but then some 
>> user (doing things the developer didn't fully anticipate) could hit that 
>> wall.  I can definitely see both sides of the argument though.  And 
>> elsewhere we've headed more in the direction of forcing the developer to 
>> think about performance, but this case seems a bit more non-deterministic 
>> than any of those.
 
Yeah, that's a good point for this case, avoiding data-dependent errors is 
probably worth the perf hit.

-pc




Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-12-14 Thread Jeremy Orlow
On Wed, Dec 15, 2010 at 12:19 AM, Pablo Castro
wrote:

>
> From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org]
> On Behalf Of Jonas Sicking
> Sent: Friday, December 10, 2010 1:42 PM
>
> >> On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow 
> wrote:
> >> > Any more thoughts on this?
> >>
> >> I don't feel strongly one way or another. Implementation wise I don't
> >> really understand why implementations couldn't use keys of unlimited
> >> size. I wouldn't imagine implementations would want to use fixed-size
> >> allocations for every key anyway, right (which would be a strong
> >> reason to keep maximum size down).
>
> I don't have a very strong opinion either. I don't quite agree with the
> guideline of "having something working slowly is better than not working at
> all"...as having something not work at all sometimes may help developers hit
> a wall and think differently about their approach for a given problem. That
> said, if folks think this is an instance where we're better off not having a
> limit I'm fine with it.
>

My only concern is that the developer might not hit this wall, but then some
user (doing things the developer didn't fully anticipate) could hit that
wall.  I can definitely see both sides of the argument though.  And
elsewhere we've headed more in the direction of forcing the developer to
think about performance, but this case seems a bit more non-deterministic
than any of those.


> >> Pablo, do you know why the back ends you were looking at had such
> >> relatively low limits?
>
> Mostly an implementation thing. Keys (and all other non-blob columns)
> typically need to fit in a page.  Predictable perf is also nice (no linked
> lists, high density/locality, etc.), but not as fundamental as page size.
>
> -pablo
>
>


RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-12-14 Thread Pablo Castro

From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] On 
Behalf Of Jonas Sicking
Sent: Friday, December 10, 2010 1:42 PM

>> On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow  wrote:
>> > Any more thoughts on this?
>>
>> I don't feel strongly one way or another. Implementation wise I don't
>> really understand why implementations couldn't use keys of unlimited
>> size. I wouldn't imagine implementations would want to use fixed-size
>> allocations for every key anyway, right (which would be a strong
>> reason to keep maximum size down).

I don't have a very strong opinion either. I don't quite agree with the 
guideline of "having something working slowly is better than not working at 
all"...as having something not work at all sometimes may help developers hit a 
wall and think differently about their approach for a given problem. That said, 
if folks think this is an instance where we're better off not having a limit 
I'm fine with it. 

>> Pablo, do you know why the back ends you were looking at had such
>> relatively low limits?

Mostly an implementation thing. Keys (and all other non-blob columns) typically 
need to fit in a page.  Predictable perf is also nice (no linked lists, high 
density/locality, etc.), but not as fundamental as page size.

-pablo




Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-12-10 Thread Jonas Sicking
On Fri, Dec 10, 2010 at 7:32 AM, Jeremy Orlow  wrote:
> Any more thoughts on this?

I don't feel strongly one way or another. Implementation wise I don't
really understand why implementations couldn't use keys of unlimited
size. I wouldn't imagine implementations would want to use fixed-size
allocations for every key anyway, right (which would be a strong
reason to keep maximum size down).

Pablo, do you know why the back ends you were looking at had such
relatively low limits?

At the same time, I suspect that very few people would run into
problems if we set the limit at a K or two of bytes.

It's in general a good idea to limit strings around somewhere 2^30
bytes as to avoid overflow problems, but such limits are large enough
that I'm not even convinced they need to be specified.

/ Jonas



Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-12-10 Thread Jeremy Orlow
Any more thoughts on this?

On Mon, Nov 22, 2010 at 12:05 PM, Jeremy Orlow  wrote:

> Something working (but with degraded performance) is better than not
> working at all.  Especially when keys will often come from user data/input
> and thus simple web apps will likely not handle the exceptions large keys
> might generate.  Throughout the rest of IndexedDB, we've taken quite a bit
> of care to make sure that we don't throw exceptions on hard to anticipate
> edge cases, I don't think this is terribly different.
>
> Storing a prefix and then doing lookups into the actual value seems like a
> good way for implementations to handle it, but it's certainly not the only
> way.  Yes, this will turn into linear performance in the worst case, but in
> practice I think you'll find that before the linear performance kills you,
> various other issues with using IndexedDB like this will kill you.  :-)
>
> I'm fine with us adding non-normative text reminding people that large keys
> will be slow and having a recommended minimum key size that implementations
> should try and make sure hits a reasonably fast path.  But I think we should
> make sure that implementations don't break with big keys.
>
> J
>
>
> On Sat, Nov 20, 2010 at 10:49 AM, Jonas Sicking  wrote:
>
>> On Fri, Nov 19, 2010 at 8:13 PM, Bjoern Hoehrmann 
>> wrote:
>> > * Jonas Sicking wrote:
>> >>The question is in part where the limit for "ridiculous" goes. 1K keys
>> >>are sort of ridiculous, though I'm sure it happens.
>> >
>> > By "ridiculous" I mean that common systems would run out of memory. That
>> > is different among systems, and I would expect developers to consider it
>> > up to an order of magnitude, but not beyond that. Clearly, to me, a DB
>> > system should not fail because I want to store 100 keys á 100KB.
>>
>> Note that at issue here isn't the total size of keys, but the key size
>> of an individual entry. I'm not sure that I'd expect a 100KB key size
>> to work.
>>
>> >>> Note that, since JavaScript does not offer key-value dictionaries for
>> >>> complex keys, and now that JSON.stringify is widely implemented, it's
>> >>> quite common for people to emulate proper dictionaries by using that
>> to
>> >>> work around this particular JavaScript limitation. Which would likely
>> >>> extend to more persistent forms of storage.
>> >>
>> >>I don't understand what you mean here.
>> >
>> > I am saying that it's quite natural to want to have string keys that are
>> > much, much longer than someone might envision the length of string keys,
>> > mainly because their notion of "string keys" is different from the key
>> > length you might get from serializing arbitrary objects.
>>
>> Still not fully sure I follow you. The only issue here is when using
>> plain strings as keys, objects are not allowed to be used as keys. Or
>> are you saying that people will use the return value from
>> JSON.stringify as key?
>>
>> / Jonas
>>
>>
>


Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-22 Thread Jeremy Orlow
Something working (but with degraded performance) is better than not working
at all.  Especially when keys will often come from user data/input and thus
simple web apps will likely not handle the exceptions large keys might
generate.  Throughout the rest of IndexedDB, we've taken quite a bit of care
to make sure that we don't throw exceptions on hard to anticipate edge
cases, I don't think this is terribly different.

Storing a prefix and then doing lookups into the actual value seems like a
good way for implementations to handle it, but it's certainly not the only
way.  Yes, this will turn into linear performance in the worst case, but in
practice I think you'll find that before the linear performance kills you,
various other issues with using IndexedDB like this will kill you.  :-)

I'm fine with us adding non-normative text reminding people that large keys
will be slow and having a recommended minimum key size that implementations
should try and make sure hits a reasonably fast path.  But I think we should
make sure that implementations don't break with big keys.

J

On Sat, Nov 20, 2010 at 10:49 AM, Jonas Sicking  wrote:

> On Fri, Nov 19, 2010 at 8:13 PM, Bjoern Hoehrmann 
> wrote:
> > * Jonas Sicking wrote:
> >>The question is in part where the limit for "ridiculous" goes. 1K keys
> >>are sort of ridiculous, though I'm sure it happens.
> >
> > By "ridiculous" I mean that common systems would run out of memory. That
> > is different among systems, and I would expect developers to consider it
> > up to an order of magnitude, but not beyond that. Clearly, to me, a DB
> > system should not fail because I want to store 100 keys á 100KB.
>
> Note that at issue here isn't the total size of keys, but the key size
> of an individual entry. I'm not sure that I'd expect a 100KB key size
> to work.
>
> >>> Note that, since JavaScript does not offer key-value dictionaries for
> >>> complex keys, and now that JSON.stringify is widely implemented, it's
> >>> quite common for people to emulate proper dictionaries by using that to
> >>> work around this particular JavaScript limitation. Which would likely
> >>> extend to more persistent forms of storage.
> >>
> >>I don't understand what you mean here.
> >
> > I am saying that it's quite natural to want to have string keys that are
> > much, much longer than someone might envision the length of string keys,
> > mainly because their notion of "string keys" is different from the key
> > length you might get from serializing arbitrary objects.
>
> Still not fully sure I follow you. The only issue here is when using
> plain strings as keys, objects are not allowed to be used as keys. Or
> are you saying that people will use the return value from
> JSON.stringify as key?
>
> / Jonas
>
>


Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-20 Thread Jonas Sicking
On Fri, Nov 19, 2010 at 8:13 PM, Bjoern Hoehrmann  wrote:
> * Jonas Sicking wrote:
>>The question is in part where the limit for "ridiculous" goes. 1K keys
>>are sort of ridiculous, though I'm sure it happens.
>
> By "ridiculous" I mean that common systems would run out of memory. That
> is different among systems, and I would expect developers to consider it
> up to an order of magnitude, but not beyond that. Clearly, to me, a DB
> system should not fail because I want to store 100 keys á 100KB.

Note that at issue here isn't the total size of keys, but the key size
of an individual entry. I'm not sure that I'd expect a 100KB key size
to work.

>>> Note that, since JavaScript does not offer key-value dictionaries for
>>> complex keys, and now that JSON.stringify is widely implemented, it's
>>> quite common for people to emulate proper dictionaries by using that to
>>> work around this particular JavaScript limitation. Which would likely
>>> extend to more persistent forms of storage.
>>
>>I don't understand what you mean here.
>
> I am saying that it's quite natural to want to have string keys that are
> much, much longer than someone might envision the length of string keys,
> mainly because their notion of "string keys" is different from the key
> length you might get from serializing arbitrary objects.

Still not fully sure I follow you. The only issue here is when using
plain strings as keys, objects are not allowed to be used as keys. Or
are you saying that people will use the return value from
JSON.stringify as key?

/ Jonas



Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-20 Thread Keean Schupke
Just a thought, because the spec does not limit the key size, does not mean
the implementation has to index on huge keys. For example you may choose to
index only the first 1000 characters of string keys, and then link the
values of key collisions together in the storage node. This way things are
kept fast and compact for the more normal key size, and there is a sensible
limit.

As long as the implementation behaves like it admits arbitrary key sizes, it
can actually implement things how it likes.

Another example would be one index for keys less than size X, and a separate
"oversize" key index for keys of size greater than X. These could use a
different internal structure and disk layout.


Cheers,
Keean.


On 20 November 2010 04:13, Bjoern Hoehrmann  wrote:

> * Jonas Sicking wrote:
> >The question is in part where the limit for "ridiculous" goes. 1K keys
> >are sort of ridiculous, though I'm sure it happens.
>
> By "ridiculous" I mean that common systems would run out of memory. That
> is different among systems, and I would expect developers to consider it
> up to an order of magnitude, but not beyond that. Clearly, to me, a DB
> system should not fail because I want to store 100 keys á 100KB.
>
> >> Note that, since JavaScript does not offer key-value dictionaries for
> >> complex keys, and now that JSON.stringify is widely implemented, it's
> >> quite common for people to emulate proper dictionaries by using that to
> >> work around this particular JavaScript limitation. Which would likely
> >> extend to more persistent forms of storage.
> >
> >I don't understand what you mean here.
>
> I am saying that it's quite natural to want to have string keys that are
> much, much longer than someone might envision the length of string keys,
> mainly because their notion of "string keys" is different from the key
> length you might get from serializing arbitrary objects.
> --
> Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
> Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
> 25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/
>
>


Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-19 Thread Bjoern Hoehrmann
* Jonas Sicking wrote:
>The question is in part where the limit for "ridiculous" goes. 1K keys
>are sort of ridiculous, though I'm sure it happens.

By "ridiculous" I mean that common systems would run out of memory. That
is different among systems, and I would expect developers to consider it
up to an order of magnitude, but not beyond that. Clearly, to me, a DB
system should not fail because I want to store 100 keys á 100KB.

>> Note that, since JavaScript does not offer key-value dictionaries for
>> complex keys, and now that JSON.stringify is widely implemented, it's
>> quite common for people to emulate proper dictionaries by using that to
>> work around this particular JavaScript limitation. Which would likely
>> extend to more persistent forms of storage.
>
>I don't understand what you mean here.

I am saying that it's quite natural to want to have string keys that are
much, much longer than someone might envision the length of string keys,
mainly because their notion of "string keys" is different from the key
length you might get from serializing arbitrary objects.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-19 Thread Jonas Sicking
On Fri, Nov 19, 2010 at 7:03 PM, Bjoern Hoehrmann  wrote:
> * Pablo Castro wrote:
 Just looking at this list, I guess I'm leaning towards _not_ limiting the
 maximum key size and instead pushing it onto implementations to do the hard
 work here.  If so, we should probably have some normative text about how 
 bigger
 keys will probably not be handled very efficiently.
>>
>>I was trying to make up my mind on this, and I'm not sure this is a good idea.
>>What would be the options for an implementation? Hashing keys into smaller 
>>values
>>is pretty painful because of sorting requirements (we'd have to index the data
>>twice, once for the key prefix that fits within limits, and a second one for a
>>hash plus some sort of discriminator for collisions). Just storing a prefix as
>>part of the key under the covers obviously won't fly...am I missing some 
>>other option?
>>
>>Clearly consistency in these things is important to people don't get caught 
>>off
>>guard. I wonder if we just pick a "reasonable" limit, say 1 K characters 
>>(yeah,
>>trying to do something weird to avoid details of how stuff is actually 
>>stored),
>>and run with it. I looked around at a few databases (from a single vendor :)),
>>and they seem to all be well over this but not by orders of magnitude (2KB to
>>8KB seems to be the range of upper limits for this in practice).
>
> No limit would be reasonable, the general, and reasonable, assumption is
> that if it works for X it will work for Y, unless Y is ridiculous. There
> is also little point in saying for some values of Y performance will be
> poor: implementations will cater for what is common, which is usually
> not a constant, and when you do unusual things, you already know that it
> is not entirely reasonable to expect the "usual" performance.

The question is in part where the limit for "ridiculous" goes. 1K keys
are sort of ridiculous, though I'm sure it happens.

Note that "unusual" performance here means linear search times rather
than logarithmic. Which in case of a join could easily mean quadratic.
So it's quite commonly not "unusual" performance, but "unacceptable".

> Note that, since JavaScript does not offer key-value dictionaries for
> complex keys, and now that JSON.stringify is widely implemented, it's
> quite common for people to emulate proper dictionaries by using that to
> work around this particular JavaScript limitation. Which would likely
> extend to more persistent forms of storage.

I don't understand what you mean here.

/ Jonas



Re: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-19 Thread Bjoern Hoehrmann
* Pablo Castro wrote:
>>> Just looking at this list, I guess I'm leaning towards _not_ limiting the
>>> maximum key size and instead pushing it onto implementations to do the hard
>>> work here.  If so, we should probably have some normative text about how 
>>> bigger
>>> keys will probably not be handled very efficiently.
>
>I was trying to make up my mind on this, and I'm not sure this is a good idea.
>What would be the options for an implementation? Hashing keys into smaller 
>values
>is pretty painful because of sorting requirements (we'd have to index the data
>twice, once for the key prefix that fits within limits, and a second one for a
>hash plus some sort of discriminator for collisions). Just storing a prefix as
>part of the key under the covers obviously won't fly...am I missing some other 
>option?
>
>Clearly consistency in these things is important to people don't get caught off
>guard. I wonder if we just pick a "reasonable" limit, say 1 K characters (yeah,
>trying to do something weird to avoid details of how stuff is actually stored),
>and run with it. I looked around at a few databases (from a single vendor :)),
>and they seem to all be well over this but not by orders of magnitude (2KB to
>8KB seems to be the range of upper limits for this in practice).

No limit would be reasonable, the general, and reasonable, assumption is
that if it works for X it will work for Y, unless Y is ridiculous. There
is also little point in saying for some values of Y performance will be
poor: implementations will cater for what is common, which is usually
not a constant, and when you do unusual things, you already know that it
is not entirely reasonable to expect the "usual" performance.

Note that, since JavaScript does not offer key-value dictionaries for
complex keys, and now that JSON.stringify is widely implemented, it's
quite common for people to emulate proper dictionaries by using that to
work around this particular JavaScript limitation. Which would likely
extend to more persistent forms of storage.
-- 
Björn Höhrmann · mailto:bjo...@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 



RE: [Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-19 Thread Pablo Castro

-Original Message-
From: public-webapps-requ...@w3.org [mailto:public-webapps-requ...@w3.org] On 
Behalf Of bugzi...@jessica.w3.org
Sent: Friday, November 19, 2010 4:16 AM

>> Just looking at this list, I guess I'm leaning towards _not_ limiting the
>> maximum key size and instead pushing it onto implementations to do the hard
>> work here.  If so, we should probably have some normative text about how 
>> bigger
>> keys will probably not be handled very efficiently.

I was trying to make up my mind on this, and I'm not sure this is a good idea. 
What would be the options for an implementation? Hashing keys into smaller 
values is pretty painful because of sorting requirements (we'd have to index 
the data twice, once for the key prefix that fits within limits, and a second 
one for a hash plus some sort of discriminator for collisions). Just storing a 
prefix as part of the key under the covers obviously won't fly...am I missing 
some other option?

Clearly consistency in these things is important to people don't get caught off 
guard. I wonder if we just pick a "reasonable" limit, say 1 K characters (yeah, 
trying to do something weird to avoid details of how stuff is actually stored), 
and run with it. I looked around at a few databases (from a single vendor :)), 
and they seem to all be well over this but not by orders of magnitude (2KB to 
8KB seems to be the range of upper limits for this in practice).

Thanks
-pablo




[Bug 11351] New: [IndexedDB] Should we have a maximum key size (or something like that)?

2010-11-19 Thread bugzilla
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11351

   Summary: [IndexedDB] Should we have a maximum key size (or
something like that)?
   Product: WebAppsWG
   Version: unspecified
  Platform: PC
OS/Version: All
Status: NEW
  Severity: normal
  Priority: P2
 Component: Indexed Database API
AssignedTo: dave.n...@w3.org
ReportedBy: jor...@chromium.org
 QAContact: member-webapi-...@w3.org
CC: m...@w3.org, public-webapps@w3.org


Should we have some sort of maximum key size for what's in IndexedDB?

Pros:
* Most other databases do.
* It's very difficult to handle them efficiently.
* Many backing storage engines have limits.
(These could be worked around by an implementation storing just the first part
of a particularly big key in the backing engine and then looks up the rest in
the value when necessary.  This clearly would add a lot of complexity and slow
things down.)

Cons:
* Pushing complexity onto the web developer.
* May break web apps in ways authors don't anticipate.

There are probably other pros and cons that I'm forgetting (please bring them
up if so!).

Just looking at this list, I guess I'm leaning towards _not_ limiting the
maximum key size and instead pushing it onto implementations to do the hard
work here.  If so, we should probably have some normative text about how bigger
keys will probably not be handled very efficiently.

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
--- You are receiving this mail because: ---
You are on the CC list for the bug.