[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15585349#comment-15585349 ] Benjamin Lerer commented on CASSANDRA-11873: Pushed the 3.X patch at ecf05b882658d78f0ce6b87b57c982aa776c5104 and the trunk one into bfd57d13b7914ad909c75e1e126802e9867cc735. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15576856#comment-15576856 ] Tyler Hobbs commented on CASSANDRA-11873: - +1 after the new changes, they look good. bq. In my opinion having some checks will be probably less confusing for the user. With the current binary representation used for duration, the order will be completely random. A completely random order is better than an almost-correct ordering, I think. In any case, it would be easy to go from the current restrictions to allowing durations to be used anywhere (but with random ordering) if we need to, so I think starting with this is fine. bq. I did not tackle that part but I thought that there will be more work to do on the python driver side. Ah, yes, you're correct. We just need to make sure that driver work includes updating the reserved keywords, since that's easy to miss. (cc [~aholmber]) > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15552969#comment-15552969 ] Benjamin Lerer commented on CASSANDRA-11873: bq. Perhaps we should just specify that duration types are sorted by their binary representation and not try to restrict them from being used anywhere that could result in an ordering? In my opinion having some checks will be probably less confusing for the user. With the current binary representation used for duration, the order will be completely random. I have pushed some new branches for 3.X and trunk with some fixes for all the problems that you raised. ||Branch||utests||dtests|| |[3.X|https://github.com/blerer/cassandra/tree/11873-3.X]|[3.X|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-3.X-testall/]|[3.X|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-3.X-dtest/]| |[trunk|https://github.com/blerer/cassandra/tree/11873-trunk]|[trunk|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-trunk-testall/]|[trunk|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-trunk-dtest/]| bq. We will also need to add "duration" to the python driver's reserved keywords in {{cassandra/metadata.py}}. I did not tackle that part but I thought that there will be more work to do on the python driver side. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15549214#comment-15549214 ] Tyler Hobbs commented on CASSANDRA-11873: - It also occurred to me that set items and map keys are also sorted, and would need to be handled or rejected. Perhaps we should just specify that duration types are sorted by their binary representation and not try to restrict them from being used anywhere that could result in an ordering? I feel like that might bite/surprise a few people, but it might also be the most sane option. If we do this, we should make the docs very clear about it. On a related note, I think we should also clearly document how equality operations work for durations, so that, for example, nobody expects {{1d}} and {{86400s}} to be equal. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15546915#comment-15546915 ] Tyler Hobbs commented on CASSANDRA-11873: - This looks pretty solid so far. I haven't completed my review, but these are my initial comments: * {{doc/source/cql/changes.rst}}: typo: "Adds a dew" * {{doc/source/cql/types.rst}}: ** For clarity, we should define all units in terms of either months, days, or nanoseconds. For example, a 'year' should be 12 months, an hour should be described as 36000 nanoseconds, etc. ** Typo in ".. \_durtaion-limitation:" ** I would rephrase "Duration cannot be used for columns part of the ``PRIMARY KEY`` of a table." as "Duration columns cannot be used in a table's ``PRIMARY KEY``." Do we need to do something special to disallow durations in MV primary keys as well? It seems like the call to {{AbstractType.checkComparable()}} in {{CFMetadata.rebuild()}} may handle this, but I see that you also changed {{CreateTableStatement}} to specifically handle the duration type, so we may need to have a friendlier check for views as well. Perhaps this could utilize {{AbstractType.checkComparable()}} to be more general. Also, we should make sure secondary index or filtering queries with >, >=, <, <= operators error correctly, too. In the past these might have been protected by a special check against counters. We will also need to add "duration" to the python driver's reserved keywords in {{cassandra/metadata.py}}. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15522414#comment-15522414 ] Benjamin Lerer commented on CASSANDRA-11873: |[trunk|https://github.com/blerer/cassandra/tree/11873-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-trunk-testall/7/]|[dtests|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-trunk-dtest/6/]| The patch add the DurationType. Internally, the data are stored as 3 vints: one for the number of months, one for the number of days and one for the number of nanoseconds. This will allow the type to work properly with daylight saving if needed. The type is, as discussed previously, non comparable. Its main goal being to simplify CQL queries for time series data. The type support: * Simple litterals (e.g. {{2mo10d12h}}: 2 months 10 days and 12 hours) with the following symbols (case insensitive): * {{y}}: years * {{mo}}: months * {{w}}: weeks * {{d}}: days * {{h}}: hours * {{m}}: minutes * {{s}}: seconds * {{ms}}: milliseconds * {{µs}} or {{us}}: microseconds * {{ns}}: nanoseconds * ISO 8601 "format with designators": {{PnYnMnDTnHnMnS}} or {{PnW}} (e.g {{P2M10DT12H}}: 2 months 10 days and 12 hours * ISO 8601 "alternative format": {{P-MM-DDThh:mm:ss}} (e.g {{P-02-10T12:00:00}}: 2 months 10 days and 12 hours > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15452092#comment-15452092 ] Benjamin Lerer commented on CASSANDRA-11873: {quote}we make that type non-comparable (we have at least one precedent with the counter type){quote} This will effectively solve the problem in a simple way as I agree that the main interest of the duration type is for simplifying time series related queries. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451804#comment-15451804 ] Sylvain Lebresne commented on CASSANDRA-11873: -- bq. it is not always possible to properly order the duration as they rely on the context. I'm not readily convinced this is such a problem. The motivation for duration types (in C*, not in some general sense) is first and foremost (imo) to make it easier to work with time series. Comparatively, using a duration in a clustering column (that only where the ordering matters) with some close reliance on proper ordering doesn't feel that fundamental to me. For instance, I haven't given it too much though, but we make that type non-comparable (we have at least one precedent with the counter type), which amounts of forbidding it in a clustering column. I don't think it would be that big of a deal: you can use a simple integer if you really need to store a duration in a clustering column, and we can provide conversions functions to make going from duration to integers (said conversion could for instance use fixed conversion factors (say 1 month = 30 days, 1 day = 24h etc.., Postregsql uses this when dealing with fraction of days, which they support) by default with the option for user to pick some other factor; and don't get me wrong, such functions are totally material for follow-ups). bq. I will be in favor of using 2 types but I would like to know your opinion. A priori, I disagree, because I suspect this would be inferior for most of the use cases we primarily targeting (user convenience). Splitting into 2 types just for sorting, when sorting isn't at all the important part, feels wrong to me. Besides, I don't think it even really solve the problem fully: there is leap seconds and daylight saving adjustment, so even ordering the 'day to fraction of second' type can't be perfectly done. Which again, just suggest to me that a duration type can't be properly ordered and we shouldn't try to: if you need to order duration, you *have* to normalize them to a number of seconds (or less) or you're doing it wrong, and we should let user do so (with as much help as we can) rather than "hack" around that fundamental fact. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15451669#comment-15451669 ] Benjamin Lerer commented on CASSANDRA-11873: I realised that there is 2 problem with my proposal: # it is not always possible to properly order the duration as they rely on the context. For example, which is the greater {{1mo}} or {{30d}} (same question for {{1y}} or {{365d}}). I think it is the main reason why Oracle splitted their interval type into 2 types (year to month) and (day to fraction of second). # having the type Byte-ordered (CASSANDRA-6936) will require to store useless data if a part of the duration is not used. Due to these problems I will be in favor of using 2 types but I would like to know your opinion. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336215#comment-15336215 ] Sylvain Lebresne commented on CASSANDRA-11873: -- bq. I understand the rational of the suggestions but I would be in favor of keeping what I proposed. I concur (and will note that we're really just following the [internal standard for those unit|https://en.wikipedia.org/wiki/Second#SI_multiples], which weigths more in my book than any particular database choices). That said, I do agree that making the unit case-insensitive would be nice (because most stuffs in CQL as case-insensitive, and because having the difference been only base on the case is error prone). Which is why I would personally suggest using {{mo}} for months (I guess you could alternatively keep {{m}} for months and force {{min}} for {{minutes}}, but I think it's way more common to abbreviate minutes than month (in life in general) so I think using something slightly irregular for month is less surprising). > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15336117#comment-15336117 ] Benjamin Lerer commented on CASSANDRA-11873: bq. What CQL type has nanosecond resolution? {{Time}} has nanosecond resolution. By consequence I strongly believe that we need microseconds and nanoseconds support. {quote} 1. Change "us" to "u" to be consistent with InfluxDB, and to keep the abbreviations consistently to 1 letter. 2. Change the "ns" to "n" to keep all the abbreviations consistently to one letter. {quote} I think that looking at what other are doing is a good thing but I do not think that we should bother following exactly what they are doing if they are the only one doing it. In my opinion, we should keep {{ms}} for milliseconds as we will have to find something less natural otherwise. In this case it seems to me that we can also keep {{us}} and {{ns}}. I understand the rational of the suggestions but I would be in favor of keeping what I proposed. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335986#comment-15335986 ] Brian Hess commented on CASSANDRA-11873: - That largely covers my thoughts, with a couple things: 1. Change "us" to "u" to be consistent with InfluxDB, and to keep the abbreviations consistently to 1 letter. 2. Change the "ns" to "n" to keep all the abbreviations consistently to one letter. 3. Change "y" to "Y" so year and month are both capital. I'm not sure exactly, but I would be inclined to changing "d" to "D" and "w" to "W" so units smaller than a day are lower-case and the others are upper-case, for user-friendliness. Two other suggestions/questions: a) Is it possible to change Month or Minute to be something other than M/m so that we could make these units case insensitive? I don't have a great suggestion there, but I have no good suggestion there (maybe M for month and N for minute? Like I said, not great). b) What CQL type has nanosecond resolution? Is there any way someone can do any arithmetic on a CQL type where it would do anything with "4ns"? now() is a timestamp which has millisecond resolution. So, I actually don't see the data type that has anything fine grained enough here. If that's the case we can get rid of "ns". I'm not sure what had microsecond resolution either now that I look at our docs, so perhaps the same for "u". > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335863#comment-15335863 ] Benjamin Lerer commented on CASSANDRA-11873: bq. it's not easily compatible with making the type byte-comparable For the moment, I am not even sure what is the best way for ordering the data. {{1 month}} and {{31 days}} are not the same thing so how should we sort them? Based on the number of month first then on the number of days? The sort result might be surprising in some cases :-( > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335787#comment-15335787 ] Sylvain Lebresne commented on CASSANDRA-11873: -- bq. Does it statisfies everybody? Pretty much what I had in mind so mostly good with me with a few remarks: bq. I guess:16 or 20 bytes with custom comparator unless we use some variable length That's certainly more of an implementation details, but probably worth getting at least a tad fancy. It shouldn't be too rare to have duration below the day for instance so maybe at least a format with some initial flags to say if we have the month and day component at all would be nice. vints would be nice but unless I'm brain-farting, it's not easily compatible with making the type byte-comparable (unlike simple presence flags). bq. Support for simple litterals with the following symbols It shouldn't be harder (technically it likely can be reduced to a parser thing) to also support the longer version (basically, {{y}}, {{year}} or {{years}}, etc...) and while I'm personally fine with the abbreviations, some people may like the explicitness of the verbose version, so I'd fine supporting it. bq. Additional support for ISO 8601 "format with designators" and ISO 8601 "alternative format" I'm only mildly convinced it's useful cause the literal syntax is here mostly for convenience in cqlsh and I can't see why you'd want those format over the simpler ones above, but I guess I'm fine with it for the sake of consistency with our support of ISO 8601 in dates. *But* I will note that those syntax are currently valid identifiers so strictly speaking, this would be a breaking change, which is an argument for not bothering. I can admit that having a column named {{P1Y}} is weird in the first place, so I'm not going to agonize over this if we insist on supporting it, but if we do it, we should at least not forget to notify that clearly in the NEWS file. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15335673#comment-15335673 ] Benjamin Lerer commented on CASSANDRA-11873: My proposal is the following: * Support for: year, month, week, day, hour, minute, second, millisecond, microsecond and nanosecond (as {{Time}} support nanosecond precision) by encoding separatly months, days and nanoseconds (I guess:16 bytes with custom comparator). * Support for simple litterals with the following symbols: ** {{y}}: years ** {{M}}: months ** {{w}}: weeks ** {{d}}: days ** {{h}}: hours ** {{m}}: minutes ** {{s}}: seconds ** {{ms}}: milliseconds ** {{µs}} or {{us}}: microseconds ** {{ns}}: nanoseconds * Additional support for ISO 8601 "format with designators" and ISO 8601 "alternative format" So the following litterals will be supported: |1y2M3d4h5m6s| CQL format for: 1 year 2 months 3 days 4 hours 5 minutes 6 seconds| |P1Y2M3DT4H5M6S| ISO 8601 "format with designators"| |P0001-02-03T04:05:06| ISO 8601 "alternative format"| Does it statisfies everybody? > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334073#comment-15334073 ] Tyler Hobbs commented on CASSANDRA-11873: - This is a good point to take a step back and plan out the semantics of our date and time types more thoroughly. I don't think we need to implement everything up front, but we should think about how we want the various date, time, and interval types to work together. We do not currently support a datetime type with timezones. However, it's certainly possible that this may be added in the future, especially if we focus on timeseries (where you may want rollups by conceptual day instead of 24 hour periods). So, I think we should consider how the types might interact with a timezone-aware datetime. The current {{duration}} type is similar to Java's {{Duration}} and Python's {{timedelta}}. It adds a number of nanoseconds to a datetime, ignoring effects like daylight savings time. On the other hand, we may also want something like Java's {{Period}} class, which works in terms of "conceptual" days, months and years. For example, if you add a conceptual day to a datetime and it happens to cross the daylight savings time boundary, it would end up at the same time of day on the next day (instead of being off by one hour, like the equivalent {{duration}} addition would be). Or, we might combine these into an interval type like Postgres's {{interval}} that stores conceptual months and days, but also stores seconds and nanoseconds. This could work in a pretty straightfoward way with our current timestamps (effectively UTC datetimes), but also work well with timezone-aware datetimes when those are added. This type is certainly more complex than the current {{duration}} type, but I think we'll eventually need something like this anyway, and it's good to ask whether we also want to have a naive {{duration}} alongside that type. If we introduce special syntax for {{duration}}, that may force future {{interval}} literals to have a more cumbersome syntax. At the very least, the differences between the two may confuse users. To summarize, if we want to plan for the future, it may be best to go ahead and implement a full {{interval}} type now that handles conceptual time units as well as raw seconds/nanoseconds. bq. By consequence, it can be difficult for the driver to handle such a type. I don't think that this should weigh heavily on how we design Cassandra's types. We are already forced to implement custom types in several of the drivers. For example, the python driver has custom classes for {{OrderedMap}}, {{SortedSet}}, {{Time}}, and {{Date}} to handle things like nested collections and nanosecond resolution. These are slightly less friendly for users than types in the standard library, but it's fairly normal for a database driver to need to do this. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15334052#comment-15334052 ] Brian Hess commented on CASSANDRA-11873: - I will save the discussion/debate on the relationship between CQL and SQL to another venue. The reason to bring it up is in the context of user/developer experience and usability. If SQL has an approach then we should consider it, but if we can do better then by all means we should do that instead (which I think nobody is debating). A few comments: 1. We should certainly consider the month and year durations. These are common uses and we should at least sketch out how we would support that (if not also implement it in this ticket - which I think we should do). 2. How would we abbreviate the example that Sylvain proposes "1 year 2 months 3 days 4 hours 5 minutes 6 seconds"? Specifically, what is the abbreviation for months and minutes? ISO 8601 has M for both, but the P/T format allows for disambiguation. 3. With respect to ISO 8601 that Postgres does also support, if someone bothers to read the CQL documentation on Date formats for Timestamp types he will find that it states "A timestamp type can be entered as an integer for CQL input, or as a string literal in any of the following ISO 8601 formats" (https://docs.datastax.com/en/cql/3.3/cql/cql_reference/timestamp_type_r.html). So, C* already chose ISO 8601 for Date formats. For consistency with CQL itself we should seriously consider making the same choice for durations. 4. According to the C* documentation, the TIMESTAMP data type, which is what is returned from the Now() call, is the "number of milliseconds since the standard base time known as the epoch". How are we going to support microseconds and nanoseconds? Even Version 1 UUIDs (UUID/TimeUUID format for C*) don't support nanosecond resolution. 5. If we choose to stick with the current bespoke syntax, I suggest moving at least to the Influx format. That leaves 2 items: a) change microseconds from "us" to "u", which is what Influx uses b) support weeks with the "w" abbreviation. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333706#comment-15333706 ] Benjamin Lerer commented on CASSANDRA-11873: bq. What about leap year? I think that point worth some discussion. The current patch store the duration in a number of nanoseconds. Which means that some information will be lost. If a user provide {{3y}} or {{3 year}} it will be converted in nanoseconds and the {{now() - 3y}} will not result in the correct date. We can try to guess what the user intended but it is a risky business. If we want to handle properly things like that it means that we have to use a more complex serialization format. Basically we need to store at least {{year}} and {{month}} separately from the remaining time in nanosecond (which is I guess the main reason why Influxdb is not supporting the month and year units). Even if it allow a better handling of some use cases, I think that this solution will bring some problems. {{Java}} for example do not have a type that can be directly mapped to that (if I am not mistaken). It has 2 different classes: {{Period}} for the date part and {{Duration}} for the time part. By consequence, it can be difficult for the driver to handle such a type. I also believe (even if I do not have some concret proof right now) that it will make some computations, like the one needed for CASSANDRA-11871, more expensives. Overall, I am in favor of keeping the thing as simple as possible. Which is for me: storing the duration has nanoseconds, supporting as litterals only a number followed by a symbol (in this first version at least) and not supporting {{month}} or {{year}} units (the current patch does not support {{week}} but it can easily be added). Having said that, I am fully open to discussion. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333585#comment-15333585 ] Sylvain Lebresne commented on CASSANDRA-11873: -- For the record, CQL is not SQL and it's not even close. Artificially forcing ourselves to reuse something existing in SQL *every single time* we need new syntax is largely pointless. Anyone trying to use CQL as if it was SQL is going to have a bad surprise, and small syntax differences is going to be the least of its problem. Don't get me wrong, CQL has the same _general_ structure than SQL and so informing our choices with what SQL (and popular SQL databases) is doing and borrowing good ideas is certainly desirable. But that's only the beginning of the conversation, not the end (even more so when said existing SQL databases don't even agree between themselves). If we think an existing syntax is not particular good and we can do better for instance, why we would pick a lesser solution? And in that particular case, I'm _convinced_ that the syntax currently implemented is better than what Postgres or Oracle do (I reckon that such statement is partly subjective, but I still stand by it). Certainly not a lot better, granted, but better because as intuitive as any of the options but more concise. For that reason, count me as a PMC-binding -1 on *not* supporting it. That said, I'm not against compromises, so please read below before answering. bq. Of the formats I've seen here, Postgres native format is the most user friendly And by "Postgres native format", you mean {{1 year 2 months 3 days 4 hours 5 minutes 6 seconds}} right? If so (and as mentioned previously), I don't really mind supporting that (I guess for the sake of making the live of Postgres developer easier, or pleasing those that want to show off their touch-typing skills). I don't mind it as long as we also support the shorter version (because, if I don't care about Postgres, why wouldn't I be allowed to abbreviate the units? It surely is pretty natural). > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15333529#comment-15333529 ] Brian Hess commented on CASSANDRA-11873: - Being subtlety different on syntax is in some cases worse than being very different. So, if we are thinking we will go with ISO 8601 format (an option that could make sense - it is a widely recognized format and present in more than a few systems (not just databases, I mean)) then we should make sure we include the "P" and the "T". While Postgres does support ISO 8601 formats (of course I bothered to read it), in that format the highest resolution is seconds. There is a good reason to want milliseconds and microseconds (and maybe nanoseconds). The standard Postgres format support all of these (with the exception of nanoseconds, though that addition to their format would be straightforward to understand). If you want to shorten the Postgres format to save typing, what abbreviation do you propose for "minute" and "month"? I will certainly agree that the Oracle syntax is not user-friendly. I think arguing it is desirable is a stretch. I have a reservation on the Influx syntax here, though. Influx does not support month or year. They only have up to week (https://docs.influxdata.com/influxdb/v0.13/query_language/data_exploration/#relative-time). So, it is not possible to say "now() - 2 months" or "now() - 1 year". To do 1 year, what would you do? "now() - 365d"? What about leap year? What about going back one month? In fact, this patch only had a subset of Influx's supported format. I don't see a week unit. Moreover, Influx doesn't use "us", it uses just "u". So, our proposed syntax isn't even consistent (in subtle ways) with Influx's format. Let alone that Influx's format is incomplete (specifically, no support for months and years). Of the formats I've seen here, Postgres native format is the most user friendly, and accomplishes the goals of durations for us. I'm (non-PMC, non-binding) -1 on the currently proposed format from a usability/product/developer POV. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1528#comment-1528 ] Benjamin Lerer commented on CASSANDRA-11873: I am -1 on this. I think those syntaxes are more complex than needed. In my opinion {{now() - 3d}} will be easily understood by everybody and I do not think that there is a need to have to write {{now() -3 day}}. In the case of CASSANDRA-11871 I found that the {{INTERVAL}} syntax is making the query much more verbose and less readable: {{GROUP BY floor(time, INTERVAL '1' HOUR)}}. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1529#comment-1529 ] Benjamin Lerer commented on CASSANDRA-11873: I am -1 on this. I think those syntaxes are more complex than needed. In my opinion {{now() - 3d}} will be easily understood by everybody and I do not think that there is a need to have to write {{now() -3 day}}. In the case of CASSANDRA-11871 I found that the {{INTERVAL}} syntax is making the query much more verbose and less readable: {{GROUP BY floor(time, INTERVAL '1' HOUR)}}. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1527#comment-1527 ] Sylvain Lebresne commented on CASSANDRA-11873: -- Well, # We're not reinventing the wheel, we're reusing [influxdb syntax|https://docs.influxdata.com/influxdb/v0.13/query_language/data_exploration/#time-syntax-in-queries]. Even besides that, calling a syntax like {{2h3m}} "reinventing the wheel" feels to me a bit of a strech. # If one bothers reading the [linked Postgres page|https://www.postgresql.org/docs/current/static/datatype-datetime.html#DATATYPE-INTERVAL-INPUT-EXAMPLES], he'll note that Postgres supports {{P2h3m}} which is pretty damn close (it also supports {{2 hours 3 minutes}} which I don't think is necessary but wouldn't mind supporting as alternative to the shorted version). Surely, Postgres veterans are smart enough to not be thrown off by us dropping the {{P}} at the beginning. # Regarding the Oracle syntax, I think it's terrible. The goal of this ticket is to add a user-friendly syntax for inputing durations, but imo {{now() - (INTERVAL '4 5:12' DAY TO MINUTE)}} (to mean {{now() - 4d5h12m}}) is verbose, unintuitive and plain ugly. And as far as I can tell, it's nowhere near standard (Postgres don't seem to support it for instance). So I'm basically a strong PMC binding -1 on it. Overall, we're not "Making up completely new syntax". {{3h2m5s}} is pretty standard (as in, in life in general) and concise, and it's even supported by some other database (influxdb and, up to a minor detail, Postgres). And I don't see any other syntax being a de-factor standard in other databases. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332694#comment-15332694 ] Patrick McFadin commented on CASSANDRA-11873: - I am +1 on this. Making up completely new syntax is just harsh to the end user. No need to re-invent the wheel. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15332685#comment-15332685 ] Brian Hess commented on CASSANDRA-11873: - So, one question about syntax. SQL has a time interval type with its own syntax. Instead of inventing additional syntax, what about taking on the Postgres or Oracle/SQL-Standard syntax. For example, Postgres uses "1 day ago" or "-1 day" (or hour, minute, etc): https://www.postgresql.org/docs/current/static/datatype-datetime.html Oracle uses "INTERVAL '1' HOUR": https://docs.oracle.com/cd/B28359_01/server.111/b28286/sql_elements003.htm#i38598 I suggest we choose something similar to one that exists, rather than create our own new syntax. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15331333#comment-15331333 ] Benjamin Lerer commented on CASSANDRA-11873: I am not sure of what {{date1}} and {{date2}} would be as Cassandra do not support column identifiers at this place in the {{WHERE}} clause but otherwise the query will be: {code} WHERE T >= (NOW() - 1d) + (date1 - date2); {code} which seems much more readable to me. One problem with your proposal is the {{timeunit}} parameter. Functions only accept arguments associated with a CQL type to be able to deserialize the input. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330686#comment-15330686 ] Tyler Hobbs commented on CASSANDRA-11873: - {quote} Will you be able to bind to the duration? As in: session.prepare("SELECT * FROM myTable WHERE pkey=5 AND clustCol > now() - ?") or session.prepare("SELECT * FROM myTable WHERE pkey=5 AND clustCol > now() - ?h") {quote} You will be able to use the first example you listed. The unit suffix is part of the value, so the second example wouldn't work. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330154#comment-15330154 ] Brian Hess commented on CASSANDRA-11873: - This raises an interesting question. Will you be able to bind to the duration? As in: session.prepare("SELECT * FROM myTable WHERE pkey=5 AND clustCol > now() - ?") or session.prepare("SELECT * FROM myTable WHERE pkey=5 AND clustCol > now() - ?h") Because you can do the following, right? session.prepare("SELECT * FROM myTable WHERE pkey=5 AND clustCol > DateDiff(now, ?)") > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330135#comment-15330135 ] Alex P commented on CASSANDRA-11873: I was thinking that from a parameterization and composability pov having functions for these operations would be _simpler_ (_nb: I do see though how the string-only version looks friendlier). For example: with a {{DATE_ADD(date, int, timeunit)}} one could do {{WHERE t >= DATE_ADD(?, ?, ?}}. And composing {{WHERE T >= DATE_ADD(DATE_ADD(NOW(), -1, DAY), DATE_DIFF(date1, date2, DAY), DAY)}}. The latter could probably still be possible but it might involve more complicated expression parser and I'm not very sure how parameterizable it would be. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15330084#comment-15330084 ] Tyler Hobbs commented on CASSANDRA-11873: - In {{CQL.textile}}, the {{jsonEncoding}} section needs to be updated for {{duration}} type. In {{formatting.py}}, instead of using {{int64_unpack(val) + 0x8000L}}, I believe we want {{uint64_unpack(val) - (2 ** 63)}}. {{DurationTypeSerializer.serialize()}} needs a comment indicating that it's relying on overflow behavior. This took me a while to figure out :) On a related note, a few test cases which explicitly show the expected serializations for values could be useful. I wrote this test while convincing myself the patch was correct, if you want to add it when committing: {code} @Test public void testSerialization() { ByteBuffer serialized = DurationSerializer.instance.serialize(Long.MAX_VALUE); assertEquals(ByteBuffer.wrap(new byte[]{ (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff}), serialized); serialized = DurationSerializer.instance.serialize(Long.MIN_VALUE); assertEquals(ByteBuffer.wrap(new byte[]{ (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00}), serialized); serialized = DurationSerializer.instance.serialize(0L); assertEquals(ByteBuffer.wrap(new byte[]{ (byte)0x80, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00}), serialized); serialized = DurationSerializer.instance.serialize(1L); assertEquals(ByteBuffer.wrap(new byte[]{ (byte)0x80, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x01}), serialized); serialized = DurationSerializer.instance.serialize(-1L); assertEquals(ByteBuffer.wrap(new byte[]{ (byte)0x7f, (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff, (byte)0xff}), serialized); serialized = DurationSerializer.instance.serialize(Long.MIN_VALUE + 1L); assertEquals(ByteBuffer.wrap(new byte[]{ (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x00, (byte)0x01}), serialized); } {code} These examples might be useful to put in the native protocol spec as well, just to ensure that driver authors implement the serialization correctly. See section 6.23 (varint encoding) for an existing example. Last, it looks like {{COPY FROM}} doesn't handle duration values when prepared statements are used, but I suppose we need {{duration}} support in the python driver for that to be supported. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329877#comment-15329877 ] Jon Haddad commented on CASSANDRA-11873: I'm +1 to all of Sylvain's points. I would definitely prefer to be able to use {code}WHERE t >= now() - 2h{code} as sugar for the more verbose {code}WHERE t >= date_sub(now(), 1h){code} or whatever the equivalent would be. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15329196#comment-15329196 ] Sylvain Lebresne commented on CASSANDRA-11873: -- bq. I was wondering if instead of introducing support for specific math operations and the necessary type conversions, a set of date/time functions wouldn't work better (and be less error prone) Could you maybe define "work better", and why do you think operators are error prone? But I'll also note that adding operators doesn't at all exclude also adding more specific date/time functions if they make sense. What we're looking for here is convenience first, and things like {{WHERE t >= now() - 2h}} looks pretty intuitive to me on that front. bq. Using functions could potentially be more extensible over time by allowing to compose these. Operators *are* functions. And I mean that literally, they will just be syntactic sugar for native functions internally. They will compose with any other function in particular. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15328892#comment-15328892 ] Alex P commented on CASSANDRA-11873: (Apologies for seeing this ticket so late) I was wondering if instead of introducing support for specific math operations and the necessary type conversions, a set of date/time functions wouldn't work better (and be less error prone). Using functions could potentially be more extensible over time by allowing to compose these. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15323457#comment-15323457 ] Benjamin Lerer commented on CASSANDRA-11873: |[branch|https://github.com/apache/cassandra/compare/trunk...blerer:11873-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-trunk-testall/4/]|[dtests|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-trunk-dtest/4/]| The new patch serialize the duration as an unsigned long and accept {{us}} and {{µs}} as microsecond symbol. The patch also add some testing for JSON. Adding tests for MV was not possible due to the fact that the Java driver does not recognize yet the {{Duration}} type. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317719#comment-15317719 ] Sylvain Lebresne commented on CASSANDRA-11873: -- bq. use {{µs}} instead of {{us}} Why _instead_? let's just support both, it's trivial. I agree that in a number of situations it's easier to stick to ascii, but no reason to also be modern. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314868#comment-15314868 ] Jon Haddad commented on CASSANDRA-11873: +1 also on us, instead of µs > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314843#comment-15314843 ] Patrick McFadin commented on CASSANDRA-11873: - +1 on _us_ instead of _µs_ I can't even figure out where to type that on my keyboard. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15314784#comment-15314784 ] Tyler Hobbs commented on CASSANDRA-11873: - Overall, I like the design for the Duration type. I'm a little hesitant to use {{µs}} instead of {{us}}, just because {{u}} is much easier to work with. Since this is a new type, we should go ahead and make the serialization format byte-order comparable by storing it as an unsigned 64bit int centered at 2^63, similar to {{DateType}}. Can you add duration entries to the existing tests in {{JsonTest}}? {{ViewFilteringTest}} and {{ViewSchemaTest}} also have a couple of tests that are supposed to cover all types. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306407#comment-15306407 ] Benjamin Lerer commented on CASSANDRA-11873: [~thobbs] could you review? > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Labels: client-impacting, doc-impacting > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15306403#comment-15306403 ] Benjamin Lerer commented on CASSANDRA-11873: |[branch|https://github.com/apache/cassandra/compare/trunk...blerer:11873-trunk]|[utests|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-trunk-testall/3/]|[dtests|http://cassci.datastax.com/view/Dev/view/blerer/job/blerer-11873-trunk-dtest/3/]| The patch adds a new {{duration}} type which store the duration in {{nanoseconds}} as a signed 64bits integer. Due to that the duration has a minimum of {{-2 ^63^ ns}} and a maximum of {{2 ^63^ - 1 ns}} (roughly -292/292 years). The patch allow durration to be specified as litterals. A duration litterals is integer followed by a time symbol. The supported symbols are: * {{d}}: days * {{h}}: hours * {{m}}: minutes * {{s}}: seconds * {{ms}}: milliseconds * {{µs}}: microseconds * {{ns}}: nanoseconds The duration can also be specified as a simple integer. In this case the duration unit used is {{nanoseconds}}. As the python driver does not support yet the {{duration}} type. The patch modify the CQLSH scripts to allow it to work with duration columns. > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (CASSANDRA-11873) Add duration type
[ https://issues.apache.org/jira/browse/CASSANDRA-11873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15296147#comment-15296147 ] Benjamin Lerer commented on CASSANDRA-11873: One interesting question is what type of text representation do we want to support for duration. The simplest approach would be to support only a number and a time unit: days ({{d}}), hours ({{h}}), minutes ({{m}}), seconds ({{s}}), milliseconds ({{ms}}) and micro seconds ({{µs}}). A slightly more complex one would be to allow the combination of multiple simple expressions like: 2d8h (2 days and 8 hours). > Add duration type > - > > Key: CASSANDRA-11873 > URL: https://issues.apache.org/jira/browse/CASSANDRA-11873 > Project: Cassandra > Issue Type: Improvement > Components: CQL >Reporter: Benjamin Lerer >Assignee: Benjamin Lerer > Fix For: 3.x > > > For CASSANDRA-11871 or to allow queries with {{WHERE}} clause like: > {{... WHERE reading_time < now() - 2h}}, we need to support some duration > type. > In my opinion, it should be represented internally as a number of > microseconds. -- This message was sent by Atlassian JIRA (v6.3.4#6332)