https://issues.apache.org/jira/browse/AVRO-739
On Tue, Jan 18, 2011 at 11:49 AM, Scott Carey <[email protected]>wrote: > We should get this discussion into JIRA soon. > > On 1/18/11 10:38 AM, "Ron Bodkin" <[email protected]> wrote: > > >Overall, yes. A couple of points worth addressing in a design: > > > >1) Do we want to allow encoding time zone data in the records? Storing a > >raw timestamp is sometimes not ideal. It's worth looking at how SQL allows > >timestamps with and without time zones. Is that simpler, or is it actually > >more complex? > > It is generally 100000x simpler to serialize only in UTC and let libraries > support what they support W.R.T timezone. Painful memories of design > mistakes past. > SQL does a lot of TZ work because they support user input and output > formatting. In the back-end most databases store in only a limited way. > > >2) Do we want to allow dates (for storing a day, without a timestamp)? > Days introduce timezone complexity if you want to find out what day a > timestamp is in. > So if we support day, or hour, then that is a significant increase in > complexity. Furthermore, the timezone may not even be the same per row. > We could leave that up to the user and support a day type that is merely > the number of days since some origin point and leaves the timezone > interpretation (and thus conversion to 'day' from 'datetime') in the > user's hands, perhaps with metadata support. > > > >3) It would be nice to allow some flexibility in the implementation > >classes for dates, e.g., letting Java users use Joda time classes as well > >as java.util.Date > > Absolutely. This is a per-language feature though, so it may not require > much of the spec. For example, in Java it could simply be a configuration > parameter passed to the DatumReader/Writers. It doesn't make a lot of > sense to store metadata on the data that says "this is a Joda object, not > java.util.Date" -- that is a user choice and not intrinsic to describing > the data. > > There are other questions too -- what are the timestamp units > (milliseconds? configurable?), what is the origin (1970? 2010? > configurable?) -- these decisions affect the serialization size. > I have a manual serialization of timestamps that is a long, in tenths of a > second since 2008, for example. I have another that is a duration > measured in tenths of a millisecond. Both were done to reduce the number > of bytes per value for a specific problem domain. > Although I could use such flexibility, I'm not sure that is enough of a > motivator to put that into Avro. I'm not very bothered with converting > from long to a human readable datetime myself. > > > > >Ron > > > > > >Ron Bodkin > >CEO > >Think Big Analytics > >m: +1 (415) 509-2895 > > > > > > > > > > > > > > > > > >On 1/18/11 8:42 AM, "Doug Cutting" <[email protected]> wrote: > > > >>The way that I have imagined doing this is to specify a standard schema > >>for dates, then implementations can optionally map this to a native date > >>type. > >> > >>The schema could be a record containing a long, e.g.: > >> > >>{"type": "record", "name":"org.apache.avro.lib.Date", "fields" : [ > >> {"name": "time", "type": "long"} > >> ] > >>} > >> > >>Java could read this into a java.util.Date, Python to a datetime, etc. > >>Such conventions could be added to the Avro specification. > >> > >>Does this sound like a reasonable approach? > >> > >>Doug > >> > >>On 01/17/2011 05:54 PM, Ron Bodkin wrote: > >>> Has anyone discussed the possibility of having built-in support for a > >>> date/time stamp data type in Avro? I think it'd be helpful, since dates > >>> and timestamps are often used as keys in processing map/reduce data > >>>(and > >>> in RPC systems). It's unpleasant to have to write code that converts > >>> longs or strings into dates or timestamps. Minimally, it would be > >>>useful > >>> to allow generating date/time stamps from long timestamps in the client > >>> APIs various language code and to have support for working with Dates > >>>in > >>> the Java reflection API. > >>> > >>> I'd like to get feedback from others if they'd also like to see support > >>> for date/time data types in Avro. It seems like a generally useful > >>> feature that would be worth adding with a patch. > >>> > >>> Thanks, > >>> Ron > >>> > >>> Ron Bodkin > >>> CEO > >>> Think Big Analytics > >>> m: +1 (415) 509-2895 > >>> > >>> > > > > > >
