Re: Different output for the same date field
Just a FYI. Submitted a pull request for https://issues.apache.org/jira/browse/DRILL-6242. The pull request materialize Drill timestamp, date, and time into java.sql.Timestamp, java.sql.Date, and java.sql.Time. With these 3 distinct types returned inside nested values through the JDBC layer, the application using the output can now see the proper types. For example, made a parquet file with date, time, timestamp, list of date, list of time, list of timestamp, and a map of data, time, timestamp. Now querying this returns: 0: jdbc:drill:zk=local> select * from dfs.tmp.`datetime.parquet` limit 1; +-+---+--+-+---+--++ |date | time |timestamp |date_list| time_list |timestamp_list| time_map | +-+---+--+-+---+--++ | 1970-01-11 | 00:00:03.600 | 2018-03-21 20:30:07.364 | ["1970-01-11"] | ["00:00:03.600"] | ["2018-03-21 20:30:07.364"] | {"date":"1970-01-11","time":"00:00:03.600","timestamp":"2018-03-21 20:30:07.364"} | +-+---+--+-+---+--++ Comparing the above to the existing behavior, which uses org.joda.time.DateTime object for all three Drill types when returning nested values such as in list and maps: +--+--+---+---+---++--+ | date | time | timestamp | date_list | time_list | timestamp_list | time_map | +--+--+---+---+---++--+ | 1970-01-11 | 00:00:03.600 | 2018-03-21 20:30:07.364 | [{"weekyear":1970,"monthOfYear":1,"millisOfSecond":0,"secondOfMinute":0,"secondOfDay":0,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":2,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":0,"dayOfYear":11,"year":1970,"dayOfMonth":11,"dayOfWeek":7,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":86400,"afterNow":false,"beforeNow":true,"equalNow":false}] | [{"weekyear":1970,"monthOfYear":1,"millisOfSecond":600,"secondOfMinute":3,"secondOfDay":3,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":1,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":3600,"dayOfYear":1,"year":1970,"dayOfMonth":1,"dayOfWeek":4,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":3600,"afterNow":false,"beforeNow":true,"equalNow":false}] | [{"weekyear":2018,"monthOfYear":3,"millisOfSecond":364,"secondOfMinute":7,"secondOfDay":73807,"minuteOfHour":30,"minuteOfDay":1230,"weekOfWeekyear":12,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"hourOfDay":20,"millisOfDay":73807364,"dayOfYear":80,"year":2018,"dayOfMonth":21,"dayOfWeek":3,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":1521664207364,"afterNow":false,"beforeNow":true,"equalNow":false}] | {"date":{"weekyear":1970,"monthOfYear":1,"millisOfSecond":0,"secondOfMinute":0,"secondOfDay":0,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":2,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":0,"dayOfYear":11,"year":1970,"dayOfMonth":11,"dayOfWeek":7,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":86400,"afterNow":false,"beforeNow":true,"equalNow":false},"time":{"weekyear":1970,"monthOfYear":1,"millisOfSecond":600,"secondOfMinute":3,"secondOfDay":3,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":1,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":3600,"dayOfYear":1,"year":1970,"dayOfMonth":1,"dayOfWeek":4,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":3600,"afterNow":false,"beforeNow":true,"equalNow":false},"timestamp":{"weekyear":2018,"monthOfYear":3,"millisOfSecond":364,"secondOfMinute":7,"secondOfDay":73807,"minuteOfHour":30,"minuteOfDay":1230,"weekOfWeekyear":12,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"hourOfDay":20,"millisOfDay":73807364,"dayOfYear":80,"year":2018,"dayOfMonth":21,"dayOfWeek":3,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":1521664207364,"afterNow":false,"beforeNow":true,"equalNow":false}} | +--+--+---+---+---++--+ -- Jiang On 3/14/18, 1:44
Re: Different output for the same date field
I will file a Jira issue for this problem with a more detailed description. https://issues.apache.org/jira/browse/DRILL-6242 I put some solution possibilities in the Jira. Someone with more knowledge can take a look at see if any of the solution approach makes sense first. Thanks. -- Jiang On 3/13/18, 9:56 PM, "Kunal Khatua"wrote: Hi Jiang Thanks for bringing this to our attention. However, we're not actively developing the Mongo storage plugin at the moment, so it might be a while before this is updated. Do you think you could help with submitting a pull-request against a JIRA filed for this? You already appear to have become familiar with the Drill codebase, and we're looking to contributions from the community. ~ Kunal On Tue, Mar 13, 2018 at 3:23 PM, Jiang Wu wrote: > Setup: > > > 1. Mongodb with this sample data inside: > > db.test.findOne(); > { > "_id" : ObjectId("5aa8487d470dd39a635a12f5"), > "name" : "orange", > "context" : { >"time" : ISODate("2018-03-13T21:52:54.940Z"), >"user" : "jack" > } > } > > > 1. Connect with Drill 1.12 and run the query: > > > select t.context.`time` as `time`, t.context from mongo.test.test t; > > The result is: > > ++-+ > | time | context | > ++-+ > | 2018-03-13 | {"time":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2, > ... },"user":"jack"} | > ++-+ > > Note how the output formatting of the same field “time” is different > depending on whether the time is shown as a first level column or as a > nested JSON field. Has anyone seen this behavior? > > Looking at the source code, it appears that the JSON output is produced > from JsonStringHashMap.java:79 (org.apache.drill.exec:vector:1.12.0). > This class uses its own ObjectMapper instance to serialize the map. > > Would it better to add some Mixin’s to the various complex types that may > appear inside this JsonStringHashMap? For example, DateTime object can be > serialized using the logical representation, rather than the getters of the > DataTime class used by Drill. > > Thanks. > > -- Jiang > >
Re: Different output for the same date field
Hi Jiang Thanks for bringing this to our attention. However, we're not actively developing the Mongo storage plugin at the moment, so it might be a while before this is updated. Do you think you could help with submitting a pull-request against a JIRA filed for this? You already appear to have become familiar with the Drill codebase, and we're looking to contributions from the community. ~ Kunal On Tue, Mar 13, 2018 at 3:23 PM, Jiang Wuwrote: > Setup: > > > 1. Mongodb with this sample data inside: > > db.test.findOne(); > { > "_id" : ObjectId("5aa8487d470dd39a635a12f5"), > "name" : "orange", > "context" : { >"time" : ISODate("2018-03-13T21:52:54.940Z"), >"user" : "jack" > } > } > > > 1. Connect with Drill 1.12 and run the query: > > > select t.context.`time` as `time`, t.context from mongo.test.test t; > > The result is: > > ++-+ > | time | context | > ++-+ > | 2018-03-13 | > {"time":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2, > ... },"user":"jack"} | > ++-+ > > Note how the output formatting of the same field “time” is different > depending on whether the time is shown as a first level column or as a > nested JSON field. Has anyone seen this behavior? > > Looking at the source code, it appears that the JSON output is produced > from JsonStringHashMap.java:79 (org.apache.drill.exec:vector:1.12.0). > This class uses its own ObjectMapper instance to serialize the map. > > Would it better to add some Mixin’s to the various complex types that may > appear inside this JsonStringHashMap? For example, DateTime object can be > serialized using the logical representation, rather than the getters of the > DataTime class used by Drill. > > Thanks. > > -- Jiang > >
Different output for the same date field
Setup: 1. Mongodb with this sample data inside: db.test.findOne(); { "_id" : ObjectId("5aa8487d470dd39a635a12f5"), "name" : "orange", "context" : { "time" : ISODate("2018-03-13T21:52:54.940Z"), "user" : "jack" } } 1. Connect with Drill 1.12 and run the query: select t.context.`time` as `time`, t.context from mongo.test.test t; The result is: ++-+ | time | context | ++-+ | 2018-03-13 | {"time":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2, ... },"user":"jack"} | ++-+ Note how the output formatting of the same field “time” is different depending on whether the time is shown as a first level column or as a nested JSON field. Has anyone seen this behavior? Looking at the source code, it appears that the JSON output is produced from JsonStringHashMap.java:79 (org.apache.drill.exec:vector:1.12.0). This class uses its own ObjectMapper instance to serialize the map. Would it better to add some Mixin’s to the various complex types that may appear inside this JsonStringHashMap? For example, DateTime object can be serialized using the logical representation, rather than the getters of the DataTime class used by Drill. Thanks. -- Jiang