Re: Different output for the same date field

2018-04-05 Thread Jiang Wu
Just a FYI.  Submitted a pull request for 
https://issues.apache.org/jira/browse/DRILL-6242.  The pull request materialize 
Drill timestamp, date, and time into java.sql.Timestamp, java.sql.Date, and 
java.sql.Time.  With these 3 distinct types returned inside nested values 
through the JDBC layer, the application using the output can now see the proper 
types.

For example, made a parquet file with date, time, timestamp, list of date, list 
of time, list of timestamp, and a map of data, time, timestamp.   Now querying 
this returns:

0: jdbc:drill:zk=local> select * from dfs.tmp.`datetime.parquet` limit 1;
+-+---+--+-+---+--++
|date | time  |timestamp |date_list|
 time_list |timestamp_list| 
 time_map  |
+-+---+--+-+---+--++
| 1970-01-11  | 00:00:03.600  | 2018-03-21 20:30:07.364  | ["1970-01-11"]  | 
["00:00:03.600"]  | ["2018-03-21 20:30:07.364"]  | 
{"date":"1970-01-11","time":"00:00:03.600","timestamp":"2018-03-21 
20:30:07.364"}  |
+-+---+--+-+---+--++
 

Comparing the above to the existing behavior, which uses org.joda.time.DateTime 
object for all three Drill types when returning nested values such as in list 
and maps:

+--+--+---+---+---++--+
| date | time | timestamp | date_list | time_list | timestamp_list | time_map |
+--+--+---+---+---++--+
| 1970-01-11 | 00:00:03.600 | 2018-03-21 20:30:07.364 | 
[{"weekyear":1970,"monthOfYear":1,"millisOfSecond":0,"secondOfMinute":0,"secondOfDay":0,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":2,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":0,"dayOfYear":11,"year":1970,"dayOfMonth":11,"dayOfWeek":7,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":86400,"afterNow":false,"beforeNow":true,"equalNow":false}]
 | 
[{"weekyear":1970,"monthOfYear":1,"millisOfSecond":600,"secondOfMinute":3,"secondOfDay":3,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":1,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":3600,"dayOfYear":1,"year":1970,"dayOfMonth":1,"dayOfWeek":4,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":3600,"afterNow":false,"beforeNow":true,"equalNow":false}]
 | 
[{"weekyear":2018,"monthOfYear":3,"millisOfSecond":364,"secondOfMinute":7,"secondOfDay":73807,"minuteOfHour":30,"minuteOfDay":1230,"weekOfWeekyear":12,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"hourOfDay":20,"millisOfDay":73807364,"dayOfYear":80,"year":2018,"dayOfMonth":21,"dayOfWeek":3,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":1521664207364,"afterNow":false,"beforeNow":true,"equalNow":false}]
 | 
{"date":{"weekyear":1970,"monthOfYear":1,"millisOfSecond":0,"secondOfMinute":0,"secondOfDay":0,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":2,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":0,"dayOfYear":11,"year":1970,"dayOfMonth":11,"dayOfWeek":7,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":86400,"afterNow":false,"beforeNow":true,"equalNow":false},"time":{"weekyear":1970,"monthOfYear":1,"millisOfSecond":600,"secondOfMinute":3,"secondOfDay":3,"minuteOfHour":0,"minuteOfDay":0,"weekOfWeekyear":1,"yearOfEra":1970,"yearOfCentury":70,"centuryOfEra":19,"hourOfDay":0,"millisOfDay":3600,"dayOfYear":1,"year":1970,"dayOfMonth":1,"dayOfWeek":4,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":3600,"afterNow":false,"beforeNow":true,"equalNow":false},"timestamp":{"weekyear":2018,"monthOfYear":3,"millisOfSecond":364,"secondOfMinute":7,"secondOfDay":73807,"minuteOfHour":30,"minuteOfDay":1230,"weekOfWeekyear":12,"yearOfEra":2018,"yearOfCentury":18,"centuryOfEra":20,"hourOfDay":20,"millisOfDay":73807364,"dayOfYear":80,"year":2018,"dayOfMonth":21,"dayOfWeek":3,"era":1,"chronology":{"zone":{"fixed":true,"id":"UTC"}},"zone":{"fixed":true,"id":"UTC"},"millis":1521664207364,"afterNow":false,"beforeNow":true,"equalNow":false}}
 |
+--+--+---+---+---++--+

-- Jiang


On 3/14/18, 1:44 

Re: Different output for the same date field

2018-03-14 Thread Jiang Wu
I will file a Jira issue for this problem with a more detailed description.  
https://issues.apache.org/jira/browse/DRILL-6242

I put some solution possibilities in the Jira.  Someone with more knowledge can 
take a look at see if any of the solution approach makes sense first.

Thanks.
 
-- Jiang


On 3/13/18, 9:56 PM, "Kunal Khatua"  wrote:

Hi Jiang

Thanks for bringing this to our attention.

However, we're not actively developing the Mongo storage plugin at the
moment, so it might be a while before this is updated.

Do you think you could help with submitting a pull-request against a JIRA
filed for this? You already appear to have become familiar with the Drill
codebase, and we're looking to contributions from the community.


~ Kunal

On Tue, Mar 13, 2018 at 3:23 PM, Jiang Wu  wrote:

> Setup:
>
>
>   1.  Mongodb with this sample data inside:
>
> db.test.findOne();
> {
>  "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
>  "name" : "orange",
>  "context" : {
>"time" : ISODate("2018-03-13T21:52:54.940Z"),
>"user" : "jack"
>  }
> }
>
>
>   1.  Connect with Drill 1.12 and run the query:
>
>
> select t.context.`time` as `time`, t.context from mongo.test.test t;
>
> The result is:
>
> ++-+
> | time   | context |
> ++-+
> | 2018-03-13 | 
{"time":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,
> ... },"user":"jack"} |
> ++-+
>
> Note how the output formatting of the same field “time” is different
> depending on whether the time is shown as a first level column or as a
> nested JSON field.  Has anyone seen this behavior?
>
> Looking at the source code, it appears that the JSON output is produced
> from JsonStringHashMap.java:79 (org.apache.drill.exec:vector:1.12.0).
> This class uses its own ObjectMapper instance to serialize the map.
>
> Would it better to add some Mixin’s to the various complex types that may
> appear inside this JsonStringHashMap?  For example, DateTime object can be
> serialized using the logical representation, rather than the getters of 
the
> DataTime class used by Drill.
>
> Thanks.
>
> -- Jiang
>
>



Re: Different output for the same date field

2018-03-13 Thread Kunal Khatua
Hi Jiang

Thanks for bringing this to our attention.

However, we're not actively developing the Mongo storage plugin at the
moment, so it might be a while before this is updated.

Do you think you could help with submitting a pull-request against a JIRA
filed for this? You already appear to have become familiar with the Drill
codebase, and we're looking to contributions from the community.


~ Kunal

On Tue, Mar 13, 2018 at 3:23 PM, Jiang Wu  wrote:

> Setup:
>
>
>   1.  Mongodb with this sample data inside:
>
> db.test.findOne();
> {
>  "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
>  "name" : "orange",
>  "context" : {
>"time" : ISODate("2018-03-13T21:52:54.940Z"),
>"user" : "jack"
>  }
> }
>
>
>   1.  Connect with Drill 1.12 and run the query:
>
>
> select t.context.`time` as `time`, t.context from mongo.test.test t;
>
> The result is:
>
> ++-+
> | time   | context |
> ++-+
> | 2018-03-13 | 
> {"time":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2,
> ... },"user":"jack"} |
> ++-+
>
> Note how the output formatting of the same field “time” is different
> depending on whether the time is shown as a first level column or as a
> nested JSON field.  Has anyone seen this behavior?
>
> Looking at the source code, it appears that the JSON output is produced
> from JsonStringHashMap.java:79 (org.apache.drill.exec:vector:1.12.0).
> This class uses its own ObjectMapper instance to serialize the map.
>
> Would it better to add some Mixin’s to the various complex types that may
> appear inside this JsonStringHashMap?  For example, DateTime object can be
> serialized using the logical representation, rather than the getters of the
> DataTime class used by Drill.
>
> Thanks.
>
> -- Jiang
>
>


Different output for the same date field

2018-03-13 Thread Jiang Wu
Setup:


  1.  Mongodb with this sample data inside:

db.test.findOne();
{
 "_id" : ObjectId("5aa8487d470dd39a635a12f5"),
 "name" : "orange",
 "context" : {
   "time" : ISODate("2018-03-13T21:52:54.940Z"),
   "user" : "jack"
 }
}


  1.  Connect with Drill 1.12 and run the query:


select t.context.`time` as `time`, t.context from mongo.test.test t;

The result is:

++-+
| time   | context |
++-+
| 2018-03-13 | 
{"time":{"dayOfYear":72,"year":2018,"dayOfMonth":13,"dayOfWeek":2, ... 
},"user":"jack"} |
++-+

Note how the output formatting of the same field “time” is different depending 
on whether the time is shown as a first level column or as a nested JSON field. 
 Has anyone seen this behavior?

Looking at the source code, it appears that the JSON output is produced from 
JsonStringHashMap.java:79 (org.apache.drill.exec:vector:1.12.0).  This class 
uses its own ObjectMapper instance to serialize the map.

Would it better to add some Mixin’s to the various complex types that may 
appear inside this JsonStringHashMap?  For example, DateTime object can be 
serialized using the logical representation, rather than the getters of the 
DataTime class used by Drill.

Thanks.

-- Jiang