Hi Igor,

The sqlTypeOf() function came about when we wrote Learning Apache Drill. We 
wanted a way to show people the actual type of a column, especially in those 
cases where Drill "makes up" the type, such as when a missing column magically 
becoming Nullable INT.

The existing "typeOf" function would just return "NULL" in that case, but we 
could not show the reader what kind of null.

So, we added sqlTypeOf() to return the "minor type" portion of the major type 
pair for a column. The name returned is supposed to match the SQL names in the 
Drill documentation. The Drill documentation describes sqlTypeOf() this way: 
"Returns the data type of a column (using the SQL names) whether the column is 
NULL or not."

We also added a "modeOf()" function the Mode portion of the MajorType pair. The 
Drill documentation explains, "Returns the cardinality (mode) of the column as 
"NOT NULL", "NULLABLE", or "ARRAY". Drill data types include a cardinality, for 
example Optional Int or Required VarChar."

The result was we could clearly show that a column is, say, a Required Int, 
Nullable Int or Repeated Int; something the original typeOf() could not do. 
Chapter 8 of the book uses these two functions in multiple places to explain 
Drill's type system and schema inference.

Perhaps a better solution would be to have modified the SQL-standard typeof() 
function to return the fully type name: "INT NOT NULL", "INT" or "ARRAY<INT>". 
However, we didn't want to break backward compatibility. And, at the time the 
functions were added (Drill 1.14), the SQL type name syntax was not yet 
available in Drill.

To follow up on Arina's earlier note: perhaps, now that Drill supports the new 
type syntax, it makes sense to modify sqlTypeOf() (which is a Drill-specific 
function) to return the same schema as used in the schema files, using the 
format shown above.

If we do that, it still makes sense to be able to get at the name and mode 
separately (so we don't have to parse the full SQL syntax.) So, if sqlTypeOf() 
returns "ARRAY<INT>", say, then a new sqlNameOf(), could return just "INT".

The functions are likely only used for training and debugging (as noted 
earlier) so we are probably free to adjust them as the team thinks is best. If 
anyone was using them in "production" queries, then we would have heard from 
then when the functionality first changed.

I agree that we should have more complete tests; I'll try to find time to add 
them.

Actually, this issue raises a larger question. In the "old days", we would 
create a spec for things like functions. Projects such as Java and Python have 
language specs that spell out how things are meant to work. Then, if someone 
changes something in a way that violates the spec, the team knows to have a 
discussion about the relative merits of that change. (The benefits of the new 
version vs. the risk of breaking compatibility.) We can't really afford to do 
that in Drill. So, I like the idea of using unit tests as the next best thing.

Thanks,
- Paul

 

    On Friday, December 27, 2019, 02:48:40 AM PST, Igor Guzenko 
<[email protected]> wrote:  
 
 Hello Paul,

While working on complex types I also was unable to find any document with
expected results of the *typeOf functions.
Since you've implemented them, could you please describe them in some way?
Or it would be even better to extend test coverage in TestTypeFns.java
so nobody could accidentally break the functionality.

Related Jira tickets:
https://issues.apache.org/jira/browse/DRILL-6360
https://issues.apache.org/jira/browse/DRILL-6362
https://issues.apache.org/jira/browse/DRILL-6377
https://issues.apache.org/jira/browse/DRILL-5189

Thanks,
Igor

On Fri, Dec 27, 2019 at 11:18 AM Arina Yelchiyeva <
[email protected]> wrote:

> Hi Paul,
>
> thanks for verifying the release.
> Regression for the release is considered something that worked in previous
> version, in our case in 1.16 and stopped working in the released, in our
> case case 1.17. Vova has checked both sqltypeof issues, they behave the
> same in 1.16 so from the release perspective, they are not regressions but
> bugs. Thus they won’t have sinked the last release candidate. By the way,
> one of the issues actually was a deliberate change in 1.16, I have left
> comment in Jira.
>
> Regarding minor release, since both issues are not regressions, there is
> no point of doing it. Though if they were, I would still say we should not
> prepare minor release unless found bugs are something really serious that
> prevents Drill from starting. Mostly because release preparation takes time
> not only for the release manager but for the PMCs and committers to verify
> and cast votes.
>
> Actually, it a good time to think when we want to release 1.18 and find
> volunteer to be a release manager.
>
> Kind regards,
> Arina
>
> > On 27 Dec 2019, at 04:48, Paul Rogers <[email protected]> wrote:
> >
> > Hi Charles,
> >
> > Good question! We have a number of big changes queued up. To do a minor
> release, we'd want to branch off of the Drill 1.17 release, which is extra
> work.
> >
> >
> > Let's see if anyone finds other issues we'd like to fix so we can see if
> the effort would be worthwhile.
> >
> >
> > Thanks,
> > - Paul
> >
> >
> >
> >    On Thursday, December 26, 2019, 6:24:27 PM PST, Charles Givre <
> [email protected]> wrote:
> >
> > Paul,
> > Do you think it's worth fixing these and releasing a minor release in a
> month or so with this and any other minor bug fixes?
> > -- C
> >
> >> On Dec 26, 2019, at 9:17 PM, Paul Rogers <[email protected]>
> wrote:
> >>
> >> Hi All,
> >>
> >> I'm late to the party (was distracted by a certain holiday). Downloaded
> the artifacts and ran though the examples in the first several sections of
> Chapter 8 of the Learning Apache Drill book.
> >>
> >> It turns out we have a regression in the sqlTypeOf() function. See
> DRILL-7499 and DRILL-7501.
> >>
> >> These are minor as they only affect users of the functions. But, the
> errors cause the results in Drill 1.17 to regress from those in Drill 1.14
> (when the book was written.)
> >>
> >> Otherwise, everything seemed to work fine.
> >>
> >> Thanks,
> >> - Paul
> >>
> >>
> >>
> >>    On Sunday, December 22, 2019, 9:00:52 AM PST, Volodymyr Vysotskyi <
> [email protected]> wrote:
> >>
> >> Hi all,
> >>
> >> I'd like to propose the third release candidate (RC2) of Apache Drill,
> >> version 1.17.0.
> >>
> >> Changes since the previous release candidate: fixed show-stopper
> DRILL-7494
> >> <https://issues.apache.org/jira/browse/DRILL-7494>.
> >>
> >> The release candidate covers a total of 205 resolved JIRAs [1]. Thanks
> to
> >> everyone who contributed to this release.
> >>
> >> The tarball artifacts are hosted at [2] and the maven artifacts are
> hosted
> >> at [3].
> >>
> >> This release candidate is based on
> >> commit 2eb6bbe0501cb6553106e63dc1f2810ff10ae375 located at [4].
> >>
> >> Please download and try out the release.
> >>
> >> The vote ends at 5 PM UTC (9 AM PDT, 7 PM EET, 10:30 PM IST), December
> 25,
> >> 2019
> >>
> >> [ ] +1
> >> [ ] +0
> >> [ ] -1
> >>
> >> Here's my vote: +1
> >>
> >> [1]
> >>
> https://issues.apache.org/jira/secure/ReleaseNote.jspa?projectId=12313820&version=12344870
> >> [2] http://home.apache.org/~volodymyr/drill/releases/1.17.0/rc2/
> >> [3]
> https://repository.apache.org/content/repositories/orgapachedrill-1077/
> >> [4] https://github.com/vvysotskyi/drill/commits/drill-1.17.0
> >>
> >> Kind regards,
> >> Volodymyr Vysotskyi
>  

Reply via email to