Re: "Death of Schema-on-Read"

2018-04-06 Thread Paul Rogers
Hi Aman,

As we get into details, I suggested to Hanu that we move the discussion into a 
JIRA ticket.

 >On the subject of CAST pushdown to Scans, there are potential drawbacks

 >  - In general, the planner will see a Scan-Project where the Project has  
CAST functions.  But the Project can have arbitrary expressions,  e.g  CAST(a 
as INT) * 5

Suggestion: push the CAST(a AS INT) down to the scan, do the a * 5 in the 
Project operator.

>  or a combination of 2 CAST functions 

If the user does a two-stage cast, CAST(CAST(a AS INT) AS BIGINT), then one 
simple rule is to push only the innermost cast downwards.

> or non-CAST functions etc.

Just keep it in Project.

 >    It would be quite expensive to examine each expression (there could be 
hundreds) to determine whether it is eligible to be pushed to the Scan.

Just push CAST( AS ). Even that would be a huge win. Note, 
for CSV, it might have to be CAST(columns[2] AS INT), since "columns" is 
special for CSV.

>   - Expressing Nullability is not possible with CAST.  If a column should be 
> tagged as  (not)nullable, CAST syntax does not allow that.

Can we just add keywords: CAST(a AS INT NULL), CAST(b AS VARCHAR NOT NULL) ?

 >  - Drill currently supports CASTing to a SQL data type, but not to the 
complex types such as arrays and maps.  We would have to add support for that 
from a language perspective as well as the run-time.  This would be non-trivial 
effort.

The term "complex type" is always confusing. Consider a map. The rules would 
apply recursively to the members of the map. (Problem: today, if I reference a 
map member, Drill pulls it to the top level: SELECT m.a creates a new top-level 
field, it does not select "a" within "m". We need to fix that anyway.  So, 
CAST(m.a AS INT) should imply the type of column "a" within map "m".

For arrays, the problem is more complex. Perhaps more syntax: CAST(a[] AS INT) 
to force array elements to INT. Maybe use CAST(a[][] AS INT) for a repeated 
list (2D array).

Unions don't need a solution as they are their own solution (they can hold 
multiple types.) Same for (non-repeated) lists. 

To resolve runs of nulls, maybe allow CAST(m AS MAP). Or we can imply that "m" 
is a Map from the expression CAST(m.a AS INT). For arrays, the previously 
suggested CAST(a[] AS INT). If columns "a" or "m" turn out to be a non-null 
scalar, then we have no good answer.

CAST cannot solve the nasty cases of JSON in which some fields are complex, 
some scalar. E.g. {a: 10} {a: [20]} or {m: "foo"} {m: {value: "foo"}}. I 
suppose no solution is perfect...

I'm sure that, if someone gets a chance to desig this feature, they'll find 
lots more issues. Maybe cast push-down is only a partial solution. But, it 
seems to solve so many of the JSON and CSV cases that I've seen that it seems 
too good to pass up.

Thanks,


- Paul  

Re: "Death of Schema-on-Read"

2018-04-06 Thread Paul Rogers
Hi Hanu,

The problem with views as is, even with casts, is that the casting comes too 
late to resolve he issues I highlighted in earlier messages. Ted's cast 
push-down idea causes the conversion to happen at read time so that we can, 
say, cast a string to an int, or cast a null to the proper type.

Today, if we use a cast, such as SELECT cast(a AS INT) FROM myTable then we get 
a DAG that has tree parts (to keep things simple):

* Scan the data, using types inferred from the data itself
* In a Filter operator, convert the type of data to INT
* In Screen, return the result to the user

If the type is ambiguous in the file, then the first step above fails; data 
never gets far enough for the Filter to kick in and apply the cast. Also, if a 
file contains a run of nulls, the scanner will choose Nullable Int, then fail 
when it finds, say, a string.

The key point is that the cast push-down means that the query will not fail due 
to dicey files: the cast resolves the ambiguity. If we push the cast down, then 
it is the SCAN operator that resolves the conflict and does the cast; avoiding 
the failures we've been discussing.

I like the idea you seem to be proposing: cascading views. Have a table view 
that cleans up each table. Then, these can be combined in higher-order views 
for specialized purposes.

The beauty of the cast push-down idea is that no metadata is needed other than 
the query. If the user wants metadata, they use existing views (that contain 
the casts and cause the cast push-down.)

This seems like such a simple, elegant solution that we could try it out 
quickly (if we get past the planner issues Aman mentioned.) In fact, the new 
scan operator code (done as part of the batch sizing work) already has a 
prototype mechanism for type hints. If the type hint is provided to the 
scanner, it uses them, otherwise it infers the type. We'd just hook up the cast 
push down data to that prototype and we could try out the result quickly. (The 
new scan operator is still in my private branch, in case anyone goes looking 
for it...)

Some of your discussion talks about automatically inferring the schema. I 
really don't think we need to do that. The hint (cast push-down) is sufficient 
to resolve ambiguities in the existing scan-time schema inference.

The syntax trick would be to find a way to provide hints just for those columns 
that are issues. If I have a table with columns a, b, ... z, but only b is a 
problem, I don't want to have to do:

SELECT a, CAST(b AS INT), c, ... z FROM myTable

Would be great if we could just do:

SELECT *, CAST(b AS INT) FROM myTable

I realize the above has issues; the key idea is: provide casts only for the 
problem fields without spelling out all fields.

If we really want to get fancy, we can do UDF push down for the complex cases 
you mentioned. Maybe:

SELECT *, CAST(b AS INT), parseCode(c) ...

We are diving into design here; maybe you can file a JIRA and we can shift 
detailed design discussion to that JIRA. Salim already has one related to 
schema change errors, which was why the "Death" article caught my eye.

Thanks,
- Paul

 

On Friday, April 6, 2018, 4:59:40 PM PDT, Hanumath Rao Maduri 
 wrote:  
 
 Hello,

Thanks for Ted & Paul for clarifying my questions.
Sorry for not being clear in my previous post, When I said create view I
was under the impression for simple views where we use cast expressions
currently to cast them to types. In this case planner can use this
information to force the scans to use this as the schema.

If the query fails then it fails at the scan and not after inferring the
schema by the scanner.

I know that views can get complicated with joins and expressions. For
schema hinting through views I assume they should be created on single
tables with corresponding columns one wants to project from the table.


Regarding the same question, today we had a discussion with Aman. Here view
can be considered as a "view" of the table with schema in place.

We can change some syntax to suite it for specifying schema. something like
this.

create schema[optional] view(/virtual table ) v1 as (a: int, b : int)
select a, b from t1 with some other rules as to conversion of scalar to
complex types.

Then the queries when used on this view (below) should enable the scanner
to use this type information and then use it to convert the data into the
appropriate types.
select * from v1

For the possibility of schema information not being known by the user, may
be use something like this.

create schema[optional] view(/virtual table) v1 as select a, b from t1
infer schema.

This view when used to query the table should trigger the logic of
inferring and consolidating the schema and attaching that inferred schema
to the view. In future when we use the same view, we should be using the
inferred schema. This view either can be local view pertaining to the
session or a global view so that other queries across sessions can use them.


By 

Re: "Death of Schema-on-Read"

2018-04-06 Thread Aman Sinha
On the subject of CAST pushdown to Scans, there are potential drawbacks
...

   - In general, the planner will see a Scan-Project where the Project has
   CAST functions.  But the Project can have arbitrary expressions,  e.g
   CAST(a as INT) * 5  or a combination of 2 CAST functions or non-CAST
   functions etc.   It would be quite expensive to examine each expression
   (there could be hundreds) to determine whether it is eligible to be pushed
   to the Scan.
   - Expressing Nullability is not possible with CAST.   If a column should
   be tagged as  (not)nullable, CAST syntax does not allow that.
   - Drill currently supports CASTing to a SQL data type, but not to the
   complex types such as arrays and maps.  We would have to add support for
   that from a language perspective as well as the run-time.  This would be
   non-trivial effort.

-Aman


On Fri, Apr 6, 2018 at 4:59 PM, Hanumath Rao Maduri 
wrote:

> Hello,
>
> Thanks for Ted & Paul for clarifying my questions.
> Sorry for not being clear in my previous post, When I said create view I
> was under the impression for simple views where we use cast expressions
> currently to cast them to types. In this case planner can use this
> information to force the scans to use this as the schema.
>
> If the query fails then it fails at the scan and not after inferring the
> schema by the scanner.
>
> I know that views can get complicated with joins and expressions. For
> schema hinting through views I assume they should be created on single
> tables with corresponding columns one wants to project from the table.
>
>
> Regarding the same question, today we had a discussion with Aman. Here view
> can be considered as a "view" of the table with schema in place.
>
> We can change some syntax to suite it for specifying schema. something like
> this.
>
> create schema[optional] view(/virtual table ) v1 as (a: int, b : int)
> select a, b from t1 with some other rules as to conversion of scalar to
> complex types.
>
> Then the queries when used on this view (below) should enable the scanner
> to use this type information and then use it to convert the data into the
> appropriate types.
> select * from v1
>
> For the possibility of schema information not being known by the user, may
> be use something like this.
>
> create schema[optional] view(/virtual table) v1 as select a, b from t1
> infer schema.
>
> This view when used to query the table should trigger the logic of
> inferring and consolidating the schema and attaching that inferred schema
> to the view. In future when we use the same view, we should be using the
> inferred schema. This view either can be local view pertaining to the
> session or a global view so that other queries across sessions can use
> them.
>
>
> By default we can apply certain rules such as converting simple scalar
> values to other scalar values (like int to double etc). But we should be
> also able to give option to the customer to enable rules such as scalar int
> to array[int] when creating the view itself.
>
>
> Thanks,
> -Hanu
>
>
> On Fri, Apr 6, 2018 at 3:10 PM, Paul Rogers 
> wrote:
>
> > Ted, this is why your participation in Drill is such a gift: cast
> > push-down is an elegant, simple solution that even works in views.
> > Beautiful.
> >
> > Thanks,
> > - Paul
> >
> >
> >
> > On Friday, April 6, 2018, 11:35:37 AM PDT, Ted Dunning <
> > ted.dunn...@gmail.com> wrote:
> >
> >  On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogers 
> > wrote:
> >
> > > Great discussion. Really appreciate the insight from the Drill users!
> > >
> > > To Ted's points: the simplest possible solution is to allow a table
> > > function to express types. Just making stuff up:
> > >
> > > SELECT a FROM schema(myTable, (a: INT))
> > >
> >
> > Why not just allow cast to be pushed down to the reader?
> >
> > Why invent new language features?
> >
> > Or, really ugly, a session option:
> > >
> > > ALTER SESSION SET schema.myTable="a: INT"
> > >
> >
> > These are a big problem.
> >
> >
>


[GitHub] drill issue #1203: DRILL-6289: Cluster view should show more relevant inform...

2018-04-06 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1203
  
@arina-ielchiieva could you please review this?


---


[GitHub] drill issue #1203: DRILL-6289: Cluster view should show more relevant inform...

2018-04-06 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1203
  
**PopOut to open in new window:**

![popout](https://user-images.githubusercontent.com/4335237/38448772-f198f47e-39bb-11e8-848f-19c15cb9c24a.png)

Tooltip describing the new columns

![tooltipinfo](https://user-images.githubusercontent.com/4335237/38448769-ef047594-39bb-11e8-8697-8644d8130fe3.png)




---


[GitHub] drill issue #1203: DRILL-6289: Cluster view should show more relevant inform...

2018-04-06 Thread kkhatua
Github user kkhatua commented on the issue:

https://github.com/apache/drill/pull/1203
  
**Snapshot:**

![screenshot](https://user-images.githubusercontent.com/4335237/38448762-e52ac3e8-39bb-11e8-9b42-5276bf3e3449.png)




---


Re: "Death of Schema-on-Read"

2018-04-06 Thread Hanumath Rao Maduri
Hello,

Thanks for Ted & Paul for clarifying my questions.
Sorry for not being clear in my previous post, When I said create view I
was under the impression for simple views where we use cast expressions
currently to cast them to types. In this case planner can use this
information to force the scans to use this as the schema.

If the query fails then it fails at the scan and not after inferring the
schema by the scanner.

I know that views can get complicated with joins and expressions. For
schema hinting through views I assume they should be created on single
tables with corresponding columns one wants to project from the table.


Regarding the same question, today we had a discussion with Aman. Here view
can be considered as a "view" of the table with schema in place.

We can change some syntax to suite it for specifying schema. something like
this.

create schema[optional] view(/virtual table ) v1 as (a: int, b : int)
select a, b from t1 with some other rules as to conversion of scalar to
complex types.

Then the queries when used on this view (below) should enable the scanner
to use this type information and then use it to convert the data into the
appropriate types.
select * from v1

For the possibility of schema information not being known by the user, may
be use something like this.

create schema[optional] view(/virtual table) v1 as select a, b from t1
infer schema.

This view when used to query the table should trigger the logic of
inferring and consolidating the schema and attaching that inferred schema
to the view. In future when we use the same view, we should be using the
inferred schema. This view either can be local view pertaining to the
session or a global view so that other queries across sessions can use them.


By default we can apply certain rules such as converting simple scalar
values to other scalar values (like int to double etc). But we should be
also able to give option to the customer to enable rules such as scalar int
to array[int] when creating the view itself.


Thanks,
-Hanu


On Fri, Apr 6, 2018 at 3:10 PM, Paul Rogers 
wrote:

> Ted, this is why your participation in Drill is such a gift: cast
> push-down is an elegant, simple solution that even works in views.
> Beautiful.
>
> Thanks,
> - Paul
>
>
>
> On Friday, April 6, 2018, 11:35:37 AM PDT, Ted Dunning <
> ted.dunn...@gmail.com> wrote:
>
>  On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogers 
> wrote:
>
> > Great discussion. Really appreciate the insight from the Drill users!
> >
> > To Ted's points: the simplest possible solution is to allow a table
> > function to express types. Just making stuff up:
> >
> > SELECT a FROM schema(myTable, (a: INT))
> >
>
> Why not just allow cast to be pushed down to the reader?
>
> Why invent new language features?
>
> Or, really ugly, a session option:
> >
> > ALTER SESSION SET schema.myTable="a: INT"
> >
>
> These are a big problem.
>
>


[GitHub] drill pull request #1203: DRILL-6289: Cluster view should show more relevant...

2018-04-06 Thread kkhatua
GitHub user kkhatua opened a pull request:

https://github.com/apache/drill/pull/1203

DRILL-6289: Cluster view should show more relevant information

The commits make use of AJAX calls by the browser to all the Drillbits for 
status metrics, which are then displayed here. This works with unsecured Drill 
sessions, while secure setups will not be able to show the metrics because of 
certificate authentication.
To achieve this, the HTTP port needed to be discoverable, for which changes 
to the protobuf were done to provide the HTTP port number as well.
The Drillbit homepage has been extended to show the following:
* Heap Memory in use
* Direct Memory (actively) in use - Since we're not able to get the total 
memory held by Netty at the moment, but only what is currently allocated to 
running queries. The base memory shown is the total peak allocation by Drill 
for Direct, because the actual Direct memory held is not available. In an 
active stable system in production, this should be around 90%.
* Average (System) Load Factor reported by the OS for that node.
* Pill indicating the current Drillbit whose WebUI is being viewed 
(existing UX feature), with an option to load (in a new window) the UI from 
other Drillbits.

Information such as the User, Data and Control port numbers don't help much 
during general cluster health, so it might be worth removing this information 
if more real-estate is needed. For now, we are retaining this. 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kkhatua/drill DRILL-6289

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/1203.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1203


commit 94e8bbaefea7c6fe9b5729bcec3cba1dbbb3bc8f
Author: Kunal Khatua 
Date:   2018-03-26T05:03:12Z

Protobuf change to carry HTTP port info

commit 9b1513da949da870fc8d9a96472c4ca307e7d5e8
Author: Kunal Khatua 
Date:   2018-04-05T22:21:47Z

Allow CORS for access to remote Drillbit metrics

Cross-origin resource sharing (CORS) is required to ensure that the 
WebServer is able serve REST calls for status pages.

commit eb9ae7c36a26194b5a49685230e2a94282efee84
Author: Kunal Khatua 
Date:   2018-04-05T21:21:18Z

Addition of Bootstrap's Glyphicons

As part of the Bootstrap's components, this meets Apache License criteria
Remove popout.png

commit 45e4831403997661c1df578a95e3c985f64ff4f8
Author: Kunal Khatua 
Date:   2018-04-06T23:08:43Z

Materialize relevant metrics

1. Heap memory (incl usage)
2. Heap memory (incl usage)
3. Average System Load (last 1 min)
4. Option to view from other nodes (pop out)
5. Added Glyphicons

commit dd507ea95a7d0ef4904092ef4fc43ee3b44ef058
Author: Kunal Khatua 
Date:   2018-03-30T06:24:54Z

Update System Table and related tests

1. Updated System Table to show HTTP port
2. Updated unit tests




---


Re: "Death of Schema-on-Read"

2018-04-06 Thread Paul Rogers
Ted, this is why your participation in Drill is such a gift: cast push-down is 
an elegant, simple solution that even works in views. Beautiful.

Thanks,
- Paul

 

On Friday, April 6, 2018, 11:35:37 AM PDT, Ted Dunning 
 wrote:  
 
 On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogers 
wrote:

> Great discussion. Really appreciate the insight from the Drill users!
>
> To Ted's points: the simplest possible solution is to allow a table
> function to express types. Just making stuff up:
>
> SELECT a FROM schema(myTable, (a: INT))
>

Why not just allow cast to be pushed down to the reader?

Why invent new language features?

Or, really ugly, a session option:
>
> ALTER SESSION SET schema.myTable="a: INT"
>

These are a big problem.
  

Re: "Death of Schema-on-Read"

2018-04-06 Thread Paul Rogers
Hi Hanu,
Sorry, I tossed in a new topic late in the discussion. We started by noting 
that views don't always work to resolve low-level schema conflicts for the 
reasons we discussed, so we need something else. That led to the schema hint 
discussion.

The additional point I raised was that views are still very useful for other 
tasks such as, say, computed columns (extended_price = price * quantity), for 
filtering out data (rejecting certain kinds of unwanted records) and so on.

If we need both hints and views (they serve distinct purposes), we'd want to 
ask how could a user can combine them into a single file-based schema file so 
that query users just see a simplified version of the table (with the hints and 
views applied.)

Since I tossed in new ideas, here is one more. We once saw a wild-and-crazy 
form of JSON with embedded metadata. Assume a list of customers:

{name: {type: "string", value: "Fred"}, age: {type: "int", value: 40}}

In such a case, would be great to be able to transform the data so that the 
fields become simple value like this:

{name: "Fred", age: 40}

Views can do this for top-level fields. But, there is no syntax in Drill to do 
this in nested maps:

{... address: {street: {type: "string", value "301 Cobblestone Way"}, ...}}

Ideally, we'd transform this to:

{name: "Fred", age: 40, address: {street: "301 Cobblestone Way", ...}}

So, if we come up with a metadata hint system, we (or the community) should be 
able to add rules for the type of messy data actually encountered in the field.

Thanks,
- Paul

 

On Thursday, April 5, 2018, 10:22:46 PM PDT, Hanumath Rao Maduri 
 wrote:  
 
 Hello,

Thank you Paul for starting this discussion.
However, I was not clear on the latest point as to how providing hints and
creating a view(mechanism which already exists in DRILL) is different.
I do think that creating a view can be cumbersome (in terms of syntax).
Providing hints are ephemeral and hence it can be used for quick validation
of the schema for a query execution. But if the user absolutely knows the
schema, then I think creating a view and using it might be a better option.
Can you please share your thoughts on this.

Thank you Ted for your valuable suggestions, as regards to your comment on
"metastore is good but centralized is bad" can you please share your view
point on what all design issues it can cause. I know that it can be
bottleneck but just want to know about other issues.
Put in other terms if centralized metastore engineered in a good way to
avoid most of the bottleneck, then do you think it can be good to use for
metadata?

Thanks,
-Hanu

On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogers 
wrote:

> Great discussion. Really appreciate the insight from the Drill users!
>
> To Ted's points: the simplest possible solution is to allow a table
> function to express types. Just making stuff up:
>
> SELECT a FROM schema(myTable, (a: INT))
>
> Or, a SQL extension:
>
> SELECT a FROM myTable(a: INT)
>
> Or, really ugly, a session option:
>
> ALTER SESSION SET schema.myTable="a: INT"
>
> All these are ephemeral and not compatible with, say, Tableau.
>
> Building on Ted's suggestion of using the (distributed) file system we can
> toss out a few half-baked ideas. Maybe use a directory to represent a name
> space, with files representing tables. If I have "weblogs" as my directory,
> I might have a file called "jsonlog" to describe the (messy) format of my
> JSON-formatted log files. And "csvlog" to describe my CSV-format logs.
> Different directories represent different SQL databases (schemas),
> different files represent tables within the schema.
>
>
> The table files can store column hints. But, it could do more. Maybe
> define the partitioning scheme (by year, month, day, say) so that can be
> mapped to a column. Wouldn't it be be great if Drill could figure out the
> partitioning itself if we gave a date range?
>
> The file could also define the format plugin to use, and its options, to
> avoid the need to define this format separate from the data, and to reduce
> the need for table functions.
>
> Today, Drill matches files to format plugins using only extensions. The
> table file could provide a regex for those old-style files (such as real
> web logs) that don't use suffixes. Or, to differentiate between "sales.csv"
> and "returns.csv" in the same data directory.
>
>
> While we're at it, the file might as well contain a standard view to apply
> to the table to define computed columns, do data conversions and so on.
>
> If Drill does automatic scans (to detect schema, to gather stats), maybe
> store that alongside the table file: "csvlogs.drill" for the
> Drill-generated info.
>
>
> Voila! A nice schema definition with no formal metastore. Because the info
> is in files, it easy to version using git, etc. (especially if the
> directory can be mounted using NFS as a normal directory.) Atomic updates
> can be done via the rename 

Re: "Death of Schema-on-Read"

2018-04-06 Thread Ted Dunning
On Thu, Apr 5, 2018 at 10:22 PM, Hanumath Rao Maduri 
wrote:

> ...
>
> Thank you Ted for your valuable suggestions, as regards to your comment on
> "metastore is good but centralized is bad" can you please share your view
> point on what all design issues it can cause. I know that it can be
> bottleneck but just want to know about other issues.

Put in other terms if centralized metastore engineered in a good way to
> avoid most of the bottleneck, then do you think it can be good to use for
> metadata?
>

Centralized metadata stores have caused the following problems in my
experience:

1) they lock versions and make it extremely hard to upgrade applications
incrementally. It is a common fiction that one can upgrade all applications
using the same data at the same moment. It isn't acceptable to require an
outage and force an upgrade on users. It also isn't acceptable to force the
metadata store to never be updated.

2) they go down and take everything else with it.

3) they require elaborate caching. The error message "updating metadata
cache" was the most common string on the impala mailing list for a long
time because of the 30 minute delays that customers were seeing due to this
kind of problem.

4) they limit expressivity. Because it is hard to update a metadata store
safely, they move slowly and typically don't describe new data well. Thus,
Hive metadata store doesn't deal with variable typed data or structured
data worth a darn. The same thing will happen with any new centralized
meta-data store.

5) they inhibit multi-tenancy. Ideally, data describes itself so that
different users can see the same data even if they are nominally not part
of the same org or sub-org.

6) they inhibit data fabrics that extend beyond a single cluster.
Centralized metadata stores are inherently anti-global. Self-describing
data, on the other hand, is inherently global since whereever the data
goes, so goes the metadata. Note that self-describing data does not have to
be intrinsically self-descriptive in a single file. I view JSON file with a
schema file alongside as a self-describing pair.

As an example, imagine that file extensions were tied to applications by a
central authority (a metadata store). This would mean that you couldn't
change web browsers (.html) or spreadhsheets. Or compilers. And frankly,
the fact that my computer has a single idea about how a file is interpreted
is limiting. I would prefer to use photoshop on images in certain
directories and Preview for other images elsewhere. A single repository
linking file type to application is too limiting even on my laptop.

That is the same issue, ultimately, as a centralized data store except that
my issues with images are tiny compared to the problems that occur when you
have 5000 analysts working on data that all get screwed by a single broken
piece of software.


Re: "Death of Schema-on-Read"

2018-04-06 Thread Ted Dunning
On Thu, Apr 5, 2018 at 9:43 PM, Paul Rogers 
wrote:

> Great discussion. Really appreciate the insight from the Drill users!
>
> To Ted's points: the simplest possible solution is to allow a table
> function to express types. Just making stuff up:
>
> SELECT a FROM schema(myTable, (a: INT))
>

Why not just allow cast to be pushed down to the reader?

Why invent new language features?

Or, really ugly, a session option:
>
> ALTER SESSION SET schema.myTable="a: INT"
>

These are a big problem.


[GitHub] drill issue #1199: DRILL-6303: Provide a button to copy the Drillbit's JStac...

2018-04-06 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1199
  
+1


---


[GitHub] drill issue #1197: DRILL-6279: UI indicates operators that spilled in-memory...

2018-04-06 Thread arina-ielchiieva
Github user arina-ielchiieva commented on the issue:

https://github.com/apache/drill/pull/1197
  
+1


---