Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-09-04 Thread Russell Spitzer
They are based on a physical column, the column is real. The function just only exists in the datasource. For example Select ttl(a), ttl(b) FROM table ks.tab On Tue, Sep 4, 2018 at 11:16 PM Reynold Xin wrote: > Russell your special columns wouldn’t actually work with option 1 because > Spark

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-09-04 Thread Reynold Xin
Russell your special columns wouldn’t actually work with option 1 because Spark would have to fail them in analysis without an actual physical column. On Tue, Sep 4, 2018 at 9:12 PM Russell Spitzer wrote: > I'm a big fan of 1 as well. I had to implement something similar using > custom

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-09-04 Thread Russell Spitzer
I'm a big fan of 1 as well. I had to implement something similar using custom expressions and it was a bit more work than it should be. In particular our use case is that columns have certain metadata (ttl, writetime) which exist not as separate columns but as special values which can be surfaced.

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-09-04 Thread Ryan Blue
Thanks for posting the summary. I'm strongly in favor of option 1. I think that API footprint is fairly small, but worth it. Not only does it make sources easier to implement by handling parsing, it also makes sources more reliable because Spark handles validation the same way across sources. A

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-09-04 Thread Reynold Xin
Ryan, Michael and I discussed this offline today. Some notes here: His use case is to support partitioning data by derived columns, rather than physical columns, because he didn't want his users to keep adding the "date" column when in reality they are purely derived from some timestamp column.

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-08-15 Thread Ryan Blue
I think I found a good solution to the problem of using Expression in the TableCatalog API and in the DeleteSupport API. For DeleteSupport, there is already a stable and public subset of Expression named Filter that can be used to pass filters. The reason why DeleteSupport would use Expression is

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-08-15 Thread Ryan Blue
I agree that it would be great to have a stable public expression API that corresponds to what is parsed, not the implementations. That would be great, but I worry that it will get out of date, and a data source that needs to support a new expression has to wait up to 6 months for a public release

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-08-15 Thread Reynold Xin
Sorry I completely disagree with using Expression in critical public APIs that we expect a lot of developers to use. There's a huge difference between exposing InternalRow vs Expression. InternalRow is a relatively small surface (still quite large) that I can see ourselves within a version getting

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-08-13 Thread Ryan Blue
Reynold, did you get a chance to look at my response about using `Expression`? I think that it's okay since it is already exposed in the v2 data source API. Plus, I wouldn't want to block this on building a public expression API that is more stable. I think that's the only objection to this SPIP.

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-07-26 Thread Ryan Blue
I don’t think that we want to block this work until we have a public and stable Expression. Like our decision to expose InternalRow, I think that while this option isn’t great, it at least allows us to move forward. We can hopefully replace it later. Also note that the use of Expression is in the

Re: [DISCUSS] SPIP: APIs for Table Metadata Operations

2018-07-26 Thread Reynold Xin
Seems reasonable at high level. I don't think we can use Expression's and SortOrder's in public APIs though. Those are not meant to be public and can break easily across versions. On Tue, Jul 24, 2018 at 9:26 AM Ryan Blue wrote: > The recently adopted SPIP to standardize logical plans requires