One other note: The default explain plan is a physical plan. If you want to get a logical plan, you need to do EXPLAIN PLAN WITHOUT IMPLEMENTATION FOR ...
If you're still getting the error around plan consumption, please file a jira. On May 23, 2015 10:28 AM, "Aman Sinha" <[email protected]> wrote: > 1. Yes, certainly the cost estimation is applied for file system sources > as well. For Parquet the row count is > extracted from the metadata available in the files. For text (csv, > json), this is estimated based on the size > of the files and a configurable *store.text.estimated_row_size_bytes* > parameter. The default value is 100 > bytes per row, but you can change it. > > 2. There are 2 things in your query that need special handling in Drill > because it does not rely on a centralized > schema. > First, the presence of '*' wildcard in the SELECT list - the '*' is > expanded at run-time unlike the schema-based > systems where it is expanded at the initial query planning time. > Second, whenever there is a join between 2 > or more tables, the columns in the query need to be qualified by the > table name because a column could > potentially be present in either of the tables. > The following query which explicitly specifies the qualified columns > in SELECT clause should work: > select r_regionkey from ( > select r.r_regionkey, n.n_regionkey from region r join nation n > on r.r_regionkey = n.n_regionkey) > > 3. You might want to file a JIRA with the exact reproducible test case. > 4. I believe the link you sent for the plan document is quite old. The > best place for the latest documentation is > drill.apache.org . > Non-equality joins are supported in 2 situations: > a) if there is another equality join between the same 2 tables. > e.g > select n1.n_nationkey from nation n1, nation n2 > where n1.n_nationkey = n2.n_nationkey and n1.n_regionkey < > n2.n_regionkey > b) if one side of the join is a scalar subquery: > select n1.n_nationkey from nation n1, (select min(n_regionkey) > as min_regionkey from nation n) n2 > where n1.n_regionkey = n2.min_regionkey; > > Pure cartesian joins are currently not supported. > > Aman > > > On Sat, May 23, 2015 at 8:13 AM, Piotr Sokólski <[email protected]> > wrote: > > > Hi, I’ve been playing a bit with v1.0.0 and stumbled upon a few > > questions/issues: > > > > 1. For query cost estimation one usually needs some additional > information > > about a table such as the number of rows. Is the cost estimation > > implemented for fs sources as well? If yes, how is the metadata extracted > > and cached? From my understanding some formats like parquet store it in > the > > file footer, but what about json or csv files? Can this information be > > queried/retrieved somehow by the user? > > > > 2. I’ve been working with the following query: > > > > $q = select * from region join nation on region.R_REGIONKEY = > > nation.N_REGIONKEY; > > > > where region and nation are the sample data files imported into a dfs.tmp > > schema. > > running queries like > > > > select R_REGIONKEY from ($q); > > > > results in an error "Column 'R_REGIONKEY' is ambiguous”. However queries > > like select R_REGIONKEY from (SELECT * FROM region); work fine, as well > as > > saving the result of the join with CREATE TABLE and then replacing $q > with > > the saved table’s name. Why is that and what are the rules for renaming > > columns in join queries? > > > > 3. I’ve been trying to execute a logical plan using the web interface. It > > works fine with a simple scan - project query, but when trying to use the > > output of EXPLAIN … FOR $q (with resultMode changed to “EXEC”) it throws > > the following error: > > > > SYSTEM ERROR: java.lang.IllegalArgumentException: Conflicting > > property-based creators: already had [constructor for > > org.apache.drill.common.logical.data.Join, ... > > > > the whole logical query and full error message are at > > https://gist.github.com/pyetras/bf625b6697de62284996 > > > > 4. What are the supported conditions for joins? The sql interface seems > to > > support only (e1 == e2 [AND])*, but the logical operator reference at > > > https://docs.google.com/document/d/1QTL8warUYS2KjldQrGUse7zp8eA72VKtLOHwfXy6c7I/mobilebasic?pli=1#cmnt7 > > mentions other relations and also cartesian joins. Are those simply not > > implemented for the sql parser or not supported in Drill at all? > > > > Sorry for the long read and thanks for your assistance, > > > > -- > > Piotr Sokólski > > > > >
