Re: Incorrect column name with OVER clause on Drill 1.8

Rahul Raj Sun, 04 Dec 2016 02:56:13 -0800

I was using sqlline and web console with drill 1.9.

I executed a select all on a very small csv file, but surprisingly found
few additional columns [fqn,filename,filepath,suffix] along with the array
containing the results. Then converted the results to parquet by selecting
the specific required columns. The parquet file was created with the same
additional columns in spite of selecting the required columns.


The column aliases are missing in the queries executed on the resulting
parquet. I am not sure if the problems are related.

My findings and the query plan are attached as  findings.txt.

Regards,
Rahul

On Sun, Dec 4, 2016 at 1:17 PM, Abhishek Girish <[email protected]>
wrote:

> I ran a similar query on a parquet dataset - and they returned the right
> results. This was on Drill 1.9.0 on CentOS. Tried via (1) Sqlline, (2)
> Drill web UI and (3) a custom JDBC app. I'm not sure why column aliases
> aren't taking affect for you. What client are you using? And also share the
> text plan for the query.
>
> > select c_last_review_date as `a`, sum(c_birth_month) over() as `b` from
> customer limit 1;
> *+----------+---------+*
> *| **   a    ** | **   b   ** |*
> *+----------+---------+*
> *| *2452508 * | *628122
> * |**+----------+---------+*
>
> On Sat, Dec 3, 2016 at 11:16 PM, Rahul Raj <rahul.raj@option3consulting.
> com>
> wrote:
>
> > Ignore the '${}' in the table name ${purchases_by_item_date}, it gets
> > substituted as a valid name.
> > Rahul
> >
> > On Sun, Dec 4, 2016 at 12:36 PM, Rahul Raj <rahul.raj@option3consulting.
> > com>
> > wrote:
> >
> > > The following query:
> > >
> > > SELECT
> > > bill_date,
> > > sum(sell_amt) over() as `cum_purchases_amt`
> > > FROM ${purchases_by_item_date}
> > >
> > > on a parquet file returns column name as '$1' instead of
> > cum_purchases_amt.
> > > Any ways to override the name?
> > >
> > > Drill 1.6 also shows the same behaviour.
> > >
> > > Rahul
> > >
> > >
> >
> > --
> > **** This email and any files transmitted with it are confidential and
> > intended solely for the use of the individual or entity to whom it is
> > addressed. If you are not the named addressee then you should not
> > disseminate, distribute or copy this e-mail. Please notify the sender
> > immediately and delete this e-mail from your system.****
> >
>

-- 
**** This email and any files transmitted with it are confidential and 
intended solely for the use of the individual or entity to whom it is 
addressed. If you are not the named addressee then you should not 
disseminate, distribute or copy this e-mail. Please notify the sender 
immediately and delete this e-mail from your system.****

cat /nfsshare/myfs/0/x.csv
A,19,US
A,30,UK
B,30,US
C,50,US
C,44,UK


0: jdbc:drill:zk=localhost:2181> select * from dfs.myfs.`0/x.csv`;
+------------------+-----------------------------+-----------+-----------------------+---------+
|     columns      |             fqn             | filename  |       filepath   
     | suffix  |
+------------------+-----------------------------+-----------+-----------------------+---------+
| ["A","19","US"]  | /nfsshare/myfs/0/x.csv  | x.csv     | /nfsshare/myfs/0  | 
csv     |
| ["A","30","UK"]  | /nfsshare/myfs/0/x.csv  | x.csv     | /nfsshare/myfs/0  | 
csv     |
| ["B","30","US"]  | /nfsshare/myfs/0/x.csv  | x.csv     | /nfsshare/myfs/0  | 
csv     |
| ["C","50","US"]  | /nfsshare/myfs/0/x.csv  | x.csv     | /nfsshare/myfs/0  | 
csv     |
| ["C","44","UK"]  | /nfsshare/myfs/0/x.csv  | x.csv     | /nfsshare/myfs/0  | 
csv     |
+------------------+-----------------------------+-----------+-----------------------+---------+
5 rows selected (0.087 seconds)

0: jdbc:drill:zk=localhost:2181> select columns[0] as a,columns[1] as 
b,columns[2] as c from dfs.myfs.`0/x.csv`;
+----+-----+-----+
| a  |  b  |  c  |
+----+-----+-----+
| A  | 19  | US  |
| A  | 30  | UK  |
| B  | 30  | US  |
| C  | 50  | US  |
| C  | 44  | UK  |
+----+-----+-----+
5 rows selected (0.084 seconds)

0: jdbc:drill:zk=localhost:2181> create table  dfs.myfs.`0/sample1` as select 
columns[0] as a,cast(columns[1] as INTEGER) as b,columns[2] as c from 
dfs.myfs.`0/x.csv`;


0: jdbc:drill:zk=localhost:2181> select * from dfs.myfs.`0/sample1`;
+----+-----+-----+---------------------------------------------+----------------+-------------------------------+----------+
| a  |  b  |  c  |                     fqn                     |    filename    
|           filepath            |  suffix  |
+----+-----+-----+---------------------------------------------+----------------+-------------------------------+----------+
| A  | 19  | US  | /nfsshare/myfs/0/sample1/0_0_0.parquet  | 0_0_0.parquet  | 
/nfsshare/myfs/0/sample1  | parquet  |
| A  | 30  | UK  | /nfsshare/myfs/0/sample1/0_0_0.parquet  | 0_0_0.parquet  | 
/nfsshare/myfs/0/sample1  | parquet  |
| B  | 30  | US  | /nfsshare/myfs/0/sample1/0_0_0.parquet  | 0_0_0.parquet  | 
/nfsshare/myfs/0/sample1  | parquet  |
| C  | 50  | US  | /nfsshare/myfs/0/sample1/0_0_0.parquet  | 0_0_0.parquet  | 
/nfsshare/myfs/0/sample1  | parquet  |
| C  | 44  | UK  | /nfsshare/myfs/0/sample1/0_0_0.parquet  | 0_0_0.parquet  | 
/nfsshare/myfs/0/sample1  | parquet  |
+----+-----+-----+---------------------------------------------+----------------+-------------------------------+----------+
5 rows selected (0.083 seconds)

0: jdbc:drill:zk=localhost:2181> select a ,sum(b) over() as bb from 
dfs.myfs.`0/sample1`;
+----+------+
| a  |  $1  |
+----+------+
| A  | 173  |
| A  | 173  |
| B  | 173  |
| C  | 173  |
| C  | 173  |
+----+------+
5 rows selected (0.076 seconds)

Query Plan

00-00    Screen : rowType = RecordType(ANY a, BIGINT $1): rowcount = 5.0, 
cumulative cost = {10.5 rows, 20.5 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 
88014
00-01      Project(a=[$0], $1=[$2]) : rowType = RecordType(ANY a, BIGINT $1): 
rowcount = 5.0, cumulative cost = {10.0 rows, 20.0 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 88013
00-02        Window(window#0=[window(partition {} order by [] range between 
UNBOUNDED PRECEDING and UNBOUNDED FOLLOWING aggs [COUNT($1)])]) : rowType = 
RecordType(ANY a, ANY b, BIGINT w0$o0): rowcount = 5.0, cumulative cost = {10.0 
rows, 20.0 cpu, 0.0 io, 0.0 network, 0.0 memory}, id = 88012
00-03          Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath 
[path=file:/nfsshare/myfs/0/sample1]], 
selectionRoot=file:/nfsshare/myfs/0/sample1, numFiles=1, 
usedMetadataFile=false, columns=[`a`, `b`]]]) : rowType = RecordType(ANY a, ANY 
b): rowcount = 5.0, cumulative cost = {5.0 rows, 10.0 cpu, 0.0 io, 0.0 network, 
0.0 memory}, id = 88011

Re: Incorrect column name with OVER clause on Drill 1.8

Reply via email to