Re: Drill selects column with the same name of a different table

2017-08-21 Thread Divya Gehlot
Hi,
At times if during the join if the columns names are same the drill suffix
zero for second column name .
like in your case it could be name ,name0,

> Can you please do a select * on join tables and check the column headers ?



Thanks,
Divya

On 21 August 2017 at 22:04, Hielke Hoeve  wrote:

> Hi everyone. I just started using Drill/Zeppelin and I’m facing a strange
> problem. I have a PostgreSQL database linked to a Drill instance. Whenever
> I am trying to join 2 tables which both have a column name and whenever I
> want to select this name Drill selects the wrong name column. What am I
> doing wrong?
>
> Given the following 2 tables:
>
> Department
> | id | name |
> ||--|
> | 1  |  A   |
> | 2  |  B   |
>
> Employee
> | id | name | dept | salary |
> ||--|--||
> | 1  |  U   |  1   |  100   |
> | 2  |  V   |  1   |   75   |
> | 3  |  W   |  1   |  120   |
> | 4  |  X   |  2   |   95   |
> | 5  |  Y   |  2   |  140   |
> | 6  |  Z   |  2   |   55   |
>
> Running
>
> select employee.name, employee.salary
> from employee
> inner join department on employee.dept = department.id
> where department.name = 'A'
>
> returns
>
> | name | salary |
> |--||
> |  A   |  100   |
> |  A   |  75|
> |  A   |  120   |
>
> Running
>
> select dept.name, employee.salary
> from employee
> inner join department on employee.dept = department.id
> where department.name = 'A'
>
> returns
>
> | name | salary |
> |--||
> | null |  100   |
> | null |  75|
> | null |  120   |
>
>
> What does work, but seems very silly to me, is:
>
> select dept.name, employee.salary
> from employee
> inner join (select id, name as deptname from department) as department on
> employee.dept = department.id
> where department.deptname = 'A'
>
>
>


Re: Drill selects column with the same name of a different table

2017-08-21 Thread Timothy Farkas
Hi Kunal/Hielke/Zelaine,

I can confirm reproducing a similar case and have described it in 
DRILL-5713 and 
DRILL-4211 and have provided 
code to reproduce. Looks similar to 
DRILL-4374 .The issue turned 
out to be two pronged:


   1. Drill does not handle ambiguous column names correctly in some cases as 
Zelaine said DRILL-5713

   2. The Jdbc storage plugin does not correctly push your select statement 
down to your Jdbc database in some cases. It breaks it up into pieces and 
executes some parts in your Jdbc store and other parts in Drill incorrectly. 
DRILL-4211


- Fixing DRILL-5713 is tough 
and would require major changes to Drill based on my discussions with Paul and 
Jinfeng.
- Fixing DRILL-4211 is easier 
and would be sufficient for resolving your issue.

Current status of the fix is that I tried and failed to fix 
DRILL-4211. I tabled it and 
it's currently prioritized for me to work on a few weeks from now. I'm happy to 
hand it off if someone is available to work on it sooner though :).

Thanks,
Tim


From: Kunal Khatua 
Sent: Monday, August 21, 2017 3:22:29 PM
To: user@drill.apache.org
Subject: RE: Drill selects column with the same name of a different table

Could you share the profile ( *.sys.drill file or the 
http://:8047/profiles/.json ) ?

This might be a bug with the JDBC Storage plugin.

A quick way to validate this would be to have the similar data as 2 
text/parquet tables and have Drill read from that. If we don't see an issue, 
then it is most likely a storage plugin bug, for which you should file a JIRA.



-Original Message-
From: Hielke Hoeve [mailto:hielke.ho...@topicus.nl]
Sent: Monday, August 21, 2017 7:05 AM
To: user@drill.apache.org
Subject: Drill selects column with the same name of a different table

Hi everyone. I just started using Drill/Zeppelin and I’m facing a strange 
problem. I have a PostgreSQL database linked to a Drill instance. Whenever I am 
trying to join 2 tables which both have a column name and whenever I want to 
select this name Drill selects the wrong name column. What am I doing wrong?

Given the following 2 tables:

Department
| id | name |
||--|
| 1  |  A   |
| 2  |  B   |

Employee
| id | name | dept | salary |
||--|--||
| 1  |  U   |  1   |  100   |
| 2  |  V   |  1   |   75   |
| 3  |  W   |  1   |  120   |
| 4  |  X   |  2   |   95   |
| 5  |  Y   |  2   |  140   |
| 6  |  Z   |  2   |   55   |

Running

select employee.name, employee.salary
from employee
inner join department on employee.dept = department.id where department.name = 
'A'

returns

| name | salary |
|--||
|  A   |  100   |
|  A   |  75|
|  A   |  120   |

Running

select dept.name, employee.salary
from employee
inner join department on employee.dept = department.id where department.name = 
'A'

returns

| name | salary |
|--||
| null |  100   |
| null |  75|
| null |  120   |


What does work, but seems very silly to me, is:

select dept.name, employee.salary
from employee
inner join (select id, name as deptname from department) as department on 
employee.dept = department.id where department.deptname = 'A'




Re: Drill selects column with the same name of a different table

2017-08-21 Thread Zelaine Fong
There was a similar (but not identical) issue reported as 
https://issues.apache.org/jira/browse/DRILL-4374.  In that case, an error was 
returned, but the root cause seemed to stem from using the Postgres JDBC 
storage plugin to access tables with the same column name.  From the comment 
thread on the bug, it looks like the bug wasn’t reproducible.  So, if you have 
a consistent repro with schema and data, it would be good to log a new bug with 
that repro info.

-- Zelaine

On 8/21/17, 3:22 PM, "Kunal Khatua"  wrote:

Could you share the profile ( *.sys.drill file or the 
http://:8047/profiles/.json ) ?

This might be a bug with the JDBC Storage plugin. 

A quick way to validate this would be to have the similar data as 2 
text/parquet tables and have Drill read from that. If we don't see an issue, 
then it is most likely a storage plugin bug, for which you should file a JIRA.



-Original Message-
From: Hielke Hoeve [mailto:hielke.ho...@topicus.nl] 
Sent: Monday, August 21, 2017 7:05 AM
To: user@drill.apache.org
Subject: Drill selects column with the same name of a different table 

Hi everyone. I just started using Drill/Zeppelin and I’m facing a strange 
problem. I have a PostgreSQL database linked to a Drill instance. Whenever I am 
trying to join 2 tables which both have a column name and whenever I want to 
select this name Drill selects the wrong name column. What am I doing wrong?

Given the following 2 tables:

Department
| id | name |
||--|
| 1  |  A   |
| 2  |  B   |

Employee
| id | name | dept | salary |
||--|--||
| 1  |  U   |  1   |  100   |
| 2  |  V   |  1   |   75   |
| 3  |  W   |  1   |  120   |
| 4  |  X   |  2   |   95   |
| 5  |  Y   |  2   |  140   |
| 6  |  Z   |  2   |   55   |

Running

select employee.name, employee.salary
from employee
inner join department on employee.dept = department.id where 
department.name = 'A'

returns

| name | salary |
|--||
|  A   |  100   |
|  A   |  75|
|  A   |  120   |

Running

select dept.name, employee.salary
from employee
inner join department on employee.dept = department.id where 
department.name = 'A'

returns

| name | salary |
|--||
| null |  100   |
| null |  75|
| null |  120   |


What does work, but seems very silly to me, is:

select dept.name, employee.salary
from employee
inner join (select id, name as deptname from department) as department on 
employee.dept = department.id where department.deptname = 'A'






RE: Drill selects column with the same name of a different table

2017-08-21 Thread Kunal Khatua
Could you share the profile ( *.sys.drill file or the 
http://:8047/profiles/.json ) ?

This might be a bug with the JDBC Storage plugin. 

A quick way to validate this would be to have the similar data as 2 
text/parquet tables and have Drill read from that. If we don't see an issue, 
then it is most likely a storage plugin bug, for which you should file a JIRA.



-Original Message-
From: Hielke Hoeve [mailto:hielke.ho...@topicus.nl] 
Sent: Monday, August 21, 2017 7:05 AM
To: user@drill.apache.org
Subject: Drill selects column with the same name of a different table 

Hi everyone. I just started using Drill/Zeppelin and I’m facing a strange 
problem. I have a PostgreSQL database linked to a Drill instance. Whenever I am 
trying to join 2 tables which both have a column name and whenever I want to 
select this name Drill selects the wrong name column. What am I doing wrong?

Given the following 2 tables:

Department
| id | name |
||--|
| 1  |  A   |
| 2  |  B   |

Employee
| id | name | dept | salary |
||--|--||
| 1  |  U   |  1   |  100   |
| 2  |  V   |  1   |   75   |
| 3  |  W   |  1   |  120   |
| 4  |  X   |  2   |   95   |
| 5  |  Y   |  2   |  140   |
| 6  |  Z   |  2   |   55   |

Running

select employee.name, employee.salary
from employee
inner join department on employee.dept = department.id where department.name = 
'A'

returns

| name | salary |
|--||
|  A   |  100   |
|  A   |  75|
|  A   |  120   |

Running

select dept.name, employee.salary
from employee
inner join department on employee.dept = department.id where department.name = 
'A'

returns

| name | salary |
|--||
| null |  100   |
| null |  75|
| null |  120   |


What does work, but seems very silly to me, is:

select dept.name, employee.salary
from employee
inner join (select id, name as deptname from department) as department on 
employee.dept = department.id where department.deptname = 'A'




Drill selects column with the same name of a different table

2017-08-21 Thread Hielke Hoeve
Hi everyone. I just started using Drill/Zeppelin and I’m facing a strange 
problem. I have a PostgreSQL database linked to a Drill instance. Whenever I am 
trying to join 2 tables which both have a column name and whenever I want to 
select this name Drill selects the wrong name column. What am I doing wrong?

Given the following 2 tables:

Department
| id | name |
||--|
| 1  |  A   |
| 2  |  B   |

Employee
| id | name | dept | salary |
||--|--||
| 1  |  U   |  1   |  100   |
| 2  |  V   |  1   |   75   |
| 3  |  W   |  1   |  120   |
| 4  |  X   |  2   |   95   |
| 5  |  Y   |  2   |  140   |
| 6  |  Z   |  2   |   55   |

Running

select employee.name, employee.salary
from employee
inner join department on employee.dept = department.id
where department.name = 'A'

returns

| name | salary |
|--||
|  A   |  100   |
|  A   |  75|
|  A   |  120   |

Running

select dept.name, employee.salary
from employee
inner join department on employee.dept = department.id
where department.name = 'A'

returns

| name | salary |
|--||
| null |  100   |
| null |  75|
| null |  120   |


What does work, but seems very silly to me, is:

select dept.name, employee.salary
from employee
inner join (select id, name as deptname from department) as department on 
employee.dept = department.id
where department.deptname = 'A'




Re: Query Optimization

2017-08-21 Thread Padma Penumarthy
That is definitely not the design strategy. Also, I don’t think what you are 
seeing is same
as DRILL-3846.  The difference between with and without metadata caching is a
factor of 2-4 times in DRILL-3846 where as what you see is huge order of 
magnitude
different.

You should file a JIRA and include details that will help us reproduce the 
problem.
Please add as much information as possible.
A sample dataset, how you are creating the table (i.e. partition info), 
logs, query profiles will be very helpful.

Thanks,
Padma


> On Aug 20, 2017, at 7:03 PM, Divya Gehlot  wrote:
> 
> Hi ,
> Yes As Rahul mentioned I am running into a bug
> https://issues.apache.org/jira/browse/DRILL-3846 ?
> 
> As asked the usedMetadataFile is true once I run the Metadata cache query .
> Any tentative or workaorund for the bug?
> 
> Now my ask is if metadata cache is enabled the does Drill reads all the
> files instead of intended ones ?
> Is it Drill design strategy ?
> 
> Thanks,
> divya
> 
> On 18 August 2017 at 12:13, Padma Penumarthy  wrote:
> 
>> It is supposed to work like you expected. May be you are running into a
>> bug.
>> Why is it reading all files after metadata refresh ? That is difficult to
>> answer without
>> looking at the logs and query profile. If you look at the query profile,
>> you can may
>> be check what usedMetadataFile flag says for scan.
>> Also, I am thinking if you created so many files, your metadata
>> cache file could be big. May be you can manually sanity
>> check if it looks ok (look for .drill.parquet.metadata file in the root
>> directory) and not
>> corrupted ?
>> 
>> Thanks,
>> Padma
>> 
>> 
>> On Aug 17, 2017, at 8:10 PM, Khurram Faraaz > a...@mapr.com>> wrote:
>> 
>> Please share your SQL query and the query plan.
>> 
>> To get the query plan, execute EXPLAIN PLAN FOR ;
>> 
>> 
>> Thanks,
>> 
>> Khurram
>> 
>> 
>> From: Divya Gehlot > Sent: Friday, August 18, 2017 7:15:18 AM
>> To: user@drill.apache.org
>> Subject: Re: Query Optimization
>> 
>> Hi ,
>> Yes its the same query its just the ran the metadata refresh command .
>> My understanding is metadata refresh command saves reading the metadata.
>> How about column values ... Why is it reading all the files after metedata
>> refresh ?
>> Partition helps to retrieve data faster .
>> Like in hive how it happens when you mention the partition column in where
>> condition
>> it just goes and read and improves the query performace .
>> In my query also I where conidtion has  partioning column it should go and
>> read those partitioned files right ?
>> Why is it taking more time ?
>> Does the Drill works in different way compare to hive ?
>> 
>> 
>> Thanks,
>> Divya
>> 
>> On 18 August 2017 at 07:37, Padma Penumarthy  ppenumar...@mapr.com>> wrote:
>> 
>> It might read all those files if some new data gets added after running
>> refresh metadata cache.
>> If everything is same before and after metadata refresh i.e. no
>> new data added and query is exactly the same, then it should not do that.
>> Also, check if you can partition in  a way that will not create so many
>> files in the
>> first place.
>> 
>> Thanks,
>> Padma
>> 
>> 
>> On Aug 16, 2017, at 10:54 PM, Divya Gehlot > mailto:divya.htco...@gmail.com>>
>> wrote:
>> 
>> Hi,
>> Another observation is
>> My query had where conditions based on the partition values
>> 
>> Total number of parquet files in directory  - 102290
>> Before Metadata refresh - Its reading only 4 files
>> After metadata refresh - its reading 102290 files
>> 
>> 
>> This is how the refresh metadata works I mean it scans each and every
>> files
>> and get the results ?
>> 
>> I dont  have access to logs now .
>> 
>> Thanks,
>> Divya
>> 
>> On 17 August 2017 at 13:48, Divya Gehlot  divya.htco...@gmail.com>>
>> wrote:
>> 
>> Hi,
>> Another observation is
>> My query had where conditions based on the partition values
>> Before Metadata refresh - Its reading only 4 files
>> After metadata refresh - its reading 102290 files
>> 
>> Thanks,
>> Divya
>> 
>> On 17 August 2017 at 13:03, Padma Penumarthy  ppenumar...@mapr.com>>
>> wrote:
>> 
>> Does your query have partition filter ?
>> Execution time is increased most likely because partition pruning is
>> not
>> happening.
>> Did you get a chance to look at the logs ?  That might give some clues.
>> 
>> Thanks,
>> Padma
>> 
>> 
>> On Aug 16, 2017, at 9:32 PM, Divya Gehlot  divya.htco...@gmail.com>>
>> wrote:
>> 
>> Hi,
>> Even I am surprised .
>> I am running Drill version 1.10  on MapR enterprise version.
>> *Query *- Selecting all the columns on partitioned parquet table
>> 
>> I observed few things from Query statistics :
>> 
>> Value
>> 
>> Before 

[HANGOUT] Suggestions for topics for 08/22

2017-08-21 Thread Aman Sinha
We will have the hangout at 10 AM PST tomorrow 08/22.  Please send in
suggestions for topics or provide it at the beginning of the hangout.

Hangout link:

https://plus.google.com/hangouts/_/event/ci4rdiju8bv04a64efj5fedd0lc

Thanks.


Re: Drills' Hbase storage plugin

2017-08-21 Thread Ascot Moss
Dear Sorabh,

About putting a hbase-site.xml in classpath, can you please explain a bit
more about this point?

Regards

On Mon, Aug 21, 2017 at 3:38 PM, Ascot Moss  wrote:

> Dear Sorabh,
>
> Your reply helps a lot!!
>
> I have copied hbase-site.xml to $DRILL_HOME/conf,
> restarted drillbit,  ran below:
>
> kinit
> sqlline -n "jdbc:drill:drillbit=n1.testserver:31010;auth=kerberos" -n "
> u1@MY_DOMAIN.COM"
>
> use hbase;
> +--+---+
> | ok|   summary|
> +--+---+
> | true  |   Default schema chnaged to [hbase]  |
> +--+---+
>
>
> show tables;
> (no result, took long time)  <=== any idea how to solve it?
>
>
> Regards
>
>
> On Tue, Aug 15, 2017 at 10:14 AM, Sorabh Hamirwasia 
> wrote:
>
>> Hi Ascot,
>>
>> I am not sure if Kerberos between Hbase storage plugin and Hbase server
>> has been tried before or not. Link in [1] talks about Kerberos setup
>> between Drill Client and Drillbit. However looking into the code for Hbase
>> and link at [2], it looks like if you can put a hbase-site.xml in classpath
>> with corresponding setting, then it should work, since internally Drill's
>> hbase storage plugin creates a HBaseConfiguration instance which loads this
>> file in class-path and set corresponding settings for authentication. You
>> need to create TGT for Drill process user (by running kinit, also
>> information provided in link [2]) which will be running as HBase client to
>> the server.
>>
>>
>> HBaseConnectionKey::getHBaseConf() {
>>
>>   return storeConfig.getHBaseConf();
>> }
>>
>>
>> Under HBaseStoragePluginConfig you can find this is how configuration is
>> created:
>>
>>
>> public Configuration getHBaseConf() {
>>   if (hbaseConf == null) {
>> hbaseConf = HBaseConfiguration.create();
>> ...
>>
>> ...
>>
>>   }
>>
>>   ...
>>
>>   ...
>>
>> }
>>
>>
>>
>> [1]: https://drill.apache.org/docs/configuring-kerberos-authentication/<
>> https://drill.apache.org/docs/configuring-kerberos-authentication/><
>> https://drill.apache.org/docs/configuring-kerberos-authentication/>
>> [2]: http://hbase.apache.org/0.94/book/security.html
>>
>>
>> Thanks,
>> Sorabh
>>
>


Re: Drills' Hbase storage plugin

2017-08-21 Thread Ascot Moss
Dear Sorabh,

Your reply helps a lot!!

I have copied hbase-site.xml to $DRILL_HOME/conf,
restarted drillbit,  ran below:

kinit
sqlline -n "jdbc:drill:drillbit=n1.testserver:31010;auth=kerberos" -n "
u1@MY_DOMAIN.COM"

use hbase;
+--+---+
| ok|   summary|
+--+---+
| true  |   Default schema chnaged to [hbase]  |
+--+---+


show tables;
(no result, took long time)  <=== any idea how to solve it?


Regards


On Tue, Aug 15, 2017 at 10:14 AM, Sorabh Hamirwasia 
wrote:

> Hi Ascot,
>
> I am not sure if Kerberos between Hbase storage plugin and Hbase server
> has been tried before or not. Link in [1] talks about Kerberos setup
> between Drill Client and Drillbit. However looking into the code for Hbase
> and link at [2], it looks like if you can put a hbase-site.xml in classpath
> with corresponding setting, then it should work, since internally Drill's
> hbase storage plugin creates a HBaseConfiguration instance which loads this
> file in class-path and set corresponding settings for authentication. You
> need to create TGT for Drill process user (by running kinit, also
> information provided in link [2]) which will be running as HBase client to
> the server.
>
>
> HBaseConnectionKey::getHBaseConf() {
>
>   return storeConfig.getHBaseConf();
> }
>
>
> Under HBaseStoragePluginConfig you can find this is how configuration is
> created:
>
>
> public Configuration getHBaseConf() {
>   if (hbaseConf == null) {
> hbaseConf = HBaseConfiguration.create();
> ...
>
> ...
>
>   }
>
>   ...
>
>   ...
>
> }
>
>
>
> [1]: https://drill.apache.org/docs/configuring-kerberos-authentication/<
> https://drill.apache.org/docs/configuring-kerberos-authentication/> s://drill.apache.org/docs/configuring-kerberos-authentication/>
> [2]: http://hbase.apache.org/0.94/book/security.html
>
>
> Thanks,
> Sorabh
>