Re: Thinking about Drill 2.0

2017-06-12 Thread Paul Rogers
Thanks for the suggestions!

The issue is only partly Calcite changes. The real challenge for potential 
contributors is that the Drill storage plugin exposes Calcite mechanisms 
directly. That is, to write storage plugin, one must know (or, more likely, 
experiment to learn) the odd set of calls made to the storage plugin, for a 
group scan, then a sub scan, then this or that. Then, learning those calls, map 
what you want to do to those calls. In some cases, as Calcite chugs along, it 
calls the same methods multiple times, so the plugin writer has to be prepared 
to implement caching to avoid banging on the underlying system multiple times 
for the same data.

The key opportunity here is to observe that the current API is at the 
implementation level: as callbacks from Calcite. (Though, the Drill “easy” 
storage plugin does hide some of the details.) Instead, we’d like an API at the 
definition level: that the plugin simply declares that, say, it can return a 
schema, or can handle certain kinds of filter push-down, etc.

If we can define that API at the metadata (planning) level, then we can create 
an adapter between that API and Calcite. Doing so makes it much easier to test 
the plugin, and isolates the plugin from future code changes as Calcite evolves 
and improves: the adapter changes but not the plugin metadata API.

As you suggest, the resulting definition API would be handy to share between 
projects.

On the execution side, however, Drill plugins are very specific to Drill’s 
operator framework, Drill’s schema-on-read mechanism, Drill’s special columns 
(file metadata, partitions), Drill’s vector “mutators” and so on. Here, any 
synergy would be with Arrow to define a common “mutator” API so that a “row 
batch reader” written for one system should work with the other.

In any case, this kind of sharing is hard to define up front, we might instead 
keep the discussion going to see what works for Drill, what we can abstract 
out, and how we can make the common abstraction work for other systems beyond 
Drill.

Thanks,

- Paul

> On Jun 9, 2017, at 3:38 PM, Julian Hyde  wrote:
> 
> 
>> On Jun 5, 2017, at 11:59 AM, Paul Rogers  wrote:
>> 
>> Similarly, the storage plugin API exposes details of Calcite (which seems to 
>> evolve with each new version), exposes value vector implementations, and so 
>> on. A cleaner, simpler, more isolated API will allow storage plugins to be 
>> built faster, but will also isolate them from Drill internals changes. 
>> Without isolation, each change to Drill internals would require plugin 
>> authors to update their plugin before Drill can be released.
> 
> Sorry you’re getting burned by Calcite changes. We try to minimize impact, 
> but sometimes it’s difficult to see what you’re breaking.
> 
> I like the goal of a stable storage plugin API. Maybe it’s something Drill 
> and Calcite can collaborate on? Much of the DNA of an adapter is independent 
> of the engine that will consume the data (Drill or otherwise) - it concerns 
> how to create a connection, getting metadata, and pushing down logical 
> operations, and generating queries in the target system’s query language. 
> Calcite and Drill ought to be able to share that part, rather than 
> maintaining separate collections of adapters.
> 
> Julian
> 



[jira] [Created] (DRILL-5581) Query with CASE statement returns wrong results

2017-06-12 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5581:
-

 Summary: Query with CASE statement returns wrong results
 Key: DRILL-5581
 URL: https://issues.apache.org/jira/browse/DRILL-5581
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
Reporter: Khurram Faraaz


A query that uses case statement, returns wrong results.

{noformat}
Apache Drill 1.11.0-SNAPSHOT, commit id: 874bf629

[test@centos-101 ~]# cat order_sample.csv
202634342,2101,20160301

apache drill 1.11.0-SNAPSHOT
"this isn't your grandfather's sql"
0: jdbc:drill:schema=dfs.tmp> ALTER SESSION SET `store.format`='csv';
+---++
|  ok   |summary |
+---++
| true  | store.format updated.  |
+---++
1 row selected (0.245 seconds)
0: jdbc:drill:schema=dfs.tmp> CREATE VIEW  `vw_order_sample_csv` as
. . . . . . . . . . . . . . > SELECT
. . . . . . . . . . . . . . > `columns`[0] AS `ND`,
. . . . . . . . . . . . . . > CAST(`columns`[1] AS BIGINT) AS `col1`,
. . . . . . . . . . . . . . > CAST(`columns`[2] AS BIGINT) AS `col2`
. . . . . . . . . . . . . . > FROM `order_sample.csv`;
+---+--+
|  ok   |   summary|
+---+--+
| true  | View 'vw_order_sample_csv' created successfully in 'dfs.tmp' schema  |
+---+--+
1 row selected (0.253 seconds)
0: jdbc:drill:schema=dfs.tmp> select
. . . . . . . . . . . . . . > case
. . . . . . . . . . . . . . > when col1 > col2 then col1
. . . . . . . . . . . . . . > else col2
. . . . . . . . . . . . . . > end as temp_col,
. . . . . . . . . . . . . . > case
. . . . . . . . . . . . . . > when col1 = 2101 and (20170302 - col2) > 
1 then 'D'
. . . . . . . . . . . . . . > when col2 = 2101 then 'P'
. . . . . . . . . . . . . . > when col1 - col2 > 1 then '0'
. . . . . . . . . . . . . . > else 'A'
. . . . . . . . . . . . . . > end as status
. . . . . . . . . . . . . . > from  `vw_order_sample_csv`;
+---+-+
| temp_col  | status  |
+---+-+
| 20160301  | A   |
+---+-+
1 row selected (0.318 seconds)

0: jdbc:drill:schema=dfs.tmp> explain plan for
. . . . . . . . . . . . . . > select
. . . . . . . . . . . . . . > case
. . . . . . . . . . . . . . > when col1 > col2 then col1
. . . . . . . . . . . . . . > else col2
. . . . . . . . . . . . . . > end as temp_col,
. . . . . . . . . . . . . . > case
. . . . . . . . . . . . . . > when col1 = 2101 and (20170302 - col2) > 
1 then 'D'
. . . . . . . . . . . . . . > when col2 = 2101 then 'P'
. . . . . . . . . . . . . . > when col1 - col2 > 1 then '0'
. . . . . . . . . . . . . . > else 'A'
. . . . . . . . . . . . . . > end as status
. . . . . . . . . . . . . . > from  `vw_order_sample_csv`;
+--+--+
| text | json |
+--+--+
| 00-00Screen
00-01  Project(temp_col=[CASE(>(CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 
2)):BIGINT), CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 2)):BIGINT)], 
status=[CASE(AND(=(CAST(ITEM($0, 1)):BIGINT, 2101), >(-(20170302, 
CAST(ITEM($0, 2)):BIGINT), 1)), 'D', =(CAST(ITEM($0, 2)):BIGINT, 2101), 
'P', >(-(CAST(ITEM($0, 1)):BIGINT, CAST(ITEM($0, 2)):BIGINT), 1), '0', 
'A')])
00-02Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/tmp/order_sample.csv, numFiles=1, columns=[`columns`[1], 
`columns`[2]], files=[maprfs:///tmp/order_sample.csv]]])

// Details of Java compiler from sys.options
0: jdbc:drill:schema=dfs.tmp> select name, status from sys.options where name 
like '%java_compiler%';
++--+
|  name  |  status  |
++--+
| exec.java.compiler.exp_in_method_size  | DEFAULT  |
| exec.java_compiler | DEFAULT  |
| exec.java_compiler_debug   | DEFAULT  |
| exec.java_compiler_janino_maxsize  | DEFAULT  |
++--+
4 rows selected (0.21 seconds)

{noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5580) Casting DateTime types to Numeric types should be supported

2017-06-12 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-5580:
--

 Summary: Casting DateTime types to Numeric types should be 
supported
 Key: DRILL-5580
 URL: https://issues.apache.org/jira/browse/DRILL-5580
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Query Planning & Optimization
Affects Versions: 1.11.0
Reporter: Abhishek Girish
Assignee: Chunhui Shi


Currently we can cast numeric types such as Int / BigInt to Date / Time / 
Timestamp. 

Example
{code}
Int 1 to Date: 1970-01-01
Int 1 to Time: 00:00:00.001
Int 1 to Timestamp: 1970-01-01 00:00:00.001
{code}

Casting Date / Time / Timestamp to Int / BigInt should also be supported, as 
they are internally stored as BigInt. Currently we get a NumberFormatException 
for explicit casts on these types. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5579) Nested cast to boolean fails for integer values greater than 1

2017-06-12 Thread Abhishek Girish (JIRA)
Abhishek Girish created DRILL-5579:
--

 Summary: Nested cast to boolean fails for integer values greater 
than 1
 Key: DRILL-5579
 URL: https://issues.apache.org/jira/browse/DRILL-5579
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Data Types, Query Planning & Optimization
Affects Versions: 1.11.0
Reporter: Abhishek Girish
Assignee: Chunhui Shi


As per Drill's casting rules, integer value 0 when casted to Boolean returns 
false. All other integers when casted to Boolean should return true. 

The following query fails:
{code}
> select cast(cast(345 as int) as boolean) as a from (values(1));
Error: SYSTEM ERROR: IllegalArgumentException: Invalid value for boolean: 345
  (java.lang.IllegalArgumentException) Invalid value for boolean: 345
org.apache.drill.exec.expr.BooleanType.get():74
org.apache.drill.exec.test.generated.ProjectorGen10.doSetup():88
org.apache.drill.exec.test.generated.ProjectorGen10.setup():101

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.setupNewSchema():492
org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():78

org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():135
org.apache.drill.exec.record.AbstractRecordBatch.next():162
org.apache.drill.exec.physical.impl.BaseRootExec.next():104
org.apache.drill.exec.physical.impl.ScreenCreator$ScreenRoot.innerNext():81
org.apache.drill.exec.physical.impl.BaseRootExec.next():94
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():232
org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():226
java.security.AccessController.doPrivileged():-2
javax.security.auth.Subject.doAs():422
org.apache.hadoop.security.UserGroupInformation.doAs():1595
org.apache.drill.exec.work.fragment.FragmentExecutor.run():226
org.apache.drill.common.SelfCleaningRunnable.run():38
java.util.concurrent.ThreadPoolExecutor.runWorker():1142
java.util.concurrent.ThreadPoolExecutor$Worker.run():617
java.lang.Thread.run():745 (state=,code=0)
{code}

The following query works as expected:
{code}
> select cast(t.a as boolean) from (select cast(345 as int) as a from 
> (values(1)) )t;
+-+
| EXPR$0  |
+-+
| true|
+-+
1 row selected (0.244 seconds)
{code}




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


Re: Native C++ Drill client handshake recovery

2017-06-12 Thread Ralph Little

Hi,

Thanks for your response:


> The original caller to DrillClient::connect() thinks everything is
> hunky-dorey.
>

Yes, that would be a problem. From what I remember, the recvHandshake call
blocks in m_ioservice.run. On return from run, the recvHandshake checks if
the error object m_pError is not null. m_pError is not null iff there has
been an error. Do you see this not working correctly?

Ah yes, I see that this code is compiled out by default unless
WIN32_SHUTDOWN_ON_TIMEOUT is defined.
I enabled that and it works as you say.


> Currently, if you attempt a submitQuery() call when the connection is
> down, it just hangs because m_io_service is not running and m_deadlineTimer
> never triggers as a fall back.
>
> Opinions?
>

It is a good idea to check connection status before sending any message to
the server. LMK if you want to submit a patch :), I can review and merge it
in.


I have added something and will send a patch shortly.

As an aside, I'm trying to shore up the resilience of query failures
from the back-end.
If I set a query timeout then pause the HADOOP backend (in a VM) so that
it is unresponsive, the application still hangs.
This seems to be because the query timeout is reset every time a
heartbeat (PONG) is received by the Native Client DLL.
So again we get no application-side timeout.

I still suspect that there may be a number of boundary scenarios that
could cause the Native Client to lock up so I'm looking into a way to
add a "cancel" application API so that the application can timeout
itself and cancel the pending query.

When I'm happy with what we have, I'll submit a patch for your perusal.

Cheers,
Ralph



Re: [ANNOUNCE] New Committer: Charles Givre

2017-06-12 Thread Paul Rogers
Congratulations! Well deserved!

- Paul

> On Jun 12, 2017, at 9:54 AM, Julian Hyde  wrote:
> 
> Congratulations, Charles, and welcome! Thank you, not only for your code 
> contributions, but also for your your work promoting Drill by writing and 
> speaking at conferences. A simple search[1] turns up a lot of material.
> 
> Julian
> 
> [1] https://www.google.com/search?q=charles+givre+apache+drill 
> 
> 
>> On Jun 12, 2017, at 9:47 AM, Parth Chandra  wrote:
>> 
>> The Project Management Committee (PMC) for Apache Drill has invited Charles
>> Givre to become a committer, and we are pleased to announce that he has
>> accepted.
>> 
>> Charles was instrumental in taking the HTTPD format plugin to completion
>> and since then has remained active on the user and dev list. He also
>> started a Drill UDFs list that he maintains here [1].
>> 
>> Welcome Charles, and thank you for your contributions.  Keep up the good
>> work !
>> 
>> - Parth
>> (on behalf of the Apache Drill PMC)
>> 
>> 
>> [1] http://thedataist.com/drill-udfs/
> 



[HANGOUT] Topics for 6/12/17

2017-06-12 Thread Padma Penumarthy

Drill hangout will be tomorrow, 10 AM PST.

In the last hangout, we talked about discussing one of the ongoing Drill 
projects in detail.
Please let me know who wants to volunteer to discuss the topic they are working 
on - 
 memory fragmentation, spill to disk for hash agg, external sort and schema 
change.

Also, please let me know if you have any topics you want to discuss by 
responding to this email. 
We will also ask for topics at the beginning of the hangout.

Thanks,
Padma

Re: [ANNOUNCE] New Committer: Charles Givre

2017-06-12 Thread Julian Hyde
Congratulations, Charles, and welcome! Thank you, not only for your code 
contributions, but also for your your work promoting Drill by writing and 
speaking at conferences. A simple search[1] turns up a lot of material.

Julian

[1] https://www.google.com/search?q=charles+givre+apache+drill 


> On Jun 12, 2017, at 9:47 AM, Parth Chandra  wrote:
> 
> The Project Management Committee (PMC) for Apache Drill has invited Charles
> Givre to become a committer, and we are pleased to announce that he has
> accepted.
> 
> Charles was instrumental in taking the HTTPD format plugin to completion
> and since then has remained active on the user and dev list. He also
> started a Drill UDFs list that he maintains here [1].
> 
> Welcome Charles, and thank you for your contributions.  Keep up the good
> work !
> 
> - Parth
> (on behalf of the Apache Drill PMC)
> 
> 
> [1] http://thedataist.com/drill-udfs/



Re: [ANNOUNCE] New Committer: Charles Givre

2017-06-12 Thread Arina Yelchiyeva
Congrats, Charles!

On Mon, Jun 12, 2017 at 7:47 PM, Parth Chandra  wrote:

> The Project Management Committee (PMC) for Apache Drill has invited Charles
> Givre to become a committer, and we are pleased to announce that he has
> accepted.
>
> Charles was instrumental in taking the HTTPD format plugin to completion
> and since then has remained active on the user and dev list. He also
> started a Drill UDFs list that he maintains here [1].
>
> Welcome Charles, and thank you for your contributions.  Keep up the good
> work !
>
> - Parth
> (on behalf of the Apache Drill PMC)
>
>
> [1] http://thedataist.com/drill-udfs/
>


[ANNOUNCE] New Committer: Charles Givre

2017-06-12 Thread Parth Chandra
The Project Management Committee (PMC) for Apache Drill has invited Charles
Givre to become a committer, and we are pleased to announce that he has
accepted.

Charles was instrumental in taking the HTTPD format plugin to completion
and since then has remained active on the user and dev list. He also
started a Drill UDFs list that he maintains here [1].

Welcome Charles, and thank you for your contributions.  Keep up the good
work !

- Parth
(on behalf of the Apache Drill PMC)


[1] http://thedataist.com/drill-udfs/


[jira] [Resolved] (DRILL-5541) C++ Client Crashes During Simple "Man in the Middle" Attack Test with Exploitable Write AV

2017-06-12 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra resolved DRILL-5541.
--
Resolution: Fixed

> C++ Client Crashes During Simple "Man in the Middle" Attack Test with 
> Exploitable Write AV
> --
>
> Key: DRILL-5541
> URL: https://issues.apache.org/jira/browse/DRILL-5541
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Client - C++
>Affects Versions: 1.10.0
>Reporter: Rob Wu
>Priority: Minor
>  Labels: ready-to-commit
>
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060
> Exploitability Classification: EXPLOITABLE
> Recommended Bug Title: Exploitable - User Mode Write AV starting at 
> drillClient!boost_sb::shared_ptr::reset+0x00a7
>  (Hash=0x4ae7fdff.0xb15af658)
> User mode write access violations that are not near NULL are exploitable.
> ==
> Stack Trace:
> Child-SP  RetAddr   Call Site
> `030df630 07fe`c295bca1 
> drillClient!boost_sb::shared_ptr::reset+0xa7
>  
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\smart_ptr\shared_ptr.hpp
>  @ 620]
> `030df680 07fe`c295433c 
> drillClient!Drill::DrillClientImpl::processSchemasResult+0x281 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1227]
> `030df7a0 07fe`c294cbf6 
> drillClient!Drill::DrillClientImpl::handleRead+0x75c 
> [c:\users\bamboo\desktop\make_win_drill\drill-1.10.0.1\drill-1.10.0.1\contrib\native\client\src\clientlib\drillclientimpl.cpp
>  @ 1555]
> `030df9c0 07fe`c294ce9f 
> drillClient!boost_sb::asio::detail::win_iocp_socket_recv_op  
> >,boost_sb::asio::mutable_buffers_1,boost_sb::asio::detail::transfer_all_t,boost_sb::_bi::bind_t  char * __ptr64,boost_sb::system::error_code const & __ptr64,unsigned 
> __int64>,boost_sb::_bi::list4,boost_sb::_bi::value __ptr64>,boost_sb::arg<1>,boost_sb::arg<2> > > > >::do_complete+0x166 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\win_iocp_socket_recv_op.hpp
>  @ 97]
> `030dfa90 07fe`c296009d 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::do_one+0x27f 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 406]
> `030dfb70 07fe`c295ffc9 
> drillClient!boost_sb::asio::detail::win_iocp_io_service::run+0xad 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\detail\impl\win_iocp_io_service.ipp
>  @ 164]
> `030dfbd0 07fe`c2aa5b53 
> drillClient!boost_sb::asio::io_service::run+0x29 
> [c:\users\bamboo\desktop\make_win_drill\sb_boost\include\boost-1_57\boost\asio\impl\io_service.ipp
>  @ 60]
> `030dfc10 07fe`c2ad3e03 drillClient!boost_sb::`anonymous 
> namespace'::thread_start_function+0x43
> `030dfc50 07fe`c2ad404e drillClient!_callthreadstartex+0x17 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 376]
> `030dfc80 `779e59cd drillClient!_threadstartex+0x102 
> [f:\dd\vctools\crt\crtw32\startup\threadex.c @ 354]
> `030dfcb0 `77c1a561 kernel32!BaseThreadInitThunk+0xd
> `030dfce0 ` ntdll!RtlUserThreadStart+0x1d
> ==
> Register:
> rax=0284bae0 rbx=07fec2b3de70 rcx=027ec210
> rdx=027ec210 rsi=027f2638 rdi=027f25d0
> rip=07fec292f827 rsp=030df630 rbp=027ec210
>  r8=027ec210  r9= r10=027d32fc
> r11=27eb001b0003 r12= r13=028035a0
> r14=027ec210 r15=
> iopl=0 nv up ei pl nz na pe nc
> cs=0033  ss=002b  ds=002b  es=002b  fs=0053  gs=002b efl=00010200
> drillClient!boost_sb::shared_ptr::reset+0xa7:
> 07fe`c292f827 f0ff4b08lock dec dword ptr [rbx+8] 
> ds:07fe`c2b3de78=c29e6060



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-5545) Add findbugs to build

2017-06-12 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra resolved DRILL-5545.
--
Resolution: Fixed

> Add findbugs to build 
> --
>
> Key: DRILL-5545
> URL: https://issues.apache.org/jira/browse/DRILL-5545
> Project: Apache Drill
>  Issue Type: Improvement
>Reporter: Parth Chandra
>  Labels: ready-to-commit
>
> We should allow the manual invocation of findbugs on the code base so that 
> developers can check and make sure they are not introducing hard to find 
> bugs. Findbugs can take a long time and a lot of memory so the invocation 
> should be manual so as not to slow the build down.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (DRILL-5560) Create configuration file for distribution specific configuration

2017-06-12 Thread Parth Chandra (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Parth Chandra resolved DRILL-5560.
--
Resolution: Fixed

> Create configuration file for distribution specific configuration
> -
>
> Key: DRILL-5560
> URL: https://issues.apache.org/jira/browse/DRILL-5560
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.10.0
>Reporter: Padma Penumarthy
>Assignee: Padma Penumarthy
>  Labels: ready-to-commit
> Fix For: 1.11.0
>
>
> Create a configuration file for distribution specific settings 
> "drill-distrib.conf". 
> This will be used to add distribution specific configuration. 
> The order in which configuration gets loaded and overriden is 
> "drill-default.conf", per module configuration files "drill-module.conf", 
> "drill-distrib.conf" and "drill-override.conf".



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)