[GitHub] drill pull request #850: DRILL-5541: C++ Client Crashes During Simple "Man i...

2017-06-05 Thread superbstreak
GitHub user superbstreak opened a pull request:

https://github.com/apache/drill/pull/850

DRILL-5541: C++ Client Crashes During Simple "Man in the Middle" Atta…

…ck Test with Exploitable Write AV

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/superbstreak/drill DRILL-5541

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/850.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #850


commit 716db51df61d0ee47804217a6a133d1d1152b64a
Author: Rob Wu 
Date:   2017-06-05T21:06:33Z

DRILL-5541: C++ Client Crashes During Simple "Man in the Middle" Attack 
Test with Exploitable Write AV




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[jira] [Resolved] (DRILL-5567) Review changes for DRILL 5514

2017-06-05 Thread Karthikeyan Manivannan (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthikeyan Manivannan resolved DRILL-5567.
---
Resolution: Done

> Review changes for DRILL 5514
> -
>
> Key: DRILL-5567
> URL: https://issues.apache.org/jira/browse/DRILL-5567
> Project: Apache Drill
>  Issue Type: Sub-task
>Reporter: Karthikeyan Manivannan
>Assignee: Karthikeyan Manivannan
> Fix For: 1.11.0
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[GitHub] drill pull request #849: DRILL-5568: Include hadoop-common jars inside drill...

2017-06-05 Thread sohami
GitHub user sohami opened a pull request:

https://github.com/apache/drill/pull/849

DRILL-5568: Include hadoop-common jars inside drill-jdbc-all.jar

More details on this PR is in 
[JIRA](https://issues.apache.org/jira/browse/DRILL-5568)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sohami/drill DRILL-5568

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/849.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #849


commit e84ce5bb6317e7a8caa50c7ffc85dfc416616596
Author: Sorabh Hamirwasia 
Date:   2017-06-05T20:45:27Z

DRILL-5568: Include hadoop-common jars inside drill-jdbc-all.jar




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...

2017-06-05 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/837#discussion_r120198724
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType 
t2) {
 return true;
   }
 
+  /**
+   * Merge two schema to produce a new, merged schema. The caller is 
responsible
+   * for ensuring that column names are unique. The order of the fields in 
the
+   * new schema is the same as that of this schema, with the other 
schema's fields
+   * appended in the order defined in the other schema. The resulting 
selection
+   * vector mode is the same as this schema. (That is, this schema is 
assumed to
+   * be the main part of the batch, possibly with a selection vector, with 
the
+   * other schema representing additional, new columns.)
+   * @param otherSchema the schema to merge with this one
+   * @return the new, merged, schema
+   */
+
+  public BatchSchema merge(BatchSchema otherSchema) {
+if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE &&
+selectionVectorMode != otherSchema.selectionVectorMode) {
+  throw new IllegalArgumentException("Left schema must carry the 
selection vector mode");
+}
+List mergedFields = new ArrayList<>();
--- End diff --

List mergedFields = new ArrayList(this.fields.size() +  
otherSchema.fields.size()) would avoid having to potentially grow the ArrayList 
twice.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] drill pull request #837: DRILL-5514: Enhance VectorContainer to merge two ro...

2017-06-05 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/837#discussion_r118797793
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/record/BatchSchema.java ---
@@ -157,4 +158,26 @@ private boolean majorTypeEqual(MajorType t1, MajorType 
t2) {
 return true;
   }
 
+  /**
+   * Merge two schema to produce a new, merged schema. The caller is 
responsible
+   * for ensuring that column names are unique. The order of the fields in 
the
+   * new schema is the same as that of this schema, with the other 
schema's fields
+   * appended in the order defined in the other schema. The resulting 
selection
+   * vector mode is the same as this schema. (That is, this schema is 
assumed to
+   * be the main part of the batch, possibly with a selection vector, with 
the
+   * other schema representing additional, new columns.)
+   * @param otherSchema the schema to merge with this one
+   * @return the new, merged, schema
+   */
+
+  public BatchSchema merge(BatchSchema otherSchema) {
+if (otherSchema.selectionVectorMode != SelectionVectorMode.NONE &&
+selectionVectorMode != otherSchema.selectionVectorMode) {
+  throw new IllegalArgumentException("Left schema must carry the 
selection vector mode");
--- End diff --

"Left schema must carry the same selection vector mode"  + "as the right 
schema"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Thinking about Drill 2.0

2017-06-05 Thread Parth Chandra
Adding to my list of things to consider for Drill 2.0,  I would think that
getting Drill off our forks of Calcite and Parquet should also be a goal,
though a tactical one.



On Mon, Jun 5, 2017 at 1:51 PM, Parth Chandra  wrote:

> Nice suggestion Paul, to start a discussion on 2.0 (it's about time). I
> would like to make this a broader discussion than just APIs, though APIs
> are a good place to start. In particular. we usually get the opportunity to
> break backward compatibility only for a major release and that is the time
> we have to finalize the APIs.
>
> In the broader discussion I feel we also need to consider some other
> aspects -
>   1) Formalize Drill's support for schema free operations.
>   2) Drill's execution engine architecture and it's 'optimistic' use of
> resources.
>
> Re the APIs:
>   One more public API is the UDFs. This and the storage plugin APIs
> together are tied at the hip with vectors and memory management. I'm not
> sure if we can cleanly separate the underlying representation of vectors
> from the interfaces to these APIs, but I agree we need to clarify this
> part. For instance, some of the performance benefits in the Parquet scan
> come from vectorizing writes to the vector especially for null or repeated
> values. We could provide interfaces to provide the same without which the
> scans would have to be vector-internals aware. The same goes for UDFs.
> Assuming that a 2.0 goal would be to provide vectorized interfaces for
> users to write table (or aggregate) UDFs, one now needs a standardized data
> set representation. If you choose this data set representation to be
> columnar (for better vectorization), will you end up with ValueVector/Arrow
> based RecordBatches? I included Arrow in this since the project is
> formalizing exactly this requirement.
>
> For the client APIs, I believe that ODBC and JDBC drivers initially were
> written using record based APIs provided by vendors, but to get better
> performance started to move to working with raw streams coming over the
> wire (eg TDS with Sybase/MS-SQLServer [1] ). So what Drill does is in fact
> similar to that approach. The client APIs are really thin layers on top of
> the vector data stream and provide row based, read only access to the
> vector.
>
> Lest I begin to sound too contrary,  thank you for starting this
> discussion. It is really needed!
>
> Parth
>
>
>
>
>
>
>
> On Mon, Jun 5, 2017 at 11:59 AM, Paul Rogers  wrote:
>
>> Hi All,
>>
>> A while back there was a discussion about the scope of Drill 2.0. Got me
>> thinking about possible topics. My two cents:
>>
>> Drill 2.0 should focus on making Drill’s external APIs production ready.
>> This means five things:
>>
>> * Clearly identify and define each API.
>> * (Re)design each API to ensure it fully isolates the client from Drill
>> internals.
>> * Ensure the API allows full version compatibility: Allow mixing of
>> old/new clients and servers with some limits.
>> * Fully test each API.
>> * Fully document each API.
>>
>> Once client code is isolated from Drill internals, we are free to evolve
>> the internals in either Drill 2.0 or a later release.
>>
>> In my mind, the top APIs to revisit are:
>>
>> * The drill client API.
>> * The storage plugin API.
>>
>> (Explanation below.)
>>
>> What other APIs should we consider? Here are some examples, please
>> suggest items you know about:
>>
>> * Command line scripts and arguments
>> * REST API
>> * Names and contents of system tables
>> * Structure of the storage plugin configuration JSON
>> * Structure of the query profile
>> * Structure of the EXPLAIN PLAN output.
>> * Semantics of Drill functions, such as the date functions recently
>> partially fixed by adding “ANSI” alternatives.
>> * Naming of config and system/session options.
>> * (Your suggestions here…)
>>
>> I’ve taken the liberty of moving some API-breaking tickets in the Apache
>> Drill JIRA to 2.0. Perhaps we can add others so that we have a good
>> inventory of 2.0 candidates.
>>
>> Here are the reasons for my two suggestions.
>>
>> Today, we expose Drill value vectors to the client. This means if we want
>> to enhance anything about Drill’s internal memory format (i.e. value
>> vectors, such as a possible move to Arrow), we break compatibility with old
>> clients. Using value vectors also means we need a very large percentage of
>> Drill’s internal code on the client in Java or C++. We are learning that
>> doing so is a challenge.
>>
>> A new client API should follow established SQL database tradition: a
>> synchronous, row-based API designed for versioning, for forward and
>> backward compatibility, and to support ODBC and JDBC users.
>>
>> We can certainly maintain the existing full, async, heavy-weight client
>> for our tests and for applications that would benefit from it.
>>
>> Once we define a new API, we are free to alter Drill’s value vectors to,
>> say, add the needed null states to fully support 

Re: Thinking about Drill 2.0

2017-06-05 Thread Parth Chandra
Nice suggestion Paul, to start a discussion on 2.0 (it's about time). I
would like to make this a broader discussion than just APIs, though APIs
are a good place to start. In particular. we usually get the opportunity to
break backward compatibility only for a major release and that is the time
we have to finalize the APIs.

In the broader discussion I feel we also need to consider some other
aspects -
  1) Formalize Drill's support for schema free operations.
  2) Drill's execution engine architecture and it's 'optimistic' use of
resources.

Re the APIs:
  One more public API is the UDFs. This and the storage plugin APIs
together are tied at the hip with vectors and memory management. I'm not
sure if we can cleanly separate the underlying representation of vectors
from the interfaces to these APIs, but I agree we need to clarify this
part. For instance, some of the performance benefits in the Parquet scan
come from vectorizing writes to the vector especially for null or repeated
values. We could provide interfaces to provide the same without which the
scans would have to be vector-internals aware. The same goes for UDFs.
Assuming that a 2.0 goal would be to provide vectorized interfaces for
users to write table (or aggregate) UDFs, one now needs a standardized data
set representation. If you choose this data set representation to be
columnar (for better vectorization), will you end up with ValueVector/Arrow
based RecordBatches? I included Arrow in this since the project is
formalizing exactly this requirement.

For the client APIs, I believe that ODBC and JDBC drivers initially were
written using record based APIs provided by vendors, but to get better
performance started to move to working with raw streams coming over the
wire (eg TDS with Sybase/MS-SQLServer [1] ). So what Drill does is in fact
similar to that approach. The client APIs are really thin layers on top of
the vector data stream and provide row based, read only access to the
vector.

Lest I begin to sound too contrary,  thank you for starting this
discussion. It is really needed!

Parth







On Mon, Jun 5, 2017 at 11:59 AM, Paul Rogers  wrote:

> Hi All,
>
> A while back there was a discussion about the scope of Drill 2.0. Got me
> thinking about possible topics. My two cents:
>
> Drill 2.0 should focus on making Drill’s external APIs production ready.
> This means five things:
>
> * Clearly identify and define each API.
> * (Re)design each API to ensure it fully isolates the client from Drill
> internals.
> * Ensure the API allows full version compatibility: Allow mixing of
> old/new clients and servers with some limits.
> * Fully test each API.
> * Fully document each API.
>
> Once client code is isolated from Drill internals, we are free to evolve
> the internals in either Drill 2.0 or a later release.
>
> In my mind, the top APIs to revisit are:
>
> * The drill client API.
> * The storage plugin API.
>
> (Explanation below.)
>
> What other APIs should we consider? Here are some examples, please suggest
> items you know about:
>
> * Command line scripts and arguments
> * REST API
> * Names and contents of system tables
> * Structure of the storage plugin configuration JSON
> * Structure of the query profile
> * Structure of the EXPLAIN PLAN output.
> * Semantics of Drill functions, such as the date functions recently
> partially fixed by adding “ANSI” alternatives.
> * Naming of config and system/session options.
> * (Your suggestions here…)
>
> I’ve taken the liberty of moving some API-breaking tickets in the Apache
> Drill JIRA to 2.0. Perhaps we can add others so that we have a good
> inventory of 2.0 candidates.
>
> Here are the reasons for my two suggestions.
>
> Today, we expose Drill value vectors to the client. This means if we want
> to enhance anything about Drill’s internal memory format (i.e. value
> vectors, such as a possible move to Arrow), we break compatibility with old
> clients. Using value vectors also means we need a very large percentage of
> Drill’s internal code on the client in Java or C++. We are learning that
> doing so is a challenge.
>
> A new client API should follow established SQL database tradition: a
> synchronous, row-based API designed for versioning, for forward and
> backward compatibility, and to support ODBC and JDBC users.
>
> We can certainly maintain the existing full, async, heavy-weight client
> for our tests and for applications that would benefit from it.
>
> Once we define a new API, we are free to alter Drill’s value vectors to,
> say, add the needed null states to fully support JSON, to change offset
> vectors to not need n+1 values (which doubles vector size in 64K batches),
> and so on. Since vectors become private to Drill (or Arrow) after the new
> client API, we are free to innovate to improve them.
>
> Similarly, the storage plugin API exposes details of Calcite (which seems
> to evolve with each new version), exposes value vector implementations, and
> so on. A 

[jira] [Created] (DRILL-5568) Include Hadoop dependency jars inside drill-jdbc-all.jar

2017-06-05 Thread Sorabh Hamirwasia (JIRA)
Sorabh Hamirwasia created DRILL-5568:


 Summary: Include Hadoop dependency jars inside drill-jdbc-all.jar
 Key: DRILL-5568
 URL: https://issues.apache.org/jira/browse/DRILL-5568
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - JDBC
Reporter: Sorabh Hamirwasia
Assignee: Sorabh Hamirwasia


With Sasl support in 1.10 the authentication using username/password was moved 
to Plain Mechanism of Sasl Framework. There are couple of Hadoop classes like 
Configuration.java and UserGroupInformation.java defined in hadoop-common 
package which were used in DrillClient for security mechanisms like 
Plain/Kerberos mechanisms. Due to this we need to add hadoop dependency inside 
_drill-jdbc-all.jar_  Without it the application using this driver will fail to 
connect to Drill with authentication enabled.

Today this jar (which is JDBC driver for Drill) already has lots of other 
dependencies which DrillClient relies on like Netty, etc. But the way we add 
these dependencies are under *oadd* namespace so that the application using 
this driver won't end up in conflict with it's own version of same 
dependencies. As part of this JIRA it will include hadoop-common dependencies 
under same namespace. This will allow an application to connect to Drill using 
this driver with security enabled. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5567) Review changes for DRILL 5514

2017-06-05 Thread Karthikeyan Manivannan (JIRA)
Karthikeyan Manivannan created DRILL-5567:
-

 Summary: Review changes for DRILL 5514
 Key: DRILL-5567
 URL: https://issues.apache.org/jira/browse/DRILL-5567
 Project: Apache Drill
  Issue Type: Sub-task
Reporter: Karthikeyan Manivannan
Assignee: Karthikeyan Manivannan






--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


protobuf version

2017-06-05 Thread Ralph Little

Hi List,
I see that Apache Drill is limited to 2.x series for protobuf.
I cannot find any reference as to why this is.

Could someone explain the dependency restriction?
Did something major change in the 3.x release series for protobuf?

The only reason I ask is that protobuf 3.3 builds much cleaner in VS 
2015 and they have proper CMAKE support.


Cheers,
Ralph


[jira] [Created] (DRILL-5566) AssertionError: Internal error: invariant violated: call to wrong operator

2017-06-05 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5566:
-

 Summary: AssertionError: Internal error: invariant violated: call 
to wrong operator
 Key: DRILL-5566
 URL: https://issues.apache.org/jira/browse/DRILL-5566
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
Reporter: Khurram Faraaz


CHARACTER_LENGTH is a non-reserved keyword as per the SQL specification. It is 
a monadic function that accepts exactly one operand or parameter.

{noformat}
 ::=

  | 
  | 
  | 
  | 
  ...
  ...

 ::=

  | 
 ::=
  { CHAR_LENGTH | CHARACTER_LENGTH }  
  [ USING  ] 
...
...
 ::=
CHARACTERS
  | OCTETS  
{noformat}

Drill reports an assertion error in drillbit.log when character_length function 
is used in a SQL query.
{noformat}
0: jdbc:drill:schema=dfs.tmp> select character_length(cast('hello' as 
varchar(10))) col1 from (values(1));
Error: SYSTEM ERROR: AssertionError: Internal error: invariant violated: call 
to wrong operator


[Error Id: 49198839-5a1b-4786-9257-59739b27d2a8 on centos-01.qa.lab:31010]

  (org.apache.drill.exec.work.foreman.ForemanException) Unexpected exception 
during fragment initialization: Internal error: invariant violated: call to 
wrong operator
org.apache.drill.exec.work.foreman.Foreman.run():297
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745
Caused By (java.lang.AssertionError) Internal error: invariant violated: call 
to wrong operator
org.apache.calcite.util.Util.newInternal():777
org.apache.calcite.util.Util.permAssert():885
org.apache.calcite.sql2rel.ReflectiveConvertletTable$3.convertCall():219
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall():59
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():4148
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():3581
org.apache.calcite.sql.SqlCall.accept():130

org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression():4040
org.apache.calcite.sql2rel.StandardConvertletTable$8.convertCall():185
org.apache.calcite.sql2rel.SqlNodeToRexConverterImpl.convertCall():59
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():4148
org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.visit():3581
org.apache.calcite.sql.SqlCall.accept():130

org.apache.calcite.sql2rel.SqlToRelConverter$Blackboard.convertExpression():4040
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectList():3411
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelectImpl():612
org.apache.calcite.sql2rel.SqlToRelConverter.convertSelect():568
org.apache.calcite.sql2rel.SqlToRelConverter.convertQueryRecursive():2773
org.apache.calcite.sql2rel.SqlToRelConverter.convertQuery():522
org.apache.drill.exec.planner.sql.SqlConverter.toRel():269

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel():623

org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():195
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():164
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():131
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():79
org.apache.drill.exec.work.foreman.Foreman.runSQL():1050
org.apache.drill.exec.work.foreman.Foreman.run():280
java.util.concurrent.ThreadPoolExecutor.runWorker():1145
java.util.concurrent.ThreadPoolExecutor$Worker.run():615
java.lang.Thread.run():745 (state=,code=0)
{noformat}

Calcite supports character_length function
{noformat}
[root@centos-0170 csv]# ./sqlline
sqlline version 1.1.9
sqlline> !connect jdbc:calcite:model=target/test-classes/model.json admin admin
SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder".
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further 
details.
0: jdbc:calcite:model=target/test-classes/mod> select 
character_length(cast('hello' as varchar(10))) col1 from (values(1));
++
|COL1|
++
| 5  |
++
1 row selected (1.379 seconds)
{noformat}

Postgres 9.3 also supports character_length function
{noformat}
postgres=# select character_length(cast('hello' as varchar(10))) col1 from 
(values(1)) foo;
 col1 
--
5
(1 row)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Created] (DRILL-5565) Directory Query fails with Permission denied: access=EXECUTE if dirN name is 'year=2017' or 'month=201704'

2017-06-05 Thread ehur (JIRA)
ehur created DRILL-5565:
---

 Summary: Directory Query fails with Permission denied: 
access=EXECUTE if dirN name is 'year=2017' or 'month=201704'
 Key: DRILL-5565
 URL: https://issues.apache.org/jira/browse/DRILL-5565
 Project: Apache Drill
  Issue Type: Bug
  Components: Functions - Drill, SQL Parser
Affects Versions: 1.6.0
Reporter: ehur


running a query like this works fine, when the name dir0 contains numerics only:
select * from all.my.records
where dir0 >= '20170322'
limit 10;

if the dirN is named according to this convention: year=2017 we get one of the 
following problems:

1. Either "system error permission denied" in:
select * from all.my.records
where dir0 >= 'year=2017'
limit 10;

 SYSTEM ERROR: RemoteException: Permission denied: user=myuser, access=EXECUTE,
inode: 
/user/myuser/all/my/records/year=2017/month=201701/day=20170101/application_1485464650247_1917/part-r-0.gz.parquet":myuser:supergroup:-rw-r--r--

at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkTraverse(DefaultAuthorizationProvider.java:180)
at 
org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkPermission(DefaultAuthorizationProvider.java:137)
at 
org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:138)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkPermission(FSNamesystem.java:6609)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4223)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:894)
at 
org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.getFileInfo(AuthorizationProviderProxyClientProtocol.java:526)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:822)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1060)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2086)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2082)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1671)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2080)

2. OR, if the where clause only specifies numerics in the dirname, it does not 
blow up, but neither does it return the relevant data, since that where clause 
is not the correct path to our data:
select * from all.my.records
where dir0 >= '2017'
limit 10;



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


Thinking about Drill 2.0

2017-06-05 Thread Paul Rogers
Hi All,

A while back there was a discussion about the scope of Drill 2.0. Got me 
thinking about possible topics. My two cents:

Drill 2.0 should focus on making Drill’s external APIs production ready. This 
means five things:

* Clearly identify and define each API.
* (Re)design each API to ensure it fully isolates the client from Drill 
internals.
* Ensure the API allows full version compatibility: Allow mixing of old/new 
clients and servers with some limits.
* Fully test each API.
* Fully document each API.

Once client code is isolated from Drill internals, we are free to evolve the 
internals in either Drill 2.0 or a later release.

In my mind, the top APIs to revisit are:

* The drill client API.
* The storage plugin API.

(Explanation below.)

What other APIs should we consider? Here are some examples, please suggest 
items you know about:

* Command line scripts and arguments
* REST API
* Names and contents of system tables
* Structure of the storage plugin configuration JSON
* Structure of the query profile
* Structure of the EXPLAIN PLAN output.
* Semantics of Drill functions, such as the date functions recently partially 
fixed by adding “ANSI” alternatives.
* Naming of config and system/session options.
* (Your suggestions here…)

I’ve taken the liberty of moving some API-breaking tickets in the Apache Drill 
JIRA to 2.0. Perhaps we can add others so that we have a good inventory of 2.0 
candidates.

Here are the reasons for my two suggestions.

Today, we expose Drill value vectors to the client. This means if we want to 
enhance anything about Drill’s internal memory format (i.e. value vectors, such 
as a possible move to Arrow), we break compatibility with old clients. Using 
value vectors also means we need a very large percentage of Drill’s internal 
code on the client in Java or C++. We are learning that doing so is a challenge.

A new client API should follow established SQL database tradition: a 
synchronous, row-based API designed for versioning, for forward and backward 
compatibility, and to support ODBC and JDBC users.

We can certainly maintain the existing full, async, heavy-weight client for our 
tests and for applications that would benefit from it.

Once we define a new API, we are free to alter Drill’s value vectors to, say, 
add the needed null states to fully support JSON, to change offset vectors to 
not need n+1 values (which doubles vector size in 64K batches), and so on. 
Since vectors become private to Drill (or Arrow) after the new client API, we 
are free to innovate to improve them.

Similarly, the storage plugin API exposes details of Calcite (which seems to 
evolve with each new version), exposes value vector implementations, and so on. 
A cleaner, simpler, more isolated API will allow storage plugins to be built 
faster, but will also isolate them from Drill internals changes. Without 
isolation, each change to Drill internals would require plugin authors to 
update their plugin before Drill can be released.

Thoughts? Suggestions?

Thanks,

- Paul

[jira] [Created] (DRILL-5564) IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: buffer space (16674816) + prealloc space (0) + child space (0) != allocated (16740352)

2017-06-05 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5564:
-

 Summary: IllegalStateException: allocator[op:21:1:5:HashJoinPOP]: 
buffer space (16674816) + prealloc space (0) + child space (0) != allocated 
(16740352)
 Key: DRILL-5564
 URL: https://issues.apache.org/jira/browse/DRILL-5564
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
 Environment: 3 node CentOS cluster
Reporter: Khurram Faraaz


Run a concurrent Java program that executes TPCDS query11
while the above concurrent java program is under execution
stop foreman Drillbit (from another shell, using below command)
./bin/drillbit.sh stop
and you will see the IllegalStateException: allocator[op:21:1:5:HashJoinPOP]:  
and another assertion error, in the drillbit.log
AssertionError: Failure while stopping processing for operator id 10. Currently 
have states of processing:false, setup:false, waiting:true.   

Drill 1.11.0 git commit ID: d11aba2 (with assertions enabled)
 
details from drillbit.log from the foreman Drillbit node.
{noformat}
2017-06-05 18:38:33,838 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
26ca5afa-7f6d-991b-1fdf-6196faddc229:23:1: State change requested RUNNING --> 
FAILED
2017-06-05 18:38:33,849 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] INFO  
o.a.d.e.w.fragment.FragmentExecutor - 
26ca5afa-7f6d-991b-1fdf-6196faddc229:23:1: State change requested FAILED --> 
FINISHED
2017-06-05 18:38:33,852 [26ca5afa-7f6d-991b-1fdf-6196faddc229:frag:23:1] ERROR 
o.a.d.e.w.fragment.FragmentExecutor - SYSTEM ERROR: AssertionError: Failure 
while stopping processing for operator id 10. Currently have states of 
processing:false, setup:false, waiting:true.

Fragment 23:1

[Error Id: a116b326-43ed-4569-a20e-a10ba03d215e on centos-01.qa.lab:31010]
org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: AssertionError: 
Failure while stopping processing for operator id 10. Currently have states of 
processing:false, setup:false, waiting:true.

Fragment 23:1

[Error Id: a116b326-43ed-4569-a20e-a10ba03d215e on centos-01.qa.lab:31010]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:295)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:160)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:264)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_91]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_91]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_91]
Caused by: java.lang.RuntimeException: java.lang.AssertionError: Failure while 
stopping processing for operator id 10. Currently have states of 
processing:false, setup:false, waiting:true.
at 
org.apache.drill.common.DeferredException.addThrowable(DeferredException.java:101)
 ~[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.fail(FragmentExecutor.java:409)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:250)
 [drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
... 4 common frames omitted
Caused by: java.lang.AssertionError: Failure while stopping processing for 
operator id 10. Currently have states of processing:false, setup:false, 
waiting:true.
at 
org.apache.drill.exec.ops.OperatorStats.stopProcessing(OperatorStats.java:167) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.ScanBatch.next(ScanBatch.java:255) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.physical.impl.validate.IteratorValidatorBatchIterator.next(IteratorValidatorBatchIterator.java:215)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:119)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:109)
 ~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 

[jira] [Created] (DRILL-5563) Stop non foreman Drillbit results in IllegalStateException: Allocator[ROOT] closed with outstanding child allocators.

2017-06-05 Thread Khurram Faraaz (JIRA)
Khurram Faraaz created DRILL-5563:
-

 Summary: Stop non foreman Drillbit results in 
IllegalStateException: Allocator[ROOT] closed with outstanding child allocators.
 Key: DRILL-5563
 URL: https://issues.apache.org/jira/browse/DRILL-5563
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Flow
Affects Versions: 1.11.0
 Environment: 3 node CentOS cluster
Reporter: Khurram Faraaz


Stopping the non-foreman Drillbit normally (as shown below) results in 
IllegalStateException: Allocator[ROOT] closed with outstanding child allocators.

/opt/mapr/drill/drill-1.11.0/bin/drillbit.sh stop

Drill 1.11.0 commit ID: d11aba2

Details from drillbit.log
{noformat}
Mon Jun  5 09:29:09 UTC 2017 Terminating drillbit pid 28182
2017-06-05 09:29:09,651 [Drillbit-ShutdownHook#0] INFO  
o.apache.drill.exec.server.Drillbit - Received shutdown request.
2017-06-05 09:29:11,691 [pool-6-thread-1] INFO  
o.a.drill.exec.rpc.user.UserServer - closed eventLoopGroup 
io.netty.channel.nio.NioEventLoopGroup@55511dc2 in 1004 ms
2017-06-05 09:29:11,691 [pool-6-thread-2] INFO  
o.a.drill.exec.rpc.data.DataServer - closed eventLoopGroup 
io.netty.channel.nio.NioEventLoopGroup@4078d750 in 1004 ms
2017-06-05 09:29:11,692 [pool-6-thread-1] INFO  
o.a.drill.exec.service.ServiceEngine - closed userServer in 1005 ms
2017-06-05 09:29:11,692 [pool-6-thread-2] INFO  
o.a.drill.exec.service.ServiceEngine - closed dataPool in 1005 ms
2017-06-05 09:29:11,701 [Drillbit-ShutdownHook#0] INFO  
o.a.drill.exec.compile.CodeCompiler - Stats: code gen count: 21, cache miss 
count: 7, hit rate: 67%
2017-06-05 09:29:11,709 [Drillbit-ShutdownHook#0] ERROR 
o.a.d.exec.server.BootStrapContext - Error while closing
java.lang.IllegalStateException: Allocator[ROOT] closed with outstanding child 
allocators.
Allocator(ROOT) 0/800/201359872/17179869184 (res/actual/peak/limit)
  child allocators: 4
Allocator(frag:3:2) 200/0/0/200 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 0
  reservations: 0
Allocator(frag:4:2) 200/0/0/200 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 0
  reservations: 0
Allocator(frag:1:2) 200/0/0/200 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 0
  reservations: 0
Allocator(frag:2:2) 200/0/0/200 (res/actual/peak/limit)
  child allocators: 0
  ledgers: 0
  reservations: 0
  ledgers: 0
  reservations: 0

at 
org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:492) 
~[drill-memory-base-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.server.BootStrapContext.close(BootStrapContext.java:247) 
~[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:76) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:64) 
[drill-common-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at org.apache.drill.exec.server.Drillbit.close(Drillbit.java:159) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
at 
org.apache.drill.exec.server.Drillbit$ShutdownThread.run(Drillbit.java:253) 
[drill-java-exec-1.11.0-SNAPSHOT.jar:1.11.0-SNAPSHOT]
2017-06-05 09:29:11,709 [Drillbit-ShutdownHook#0] INFO  
o.apache.drill.exec.server.Drillbit - Shutdown completed (2057 ms).
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)