[GitHub] drill issue #935: DRILL-5766: Fix XSS vulnerabilities in Drill

2017-09-08 Thread parthchandra
Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/935
  
+1. Looks good.


---


Re: Checkstyle Unused Imports

2017-09-08 Thread Paul Rogers
Hi Vlad,

Java has a wide variety of warnings available; each project decides which to 
ignore, which are warnings and which are errors. It may be that Eclipse, by 
default, has resource warnings turned on. The quick & dirty solution is simply 
to turn off warnings for AutoCloseables and missing @Overrides. This is, as 
they say, “crude but effective."

It seems that the Drill community stand on imports is not to change them. 
Eclipse has an “organize imports” feature. I have to be careful when removing 
unused imports not to invoke this feature as it changes import order and often 
cause reviews to complain about unnecessary code changes.

Would be good if we could 1) agree on a standard and 2) make sure that both 
Eclipse and IntelliJ can automatically organize imports to follow the standard. 
But, I personally don’t worry about imports because Eclipse takes care of it 
for me.

For me, the bigger concern is about code style. Operators are implemented as 
huge, complex, deeply nested methods with many local variables (such as flags) 
set one place and used elsewhere — all with no comments. Would seem like a good 
idea to adopt best practices and require human-digestible method sizes with 
good Javadoc comments. To my mind, that will contribute more to the project 
than import order.

Oh, and the other item that needs addressing is a requirement to create true 
unit tests (not just system tests coded with JUnit.) Good unit test will 
increase our code quality immensely, and will simplify the task for code 
reviews. So, I’d want to push that ahead before worrying about imports.

Just my two cents…

Thanks,

- Paul

> On Sep 8, 2017, at 6:58 PM, Vlad Rozov  wrote:
> 
> Paul, is AutoCloseable warning specific to Eclipse? I don't remember seeing 
> the same warning in IntelliJ or during compilation.
> 
> I know that some communities are significantly more strict regarding code 
> style and enforce not only unused imports, but also order of imports and 
> placement of static imports. What is the Drill community stand on those items?
> 
> Thank you,
> 
> Vlad
> 
> On 9/8/17 18:04, Paul Rogers wrote:
>> I clean up the imports as I find them, but it would be nice to do them all 
>> at once to avoid the constant drip-drip-drop of warnings.
>> 
>> The key problem is the generated code: the templates can’t really tell which 
>> imports are used where. So, we’d need to exclude generated code directories 
>> from the check style rules.
>> 
>> Drill also has thousands of omitted “@Override” annotations and heavy abuse 
>> of AutoCloseable (which triggers warnings when used outside of 
>> try-with-resources).
>> 
>> At present, Eclipse complains about 17,883 warnings in Drill code.
>> 
>> - Paul
>> 
>>> On Sep 8, 2017, at 4:43 PM, Timothy Farkas  wrote:
>>> 
>>> Hi All,
>>> 
>>> I've noticed that a lot of files have unused imports, and I frequently 
>>> accidentally leave unused imports behind when I do refactoring. So I'd like 
>>> to enable checkstyle to check for unused imports.
>>> 
>>> Thanks,
>>> Tim
> 



[GitHub] drill pull request #938: DRILL-5694: Handle HashAgg OOM by spill and retry, ...

2017-09-08 Thread Ben-Zvi
GitHub user Ben-Zvi opened a pull request:

https://github.com/apache/drill/pull/938

DRILL-5694: Handle HashAgg OOM by spill and retry, plus perf improvement

  The main change in this PR is adding a "_second way_" to handle memory 
pressure for the Hash Aggregate: Basically catch OOM failures when processing a 
new input row (during put() into the Hash Table), cleanup internally to allow a 
retry (of the put()) and return a new exception "**RetryAfterSpillException**". 
In such a case the caller spills some partition to free more memory, and 
retries inserting that new row.
   In addition, to reduce the risk of OOM when either creating the "Values 
Batch" (to match the "Keys Batch" in the Hash Table), or when allocating the 
Outgoing vectors (for the Values) -- there are new "_reserves_" -- one reserve 
for each of the two. A "_reserve_" is a memory amount subtracted from the 
memory-limit, which is added back to the limit just before it is needed, so 
hopefully preventing an OOM. After the allocation the code tries to restore 
that reserve (by subtracting from the limit, if possible). We always restore 
the "Outgoing Reserve" first; in case the "Values Batch" reserve runs empty 
just before calling put(), we skip the put() (just like an OOM there) and spill 
to free some memory (and restore that reserve).
   The old "_first way_" is still used. That is the code that predicts the 
memory needs, and triggers a spill if not enough memory is available. The spill 
code was separated into a new method called spillIfNeeded() which is used in 
two modes - either the old way (prediction), or (when called from the new OOM 
catch code) with a flag to force a spill, regardless of available memory. That 
flag is also used to reduce the priority of the "current partition" when 
choosing a partition to spill.

  A new testing option was added (**hashagg_use_memory_prediction**, 
default true) - by setting this to false the old "first way" is disabled. This 
allows stress testing of the OOM handling code (which may not be used under 
normal memory allocation).

  The HashTable put() code was re-written to cleanup partial changes in 
case of an OOM. And so the code around the call of put() to catch the new 
exception, spill and retry. Note that this works for 1st phase aggregation as 
well (return rows early).

For the estimates (in addition to the old "max batch size" estimate) - 
there is an estimate for the Values Batch, and one for for the Outgoing. These 
are used for restoring the "reserves". These estimates may be resized up in 
case actual allocations are bigger.

Other changes:
* Improved the "max batch size estimation" -- using the outgoing batch for 
getting the correct schema (instead of the input batch).
  The only information needed from the input batch is the "max average 
column size" (see change inRecordBatchSizer.java) to have a better estimate for 
VarChars.
  Also computed the size of those "no null" bigint columns added into the 
Values Batch when the aggregation is SUM, MIN or MAX (see changes in 
HashAggBatch.java and HashAggregator.java)
* Using a "plain Java" subclass for the HashTable  because "byte 
manipulation" breaks on the new template code (see ChainedHashTable.java)
* The three Configuration options where changed into System/Session 
options:   min_batches_per_partition , hashagg_max_memory , 
hashagg_num_partitions
* There was a potential memory leak in the HashTable BatchHolder ctor 
(vectors were added to the container only after the successful allocation, and 
the container was cleared in case of OOM. So in case of a partial allocation, 
the allocated part was no accessible). Also (Paul's suggestion) modified some 
vector templates to cleanup after any runtime error (including an OOM).
* Performance improvements: Eliminated the call to updateBatches() before 
each hash computation (instead used only when switching to a new 
SpilledRecordBatch); this was a big overhead.
   Also changed all the "setSafe" calls into "set" for the HashTable (those 
nanoseconds add up, specially when rehashing) - these bigint vectors need no 
resizing.
* Ignore "(spill) file not found" error while cleaning up.
* The unit tests were re-written in a more compact form. And a test with 
the new option (forcing the OOM code) was added (no memory prediction).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ben-Zvi/drill DRILL-5694

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/938.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #938


commit 1a96bb39faf01b7665bd669d88494789693ed9b8
Author: Ben-Zvi 
Date:   2017-09-08T22:52:57Z

DRILL-5694: Handle OOM in HashAggr by spill 

Re: Checkstyle Unused Imports

2017-09-08 Thread Vlad Rozov
Paul, is AutoCloseable warning specific to Eclipse? I don't remember 
seeing the same warning in IntelliJ or during compilation.


I know that some communities are significantly more strict regarding 
code style and enforce not only unused imports, but also order of 
imports and placement of static imports. What is the Drill community 
stand on those items?


Thank you,

Vlad

On 9/8/17 18:04, Paul Rogers wrote:

I clean up the imports as I find them, but it would be nice to do them all at 
once to avoid the constant drip-drip-drop of warnings.

The key problem is the generated code: the templates can’t really tell which 
imports are used where. So, we’d need to exclude generated code directories 
from the check style rules.

Drill also has thousands of omitted “@Override” annotations and heavy abuse of 
AutoCloseable (which triggers warnings when used outside of try-with-resources).

At present, Eclipse complains about 17,883 warnings in Drill code.

- Paul


On Sep 8, 2017, at 4:43 PM, Timothy Farkas  wrote:

Hi All,

I've noticed that a lot of files have unused imports, and I frequently 
accidentally leave unused imports behind when I do refactoring. So I'd like to 
enable checkstyle to check for unused imports.

Thanks,
Tim




[GitHub] drill issue #923: DRILL-5723: Added System Internal Options That can be Modi...

2017-09-08 Thread ilooner
Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/923
  
@paul-rogers Finished applying comments, and cleanup. It's ready for review 
again now.


---


Re: Checkstyle Unused Imports

2017-09-08 Thread Paul Rogers
I clean up the imports as I find them, but it would be nice to do them all at 
once to avoid the constant drip-drip-drop of warnings.

The key problem is the generated code: the templates can’t really tell which 
imports are used where. So, we’d need to exclude generated code directories 
from the check style rules.

Drill also has thousands of omitted “@Override” annotations and heavy abuse of 
AutoCloseable (which triggers warnings when used outside of try-with-resources).

At present, Eclipse complains about 17,883 warnings in Drill code.

- Paul

> On Sep 8, 2017, at 4:43 PM, Timothy Farkas  wrote:
> 
> Hi All,
> 
> I've noticed that a lot of files have unused imports, and I frequently 
> accidentally leave unused imports behind when I do refactoring. So I'd like 
> to enable checkstyle to check for unused imports.
> 
> Thanks,
> Tim



[jira] [Created] (DRILL-5778) Drill seems to run out of memory but completes execution

2017-09-08 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5778:
-

 Summary: Drill seems to run out of memory but completes execution
 Key: DRILL-5778
 URL: https://issues.apache.org/jira/browse/DRILL-5778
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
select count(*) from (select * from (select id, flatten(str_list) str from 
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
d.str) d1 where d1.id=0;
{noformat}

Plan is:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
00-03  UnionExchange
01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
01-02  Project($f0=[0])
01-03SelectionVectorRemover
01-04  Filter(condition=[=($0, 0)])
01-05SingleMergeExchange(sort0=[1 ASC])
02-01  SelectionVectorRemover
02-02Sort(sort0=[$1], dir0=[ASC])
02-03  Project(id=[$0], str=[$1])
02-04HashToRandomExchange(dist0=[[$1]])
03-01  UnorderedMuxExchange
04-01Project(id=[$0], str=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
04-02  Flatten(flattenField=[$1])
04-03Project(id=[$0], str=[$1])
04-04  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json,
 numFiles=1, columns=[`id`, `str_list`], 
files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]])
{noformat}

>From drillbit.log:
{noformat}
2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
  str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, data 
size: 548360)
  id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data 
size: 36864)
  Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: 
262163, Net row width: 143, Density: 1}
2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] ERROR 
o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. 
Incoming batch size: 1073819648, available memory: 2147483648
2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] INFO  
o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug option: 
true
2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=3.3 KiB):

...

2017-09-08 05:07:21,536 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.exec.compile.ClassTransformer - Compiled and merged 
SingleBatchSorterGen2677: bytecode size = 3.6 KiB, time = 19 ms.
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 5608 us to sort 4096 records
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 143 
bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048476 bytes, 
gross = 1572714 bytes, records = 7332; spill file = 268435456 bytes
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 9371505 bytes, 
gross = 14057257 bytes, records = 65535
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer memory 
= 2143289744, merge memory = 2128740638
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 4303 us to sort 4096 records
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 266 
bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048572 bytes, 
gross = 1572858 bytes, 

Checkstyle Unused Imports

2017-09-08 Thread Timothy Farkas
Hi All,

I've noticed that a lot of files have unused imports, and I frequently 
accidentally leave unused imports behind when I do refactoring. So I'd like to 
enable checkstyle to check for unused imports.

Thanks,
Tim


OWASP dependency check maven plugin (was: Does Drill Use Apache Struts)

2017-09-08 Thread Vlad Rozov
What the community thinks about implementing OWASP dependency check 
maven plugin that will be disabled by default and enabled in Travis-ci 
and/or Jenkins builds? It has an ability to fail build depending on the 
level of CVE present in the project dependencies.


The topic is probably more suitable for the dev@drill list, so moved it 
there.


Thank you,

Vlad

On 9/8/17 09:27, Bob Rudis wrote:

(This is primarily for John, but may be of use to a broader set of folks)

OWASP's straightforward-yet-uncreatively-named "DependencyCheck" tool
 may be worth looking
into. I haven't had to run it in a while (thankfully I work in R most
of the time now ;-) but it should help diagnose project dependencies
that have vulnerabilities. It takes a wee-bit to get it up and running
(not much, tho) but once you do it shld be able to churn out anything
that's remotely bad dep-wise.

There are likely some OWASPians who wld be willing to help get run on
Drill source, too.

On Fri, Sep 8, 2017 at 11:49 AM, John Omernik  wrote:

That's a great idea Bob.

The difficult thing is a review may find what's vulnerable and known about
at the time of a the assessment, but when new vulnerabilities are released
especially in libraries that may or may not be known to be a part of core
projects, it can be harder to see the impact of those vulnerabilities.  I
will keep checking the poms of things I use (thanks Bob for the pointer
there, I am not a Java person, but it's seems reasonable to use that as the
starting point).  Also, it's good to raise awareness on all of these points
in general so I always appreciate lively discussions :)



On Fri, Sep 8, 2017 at 10:42 AM, Bob Rudis  wrote:


I personally haven't had the cycles to do a thorough appsec review of
the main web interface, the REST interface, access controls or
encryption tools, but I also only run Drill on private AWS instances
or on personal servers / systems, so it hasn't been a huge priority
for me.

I would encourage the Drill team to apply for a CII grant
. CII has funded security audits
of OpenSSL and other OSS software and I believe Drill would be a great
candidate, especially since it's designed to provide access to diverse
data stores (i.e. breach Drill and you get to everything behind it).

MapR or Dremio could likely help speed up said grant application since
they are commercial entities with ties to the OSS side of Drill.

On Fri, Sep 8, 2017 at 11:28 AM, Saurabh Mahapatra
 wrote:

Thanks John, all. I think this discussion thread is important. As a

community member, I learn so much by reading these threads.

Since you work in cyber security research, are there specific things we

should think about from a security standpoint for Drill?

I know that we have a REST API and I am sure there are web apps being

built around it. Are there vulnerabilities that we need to be aware of? How
can we advise users about this?

Thoughts?

Best,
Saurabh

Sent from my iPhone




On Sep 8, 2017, at 7:41 AM, John Omernik  wrote:

Also, thank you for the pointer to the pom.xml


On Fri, Sep 8, 2017 at 9:41 AM, John Omernik  wrote:

So, I thought I was clear that it was unverified, but I also I am in

cyber

security research, and this is what is being discussed in closed

circles. I

agree, it may not be just struts, it's not spreading rumors to say,

this

struts vulnerability is serious, and it's something that should be
considered in a massive breech like this. Also, as with most security
incidents, it is likely only a part of the story. It could be SQLi and

it

could be Struts and it could be both or neither. To imply it was

unrelated

SQLi is just as presumptuous as saying it was struts. Some folks are
talking about attackers using Struts to get to a zone where SQLi was
possible.  I will be clear(er): I have not verified that Equifax is

wholly

struts, or even related to Struts, but my fear right now is focused on

open

source projects that may use Struts and I think this is legitimate.

Putting

it into context, I want to learn more how to ensure vulnerabilities in

one

project/library are handled from a cascading point of view.

John


On Fri, Sep 8, 2017 at 9:15 AM, Bob Rudis  wrote:

Equifax was likely unrelated SQL injection. Don't spread rumors.

Struts had yet-another-remote exploit (three of 'em, actually).

I do this for a living (cybersecurity research).

Drill is not impacted which can be verified by looking at dependencies
in https://github.com/apache/drill/blob/master/pom.xml


On Fri, Sep 8, 2017 at 10:12 AM, John Omernik 

wrote:

Rumors are pointing to it being related to the Equifax breech (no
confirmation from me on that, just seeing it referenced as a

possibility)

http://thehackernews.com/2017/09/apache-struts-vulnerability.html




On Fri, Sep 8, 

[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

2017-09-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r137852118
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Handles the details of the result set loader implementation.
+ * 
+ * The primary purpose of this loader, and the most complex to understand 
and
+ * maintain, is overflow handling.
+ *
+ * Detailed Use Cases
+ *
+ * Let's examine it by considering a number of
+ * use cases.
+ * 
+ * 
Rowabcdefgh
+ * 
n-2XX--
+ * n-1  
--
+ * n  X!O O 
O 
+ * 
+ * Here:
+ * 
+ * n-2, n-1, and n are rows. n is the overflow row.
+ * X indicates a value was written before overflow.
+ * Blank indicates no value was written in that row.
+ * ! indicates the value that triggered overflow.
+ * - indicates a column that did not exist prior to overflow.
--- End diff --

What does an 'O' value mean in the diagram above?


---


[GitHub] drill pull request #914: DRILL-5657: Size-aware vector writer structure

2017-09-08 Thread bitblender
Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r137851895
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Handles the details of the result set loader implementation.
+ * 
+ * The primary purpose of this loader, and the most complex to understand 
and
+ * maintain, is overflow handling.
+ *
+ * Detailed Use Cases
+ *
+ * Let's examine it by considering a number of
+ * use cases.
+ * 
+ * 
Rowabcdefgh
+ * 
n-2XX--
+ * n-1  
--
+ * n  X!O O 
O 
+ * 
+ * Here:
+ * 
+ * n-2, n-1, and n are rows. n is the overflow row.
+ * X indicates a value was written before overflow.
+ * Blank indicates no value was written in that row.
+ * ! indicates the value that triggered overflow.
+ * - indicates a column that did not exist prior to overflow.
+ * 
+ * Column a is written before overflow occurs, b causes overflow, and all 
other
+ * columns either are not written, or written after overflow.
+ * 
+ * The scenarios, identified by column names above, are:
+ * 
+ * a
+ * a contains values for all three rows.
+ * 
+ * Two values were written in the "main" batch, while a third was 
written to
+ * what becomes the overflow row.
+ * When overflow occurs, the last write position is at n. It must be 
moved
+ * back to n-1.
+ * Since data was written to the overflow row, it is copied to the 
look-
+ * ahead batch.
+ * The last write position in the lookahead batch is 0 (since data was
+ * copied into the 0th row.
+ * When harvesting, no empty-filling is needed.
+ * When starting the next batch, the last write position must be set 
to 0 to
+ * reflect the presence of the value for row n.
+ * 
+ * 
+ * b
+ * b contains values for all three rows. The value for row n triggers
+ * overflow.
+ * 
+ * The last write position is at n-1, which is kept for the "main"
+ * vector.
+ * A new overflow vector is created and starts empty, with the last 
write
+ * position at -1.
+ * Once created, b is immediately written to the overflow vector, 
advancing
+ * the last write position to 0.
+ * Harvesting, and starting the next for column b works the same as 
column
+ * a.
+ * 
+ * 
+ * c
+ * Column c has values for all rows.
+ * 
+ * The value for row n is written after overflow.
+ * At overflow, the last write position is at n-1.
+ * At overflow, a new lookahead vector is created with the last write
+ * position at -1.
+ * The value of c is written to the lookahead vector, advancing the 
last
+ * write position to -1.
+ * Harvesting, and starting the next for column c works the same as 
column
+ * a.
+ * 
+ * 
+ * d
+ * Column d writes values to the last two rows before overflow, but 
not to
+ * the overflow row.
+ * 
+ * The last write position for the main batch is at n-1.
+ * The last write position in the lookahead batch remains at -1.
+ * Harvesting for column d requires filling an empty value for row 
n-1.
+ * When starting the next batch, the last write position must be set 
to -1,
+ * indicating no data yet written.
+ * 
+ * 
+ * f
+ * Column f has no data in the last position of the main batch, and no 
data
+ * in the overflow row.
+ * 
+ * The last write position is at n-2.
+ * An empty value must be written into position n-1 during 
harvest.
+ * On start of the next batch, the last write position starts at 
-1.
+ * 
+ * 
+ * g
+ * Column g is added after overflow, and has a value written to the 
overflow
+ * row.
+ * 
+ * On harvest, column g is simply skipped.
+ * On start of the next row, the last write position can be left 
unchanged
+ * since no "exchange" was done.
+ * 
+ * 
+ * 

[jira] [Created] (DRILL-5777) Oracle JDBC Error while access synonym

2017-09-08 Thread Sudhir Kumar (JIRA)
Sudhir Kumar created DRILL-5777:
---

 Summary: Oracle JDBC Error while access synonym
 Key: DRILL-5777
 URL: https://issues.apache.org/jira/browse/DRILL-5777
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - Java
Affects Versions: 1.10.0
Reporter: Sudhir Kumar


Error while accessing individual column in oracle table accessing via synonym.
Query : select  from 
..

Error:
1
2017-09-08 10:13:46,451 [264d3035-1605-9f5b-084f-09a1b525ef75:foreman] INFO  
o.a.d.exec.planner.sql.SqlConverter - User Error Occurred: From line 1, column 
8 to line 1, column 17: Column  not found in any table (From line 
1, column 8 to line 1, column 17: Column  not found in any table)
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: From line 
1, column 8 to line 1, column 17: Column  not found in any table

SQL Query null

[Error Id: 2b7c7c2d-664e-4c90-ba20-67509de90f09 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:178) 
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:622)
 [drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:192)
 [drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
 [drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131)
 [drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79)
 [drill-java-exec-1.10.0.jar:1.10.0]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050) 
[drill-java-exec-1.10.0.jar:1.10.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
[drill-java-exec-1.10.0.jar:1.10.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_92]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, 
column 8 to line 1, column 17: Column 'TABLE_NAME' not found in any table
at sun.reflect.GeneratedConstructorAccessor67.newInstance(Unknown 
Source) ~[na:na]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[na:1.8.0_92]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[na:1.8.0_92]
at 
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:765) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:753) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:3974)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.EmptyScope.findQualifyingTableName(EmptyScope.java:108)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.DelegatingScope.findQualifyingTableName(DelegatingScope.java:112)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.ListScope.findQualifyingTableName(ListScope.java:150)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:154)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:4460)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:4440)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:274) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.expand(SqlValidatorImpl.java:4148)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:420)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 

Querying PostgreSQL database.

2017-09-08 Thread alex den
Hi all,

I am fairly new to Apache Drill. I need to query my PostgreSQL database
using Apache Drill.
I went through the documentation  and could
configure a storage plugin for my database.

{

  "type": "jdbc",

  "driver": "org.postgresql.Driver",

  "url": "jdbc:postgresql://localhost/mydb",

  "username": ,

  "password": ,

  "enabled": true

}
I can successfully query tables with primary datatypes but unable to query
the ones that have composite datatype.
I have tried querying it using various ways, example this

.

This is the exception i get:

Caused by: java.lang.NullPointerException: null
at
org.apache.calcite.sql2rel.RelStructuredTypeFlattener.restructureFields(RelStructuredTypeFlattener.java:201)
~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at
org.apache.calcite.sql2rel.RelStructuredTypeFlattener.restructure(RelStructuredTypeFlattener.java:225)
~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at
org.apache.calcite.sql2rel.RelStructuredTypeFlattener.restructureFields(RelStructuredTypeFlattener.java:205)
~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at
org.apache.calcite.sql2rel.RelStructuredTypeFlattener.rewrite(RelStructuredTypeFlattener.java:184)
~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at
org.apache.calcite.sql2rel.SqlToRelConverter.flattenTypes(SqlToRelConverter.java:435)
~[calcite-core-1.4.0-drill-r21.jar:1.4.0-drill-r21]
at
org.apache.drill.exec.planner.sql.SqlConverter.toRel(SqlConverter.java:270)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.convertToRel(DefaultSqlHandler.java:638)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:196)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:165)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131)
~[drill-java-exec-1.11.0.jar:1.11.0]
at
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79)
~[drill-java-exec-1.11.0.jar:1.11.0]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050)
[drill-java-exec-1.11.0.jar:1.11.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:280)
[drill-java-exec-1.11.0.jar:1.11.0]

Please let me know if am missing something.

Thanks,
Alex