[jira] [Commented] (DRILL-4641) Support for lzo compression

2017-09-08 Thread Evgeniy Kazakov (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159759#comment-16159759
 ] 

Evgeniy Kazakov commented on DRILL-4641:


Hello, I am struggling same issue with drill and lzo in Windows. Where I should 
put  lzo-hadoop-1.0.5.jar and lzo-core-1.0.5.jar to make drill use them?

> Support for lzo compression
> ---
>
> Key: DRILL-4641
> URL: https://issues.apache.org/jira/browse/DRILL-4641
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Other
>Affects Versions: Future
> Environment: Not specific to platform
>Reporter: subbu srinivasan
>
> Would love support for quering lzo compressed files.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159757#comment-16159757
 ] 

Paul Rogers commented on DRILL-5670:


The OOM occurs during the merge phase:

{code}
Completed spill: memory = 0
Starting merge phase. Runs = 62, Alloc. memory = 0
Read 100 records in 73169 us; size = 8480768, memory = 8481024
...
Read 100 records in 81261 us; size = 8480768, memory = 525807872
{code}

Here the "Read 100 records" indicates the sort is loading the first batch of 
each of 62 spill files. We see that the first spilled batch was 4,736,187 when 
written (previous comment), but requires 8,481,024 bytes when read. This is 
larger than the estimate of 7,215,150 that the calcs estimated. The average 
load size is 8,480,772 bytes. The 1,265,622 byte delta per batch adds up to a 
78,468,572 byte error over the 62 batches.

Then, when the code tries to allocate the output batch of 348 records, 16739148 
bytes, memory is exhausted. (The limit is 530,579,456 bytes.)

A quick fix is to use the "max buffer size" in computing the number of batches 
that can be merged. (The max buffer size is computed as twice the data size, 
which assumes a 50% internal fragmentation of the memory buffer.)

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.044 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 482344960;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.372 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | planner.width.max_per_node updated.  |
> +---+--+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159638#comment-16159638
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 4:53 AM:


Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

>From the file summary:

{code}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{code}

>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This 
>number has no internal fragmentation and so should match our "net" record size 
>estimate. Our net estimate is 48,101, so we're pretty close. The error should 
>be explained, but our estimate is conservative, and so is safe for memory 
>calcs.


was (Author: paul.rogers):
Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

>From the file summary:

{code}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{code}

>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This 
>number has no internal fragmentation and so should match our "net" record size 
>estimate. Our net estimate is 48,101, so we're pretty close. The error should 
>be explained, but our estimate is conservative, and so is safe for memory 
>calcs.

The question is, how does the 4.6 MB per batch balloon to 16 MB or more on read?

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> 

[jira] [Commented] (DRILL-5766) Stored XSS in APACHE DRILL

2017-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159730#comment-16159730
 ] 

ASF GitHub Bot commented on DRILL-5766:
---

Github user parthchandra commented on the issue:

https://github.com/apache/drill/pull/935
  
+1. Looks good.


> Stored XSS in APACHE DRILL
> --
>
> Key: DRILL-5766
> URL: https://issues.apache.org/jira/browse/DRILL-5766
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0
> Environment: Apache drill installed in debian system
>Reporter: Sanjog Panda
>Assignee: Arina Ielchiieva
>Priority: Critical
>  Labels: cross-site-scripting, security, security-issue, xss
> Fix For: 1.12.0
>
> Attachments: XSS - Sink.png, XSS - Source.png
>
>
> Hello Apache security team,
> I have been testing an application which internally uses the Apache drill 
> software v 1.6 as of now.
> I found XSS on profile page (sink) where in the user's malicious input comes 
> from the Query page (source) where you run a query. 
> Affected URL : https://localhost:8047/profiles 
> Once the user give the below payload and load the profile page, it gets 
> triggered and is stored.
> I have attached the screenshot of payload 
> alert(document.cookie).
> *[screenshot link]
> *
> https://drive.google.com/file/d/0B8giJ3591fvUbm5JZWtjUTg3WmEwYmJQeWd6dURuV0gzOVd3/view?usp=sharing
> https://drive.google.com/file/d/0B8giJ3591fvUV2lJRzZWOWRGNzN5S0JzdVlXSG1iNnVwRlAw/view?usp=sharing
>  



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Updated] (DRILL-5694) hash agg spill to disk, second phase OOM

2017-09-08 Thread Boaz Ben-Zvi (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Boaz Ben-Zvi updated DRILL-5694:

Reviewer: Paul Rogers

> hash agg spill to disk, second phase OOM
> 
>
> Key: DRILL-5694
> URL: https://issues.apache.org/jira/browse/DRILL-5694
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Functions - Drill
>Affects Versions: 1.11.0
>Reporter: Chun Chang
>Assignee: Boaz Ben-Zvi
>
> | 1.11.0-SNAPSHOT  | d622f76ee6336d97c9189fc589befa7b0f4189d6  | DRILL-5165: 
> For limit all case, no need to push down limit to scan  | 21.07.2017 @ 
> 10:36:29 PDT
> Second phase agg ran out of memory. Not suppose to. Test data currently only 
> accessible locally.
> /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q
> Query:
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from parquet_500m_v1 group by row_count order by row_count 
> limit 30
> Failed with exception
> java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory 
> while executing the query.
> HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: 
> 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: 
> 536870912 so far allocated: 534773760.
> Fragment 1:6
> [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010]
>   (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 
> OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned 
> batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far 
> allocated: 534773760.
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.record.AbstractRecordBatch.next():119
> org.apache.drill.exec.record.AbstractRecordBatch.next():109
> org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51
> 
> org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93
> org.apache.drill.exec.record.AbstractRecordBatch.next():162
> org.apache.drill.exec.physical.impl.BaseRootExec.next():105
> 
> org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92
> org.apache.drill.exec.physical.impl.BaseRootExec.next():95
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234
> org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227
> java.security.AccessController.doPrivileged():-2
> javax.security.auth.Subject.doAs():415
> org.apache.hadoop.security.UserGroupInformation.doAs():1595
> org.apache.drill.exec.work.fragment.FragmentExecutor.run():227
> org.apache.drill.common.SelfCleaningRunnable.run():38
> java.util.concurrent.ThreadPoolExecutor.runWorker():1145
> java.util.concurrent.ThreadPoolExecutor$Worker.run():615
> java.lang.Thread.run():745
>   Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Unable to 
> allocate buffer of size 4194304 due to memory limit. Current allocation: 
> 534773760
> org.apache.drill.exec.memory.BaseAllocator.buffer():238
> org.apache.drill.exec.memory.BaseAllocator.buffer():213
> org.apache.drill.exec.vector.IntVector.allocateBytes():231
> org.apache.drill.exec.vector.IntVector.allocateNew():211
> 
> org.apache.drill.exec.test.generated.HashTableGen2141.allocMetadataVector():778
> 
> org.apache.drill.exec.test.generated.HashTableGen2141.resizeAndRehashIfNeeded():717
> org.apache.drill.exec.test.generated.HashTableGen2141.insertEntry():643
> org.apache.drill.exec.test.generated.HashTableGen2141.put():618
> 
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1173
> org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539
> org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168
> 

[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM

2017-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159676#comment-16159676
 ] 

ASF GitHub Bot commented on DRILL-5694:
---

GitHub user Ben-Zvi opened a pull request:

https://github.com/apache/drill/pull/938

DRILL-5694: Handle HashAgg OOM by spill and retry, plus perf improvement

  The main change in this PR is adding a "_second way_" to handle memory 
pressure for the Hash Aggregate: Basically catch OOM failures when processing a 
new input row (during put() into the Hash Table), cleanup internally to allow a 
retry (of the put()) and return a new exception "**RetryAfterSpillException**". 
In such a case the caller spills some partition to free more memory, and 
retries inserting that new row.
   In addition, to reduce the risk of OOM when either creating the "Values 
Batch" (to match the "Keys Batch" in the Hash Table), or when allocating the 
Outgoing vectors (for the Values) -- there are new "_reserves_" -- one reserve 
for each of the two. A "_reserve_" is a memory amount subtracted from the 
memory-limit, which is added back to the limit just before it is needed, so 
hopefully preventing an OOM. After the allocation the code tries to restore 
that reserve (by subtracting from the limit, if possible). We always restore 
the "Outgoing Reserve" first; in case the "Values Batch" reserve runs empty 
just before calling put(), we skip the put() (just like an OOM there) and spill 
to free some memory (and restore that reserve).
   The old "_first way_" is still used. That is the code that predicts the 
memory needs, and triggers a spill if not enough memory is available. The spill 
code was separated into a new method called spillIfNeeded() which is used in 
two modes - either the old way (prediction), or (when called from the new OOM 
catch code) with a flag to force a spill, regardless of available memory. That 
flag is also used to reduce the priority of the "current partition" when 
choosing a partition to spill.

  A new testing option was added (**hashagg_use_memory_prediction**, 
default true) - by setting this to false the old "first way" is disabled. This 
allows stress testing of the OOM handling code (which may not be used under 
normal memory allocation).

  The HashTable put() code was re-written to cleanup partial changes in 
case of an OOM. And so the code around the call of put() to catch the new 
exception, spill and retry. Note that this works for 1st phase aggregation as 
well (return rows early).

For the estimates (in addition to the old "max batch size" estimate) - 
there is an estimate for the Values Batch, and one for for the Outgoing. These 
are used for restoring the "reserves". These estimates may be resized up in 
case actual allocations are bigger.

Other changes:
* Improved the "max batch size estimation" -- using the outgoing batch for 
getting the correct schema (instead of the input batch).
  The only information needed from the input batch is the "max average 
column size" (see change inRecordBatchSizer.java) to have a better estimate for 
VarChars.
  Also computed the size of those "no null" bigint columns added into the 
Values Batch when the aggregation is SUM, MIN or MAX (see changes in 
HashAggBatch.java and HashAggregator.java)
* Using a "plain Java" subclass for the HashTable  because "byte 
manipulation" breaks on the new template code (see ChainedHashTable.java)
* The three Configuration options where changed into System/Session 
options:   min_batches_per_partition , hashagg_max_memory , 
hashagg_num_partitions
* There was a potential memory leak in the HashTable BatchHolder ctor 
(vectors were added to the container only after the successful allocation, and 
the container was cleared in case of OOM. So in case of a partial allocation, 
the allocated part was no accessible). Also (Paul's suggestion) modified some 
vector templates to cleanup after any runtime error (including an OOM).
* Performance improvements: Eliminated the call to updateBatches() before 
each hash computation (instead used only when switching to a new 
SpilledRecordBatch); this was a big overhead.
   Also changed all the "setSafe" calls into "set" for the HashTable (those 
nanoseconds add up, specially when rehashing) - these bigint vectors need no 
resizing.
* Ignore "(spill) file not found" error while cleaning up.
* The unit tests were re-written in a more compact form. And a test with 
the new option (forcing the OOM code) was added (no memory prediction).


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/Ben-Zvi/drill DRILL-5694

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/drill/pull/938.patch

To close this pull request, make a commit to your master/trunk branch
with (at 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159638#comment-16159638
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:36 AM:


Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

>From the file summary:

{code}
Summary: Wrote 246281737 bytes to ...
Spilled 52 output batches, each of 10566656 bytes, 100 records
{code}

>From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This 
>number has no internal fragmentation and so should match our "net" record size 
>estimate. Our net estimate is 48,101, so we're pretty close. The error should 
>be explained, but our estimate is conservative, and so is safe for memory 
>calcs.

The question is, how does the 4.6 MB per batch balloon to 16 MB or more on read?


was (Author: paul.rogers):
Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.044 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 482344960;
> +---++
> |  ok   |  summary   |
> 

[jira] [Commented] (DRILL-5723) Support System/Session Internal Options And Additional Option System Fixes

2017-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159639#comment-16159639
 ] 

ASF GitHub Bot commented on DRILL-5723:
---

Github user ilooner commented on the issue:

https://github.com/apache/drill/pull/923
  
@paul-rogers Finished applying comments, and cleanup. It's ready for review 
again now.


> Support System/Session Internal Options And Additional Option System Fixes
> --
>
> Key: DRILL-5723
> URL: https://issues.apache.org/jira/browse/DRILL-5723
> Project: Apache Drill
>  Issue Type: New Feature
>Reporter: Timothy Farkas
>Assignee: Timothy Farkas
>
> This is a feature proposed by [~ben-zvi].
> Currently all the options are accessible by the user in sys.options. We would 
> like to add internal options which can be altered, but are not visible in the 
> sys.options table. These internal options could be seen by another alias 
> select * from internal.options. The intention would be to put new options we 
> weren't comfortable with exposing to the end user in this table.
> After the options and their corresponding features are considered stable they 
> could be changed to appear in the sys.option table.
> A bunch of other fixes to the Option system have been clubbed into this:
> * OptionValidators no longer hold default values. Default values are 
> contained in the SystemOptionManager
> * Options have an OptionDefinition. The option definition includes:
>   * A validator
>   * Metadata about the options visibility, required permissions, and the 
> scope in which it can be set.
> * The Option Manager interface has been cleaned up so that a Type is not 
> required to be passed in in order to set and delete options



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159638#comment-16159638
 ] 

Paul Rogers commented on DRILL-5670:


Further investigation. When spilling, we get these log entries:

{code}
Initial output batch allocation: 10566656 bytes, 100 records
Took 52893 us to merge 100 records, consuming 10566656 bytes of memory
{code}

The above shows that we are spilling the expected 100 records. The initial 
allocation is good; we didn't resize vectors as we wrote. However, each batch 
consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The 
gross memory used is 105,667 bytes per row, larger than the 84,413 expected.

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, 
> drillbit.log, drillbit.log, drillbit.out, drill-override.conf
>
>
> I am running this test on a private branch of [paul's 
> repository|https://github.com/paul-rogers/drill]. Below is the commit info
> {code}
> git.commit.id.abbrev=d86e16c
> git.commit.user.email=prog...@maprtech.com
> git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an 
> improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the 
> merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- 
> DRILL-5522\: OOM during the merge and spill process of the managed external 
> sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of 
> external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable 
> vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to 
> initialize the offset vector\n\nAll of the bugs have to do with handling 
> low-memory conditions, and with\ncorrectly estimating the sizes of vectors, 
> even when those vectors come\nfrom the spill file or from an exchange. Hence, 
> the changes for all of\nthe above issues are interrelated.\n
> git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an 
> improvements
> git.commit.user.name=Paul Rogers
> git.build.user.name=Rahul Challapalli
> git.commit.id.describe=0.9.0-1078-gd86e16c
> git.build.user.email=challapallira...@gmail.com
> git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659
> git.commit.time=05.07.2017 @ 20\:34\:39 PDT
> git.build.time=12.07.2017 @ 14\:27\:03 PDT
> git.remote.origin.url=g...@github.com\:paul-rogers/drill.git
> {code}
> Below query fails with an Assertion Error
> {code}
> 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET 
> `exec.sort.disable_managed` = false;
> +---+-+
> |  ok   |   summary   |
> +---+-+
> | true  | exec.sort.disable_managed updated.  |
> +---+-+
> 1 row selected (1.044 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.memory.max_query_memory_per_node` = 482344960;
> +---++
> |  ok   |  summary   |
> +---++
> | true  | planner.memory.max_query_memory_per_node updated.  |
> +---++
> 1 row selected (0.372 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_node` = 1;
> +---+--+
> |  ok   |   summary|
> +---+--+
> | true  | planner.width.max_per_node updated.  |
> +---+--+
> 1 row selected (0.292 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> alter session set 
> `planner.width.max_per_query` = 1;
> +---+---+
> |  ok   |summary|
> +---+---+
> | true  | planner.width.max_per_query updated.  |
> +---+---+
> 1 row selected (0.25 seconds)
> 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from 
> dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by 
> 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:18 AM:


On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

Note that actual density is 56%. There seems to be an overflow issue that gives 
a number of only 8%.

The above calcs are identical to that seen on the QA cluster.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] -> [8192]
{code}

This code seems to come from the text reader which relies on default vector 
allocation:

{code}
  @Override
  public void allocate(Map vectorMap) throws 
OutOfMemoryException {
for (final ValueVector v : vectorMap.values()) {
  v.allocateNew();
}
  }
{code}

The above does not say the number of values to allocate. There seems to be some 
attempt to allocate the same size as the previous allocation, but this does not 
explain the offset vector behavior. Also, if the algorithm is, indeed, 
"allocate the last amount", then the algorithm creates a ratchet effect: the 
vector will only grow in size over batches, retaining the largest size yet 
seen. Spikes in data can result in wasted space (low density batches.) This is 
something to investigate elsewhere.

Further, the logic does not seem to work with the offset vector, causing 
reallocations. Something else to investigate.

The run takes about an hour. Good news, I can reproduce the problem on the Mac:

{code}
Starting merge phase. Runs = 62, Alloc. memory = 0
End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}


was (Author: paul.rogers):
On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

Note that actual density is 56%. There seems to be an overflow issue that gives 
a number of only 8%.

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:13 AM:


On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

Note that actual density is 56%. There seems to be an overflow issue that gives 
a number of only 8%.

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] -> [8192]
{code}

This code seems to come from the text reader which relies on default vector 
allocation:

{code}
  @Override
  public void allocate(Map vectorMap) throws 
OutOfMemoryException {
for (final ValueVector v : vectorMap.values()) {
  v.allocateNew();
}
  }
{code}

The above does not say the number of values to allocate. There seems to be some 
attempt to allocate the same size as the previous allocation, but this does not 
explain the offset vector behavior. Also, if the algorithm is, indeed, 
"allocate the last amount", then the algorithm creates a ratchet effect: the 
vector will only grow in size over batches, retaining the largest size yet 
seen. Spikes in data can result in wasted space (low density batches.) This is 
something to investigate elsewhere.

Further, the logic does not seem to work with the offset vector, causing 
reallocations. Something else to investigate.

The run takes about an hour. Good news, I can reproduce the problem on the Mac:

{code}
Starting merge phase. Runs = 62, Alloc. memory = 0
End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}


was (Author: paul.rogers):
On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 

[jira] [Updated] (DRILL-5778) Drill seems to run out of memory but completes execution

2017-09-08 Thread Robert Hou (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Hou updated DRILL-5778:
--
Attachment: 264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0.sys.drill
drillbit.log

> Drill seems to run out of memory but completes execution
> 
>
> Key: DRILL-5778
> URL: https://issues.apache.org/jira/browse/DRILL-5778
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Robert Hou
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0.sys.drill, 
> drillbit.log
>
>
> Query is:
> {noformat}
> ALTER SESSION SET `exec.sort.disable_managed` = false;
> alter session set `planner.width.max_per_node` = 1;
> alter session set `planner.disable_exchanges` = true;
> alter session set `planner.width.max_per_query` = 1;
> alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
> select count(*) from (select * from (select id, flatten(str_list) str from 
> dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
> d.str) d1 where d1.id=0;
> {noformat}
> Plan is:
> {noformat}
> | 00-00Screen
> 00-01  Project(EXPR$0=[$0])
> 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
> 00-03  UnionExchange
> 01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
> 01-02  Project($f0=[0])
> 01-03SelectionVectorRemover
> 01-04  Filter(condition=[=($0, 0)])
> 01-05SingleMergeExchange(sort0=[1 ASC])
> 02-01  SelectionVectorRemover
> 02-02Sort(sort0=[$1], dir0=[ASC])
> 02-03  Project(id=[$0], str=[$1])
> 02-04HashToRandomExchange(dist0=[[$1]])
> 03-01  UnorderedMuxExchange
> 04-01Project(id=[$0], str=[$1], 
> E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
> 04-02  Flatten(flattenField=[$1])
> 04-03Project(id=[$0], str=[$1])
> 04-04  Scan(groupscan=[EasyGroupScan 
> [selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json,
>  numFiles=1, columns=[`id`, `str_list`], 
> files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]])
> {noformat}
> From drillbit.log:
> {noformat}
> 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
>   str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, 
> data size: 548360)
>   id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data 
> size: 36864)
>   Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: 
> 262163, Net row width: 143, Density: 1}
> 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] ERROR 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. 
> Incoming batch size: 1073819648, available memory: 2147483648
> 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] INFO  
> o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug 
> option: true
> 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=3.3 KiB):
> ...
> 2017-09-08 05:07:21,536 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.exec.compile.ClassTransformer - Compiled and merged 
> SingleBatchSorterGen2677: bytecode size = 3.6 KiB, time = 19 ms.
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 5608 us to sort 4096 records
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 143 
> bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048476 bytes, 
> gross = 1572714 bytes, records = 7332; spill file = 268435456 bytes
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 9371505 bytes, 
> gross = 14057257 bytes, records = 65535
> 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
> o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer 
> memory = 2143289744, merge memory = 2128740638
> 2017-09-08 05:07:21,571 

[jira] [Created] (DRILL-5778) Drill seems to run out of memory but completes execution

2017-09-08 Thread Robert Hou (JIRA)
Robert Hou created DRILL-5778:
-

 Summary: Drill seems to run out of memory but completes execution
 Key: DRILL-5778
 URL: https://issues.apache.org/jira/browse/DRILL-5778
 Project: Apache Drill
  Issue Type: Bug
  Components: Execution - Relational Operators
Affects Versions: 1.11.0
Reporter: Robert Hou
Assignee: Paul Rogers
 Fix For: 1.12.0


Query is:
{noformat}
ALTER SESSION SET `exec.sort.disable_managed` = false;
alter session set `planner.width.max_per_node` = 1;
alter session set `planner.disable_exchanges` = true;
alter session set `planner.width.max_per_query` = 1;
alter session set `planner.memory.max_query_memory_per_node` = 2147483648;
select count(*) from (select * from (select id, flatten(str_list) str from 
dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by 
d.str) d1 where d1.id=0;
{noformat}

Plan is:
{noformat}
| 00-00Screen
00-01  Project(EXPR$0=[$0])
00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)])
00-03  UnionExchange
01-01StreamAgg(group=[{}], EXPR$0=[COUNT()])
01-02  Project($f0=[0])
01-03SelectionVectorRemover
01-04  Filter(condition=[=($0, 0)])
01-05SingleMergeExchange(sort0=[1 ASC])
02-01  SelectionVectorRemover
02-02Sort(sort0=[$1], dir0=[ASC])
02-03  Project(id=[$0], str=[$1])
02-04HashToRandomExchange(dist0=[[$1]])
03-01  UnorderedMuxExchange
04-01Project(id=[$0], str=[$1], 
E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)])
04-02  Flatten(flattenField=[$1])
04-03Project(id=[$0], str=[$1])
04-04  Scan(groupscan=[EasyGroupScan 
[selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json,
 numFiles=1, columns=[`id`, `str_list`], 
files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]])
{noformat}

>From drillbit.log:
{noformat}
2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes {
  str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, data 
size: 548360)
  id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data 
size: 36864)
  Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: 
262163, Net row width: 143, Density: 1}
2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] ERROR 
o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. 
Incoming batch size: 1073819648, available memory: 2147483648
2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] INFO  
o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug option: 
true
2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=3.3 KiB):

...

2017-09-08 05:07:21,536 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.exec.compile.ClassTransformer - Compiled and merged 
SingleBatchSorterGen2677: bytecode size = 3.6 KiB, time = 19 ms.
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 5608 us to sort 4096 records
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 143 
bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048476 bytes, 
gross = 1572714 bytes, records = 7332; spill file = 268435456 bytes
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 9371505 bytes, 
gross = 14057257 bytes, records = 65535
2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer memory 
= 2143289744, merge memory = 2128740638
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 4303 us to sort 4096 records
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 266 
bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096
2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG 
o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048572 bytes, 
gross = 1572858 bytes, 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157917#comment-16157917
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/8/17 11:26 PM:
-

Analysis:

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216,
mSort batch size = 65535
Memory config: Allocator limit = 482344960
Actual batch schema & sizes {
...
  Records: 1023, Total size: 51667936, Data size: 49207323, Gross row width: 
50507, Net row width: 48101, Density: 13}
Input Batch Estimates: record size = 48101 bytes; net = 51667936 bytes, gross = 
77501904, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
...
Completed load phase: read 978 batches, spilled 194 times, total input bytes: 
49162397056
Starting consolidate phase. Batches = 978, Records = 100, Memory = 
378422096, In-memory batches 8, spilled runs 194
Starting merge phase. Runs = 62, Alloc. memory = 0
...
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}

Here, batch size is 1023, which means that some operator must have resized 
batches after the scanner read them (the scanner reads batches of 8K rows.) 
However, this sort saw all 1 million records.

Let's do the math. The sort is trying to merge 62 runs of 7,215,150 bytes per 
batch or 447,339,300 bytes total, producing an output batch of size 1,6739,148, 
for a total memory usage of 464,078,448 bytes. The sort has 482,344,960 bytes 
total. So the math works out.

The error message does not say which batch causes the OOM. At the time of OOM, 
the total memory is 525,809,920. So, something is off.

The vector being allocated is 16 MB, but we expected each batch to be ~7MB in 
size. Something does not balance.

The spill batch size is:

{code}
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100;
spill file = 268435456 bytes
{code}

That is, 100 records needs 7 MB, assuming 50% average internal fragmentation. 
But, somehow, we actually try to allocate 16 MB for 100 records, which requires 
160,000 bytes per record, almost four time our expected data size. Clearly, 
something is off.


was (Author: paul.rogers):
Analysis:

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216,
mSort batch size = 65535
Memory config: Allocator limit = 482344960
Actual batch schema & sizes {
...
  Records: 1023, Total size: 51667936, Data size: 49207323, Gross row width: 
50507, Net row width: 48101, Density: 13}
Input Batch Estimates: record size = 48101 bytes; net = 51667936 bytes, gross = 
77501904, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
...
Completed load phase: read 978 batches, spilled 194 times, total input bytes: 
49162397056
Starting consolidate phase. Batches = 978, Records = 100, Memory = 
378422096, In-memory batches 8, spilled runs 194
Starting merge phase. Runs = 62, Alloc. memory = 0
...
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}

Here, batch size is 1023, which means that some operator must have resized 
batches after the scanner read them (the scanner reads batches of 8K rows.) 
However, this sort saw all 1 million records.

Let's do the math. The sort is trying to merge 62 runs of 7,215,150 bytes per 
batch or 447,339,300 bytes total, producing an output batch of size 1,6739,148, 
for a total memory usage of 464,078,448 bytes. The sort has 482,344,960 bytes 
total. So the math works out.

The error message does not say which batch causes the OOM. At the time of OOM, 
the total memory is 525,809,920. So, something is off.

The vector being allocated is 16 MB, but we expected each batch to be ~7MB in 
size. Something does not balance.

> Varchar vector throws an assertion error when allocating a new vector
> -
>
> Key: DRILL-5670
> URL: https://issues.apache.org/jira/browse/DRILL-5670
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Rahul Challapalli
>Assignee: Paul Rogers
> Fix For: 1.12.0
>
> Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, 
> 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, 
> 

[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector

2017-09-08 Thread Paul Rogers (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101
 ] 

Paul Rogers edited comment on DRILL-5670 at 9/8/17 11:23 PM:
-

On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs past the above problem (but runs into a problem described later):

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] -> [8192]
{code}

This code seems to come from the text reader which relies on default vector 
allocation:

{code}
  @Override
  public void allocate(Map vectorMap) throws 
OutOfMemoryException {
for (final ValueVector v : vectorMap.values()) {
  v.allocateNew();
}
  }
{code}

The above does not say the number of values to allocate. There seems to be some 
attempt to allocate the same size as the previous allocation, but this does not 
explain the offset vector behavior. Also, if the algorithm is, indeed, 
"allocate the last amount", then the algorithm creates a ratchet effect: the 
vector will only grow in size over batches, retaining the largest size yet 
seen. Spikes in data can result in wasted space (low density batches.) This is 
something to investigate elsewhere.

Further, the logic does not seem to work with the offset vector, causing 
reallocations. Something else to investigate.

The run takes about an hour. Good news, I can reproduce the problem on the Mac:

{code}
Starting merge phase. Runs = 62, Alloc. memory = 0
End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715
Unable to allocate buffer of size 16777216 (rounded from 15834000) due to 
memory limit. Current allocation: 525809920
{code}


was (Author: paul.rogers):
On the Mac, single-threaded, the batch size is much different, resulting in 
different behavior:

{code}
  Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 
83754, Net row width: 48101, Density: 1%}

Insufficient memory to merge two batches. Incoming batch size: 678068224, 
available memory: 482344960
{code}

This is because 8096 is the default number of records in the text reader:

{code}
MAX_RECORDS_PER_BATCH = 8096;
{code}

To recreate the original case, changed the hard-coded batch size to 1023. The 
query now runs:

{code}
Config: spill file size = 268435456, spill batch size = 1048576, merge batch 
size = 16777216, mSort batch size = 65535
Memory config: Allocator limit = 482344960
Config: Resetting allocator to 10% safety margin: 530579456

  Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 
84413, Net row width: 48101, Density: 8%}
...
Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 
129530880, records = 1023
Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; 
spill file = 268435456 bytes
Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348
Available memory: 482344960, buffer memory = 463104560, merge memory = 44884
{code}

The above calcs are identical to that seen on the QA cluster.

Note that the CSV reader is producing low-density batches: this is a problem to 
be resolved elsewhere.

The log contains many of the following:

{code}
UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: 
[4096] -> [8192]
{code}

This code seems to come from the text reader which relies on default vector 
allocation:

{code}
  @Override
  public void allocate(Map

[jira] [Commented] (DRILL-5755) TOP_N_SORT operator does not free memory while running

2017-09-08 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159298#comment-16159298
 ] 

Timothy Farkas commented on DRILL-5755:
---

The root cause of the issue is that there is a hyper batch which is the 
combination of a bunch of upstream batches. This hyper batch is purged every N 
windows as dictated by the drill.exec.sort.purge.threshold. There are two 
issues with this:

* *drill.exec.sort.purge.threshold* is currently ill defined because there is 
no default defined for it.
* I don't agree with the design as it's laid out in 
https://issues.apache.org/jira/browse/DRILL-385 I don't see why we could make 
the priority queue hold the records themselves not just the indices. It is more 
work, but if we did that we could eliminate the need for keeping a hyper batch 
that needs to be periodically purged.

> TOP_N_SORT operator does not free memory while running
> --
>
> Key: DRILL-5755
> URL: https://issues.apache.org/jira/browse/DRILL-5755
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.11.0
>Reporter: Boaz Ben-Zvi
>Assignee: Timothy Farkas
> Attachments: 2658c253-20b6-db90-362a-139aae4a327e.sys.drill
>
>
>  The TOP_N_SORT operator should keep the top N rows while processing its 
> input, and free the memory used to hold all rows below the top N.
> For example, the following query uses a table with 125M rows:
> {code}
> select row_count, sum(row_count), avg(double_field), max(double_rand), 
> count(float_rand) from dfs.`/data/tmp` group by row_count order by row_count 
> limit 30;
> {code}
> And failed with an OOM when each of the 3 TOP_N_SORT operators was holding 
> about 2.44 GB !! (see attached profile).  It should take far less memory to 
> hold 30 rows !!



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (DRILL-5138) TopN operator on top of ~110 GB data set is very slow

2017-09-08 Thread Timothy Farkas (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159206#comment-16159206
 ] 

Timothy Farkas commented on DRILL-5138:
---

The memory leak definitely could contribute to performance issues

> TopN operator on top of ~110 GB data set is very slow
> -
>
> Key: DRILL-5138
> URL: https://issues.apache.org/jira/browse/DRILL-5138
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Reporter: Rahul Challapalli
>Assignee: Timothy Farkas
>
> git.commit.id.abbrev=cf2b7c7
> No of cores : 23
> No of disks : 5
> DRILL_MAX_DIRECT_MEMORY="24G"
> DRILL_MAX_HEAP="12G"
> The below query ran for more than 4 hours and did not complete. The table is 
> ~110 GB
> {code}
> select * from catalog_sales order by cs_quantity, cs_wholesale_cost limit 1;
> {code}
> Physical Plan :
> {code}
> 00-00Screen : rowType = RecordType(ANY *): rowcount = 1.0, cumulative 
> cost = {1.00798629141E10 rows, 4.17594320691E10 cpu, 0.0 io, 
> 4.1287118487552E13 network, 0.0 memory}, id = 352
> 00-01  Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 1.0, 
> cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 
> 4.1287118487552E13 network, 0.0 memory}, id = 351
> 00-02Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): rowcount 
> = 1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 
> 4.1287118487552E13 network, 0.0 memory}, id = 350
> 00-03  SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY 
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = 
> {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13 
> network, 0.0 memory}, id = 349
> 00-04Limit(fetch=[1]) : rowType = RecordType(ANY T0¦¦*, ANY 
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = 
> {1.0079862913E10 rows, 4.1759432068E10 cpu, 0.0 io, 4.1287118487552E13 
> network, 0.0 memory}, id = 348
> 00-05  SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC]) : 
> rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): 
> rowcount = 1.439980416E9, cumulative cost = {1.0079862912E10 rows, 
> 4.1759432064E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 347
> 01-01SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, 
> ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative 
> cost = {8.639882496E9 rows, 3.0239588736E10 cpu, 0.0 io, 2.3592639135744E13 
> network, 0.0 memory}, id = 346
> 01-02  TopN(limit=[1]) : rowType = RecordType(ANY T0¦¦*, ANY 
> cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative 
> cost = {7.19990208E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 
> network, 0.0 memory}, id = 345
> 01-03Project(T0¦¦*=[$0], cs_quantity=[$1], 
> cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, 
> ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = 
> {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, 
> 0.0 memory}, id = 344
> 01-04  HashToRandomExchange(dist0=[[$1]], dist1=[[$2]]) : 
> rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = 
> {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, 
> 0.0 memory}, id = 343
> 02-01UnorderedMuxExchange : rowType = RecordType(ANY 
> T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY 
> E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = 
> {4.319941248E9 rows, 1.1519843328E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, 
> id = 342
> 03-01  Project(T0¦¦*=[$0], cs_quantity=[$1], 
> cs_wholesale_cost=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2, 
> hash32AsDouble($1))]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY 
> cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, 
> cumulative cost = {2.879960832E9 rows, 1.0079862912E10 cpu, 0.0 io, 0.0 
> network, 0.0 memory}, id = 341
> 03-02Project(T0¦¦*=[$0], cs_quantity=[$1], 
> cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, 
> ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = 
> {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id 
> = 340
> 03-03  Scan(groupscan=[ParquetGroupScan 
> [entries=[ReadEntryWithPath 
> [path=maprfs:///drill/testdata/tpcds/parquet/sf1000/catalog_sales]], 
> selectionRoot=maprfs:/drill/testdata/tpcds/parquet/sf1000/catalog_sales, 
> numFiles=1, usedMetadataFile=false, 

[jira] [Commented] (DRILL-5657) Implement size-aware result set loader

2017-09-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159016#comment-16159016
 ] 

ASF GitHub Bot commented on DRILL-5657:
---

Github user bitblender commented on a diff in the pull request:

https://github.com/apache/drill/pull/914#discussion_r137852118
  
--- Diff: 
exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java
 ---
@@ -0,0 +1,295 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+/**
+ * Handles the details of the result set loader implementation.
+ * 
+ * The primary purpose of this loader, and the most complex to understand 
and
+ * maintain, is overflow handling.
+ *
+ * Detailed Use Cases
+ *
+ * Let's examine it by considering a number of
+ * use cases.
+ * 
+ * 
Rowabcdefgh
+ * 
n-2XX--
+ * n-1  
--
+ * n  X!O O 
O 
+ * 
+ * Here:
+ * 
+ * n-2, n-1, and n are rows. n is the overflow row.
+ * X indicates a value was written before overflow.
+ * Blank indicates no value was written in that row.
+ * ! indicates the value that triggered overflow.
+ * - indicates a column that did not exist prior to overflow.
--- End diff --

What does an 'O' value mean in the diagram above?


> Implement size-aware result set loader
> --
>
> Key: DRILL-5657
> URL: https://issues.apache.org/jira/browse/DRILL-5657
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: Future
>Reporter: Paul Rogers
>Assignee: Paul Rogers
> Fix For: Future
>
>
> A recent extension to Drill's set of test tools created a "row set" 
> abstraction to allow us to create, and verify, record batches with very few 
> lines of code. Part of this work involved creating a set of "column 
> accessors" in the vector subsystem. Column readers provide a uniform API to 
> obtain data from columns (vectors), while column writers provide a uniform 
> writing interface.
> DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size 
> (to avoid memory fragmentation due to Drill's two memory allocators.) The 
> column accessors have proven to be so useful that they will be the basis for 
> the new, size-aware writers used by Drill's record readers.
> A step in that direction is to retrofit the column writers to use the 
> size-aware {{setScalar()}} and {{setArray()}} methods introduced in 
> DRILL-5517.
> Since the test framework row set classes are (at present) the only consumer 
> of the accessors, those classes must also be updated with the changes.
> This then allows us to add a new "row mutator" class that handles size-aware 
> vector writing, including the case in which a vector fills in the middle of a 
> row.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (DRILL-5777) Oracle JDBC Error while access synonym

2017-09-08 Thread Sudhir Kumar (JIRA)
Sudhir Kumar created DRILL-5777:
---

 Summary: Oracle JDBC Error while access synonym
 Key: DRILL-5777
 URL: https://issues.apache.org/jira/browse/DRILL-5777
 Project: Apache Drill
  Issue Type: Bug
  Components: Client - Java
Affects Versions: 1.10.0
Reporter: Sudhir Kumar


Error while accessing individual column in oracle table accessing via synonym.
Query : select  from 
..

Error:
1
2017-09-08 10:13:46,451 [264d3035-1605-9f5b-084f-09a1b525ef75:foreman] INFO  
o.a.d.exec.planner.sql.SqlConverter - User Error Occurred: From line 1, column 
8 to line 1, column 17: Column  not found in any table (From line 
1, column 8 to line 1, column 17: Column  not found in any table)
org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: From line 
1, column 8 to line 1, column 17: Column  not found in any table

SQL Query null

[Error Id: 2b7c7c2d-664e-4c90-ba20-67509de90f09 ]
at 
org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544)
 ~[drill-common-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:178) 
[drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:622)
 [drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:192)
 [drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164)
 [drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131)
 [drill-java-exec-1.10.0.jar:1.10.0]
at 
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79)
 [drill-java-exec-1.10.0.jar:1.10.0]
at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050) 
[drill-java-exec-1.10.0.jar:1.10.0]
at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) 
[drill-java-exec-1.10.0.jar:1.10.0]
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 
[na:1.8.0_92]
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) 
[na:1.8.0_92]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92]
Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, 
column 8 to line 1, column 17: Column 'TABLE_NAME' not found in any table
at sun.reflect.GeneratedConstructorAccessor67.newInstance(Unknown 
Source) ~[na:na]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[na:1.8.0_92]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[na:1.8.0_92]
at 
org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:765) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:753) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:3974)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.EmptyScope.findQualifyingTableName(EmptyScope.java:108)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.DelegatingScope.findQualifyingTableName(DelegatingScope.java:112)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.ListScope.findQualifyingTableName(ListScope.java:150)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:154)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:4460)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:4440)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:274) 
~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.expand(SqlValidatorImpl.java:4148)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 
org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:420)
 ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19]
at 

[jira] [Closed] (DRILL-5776) Authentication is not performed when updating SYSTEM option from REST api

2017-09-08 Thread Timothy Farkas (JIRA)

 [ 
https://issues.apache.org/jira/browse/DRILL-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Timothy Farkas closed DRILL-5776.
-
Resolution: Invalid

I incorrectly filed this bug, it is not an issue.

> Authentication is not performed when updating SYSTEM option from REST api
> -
>
> Key: DRILL-5776
> URL: https://issues.apache.org/jira/browse/DRILL-5776
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Timothy Farkas
>
> Authentication is not performed when authentication is enabled and a SYSTEM 
> option is updated from the rest api.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)