[jira] [Commented] (DRILL-4641) Support for lzo compression
[ https://issues.apache.org/jira/browse/DRILL-4641?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159759#comment-16159759 ] Evgeniy Kazakov commented on DRILL-4641: Hello, I am struggling same issue with drill and lzo in Windows. Where I should put lzo-hadoop-1.0.5.jar and lzo-core-1.0.5.jar to make drill use them? > Support for lzo compression > --- > > Key: DRILL-4641 > URL: https://issues.apache.org/jira/browse/DRILL-4641 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Other >Affects Versions: Future > Environment: Not specific to platform >Reporter: subbu srinivasan > > Would love support for quering lzo compressed files. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159757#comment-16159757 ] Paul Rogers commented on DRILL-5670: The OOM occurs during the merge phase: {code} Completed spill: memory = 0 Starting merge phase. Runs = 62, Alloc. memory = 0 Read 100 records in 73169 us; size = 8480768, memory = 8481024 ... Read 100 records in 81261 us; size = 8480768, memory = 525807872 {code} Here the "Read 100 records" indicates the sort is loading the first batch of each of 62 spill files. We see that the first spilled batch was 4,736,187 when written (previous comment), but requires 8,481,024 bytes when read. This is larger than the estimate of 7,215,150 that the calcs estimated. The average load size is 8,480,772 bytes. The 1,265,622 byte delta per batch adds up to a 78,468,572 byte error over the 62 batches. Then, when the code tries to allocate the output batch of 348 records, 16739148 bytes, memory is exhausted. (The limit is 530,579,456 bytes.) A quick fix is to use the "max buffer size" in computing the number of batches that can be merged. (The max buffer size is computed as twice the data size, which assumes a 50% internal fragmentation of the memory buffer.) > Varchar vector throws an assertion error when allocating a new vector > - > > Key: DRILL-5670 > URL: https://issues.apache.org/jira/browse/DRILL-5670 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, > 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, > 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, > drillbit.log, drillbit.log, drillbit.out, drill-override.conf > > > I am running this test on a private branch of [paul's > repository|https://github.com/paul-rogers/drill]. Below is the commit info > {code} > git.commit.id.abbrev=d86e16c > git.commit.user.email=prog...@maprtech.com > git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an > improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the > merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- > DRILL-5522\: OOM during the merge and spill process of the managed external > sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of > external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable > vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to > initialize the offset vector\n\nAll of the bugs have to do with handling > low-memory conditions, and with\ncorrectly estimating the sizes of vectors, > even when those vectors come\nfrom the spill file or from an exchange. Hence, > the changes for all of\nthe above issues are interrelated.\n > git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an > improvements > git.commit.user.name=Paul Rogers > git.build.user.name=Rahul Challapalli > git.commit.id.describe=0.9.0-1078-gd86e16c > git.build.user.email=challapallira...@gmail.com > git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.time=05.07.2017 @ 20\:34\:39 PDT > git.build.time=12.07.2017 @ 14\:27\:03 PDT > git.remote.origin.url=g...@github.com\:paul-rogers/drill.git > {code} > Below query fails with an Assertion Error > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET > `exec.sort.disable_managed` = false; > +---+-+ > | ok | summary | > +---+-+ > | true | exec.sort.disable_managed updated. | > +---+-+ > 1 row selected (1.044 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.memory.max_query_memory_per_node` = 482344960; > +---++ > | ok | summary | > +---++ > | true | planner.memory.max_query_memory_per_node updated. | > +---++ > 1 row selected (0.372 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.width.max_per_node` = 1; > +---+--+ > | ok | summary| > +---+--+ > | true | planner.width.max_per_node updated. | > +---+--+ > 1 row selected (0.292 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181>
[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159638#comment-16159638 ] Paul Rogers edited comment on DRILL-5670 at 9/9/17 4:53 AM: Further investigation. When spilling, we get these log entries: {code} Initial output batch allocation: 10566656 bytes, 100 records Took 52893 us to merge 100 records, consuming 10566656 bytes of memory {code} The above shows that we are spilling the expected 100 records. The initial allocation is good; we didn't resize vectors as we wrote. However, each batch consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The gross memory used is 105,667 bytes per row, larger than the 84,413 expected. >From the file summary: {code} Summary: Wrote 246281737 bytes to ... Spilled 52 output batches, each of 10566656 bytes, 100 records {code} >From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This >number has no internal fragmentation and so should match our "net" record size >estimate. Our net estimate is 48,101, so we're pretty close. The error should >be explained, but our estimate is conservative, and so is safe for memory >calcs. was (Author: paul.rogers): Further investigation. When spilling, we get these log entries: {code} Initial output batch allocation: 10566656 bytes, 100 records Took 52893 us to merge 100 records, consuming 10566656 bytes of memory {code} The above shows that we are spilling the expected 100 records. The initial allocation is good; we didn't resize vectors as we wrote. However, each batch consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The gross memory used is 105,667 bytes per row, larger than the 84,413 expected. >From the file summary: {code} Summary: Wrote 246281737 bytes to ... Spilled 52 output batches, each of 10566656 bytes, 100 records {code} >From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This >number has no internal fragmentation and so should match our "net" record size >estimate. Our net estimate is 48,101, so we're pretty close. The error should >be explained, but our estimate is conservative, and so is safe for memory >calcs. The question is, how does the 4.6 MB per batch balloon to 16 MB or more on read? > Varchar vector throws an assertion error when allocating a new vector > - > > Key: DRILL-5670 > URL: https://issues.apache.org/jira/browse/DRILL-5670 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, > 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, > 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, > drillbit.log, drillbit.log, drillbit.out, drill-override.conf > > > I am running this test on a private branch of [paul's > repository|https://github.com/paul-rogers/drill]. Below is the commit info > {code} > git.commit.id.abbrev=d86e16c > git.commit.user.email=prog...@maprtech.com > git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an > improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the > merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- > DRILL-5522\: OOM during the merge and spill process of the managed external > sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of > external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable > vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to > initialize the offset vector\n\nAll of the bugs have to do with handling > low-memory conditions, and with\ncorrectly estimating the sizes of vectors, > even when those vectors come\nfrom the spill file or from an exchange. Hence, > the changes for all of\nthe above issues are interrelated.\n > git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an > improvements > git.commit.user.name=Paul Rogers > git.build.user.name=Rahul Challapalli > git.commit.id.describe=0.9.0-1078-gd86e16c > git.build.user.email=challapallira...@gmail.com > git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.time=05.07.2017 @ 20\:34\:39 PDT > git.build.time=12.07.2017 @ 14\:27\:03 PDT > git.remote.origin.url=g...@github.com\:paul-rogers/drill.git > {code} > Below query fails with an Assertion Error > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET > `exec.sort.disable_managed` = false; > +---+-+ > | ok | summary | >
[jira] [Commented] (DRILL-5766) Stored XSS in APACHE DRILL
[ https://issues.apache.org/jira/browse/DRILL-5766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159730#comment-16159730 ] ASF GitHub Bot commented on DRILL-5766: --- Github user parthchandra commented on the issue: https://github.com/apache/drill/pull/935 +1. Looks good. > Stored XSS in APACHE DRILL > -- > > Key: DRILL-5766 > URL: https://issues.apache.org/jira/browse/DRILL-5766 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.6.0, 1.7.0, 1.8.0, 1.9.0, 1.10.0, 1.11.0 > Environment: Apache drill installed in debian system >Reporter: Sanjog Panda >Assignee: Arina Ielchiieva >Priority: Critical > Labels: cross-site-scripting, security, security-issue, xss > Fix For: 1.12.0 > > Attachments: XSS - Sink.png, XSS - Source.png > > > Hello Apache security team, > I have been testing an application which internally uses the Apache drill > software v 1.6 as of now. > I found XSS on profile page (sink) where in the user's malicious input comes > from the Query page (source) where you run a query. > Affected URL : https://localhost:8047/profiles > Once the user give the below payload and load the profile page, it gets > triggered and is stored. > I have attached the screenshot of payload > alert(document.cookie). > *[screenshot link] > * > https://drive.google.com/file/d/0B8giJ3591fvUbm5JZWtjUTg3WmEwYmJQeWd6dURuV0gzOVd3/view?usp=sharing > https://drive.google.com/file/d/0B8giJ3591fvUV2lJRzZWOWRGNzN5S0JzdVlXSG1iNnVwRlAw/view?usp=sharing > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Updated] (DRILL-5694) hash agg spill to disk, second phase OOM
[ https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Boaz Ben-Zvi updated DRILL-5694: Reviewer: Paul Rogers > hash agg spill to disk, second phase OOM > > > Key: DRILL-5694 > URL: https://issues.apache.org/jira/browse/DRILL-5694 > Project: Apache Drill > Issue Type: Bug > Components: Functions - Drill >Affects Versions: 1.11.0 >Reporter: Chun Chang >Assignee: Boaz Ben-Zvi > > | 1.11.0-SNAPSHOT | d622f76ee6336d97c9189fc589befa7b0f4189d6 | DRILL-5165: > For limit all case, no need to push down limit to scan | 21.07.2017 @ > 10:36:29 PDT > Second phase agg ran out of memory. Not suppose to. Test data currently only > accessible locally. > /root/drill-test-framework/framework/resources/Advanced/hash-agg/spill/hagg15.q > Query: > select row_count, sum(row_count), avg(double_field), max(double_rand), > count(float_rand) from parquet_500m_v1 group by row_count order by row_count > limit 30 > Failed with exception > java.sql.SQLException: RESOURCE ERROR: One or more nodes ran out of memory > while executing the query. > HT was: 534773760 OOM at Second Phase. Partitions: 32. Estimated batch size: > 4849664. Planned batches: 0. Rows spilled so far: 6459928 Memory limit: > 536870912 so far allocated: 534773760. > Fragment 1:6 > [Error Id: a193babd-f783-43da-a476-bb8dd4382420 on 10.10.30.168:31010] > (org.apache.drill.exec.exception.OutOfMemoryException) HT was: 534773760 > OOM at Second Phase. Partitions: 32. Estimated batch size: 4849664. Planned > batches: 0. Rows spilled so far: 6459928 Memory limit: 536870912 so far > allocated: 534773760. > > org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1175 > org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.project.ProjectRecordBatch.innerNext():133 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.physical.impl.TopN.TopNBatch.innerNext():191 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.record.AbstractRecordBatch.next():119 > org.apache.drill.exec.record.AbstractRecordBatch.next():109 > org.apache.drill.exec.record.AbstractSingleRecordBatch.innerNext():51 > > org.apache.drill.exec.physical.impl.svremover.RemovingRecordBatch.innerNext():93 > org.apache.drill.exec.record.AbstractRecordBatch.next():162 > org.apache.drill.exec.physical.impl.BaseRootExec.next():105 > > org.apache.drill.exec.physical.impl.SingleSenderCreator$SingleSenderRootExec.innerNext():92 > org.apache.drill.exec.physical.impl.BaseRootExec.next():95 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():234 > org.apache.drill.exec.work.fragment.FragmentExecutor$1.run():227 > java.security.AccessController.doPrivileged():-2 > javax.security.auth.Subject.doAs():415 > org.apache.hadoop.security.UserGroupInformation.doAs():1595 > org.apache.drill.exec.work.fragment.FragmentExecutor.run():227 > org.apache.drill.common.SelfCleaningRunnable.run():38 > java.util.concurrent.ThreadPoolExecutor.runWorker():1145 > java.util.concurrent.ThreadPoolExecutor$Worker.run():615 > java.lang.Thread.run():745 > Caused By (org.apache.drill.exec.exception.OutOfMemoryException) Unable to > allocate buffer of size 4194304 due to memory limit. Current allocation: > 534773760 > org.apache.drill.exec.memory.BaseAllocator.buffer():238 > org.apache.drill.exec.memory.BaseAllocator.buffer():213 > org.apache.drill.exec.vector.IntVector.allocateBytes():231 > org.apache.drill.exec.vector.IntVector.allocateNew():211 > > org.apache.drill.exec.test.generated.HashTableGen2141.allocMetadataVector():778 > > org.apache.drill.exec.test.generated.HashTableGen2141.resizeAndRehashIfNeeded():717 > org.apache.drill.exec.test.generated.HashTableGen2141.insertEntry():643 > org.apache.drill.exec.test.generated.HashTableGen2141.put():618 > > org.apache.drill.exec.test.generated.HashAggregatorGen1823.checkGroupAndAggrValues():1173 > org.apache.drill.exec.test.generated.HashAggregatorGen1823.doWork():539 > org.apache.drill.exec.physical.impl.aggregate.HashAggBatch.innerNext():168 >
[jira] [Commented] (DRILL-5694) hash agg spill to disk, second phase OOM
[ https://issues.apache.org/jira/browse/DRILL-5694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159676#comment-16159676 ] ASF GitHub Bot commented on DRILL-5694: --- GitHub user Ben-Zvi opened a pull request: https://github.com/apache/drill/pull/938 DRILL-5694: Handle HashAgg OOM by spill and retry, plus perf improvement The main change in this PR is adding a "_second way_" to handle memory pressure for the Hash Aggregate: Basically catch OOM failures when processing a new input row (during put() into the Hash Table), cleanup internally to allow a retry (of the put()) and return a new exception "**RetryAfterSpillException**". In such a case the caller spills some partition to free more memory, and retries inserting that new row. In addition, to reduce the risk of OOM when either creating the "Values Batch" (to match the "Keys Batch" in the Hash Table), or when allocating the Outgoing vectors (for the Values) -- there are new "_reserves_" -- one reserve for each of the two. A "_reserve_" is a memory amount subtracted from the memory-limit, which is added back to the limit just before it is needed, so hopefully preventing an OOM. After the allocation the code tries to restore that reserve (by subtracting from the limit, if possible). We always restore the "Outgoing Reserve" first; in case the "Values Batch" reserve runs empty just before calling put(), we skip the put() (just like an OOM there) and spill to free some memory (and restore that reserve). The old "_first way_" is still used. That is the code that predicts the memory needs, and triggers a spill if not enough memory is available. The spill code was separated into a new method called spillIfNeeded() which is used in two modes - either the old way (prediction), or (when called from the new OOM catch code) with a flag to force a spill, regardless of available memory. That flag is also used to reduce the priority of the "current partition" when choosing a partition to spill. A new testing option was added (**hashagg_use_memory_prediction**, default true) - by setting this to false the old "first way" is disabled. This allows stress testing of the OOM handling code (which may not be used under normal memory allocation). The HashTable put() code was re-written to cleanup partial changes in case of an OOM. And so the code around the call of put() to catch the new exception, spill and retry. Note that this works for 1st phase aggregation as well (return rows early). For the estimates (in addition to the old "max batch size" estimate) - there is an estimate for the Values Batch, and one for for the Outgoing. These are used for restoring the "reserves". These estimates may be resized up in case actual allocations are bigger. Other changes: * Improved the "max batch size estimation" -- using the outgoing batch for getting the correct schema (instead of the input batch). The only information needed from the input batch is the "max average column size" (see change inRecordBatchSizer.java) to have a better estimate for VarChars. Also computed the size of those "no null" bigint columns added into the Values Batch when the aggregation is SUM, MIN or MAX (see changes in HashAggBatch.java and HashAggregator.java) * Using a "plain Java" subclass for the HashTable because "byte manipulation" breaks on the new template code (see ChainedHashTable.java) * The three Configuration options where changed into System/Session options: min_batches_per_partition , hashagg_max_memory , hashagg_num_partitions * There was a potential memory leak in the HashTable BatchHolder ctor (vectors were added to the container only after the successful allocation, and the container was cleared in case of OOM. So in case of a partial allocation, the allocated part was no accessible). Also (Paul's suggestion) modified some vector templates to cleanup after any runtime error (including an OOM). * Performance improvements: Eliminated the call to updateBatches() before each hash computation (instead used only when switching to a new SpilledRecordBatch); this was a big overhead. Also changed all the "setSafe" calls into "set" for the HashTable (those nanoseconds add up, specially when rehashing) - these bigint vectors need no resizing. * Ignore "(spill) file not found" error while cleaning up. * The unit tests were re-written in a more compact form. And a test with the new option (forcing the OOM code) was added (no memory prediction). You can merge this pull request into a Git repository by running: $ git pull https://github.com/Ben-Zvi/drill DRILL-5694 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/drill/pull/938.patch To close this pull request, make a commit to your master/trunk branch with (at
[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159638#comment-16159638 ] Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:36 AM: Further investigation. When spilling, we get these log entries: {code} Initial output batch allocation: 10566656 bytes, 100 records Took 52893 us to merge 100 records, consuming 10566656 bytes of memory {code} The above shows that we are spilling the expected 100 records. The initial allocation is good; we didn't resize vectors as we wrote. However, each batch consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The gross memory used is 105,667 bytes per row, larger than the 84,413 expected. >From the file summary: {code} Summary: Wrote 246281737 bytes to ... Spilled 52 output batches, each of 10566656 bytes, 100 records {code} >From this we see size was 4,736,187 bytes per batch, or 47,362 per row. This >number has no internal fragmentation and so should match our "net" record size >estimate. Our net estimate is 48,101, so we're pretty close. The error should >be explained, but our estimate is conservative, and so is safe for memory >calcs. The question is, how does the 4.6 MB per batch balloon to 16 MB or more on read? was (Author: paul.rogers): Further investigation. When spilling, we get these log entries: {code} Initial output batch allocation: 10566656 bytes, 100 records Took 52893 us to merge 100 records, consuming 10566656 bytes of memory {code} The above shows that we are spilling the expected 100 records. The initial allocation is good; we didn't resize vectors as we wrote. However, each batch consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The gross memory used is 105,667 bytes per row, larger than the 84,413 expected. > Varchar vector throws an assertion error when allocating a new vector > - > > Key: DRILL-5670 > URL: https://issues.apache.org/jira/browse/DRILL-5670 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, > 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, > 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, > drillbit.log, drillbit.log, drillbit.out, drill-override.conf > > > I am running this test on a private branch of [paul's > repository|https://github.com/paul-rogers/drill]. Below is the commit info > {code} > git.commit.id.abbrev=d86e16c > git.commit.user.email=prog...@maprtech.com > git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an > improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the > merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- > DRILL-5522\: OOM during the merge and spill process of the managed external > sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of > external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable > vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to > initialize the offset vector\n\nAll of the bugs have to do with handling > low-memory conditions, and with\ncorrectly estimating the sizes of vectors, > even when those vectors come\nfrom the spill file or from an exchange. Hence, > the changes for all of\nthe above issues are interrelated.\n > git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an > improvements > git.commit.user.name=Paul Rogers > git.build.user.name=Rahul Challapalli > git.commit.id.describe=0.9.0-1078-gd86e16c > git.build.user.email=challapallira...@gmail.com > git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.time=05.07.2017 @ 20\:34\:39 PDT > git.build.time=12.07.2017 @ 14\:27\:03 PDT > git.remote.origin.url=g...@github.com\:paul-rogers/drill.git > {code} > Below query fails with an Assertion Error > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET > `exec.sort.disable_managed` = false; > +---+-+ > | ok | summary | > +---+-+ > | true | exec.sort.disable_managed updated. | > +---+-+ > 1 row selected (1.044 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.memory.max_query_memory_per_node` = 482344960; > +---++ > | ok | summary | >
[jira] [Commented] (DRILL-5723) Support System/Session Internal Options And Additional Option System Fixes
[ https://issues.apache.org/jira/browse/DRILL-5723?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159639#comment-16159639 ] ASF GitHub Bot commented on DRILL-5723: --- Github user ilooner commented on the issue: https://github.com/apache/drill/pull/923 @paul-rogers Finished applying comments, and cleanup. It's ready for review again now. > Support System/Session Internal Options And Additional Option System Fixes > -- > > Key: DRILL-5723 > URL: https://issues.apache.org/jira/browse/DRILL-5723 > Project: Apache Drill > Issue Type: New Feature >Reporter: Timothy Farkas >Assignee: Timothy Farkas > > This is a feature proposed by [~ben-zvi]. > Currently all the options are accessible by the user in sys.options. We would > like to add internal options which can be altered, but are not visible in the > sys.options table. These internal options could be seen by another alias > select * from internal.options. The intention would be to put new options we > weren't comfortable with exposing to the end user in this table. > After the options and their corresponding features are considered stable they > could be changed to appear in the sys.option table. > A bunch of other fixes to the Option system have been clubbed into this: > * OptionValidators no longer hold default values. Default values are > contained in the SystemOptionManager > * Options have an OptionDefinition. The option definition includes: > * A validator > * Metadata about the options visibility, required permissions, and the > scope in which it can be set. > * The Option Manager interface has been cleaned up so that a Type is not > required to be passed in in order to set and delete options -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159638#comment-16159638 ] Paul Rogers commented on DRILL-5670: Further investigation. When spilling, we get these log entries: {code} Initial output batch allocation: 10566656 bytes, 100 records Took 52893 us to merge 100 records, consuming 10566656 bytes of memory {code} The above shows that we are spilling the expected 100 records. The initial allocation is good; we didn't resize vectors as we wrote. However, each batch consumed 10,566,656 bytes of memory, much more than the 7 MB expected. The gross memory used is 105,667 bytes per row, larger than the 84,413 expected. > Varchar vector throws an assertion error when allocating a new vector > - > > Key: DRILL-5670 > URL: https://issues.apache.org/jira/browse/DRILL-5670 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, > 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, > 269969ca-8d4d-073a-d916-9031e3d3fbf0.sys.drill, drillbit.log, drillbit.log, > drillbit.log, drillbit.log, drillbit.out, drill-override.conf > > > I am running this test on a private branch of [paul's > repository|https://github.com/paul-rogers/drill]. Below is the commit info > {code} > git.commit.id.abbrev=d86e16c > git.commit.user.email=prog...@maprtech.com > git.commit.message.full=DRILL-5601\: Rollup of external sort fixes an > improvements\n\n- DRILL-5513\: Managed External Sort \: OOM error during the > merge phase\n- DRILL-5519\: Sort fails to spill and results in an OOM\n- > DRILL-5522\: OOM during the merge and spill process of the managed external > sort\n- DRILL-5594\: Excessive buffer reallocations during merge phase of > external sort\n- DRILL-5597\: Incorrect "bits" vector allocation in nullable > vectors allocateNew()\n- DRILL-5602\: Repeated List Vector fails to > initialize the offset vector\n\nAll of the bugs have to do with handling > low-memory conditions, and with\ncorrectly estimating the sizes of vectors, > even when those vectors come\nfrom the spill file or from an exchange. Hence, > the changes for all of\nthe above issues are interrelated.\n > git.commit.id=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.message.short=DRILL-5601\: Rollup of external sort fixes an > improvements > git.commit.user.name=Paul Rogers > git.build.user.name=Rahul Challapalli > git.commit.id.describe=0.9.0-1078-gd86e16c > git.build.user.email=challapallira...@gmail.com > git.branch=d86e16c551e7d3553f2cde748a739b1c5a7a7659 > git.commit.time=05.07.2017 @ 20\:34\:39 PDT > git.build.time=12.07.2017 @ 14\:27\:03 PDT > git.remote.origin.url=g...@github.com\:paul-rogers/drill.git > {code} > Below query fails with an Assertion Error > {code} > 0: jdbc:drill:zk=10.10.100.190:5181> ALTER SESSION SET > `exec.sort.disable_managed` = false; > +---+-+ > | ok | summary | > +---+-+ > | true | exec.sort.disable_managed updated. | > +---+-+ > 1 row selected (1.044 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.memory.max_query_memory_per_node` = 482344960; > +---++ > | ok | summary | > +---++ > | true | planner.memory.max_query_memory_per_node updated. | > +---++ > 1 row selected (0.372 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.width.max_per_node` = 1; > +---+--+ > | ok | summary| > +---+--+ > | true | planner.width.max_per_node updated. | > +---+--+ > 1 row selected (0.292 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> alter session set > `planner.width.max_per_query` = 1; > +---+---+ > | ok |summary| > +---+---+ > | true | planner.width.max_per_query updated. | > +---+---+ > 1 row selected (0.25 seconds) > 0: jdbc:drill:zk=10.10.100.190:5181> select count(*) from (select * from > dfs.`/drill/testdata/resource-manager/3500cols.tbl` order by >
[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101 ] Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:18 AM: On the Mac, single-threaded, the batch size is much different, resulting in different behavior: {code} Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 83754, Net row width: 48101, Density: 1%} Insufficient memory to merge two batches. Incoming batch size: 678068224, available memory: 482344960 {code} This is because 8096 is the default number of records in the text reader: {code} MAX_RECORDS_PER_BATCH = 8096; {code} To recreate the original case, changed the hard-coded batch size to 1023. The query now runs past the above problem (but runs into a problem described later): {code} Config: spill file size = 268435456, spill batch size = 1048576, merge batch size = 16777216, mSort batch size = 65535 Memory config: Allocator limit = 482344960 Config: Resetting allocator to 10% safety margin: 530579456 Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 84413, Net row width: 48101, Density: 8%} ... Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 129530880, records = 1023 Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348 Available memory: 482344960, buffer memory = 463104560, merge memory = 44884 {code} Note that actual density is 56%. There seems to be an overflow issue that gives a number of only 8%. The above calcs are identical to that seen on the QA cluster. The log contains many of the following: {code} UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [4096] -> [8192] {code} This code seems to come from the text reader which relies on default vector allocation: {code} @Override public void allocate(MapvectorMap) throws OutOfMemoryException { for (final ValueVector v : vectorMap.values()) { v.allocateNew(); } } {code} The above does not say the number of values to allocate. There seems to be some attempt to allocate the same size as the previous allocation, but this does not explain the offset vector behavior. Also, if the algorithm is, indeed, "allocate the last amount", then the algorithm creates a ratchet effect: the vector will only grow in size over batches, retaining the largest size yet seen. Spikes in data can result in wasted space (low density batches.) This is something to investigate elsewhere. Further, the logic does not seem to work with the offset vector, causing reallocations. Something else to investigate. The run takes about an hour. Good news, I can reproduce the problem on the Mac: {code} Starting merge phase. Runs = 62, Alloc. memory = 0 End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715 Unable to allocate buffer of size 16777216 (rounded from 15834000) due to memory limit. Current allocation: 525809920 {code} was (Author: paul.rogers): On the Mac, single-threaded, the batch size is much different, resulting in different behavior: {code} Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 83754, Net row width: 48101, Density: 1%} Insufficient memory to merge two batches. Incoming batch size: 678068224, available memory: 482344960 {code} This is because 8096 is the default number of records in the text reader: {code} MAX_RECORDS_PER_BATCH = 8096; {code} To recreate the original case, changed the hard-coded batch size to 1023. The query now runs past the above problem (but runs into a problem described later): {code} Config: spill file size = 268435456, spill batch size = 1048576, merge batch size = 16777216, mSort batch size = 65535 Memory config: Allocator limit = 482344960 Config: Resetting allocator to 10% safety margin: 530579456 Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 84413, Net row width: 48101, Density: 8%} ... Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 129530880, records = 1023 Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348 Available memory: 482344960, buffer memory = 463104560, merge memory = 44884 {code} Note that actual density is 56%. There seems to be an overflow issue that gives a number of only 8%. The above calcs are identical to that seen on the QA cluster. Note that the CSV reader is producing low-density batches: this is a problem to be resolved elsewhere. The log contains many of the following: {code} UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [4096]
[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101 ] Paul Rogers edited comment on DRILL-5670 at 9/9/17 1:13 AM: On the Mac, single-threaded, the batch size is much different, resulting in different behavior: {code} Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 83754, Net row width: 48101, Density: 1%} Insufficient memory to merge two batches. Incoming batch size: 678068224, available memory: 482344960 {code} This is because 8096 is the default number of records in the text reader: {code} MAX_RECORDS_PER_BATCH = 8096; {code} To recreate the original case, changed the hard-coded batch size to 1023. The query now runs past the above problem (but runs into a problem described later): {code} Config: spill file size = 268435456, spill batch size = 1048576, merge batch size = 16777216, mSort batch size = 65535 Memory config: Allocator limit = 482344960 Config: Resetting allocator to 10% safety margin: 530579456 Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 84413, Net row width: 48101, Density: 8%} ... Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 129530880, records = 1023 Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348 Available memory: 482344960, buffer memory = 463104560, merge memory = 44884 {code} Note that actual density is 56%. There seems to be an overflow issue that gives a number of only 8%. The above calcs are identical to that seen on the QA cluster. Note that the CSV reader is producing low-density batches: this is a problem to be resolved elsewhere. The log contains many of the following: {code} UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [4096] -> [8192] {code} This code seems to come from the text reader which relies on default vector allocation: {code} @Override public void allocate(MapvectorMap) throws OutOfMemoryException { for (final ValueVector v : vectorMap.values()) { v.allocateNew(); } } {code} The above does not say the number of values to allocate. There seems to be some attempt to allocate the same size as the previous allocation, but this does not explain the offset vector behavior. Also, if the algorithm is, indeed, "allocate the last amount", then the algorithm creates a ratchet effect: the vector will only grow in size over batches, retaining the largest size yet seen. Spikes in data can result in wasted space (low density batches.) This is something to investigate elsewhere. Further, the logic does not seem to work with the offset vector, causing reallocations. Something else to investigate. The run takes about an hour. Good news, I can reproduce the problem on the Mac: {code} Starting merge phase. Runs = 62, Alloc. memory = 0 End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715 Unable to allocate buffer of size 16777216 (rounded from 15834000) due to memory limit. Current allocation: 525809920 {code} was (Author: paul.rogers): On the Mac, single-threaded, the batch size is much different, resulting in different behavior: {code} Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 83754, Net row width: 48101, Density: 1%} Insufficient memory to merge two batches. Incoming batch size: 678068224, available memory: 482344960 {code} This is because 8096 is the default number of records in the text reader: {code} MAX_RECORDS_PER_BATCH = 8096; {code} To recreate the original case, changed the hard-coded batch size to 1023. The query now runs past the above problem (but runs into a problem described later): {code} Config: spill file size = 268435456, spill batch size = 1048576, merge batch size = 16777216, mSort batch size = 65535 Memory config: Allocator limit = 482344960 Config: Resetting allocator to 10% safety margin: 530579456 Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 84413, Net row width: 48101, Density: 8%} ... Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 129530880, records = 1023 Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348 Available memory: 482344960, buffer memory = 463104560, merge memory = 44884 {code} The above calcs are identical to that seen on the QA cluster. Note that the CSV reader is producing low-density batches: this is a problem to be resolved elsewhere. The log contains many of the following: {code} UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes:
[jira] [Updated] (DRILL-5778) Drill seems to run out of memory but completes execution
[ https://issues.apache.org/jira/browse/DRILL-5778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Hou updated DRILL-5778: -- Attachment: 264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0.sys.drill drillbit.log > Drill seems to run out of memory but completes execution > > > Key: DRILL-5778 > URL: https://issues.apache.org/jira/browse/DRILL-5778 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Robert Hou >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0.sys.drill, > drillbit.log > > > Query is: > {noformat} > ALTER SESSION SET `exec.sort.disable_managed` = false; > alter session set `planner.width.max_per_node` = 1; > alter session set `planner.disable_exchanges` = true; > alter session set `planner.width.max_per_query` = 1; > alter session set `planner.memory.max_query_memory_per_node` = 2147483648; > select count(*) from (select * from (select id, flatten(str_list) str from > dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by > d.str) d1 where d1.id=0; > {noformat} > Plan is: > {noformat} > | 00-00Screen > 00-01 Project(EXPR$0=[$0]) > 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) > 00-03 UnionExchange > 01-01StreamAgg(group=[{}], EXPR$0=[COUNT()]) > 01-02 Project($f0=[0]) > 01-03SelectionVectorRemover > 01-04 Filter(condition=[=($0, 0)]) > 01-05SingleMergeExchange(sort0=[1 ASC]) > 02-01 SelectionVectorRemover > 02-02Sort(sort0=[$1], dir0=[ASC]) > 02-03 Project(id=[$0], str=[$1]) > 02-04HashToRandomExchange(dist0=[[$1]]) > 03-01 UnorderedMuxExchange > 04-01Project(id=[$0], str=[$1], > E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)]) > 04-02 Flatten(flattenField=[$1]) > 04-03Project(id=[$0], str=[$1]) > 04-04 Scan(groupscan=[EasyGroupScan > [selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json, > numFiles=1, columns=[`id`, `str_list`], > files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]]) > {noformat} > From drillbit.log: > {noformat} > 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG > o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes { > str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, > data size: 548360) > id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data > size: 36864) > Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: > 262163, Net row width: 143, Density: 1} > 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] ERROR > o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. > Incoming batch size: 1073819648, available memory: 2147483648 > 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] INFO > o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug > option: true > 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG > o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=3.3 KiB): > ... > 2017-09-08 05:07:21,536 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG > o.a.d.exec.compile.ClassTransformer - Compiled and merged > SingleBatchSorterGen2677: bytecode size = 3.6 KiB, time = 19 ms. > 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG > o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 5608 us to sort 4096 records > 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG > o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 143 > bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096 > 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG > o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048476 bytes, > gross = 1572714 bytes, records = 7332; spill file = 268435456 bytes > 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG > o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 9371505 bytes, > gross = 14057257 bytes, records = 65535 > 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG > o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer > memory = 2143289744, merge memory = 2128740638 > 2017-09-08 05:07:21,571
[jira] [Created] (DRILL-5778) Drill seems to run out of memory but completes execution
Robert Hou created DRILL-5778: - Summary: Drill seems to run out of memory but completes execution Key: DRILL-5778 URL: https://issues.apache.org/jira/browse/DRILL-5778 Project: Apache Drill Issue Type: Bug Components: Execution - Relational Operators Affects Versions: 1.11.0 Reporter: Robert Hou Assignee: Paul Rogers Fix For: 1.12.0 Query is: {noformat} ALTER SESSION SET `exec.sort.disable_managed` = false; alter session set `planner.width.max_per_node` = 1; alter session set `planner.disable_exchanges` = true; alter session set `planner.width.max_per_query` = 1; alter session set `planner.memory.max_query_memory_per_node` = 2147483648; select count(*) from (select * from (select id, flatten(str_list) str from dfs.`/drill/testdata/resource-manager/flatten-large-small.json`) d order by d.str) d1 where d1.id=0; {noformat} Plan is: {noformat} | 00-00Screen 00-01 Project(EXPR$0=[$0]) 00-02StreamAgg(group=[{}], EXPR$0=[$SUM0($0)]) 00-03 UnionExchange 01-01StreamAgg(group=[{}], EXPR$0=[COUNT()]) 01-02 Project($f0=[0]) 01-03SelectionVectorRemover 01-04 Filter(condition=[=($0, 0)]) 01-05SingleMergeExchange(sort0=[1 ASC]) 02-01 SelectionVectorRemover 02-02Sort(sort0=[$1], dir0=[ASC]) 02-03 Project(id=[$0], str=[$1]) 02-04HashToRandomExchange(dist0=[[$1]]) 03-01 UnorderedMuxExchange 04-01Project(id=[$0], str=[$1], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($1, 1301011)]) 04-02 Flatten(flattenField=[$1]) 04-03Project(id=[$0], str=[$1]) 04-04 Scan(groupscan=[EasyGroupScan [selectionRoot=maprfs:/drill/testdata/resource-manager/flatten-large-small.json, numFiles=1, columns=[`id`, `str_list`], files=[maprfs:///drill/testdata/resource-manager/flatten-large-small.json]]]) {noformat} >From drillbit.log: {noformat} 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Actual batch schema & sizes { str(type: REQUIRED VARCHAR, count: 4096, std size: 54, actual size: 134, data size: 548360) id(type: OPTIONAL BIGINT, count: 4096, std size: 8, actual size: 9, data size: 36864) Records: 4096, Total size: 1073819648, Data size: 585224, Gross row width: 262163, Net row width: 143, Density: 1} 2017-09-08 05:07:21,515 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] ERROR o.a.d.e.p.i.x.m.ExternalSortBatch - Insufficient memory to merge two batches. Incoming batch size: 1073819648, available memory: 2147483648 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] INFO o.a.d.e.c.ClassCompilerSelector - Java compiler policy: DEFAULT, Debug option: true 2017-09-08 05:07:21,517 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.compile.JaninoClassCompiler - Compiling (source size=3.3 KiB): ... 2017-09-08 05:07:21,536 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.exec.compile.ClassTransformer - Compiled and merged SingleBatchSorterGen2677: bytecode size = 3.6 KiB, time = 19 ms. 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 5608 us to sort 4096 records 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 143 bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048476 bytes, gross = 1572714 bytes, records = 7332; spill file = 268435456 bytes 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Output batch size: net = 9371505 bytes, gross = 14057257 bytes, records = 65535 2017-09-08 05:07:21,566 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Available memory: 2147483648, buffer memory = 2143289744, merge memory = 2128740638 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.t.g.SingleBatchSorterGen2677 - Took 4303 us to sort 4096 records 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Input Batch Estimates: record size = 266 bytes; net = 1073819648 bytes, gross = 1610729472, records = 4096 2017-09-08 05:07:21,571 [264d780f-41ac-2c4f-6bc8-bdbb5eeb3df0:frag:0:0] DEBUG o.a.d.e.p.i.x.m.ExternalSortBatch - Spill batch size: net = 1048572 bytes, gross = 1572858 bytes,
[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16157917#comment-16157917 ] Paul Rogers edited comment on DRILL-5670 at 9/8/17 11:26 PM: - Analysis: {code} Config: spill file size = 268435456, spill batch size = 1048576, merge batch size = 16777216, mSort batch size = 65535 Memory config: Allocator limit = 482344960 Actual batch schema & sizes { ... Records: 1023, Total size: 51667936, Data size: 49207323, Gross row width: 50507, Net row width: 48101, Density: 13} Input Batch Estimates: record size = 48101 bytes; net = 51667936 bytes, gross = 77501904, records = 1023 Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348 Available memory: 482344960, buffer memory = 463104560, merge memory = 44884 ... Completed load phase: read 978 batches, spilled 194 times, total input bytes: 49162397056 Starting consolidate phase. Batches = 978, Records = 100, Memory = 378422096, In-memory batches 8, spilled runs 194 Starting merge phase. Runs = 62, Alloc. memory = 0 ... Unable to allocate buffer of size 16777216 (rounded from 15834000) due to memory limit. Current allocation: 525809920 {code} Here, batch size is 1023, which means that some operator must have resized batches after the scanner read them (the scanner reads batches of 8K rows.) However, this sort saw all 1 million records. Let's do the math. The sort is trying to merge 62 runs of 7,215,150 bytes per batch or 447,339,300 bytes total, producing an output batch of size 1,6739,148, for a total memory usage of 464,078,448 bytes. The sort has 482,344,960 bytes total. So the math works out. The error message does not say which batch causes the OOM. At the time of OOM, the total memory is 525,809,920. So, something is off. The vector being allocated is 16 MB, but we expected each batch to be ~7MB in size. Something does not balance. The spill batch size is: {code} Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes {code} That is, 100 records needs 7 MB, assuming 50% average internal fragmentation. But, somehow, we actually try to allocate 16 MB for 100 records, which requires 160,000 bytes per record, almost four time our expected data size. Clearly, something is off. was (Author: paul.rogers): Analysis: {code} Config: spill file size = 268435456, spill batch size = 1048576, merge batch size = 16777216, mSort batch size = 65535 Memory config: Allocator limit = 482344960 Actual batch schema & sizes { ... Records: 1023, Total size: 51667936, Data size: 49207323, Gross row width: 50507, Net row width: 48101, Density: 13} Input Batch Estimates: record size = 48101 bytes; net = 51667936 bytes, gross = 77501904, records = 1023 Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348 Available memory: 482344960, buffer memory = 463104560, merge memory = 44884 ... Completed load phase: read 978 batches, spilled 194 times, total input bytes: 49162397056 Starting consolidate phase. Batches = 978, Records = 100, Memory = 378422096, In-memory batches 8, spilled runs 194 Starting merge phase. Runs = 62, Alloc. memory = 0 ... Unable to allocate buffer of size 16777216 (rounded from 15834000) due to memory limit. Current allocation: 525809920 {code} Here, batch size is 1023, which means that some operator must have resized batches after the scanner read them (the scanner reads batches of 8K rows.) However, this sort saw all 1 million records. Let's do the math. The sort is trying to merge 62 runs of 7,215,150 bytes per batch or 447,339,300 bytes total, producing an output batch of size 1,6739,148, for a total memory usage of 464,078,448 bytes. The sort has 482,344,960 bytes total. So the math works out. The error message does not say which batch causes the OOM. At the time of OOM, the total memory is 525,809,920. So, something is off. The vector being allocated is 16 MB, but we expected each batch to be ~7MB in size. Something does not balance. > Varchar vector throws an assertion error when allocating a new vector > - > > Key: DRILL-5670 > URL: https://issues.apache.org/jira/browse/DRILL-5670 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Rahul Challapalli >Assignee: Paul Rogers > Fix For: 1.12.0 > > Attachments: 26555749-4d36-10d2-6faf-e403db40c370.sys.drill, > 266290f3-5fdc-5873-7372-e9ee053bf867.sys.drill, >
[jira] [Comment Edited] (DRILL-5670) Varchar vector throws an assertion error when allocating a new vector
[ https://issues.apache.org/jira/browse/DRILL-5670?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16158101#comment-16158101 ] Paul Rogers edited comment on DRILL-5670 at 9/8/17 11:23 PM: - On the Mac, single-threaded, the batch size is much different, resulting in different behavior: {code} Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 83754, Net row width: 48101, Density: 1%} Insufficient memory to merge two batches. Incoming batch size: 678068224, available memory: 482344960 {code} This is because 8096 is the default number of records in the text reader: {code} MAX_RECORDS_PER_BATCH = 8096; {code} To recreate the original case, changed the hard-coded batch size to 1023. The query now runs past the above problem (but runs into a problem described later): {code} Config: spill file size = 268435456, spill batch size = 1048576, merge batch size = 16777216, mSort batch size = 65535 Memory config: Allocator limit = 482344960 Config: Resetting allocator to 10% safety margin: 530579456 Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 84413, Net row width: 48101, Density: 8%} ... Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 129530880, records = 1023 Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348 Available memory: 482344960, buffer memory = 463104560, merge memory = 44884 {code} The above calcs are identical to that seen on the QA cluster. Note that the CSV reader is producing low-density batches: this is a problem to be resolved elsewhere. The log contains many of the following: {code} UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [4096] -> [8192] {code} This code seems to come from the text reader which relies on default vector allocation: {code} @Override public void allocate(MapvectorMap) throws OutOfMemoryException { for (final ValueVector v : vectorMap.values()) { v.allocateNew(); } } {code} The above does not say the number of values to allocate. There seems to be some attempt to allocate the same size as the previous allocation, but this does not explain the offset vector behavior. Also, if the algorithm is, indeed, "allocate the last amount", then the algorithm creates a ratchet effect: the vector will only grow in size over batches, retaining the largest size yet seen. Spikes in data can result in wasted space (low density batches.) This is something to investigate elsewhere. Further, the logic does not seem to work with the offset vector, causing reallocations. Something else to investigate. The run takes about an hour. Good news, I can reproduce the problem on the Mac: {code} Starting merge phase. Runs = 62, Alloc. memory = 0 End of sort. Total write bytes: 82135266715, Total read bytes: 82135266715 Unable to allocate buffer of size 16777216 (rounded from 15834000) due to memory limit. Current allocation: 525809920 {code} was (Author: paul.rogers): On the Mac, single-threaded, the batch size is much different, resulting in different behavior: {code} Records: 8096, Total size: 678068224, Data size: 389425696, Gross row width: 83754, Net row width: 48101, Density: 1%} Insufficient memory to merge two batches. Incoming batch size: 678068224, available memory: 482344960 {code} This is because 8096 is the default number of records in the text reader: {code} MAX_RECORDS_PER_BATCH = 8096; {code} To recreate the original case, changed the hard-coded batch size to 1023. The query now runs: {code} Config: spill file size = 268435456, spill batch size = 1048576, merge batch size = 16777216, mSort batch size = 65535 Memory config: Allocator limit = 482344960 Config: Resetting allocator to 10% safety margin: 530579456 Records: 1023, Total size: 86353920, Data size: 49207323, Gross row width: 84413, Net row width: 48101, Density: 8%} ... Input Batch Estimates: record size = 48101 bytes; net = 86353920 bytes, gross = 129530880, records = 1023 Spill batch size: net = 4810100 bytes, gross = 7215150 bytes, records = 100; spill file = 268435456 bytes Output batch size: net = 16739148 bytes, gross = 25108722 bytes, records = 348 Available memory: 482344960, buffer memory = 463104560, merge memory = 44884 {code} The above calcs are identical to that seen on the QA cluster. Note that the CSV reader is producing low-density batches: this is a problem to be resolved elsewhere. The log contains many of the following: {code} UInt4Vector - Reallocating vector [$offsets$(UINT4:REQUIRED)]. # of bytes: [4096] -> [8192] {code} This code seems to come from the text reader which relies on default vector allocation: {code} @Override public void allocate(Map
[jira] [Commented] (DRILL-5755) TOP_N_SORT operator does not free memory while running
[ https://issues.apache.org/jira/browse/DRILL-5755?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159298#comment-16159298 ] Timothy Farkas commented on DRILL-5755: --- The root cause of the issue is that there is a hyper batch which is the combination of a bunch of upstream batches. This hyper batch is purged every N windows as dictated by the drill.exec.sort.purge.threshold. There are two issues with this: * *drill.exec.sort.purge.threshold* is currently ill defined because there is no default defined for it. * I don't agree with the design as it's laid out in https://issues.apache.org/jira/browse/DRILL-385 I don't see why we could make the priority queue hold the records themselves not just the indices. It is more work, but if we did that we could eliminate the need for keeping a hyper batch that needs to be periodically purged. > TOP_N_SORT operator does not free memory while running > -- > > Key: DRILL-5755 > URL: https://issues.apache.org/jira/browse/DRILL-5755 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.11.0 >Reporter: Boaz Ben-Zvi >Assignee: Timothy Farkas > Attachments: 2658c253-20b6-db90-362a-139aae4a327e.sys.drill > > > The TOP_N_SORT operator should keep the top N rows while processing its > input, and free the memory used to hold all rows below the top N. > For example, the following query uses a table with 125M rows: > {code} > select row_count, sum(row_count), avg(double_field), max(double_rand), > count(float_rand) from dfs.`/data/tmp` group by row_count order by row_count > limit 30; > {code} > And failed with an OOM when each of the 3 TOP_N_SORT operators was holding > about 2.44 GB !! (see attached profile). It should take far less memory to > hold 30 rows !! -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (DRILL-5138) TopN operator on top of ~110 GB data set is very slow
[ https://issues.apache.org/jira/browse/DRILL-5138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159206#comment-16159206 ] Timothy Farkas commented on DRILL-5138: --- The memory leak definitely could contribute to performance issues > TopN operator on top of ~110 GB data set is very slow > - > > Key: DRILL-5138 > URL: https://issues.apache.org/jira/browse/DRILL-5138 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Reporter: Rahul Challapalli >Assignee: Timothy Farkas > > git.commit.id.abbrev=cf2b7c7 > No of cores : 23 > No of disks : 5 > DRILL_MAX_DIRECT_MEMORY="24G" > DRILL_MAX_HEAP="12G" > The below query ran for more than 4 hours and did not complete. The table is > ~110 GB > {code} > select * from catalog_sales order by cs_quantity, cs_wholesale_cost limit 1; > {code} > Physical Plan : > {code} > 00-00Screen : rowType = RecordType(ANY *): rowcount = 1.0, cumulative > cost = {1.00798629141E10 rows, 4.17594320691E10 cpu, 0.0 io, > 4.1287118487552E13 network, 0.0 memory}, id = 352 > 00-01 Project(*=[$0]) : rowType = RecordType(ANY *): rowcount = 1.0, > cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, > 4.1287118487552E13 network, 0.0 memory}, id = 351 > 00-02Project(T0¦¦*=[$0]) : rowType = RecordType(ANY T0¦¦*): rowcount > = 1.0, cumulative cost = {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, > 4.1287118487552E13 network, 0.0 memory}, id = 350 > 00-03 SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, ANY > cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = > {1.0079862914E10 rows, 4.1759432069E10 cpu, 0.0 io, 4.1287118487552E13 > network, 0.0 memory}, id = 349 > 00-04Limit(fetch=[1]) : rowType = RecordType(ANY T0¦¦*, ANY > cs_quantity, ANY cs_wholesale_cost): rowcount = 1.0, cumulative cost = > {1.0079862913E10 rows, 4.1759432068E10 cpu, 0.0 io, 4.1287118487552E13 > network, 0.0 memory}, id = 348 > 00-05 SingleMergeExchange(sort0=[1 ASC], sort1=[2 ASC]) : > rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost): > rowcount = 1.439980416E9, cumulative cost = {1.0079862912E10 rows, > 4.1759432064E10 cpu, 0.0 io, 4.1287118487552E13 network, 0.0 memory}, id = 347 > 01-01SelectionVectorRemover : rowType = RecordType(ANY T0¦¦*, > ANY cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative > cost = {8.639882496E9 rows, 3.0239588736E10 cpu, 0.0 io, 2.3592639135744E13 > network, 0.0 memory}, id = 346 > 01-02 TopN(limit=[1]) : rowType = RecordType(ANY T0¦¦*, ANY > cs_quantity, ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative > cost = {7.19990208E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 > network, 0.0 memory}, id = 345 > 01-03Project(T0¦¦*=[$0], cs_quantity=[$1], > cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, > ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = > {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, > 0.0 memory}, id = 344 > 01-04 HashToRandomExchange(dist0=[[$1]], dist1=[[$2]]) : > rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = > {5.759921664E9 rows, 2.879960832E10 cpu, 0.0 io, 2.3592639135744E13 network, > 0.0 memory}, id = 343 > 02-01UnorderedMuxExchange : rowType = RecordType(ANY > T0¦¦*, ANY cs_quantity, ANY cs_wholesale_cost, ANY > E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, cumulative cost = > {4.319941248E9 rows, 1.1519843328E10 cpu, 0.0 io, 0.0 network, 0.0 memory}, > id = 342 > 03-01 Project(T0¦¦*=[$0], cs_quantity=[$1], > cs_wholesale_cost=[$2], E_X_P_R_H_A_S_H_F_I_E_L_D=[hash32AsDouble($2, > hash32AsDouble($1))]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, ANY > cs_wholesale_cost, ANY E_X_P_R_H_A_S_H_F_I_E_L_D): rowcount = 1.439980416E9, > cumulative cost = {2.879960832E9 rows, 1.0079862912E10 cpu, 0.0 io, 0.0 > network, 0.0 memory}, id = 341 > 03-02Project(T0¦¦*=[$0], cs_quantity=[$1], > cs_wholesale_cost=[$2]) : rowType = RecordType(ANY T0¦¦*, ANY cs_quantity, > ANY cs_wholesale_cost): rowcount = 1.439980416E9, cumulative cost = > {1.439980416E9 rows, 4.319941248E9 cpu, 0.0 io, 0.0 network, 0.0 memory}, id > = 340 > 03-03 Scan(groupscan=[ParquetGroupScan > [entries=[ReadEntryWithPath > [path=maprfs:///drill/testdata/tpcds/parquet/sf1000/catalog_sales]], > selectionRoot=maprfs:/drill/testdata/tpcds/parquet/sf1000/catalog_sales, > numFiles=1, usedMetadataFile=false,
[jira] [Commented] (DRILL-5657) Implement size-aware result set loader
[ https://issues.apache.org/jira/browse/DRILL-5657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16159016#comment-16159016 ] ASF GitHub Bot commented on DRILL-5657: --- Github user bitblender commented on a diff in the pull request: https://github.com/apache/drill/pull/914#discussion_r137852118 --- Diff: exec/java-exec/src/main/java/org/apache/drill/exec/physical/rowSet/impl/package-info.java --- @@ -0,0 +1,295 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +/** + * Handles the details of the result set loader implementation. + * + * The primary purpose of this loader, and the most complex to understand and + * maintain, is overflow handling. + * + * Detailed Use Cases + * + * Let's examine it by considering a number of + * use cases. + * + * Rowabcdefgh + * n-2XX-- + * n-1 -- + * n X!O O O + * + * Here: + * + * n-2, n-1, and n are rows. n is the overflow row. + * X indicates a value was written before overflow. + * Blank indicates no value was written in that row. + * ! indicates the value that triggered overflow. + * - indicates a column that did not exist prior to overflow. --- End diff -- What does an 'O' value mean in the diagram above? > Implement size-aware result set loader > -- > > Key: DRILL-5657 > URL: https://issues.apache.org/jira/browse/DRILL-5657 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: Future >Reporter: Paul Rogers >Assignee: Paul Rogers > Fix For: Future > > > A recent extension to Drill's set of test tools created a "row set" > abstraction to allow us to create, and verify, record batches with very few > lines of code. Part of this work involved creating a set of "column > accessors" in the vector subsystem. Column readers provide a uniform API to > obtain data from columns (vectors), while column writers provide a uniform > writing interface. > DRILL-5211 discusses a set of changes to limit value vectors to 16 MB in size > (to avoid memory fragmentation due to Drill's two memory allocators.) The > column accessors have proven to be so useful that they will be the basis for > the new, size-aware writers used by Drill's record readers. > A step in that direction is to retrofit the column writers to use the > size-aware {{setScalar()}} and {{setArray()}} methods introduced in > DRILL-5517. > Since the test framework row set classes are (at present) the only consumer > of the accessors, those classes must also be updated with the changes. > This then allows us to add a new "row mutator" class that handles size-aware > vector writing, including the case in which a vector fills in the middle of a > row. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (DRILL-5777) Oracle JDBC Error while access synonym
Sudhir Kumar created DRILL-5777: --- Summary: Oracle JDBC Error while access synonym Key: DRILL-5777 URL: https://issues.apache.org/jira/browse/DRILL-5777 Project: Apache Drill Issue Type: Bug Components: Client - Java Affects Versions: 1.10.0 Reporter: Sudhir Kumar Error while accessing individual column in oracle table accessing via synonym. Query : select from .. Error: 1 2017-09-08 10:13:46,451 [264d3035-1605-9f5b-084f-09a1b525ef75:foreman] INFO o.a.d.exec.planner.sql.SqlConverter - User Error Occurred: From line 1, column 8 to line 1, column 17: Column not found in any table (From line 1, column 8 to line 1, column 17: Column not found in any table) org.apache.drill.common.exceptions.UserException: VALIDATION ERROR: From line 1, column 8 to line 1, column 17: Column not found in any table SQL Query null [Error Id: 2b7c7c2d-664e-4c90-ba20-67509de90f09 ] at org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:544) ~[drill-common-1.10.0.jar:1.10.0] at org.apache.drill.exec.planner.sql.SqlConverter.validate(SqlConverter.java:178) [drill-java-exec-1.10.0.jar:1.10.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode(DefaultSqlHandler.java:622) [drill-java-exec-1.10.0.jar:1.10.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert(DefaultSqlHandler.java:192) [drill-java-exec-1.10.0.jar:1.10.0] at org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan(DefaultSqlHandler.java:164) [drill-java-exec-1.10.0.jar:1.10.0] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan(DrillSqlWorker.java:131) [drill-java-exec-1.10.0.jar:1.10.0] at org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan(DrillSqlWorker.java:79) [drill-java-exec-1.10.0.jar:1.10.0] at org.apache.drill.exec.work.foreman.Foreman.runSQL(Foreman.java:1050) [drill-java-exec-1.10.0.jar:1.10.0] at org.apache.drill.exec.work.foreman.Foreman.run(Foreman.java:281) [drill-java-exec-1.10.0.jar:1.10.0] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [na:1.8.0_92] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [na:1.8.0_92] at java.lang.Thread.run(Thread.java:745) [na:1.8.0_92] Caused by: org.apache.calcite.runtime.CalciteContextException: From line 1, column 8 to line 1, column 17: Column 'TABLE_NAME' not found in any table at sun.reflect.GeneratedConstructorAccessor67.newInstance(Unknown Source) ~[na:na] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[na:1.8.0_92] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[na:1.8.0_92] at org.apache.calcite.runtime.Resources$ExInstWithCause.ex(Resources.java:405) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:765) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlUtil.newContextException(SqlUtil.java:753) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError(SqlValidatorImpl.java:3974) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.EmptyScope.findQualifyingTableName(EmptyScope.java:108) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.DelegatingScope.findQualifyingTableName(DelegatingScope.java:112) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.ListScope.findQualifyingTableName(ListScope.java:150) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.DelegatingScope.fullyQualify(DelegatingScope.java:154) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:4460) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.SqlValidatorImpl$Expander.visit(SqlValidatorImpl.java:4440) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.SqlIdentifier.accept(SqlIdentifier.java:274) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.SqlValidatorImpl.expand(SqlValidatorImpl.java:4148) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at org.apache.calcite.sql.validate.SqlValidatorImpl.expandSelectItem(SqlValidatorImpl.java:420) ~[calcite-core-1.4.0-drill-r19.jar:1.4.0-drill-r19] at
[jira] [Closed] (DRILL-5776) Authentication is not performed when updating SYSTEM option from REST api
[ https://issues.apache.org/jira/browse/DRILL-5776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Farkas closed DRILL-5776. - Resolution: Invalid I incorrectly filed this bug, it is not an issue. > Authentication is not performed when updating SYSTEM option from REST api > - > > Key: DRILL-5776 > URL: https://issues.apache.org/jira/browse/DRILL-5776 > Project: Apache Drill > Issue Type: Bug >Reporter: Timothy Farkas > > Authentication is not performed when authentication is enabled and a SYSTEM > option is updated from the rest api. -- This message was sent by Atlassian JIRA (v6.4.14#64029)