[jira] [Commented] (DRILL-8480) Cleanup before finished. 0 out of 1 streams have finished
[ https://issues.apache.org/jira/browse/DRILL-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831903#comment-17831903 ] ASF GitHub Bot commented on DRILL-8480: --- rymarm opened a new pull request, #2897: URL: https://github.com/apache/drill/pull/2897 # [DRILL-8480](https://issues.apache.org/jira/browse/DRILL-8480): Make Nested Loop Join operator properly process empty batches and batches with new schema ## Description Nested Loop Join operator (`NestedLoopJoinBatch`, `NestedLoopJoin`) unproperly handles batch iteration outcome `OK` with 0 records. Drill design of the processing of batches involves 5 states: * `NONE` (batch can have only 0 records) * `OK` (batch can have 0+ records) * `OK_NEW_SCHEMA` (batch can have 0+ records) * `NOT_YET` (undefined) * `EMIT` (batch can have 0+ records) The Nested Loop Join operator in some circumstances could receive `OK` outcome with 0 records, and instead of requesting the next batch, the operator stops data processing and returns `NONE` outcome to upstream batches(operators) without freeing resources of underlying batches. ## Documentation - ## Testing Manual testing with a file from the Jira ticket [DRILL-8480](https://issues.apache.org/jira/browse/DRILL-8480) > Cleanup before finished. 0 out of 1 streams have finished > - > > Key: DRILL-8480 > URL: https://issues.apache.org/jira/browse/DRILL-8480 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Maksym Rymar >Assignee: Maksym Rymar >Priority: Major > Attachments: 1a349ff1-d1f9-62bf-ed8c-26346c548005.sys.drill, > tableWithNumber2.parquet > > > Drill fails to execute a query with the following exception: > {code:java} > org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: > IllegalStateException: Cleanup before finished. 0 out of 1 streams have > finished > Fragment: 1:0 > Please, refer to logs for more information. > [Error Id: 270da8f4-0bb6-4985-bf4f-34853138881c on > compute7.vmcluster.com:31010] > at > org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:395) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:245) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:362) > at > org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at java.base/java.lang.Thread.run(Thread.java:829) > Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of > 1 streams have finished > at > org.apache.drill.exec.work.batch.BaseRawBatchBuffer.close(BaseRawBatchBuffer.java:111) > at > org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91) > at > org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:71) > at > org.apache.drill.exec.work.batch.AbstractDataCollector.close(AbstractDataCollector.java:121) > at > org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91) > at > org.apache.drill.exec.work.batch.IncomingBuffers.close(IncomingBuffers.java:144) > at > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581) > at > org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:567) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:417) > at > org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:240) > ... 5 common frames omitted > Suppressed: java.lang.IllegalStateException: Cleanup before finished. > 0 out of 1 streams have finished > ... 15 common frames omitted > Suppressed: java.lang.IllegalStateException: Memory was leaked by > query. Memory leaked: (32768) > Allocator(op:1:0:8:UnorderedReceiver) 100/32768/32768/100 > (res/actual/peak/limit) > at > org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519) > at > org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159) > at > org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77) > at > org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.ja
[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container
[ https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831155#comment-17831155 ] ASF GitHub Bot commented on DRILL-8484: --- shfshihuafeng commented on PR #2889: URL: https://github.com/apache/drill/pull/2889#issuecomment-2021914479 > @shfshihuafeng Can you please resolve merge conflicts. it is done > HashJoinPOP memory leak is caused by an oom exception when read data from > Stream with container > - > > Key: DRILL-8484 > URL: https://issues.apache.org/jira/browse/DRILL-8484 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > > *Describe the bug* > An oom exception occurred When read data from Stream with container > ,resulting in hashJoinPOP memory leak > *To Reproduce* > prepare data for tpch 1s > # 30 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *leak info* > {code:java} > Allocator(frag:5:0) 500/100/31067136/40041943040 > (res/actual/peak/limit) > child allocators: 1 > Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 > (res/actual/peak/limit) > child allocators: 0 > ledgers: 2 > ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, > size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: > [1703465, life: 16936270178813617..0] holds 4 buffers. > DrillBuf[2041995], udle: [1703441 0..957]{code} > {code:java} > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container
[ https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830942#comment-17830942 ] ASF GitHub Bot commented on DRILL-8484: --- cgivre commented on PR #2889: URL: https://github.com/apache/drill/pull/2889#issuecomment-2020523775 @shfshihuafeng Can you please resolve merge conflicts. > HashJoinPOP memory leak is caused by an oom exception when read data from > Stream with container > - > > Key: DRILL-8484 > URL: https://issues.apache.org/jira/browse/DRILL-8484 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > > *Describe the bug* > An oom exception occurred When read data from Stream with container > ,resulting in hashJoinPOP memory leak > *To Reproduce* > prepare data for tpch 1s > # 30 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *leak info* > {code:java} > Allocator(frag:5:0) 500/100/31067136/40041943040 > (res/actual/peak/limit) > child allocators: 1 > Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 > (res/actual/peak/limit) > child allocators: 0 > ledgers: 2 > ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, > size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: > [1703465, life: 16936270178813617..0] holds 4 buffers. > DrillBuf[2041995], udle: [1703441 0..957]{code} > {code:java} > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream
[ https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830330#comment-17830330 ] ASF GitHub Bot commented on DRILL-8485: --- shfshihuafeng commented on PR #2891: URL: https://github.com/apache/drill/pull/2891#issuecomment-2017129746 > LGTM +1 Thanks @shfshihuafeng for all these memory leak fixes. I am honored to get your approved. > HashJoinPOP memory leak is caused by an oom exception when read data from > InputStream > - > > Key: DRILL-8485 > URL: https://issues.apache.org/jira/browse/DRILL-8485 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.1 > > > when traversing fieldList druing read data from InputStream, if the > intermediate process throw exception,we can not release previously > constructed vectors. it result in memory leak。 > it is similar to DRILL-8484 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream
[ https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830305#comment-17830305 ] ASF GitHub Bot commented on DRILL-8485: --- cgivre merged PR #2891: URL: https://github.com/apache/drill/pull/2891 > HashJoinPOP memory leak is caused by an oom exception when read data from > InputStream > - > > Key: DRILL-8485 > URL: https://issues.apache.org/jira/browse/DRILL-8485 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.1 > > > when traversing fieldList druing read data from InputStream, if the > intermediate process throw exception,we can not release previously > constructed vectors. it result in memory leak。 > it is similar to DRILL-8484 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream
[ https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830261#comment-17830261 ] ASF GitHub Bot commented on DRILL-8485: --- shfshihuafeng opened a new pull request, #2891: URL: https://github.com/apache/drill/pull/2891 …n read data from InputStream # [DRILL-8485](https://issues.apache.org/jira/browse/DRILL-8485): HashJoinPOP memory leak is caused by an oom exception when read data from InputStream ## Description it is similar to [DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484) **exception info** ``` Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 16384 (rounded from 15364) due to memory limit (41943040). Current allocation: 4337664 at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241) at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216) at org.apache.drill.exec.memory.BaseAllocator.read(BaseAllocator.java:856) ``` **leak info** ``` Allocator(frag:5:1) 500/100/27824128/40041943040 (res/actual/peak/limit) child allocators: 1 Allocator(op:5:1:1:HashJoinPOP) 100/16384/22822912/41943040 (res/actual/peak/limit) child allocators: 0 ledgers: 2 ledger[442780] allocator: op:5:1:1:HashJoinPOP), isOwning: true, size: 8192, references: 2, life: 4486836603491..0, allocatorManager: [390894, life: 4486836601180..0] holds 4 buffers. DrillBuf[458469], udle: [390895 1024..8192] event log for: DrillBuf[458469] ``` ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing The testing method for drill-8485 is the similar as for [DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484). we can throw exception in the method readVectors > HashJoinPOP memory leak is caused by an oom exception when read data from > InputStream > - > > Key: DRILL-8485 > URL: https://issues.apache.org/jira/browse/DRILL-8485 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.1 > > > when traversing fieldList druing read data from InputStream, if the > intermediate process throw exception,we can not release previously > constructed vectors. it result in memory leak。 > it is similar to DRILL-8484 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container
[ https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828185#comment-17828185 ] ASF GitHub Bot commented on DRILL-8484: --- shfshihuafeng commented on code in PR #2889: URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261 ## exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java: ## @@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer myContainer, InputStream for (SerializedField metaData : fieldList) { final int dataLength = metaData.getBufferLength(); final MaterializedField field = MaterializedField.create(metaData); - final DrillBuf buf = allocator.buffer(dataLength); - final ValueVector vector; + DrillBuf buf = null; + ValueVector vector = null; try { +buf = allocator.buffer(dataLength); buf.writeBytes(input, dataLength); vector = TypeHelper.getNewVector(field, allocator); vector.load(metaData, buf); + } catch (OutOfMemoryException oom) { +for (ValueVector valueVector : vectorList) { + valueVector.clear(); +} +throw UserException.memoryError(oom).message("Allocator memory failed").build(logger); Review Comment: when we prepare to allocator memory using "allocator.buffer(dataLength)" for hashjoinPop allocator, if actual memory > maxAllocation(The parameter is calculated by call computeOperatorMemory) ,then it throw exception, like following my test。 user can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or reduce concurrency based on actual conditions. **throw exception code** ``` public DrillBuf buffer(final int initialRequestSize, BufferManager manager) { assertOpen(); AllocationOutcome outcome = allocateBytes(actualRequestSize); if (!outcome.isOk()) { throw new OutOfMemoryException(createErrorMsg(this, actualRequestSize, initialRequestSize)); } ``` **my test scenario** ``` Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 16384 (rounded from 14359) due to memory limit (41943040). Current allocation: 22583616 at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241) at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216) at org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172) ``` > HashJoinPOP memory leak is caused by an oom exception when read data from > Stream with container > - > > Key: DRILL-8484 > URL: https://issues.apache.org/jira/browse/DRILL-8484 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > > *Describe the bug* > An oom exception occurred When read data from Stream with container > ,resulting in hashJoinPOP memory leak > *To Reproduce* > prepare data for tpch 1s > # 30 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *leak info* > {code:java} > Allocator(frag:5:0) 500/100/31067136/40041943040 > (res/actual/peak/limit) > child allocators: 1 > Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 > (res/actual/peak/limit) > child allocators: 0 > ledgers: 2 > ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, > size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: > [1703465, life: 16936270178813617..0] holds 4 buffers. > DrillBuf[2041995], udle: [1703441 0..957]{code} > {code:java} > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container
[ https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828184#comment-17828184 ] ASF GitHub Bot commented on DRILL-8484: --- shfshihuafeng commented on code in PR #2889: URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261 ## exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java: ## @@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer myContainer, InputStream for (SerializedField metaData : fieldList) { final int dataLength = metaData.getBufferLength(); final MaterializedField field = MaterializedField.create(metaData); - final DrillBuf buf = allocator.buffer(dataLength); - final ValueVector vector; + DrillBuf buf = null; + ValueVector vector = null; try { +buf = allocator.buffer(dataLength); buf.writeBytes(input, dataLength); vector = TypeHelper.getNewVector(field, allocator); vector.load(metaData, buf); + } catch (OutOfMemoryException oom) { +for (ValueVector valueVector : vectorList) { + valueVector.clear(); +} +throw UserException.memoryError(oom).message("Allocator memory failed").build(logger); Review Comment: when we prepare to allocator memory using "allocator.buffer(dataLength)" for hashjoinPop allocator, if actualmemory > maxAllocation(The parameter is calculated by call computeOperatorMemory) ,then it throw exception, like following my test。 user can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or reduce concurrency based on actual conditions. ``` Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 16384 (rounded from 14359) due to memory limit (41943040). Current allocation: 22583616 at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241) at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216) at org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172) ``` ## exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java: ## @@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer myContainer, InputStream for (SerializedField metaData : fieldList) { final int dataLength = metaData.getBufferLength(); final MaterializedField field = MaterializedField.create(metaData); - final DrillBuf buf = allocator.buffer(dataLength); - final ValueVector vector; + DrillBuf buf = null; + ValueVector vector = null; try { +buf = allocator.buffer(dataLength); buf.writeBytes(input, dataLength); vector = TypeHelper.getNewVector(field, allocator); vector.load(metaData, buf); + } catch (OutOfMemoryException oom) { +for (ValueVector valueVector : vectorList) { + valueVector.clear(); +} +throw UserException.memoryError(oom).message("Allocator memory failed").build(logger); Review Comment: when we prepare to allocator memory using "allocator.buffer(dataLength)" for hashjoinPop allocator, if actual memory > maxAllocation(The parameter is calculated by call computeOperatorMemory) ,then it throw exception, like following my test。 user can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or reduce concurrency based on actual conditions. ``` Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 16384 (rounded from 14359) due to memory limit (41943040). Current allocation: 22583616 at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241) at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216) at org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172) ``` > HashJoinPOP memory leak is caused by an oom exception when read data from > Stream with container > - > > Key: DRILL-8484 > URL: https://issues.apache.org/jira/browse/DRILL-8484 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > > *Describe the bug* > An oom exception occurred When read data from Stream with container > ,resulting in hashJoinPOP memory leak > *To Reproduce* > prepare data for tpch 1s > # 30 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMe
[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container
[ https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828183#comment-17828183 ] ASF GitHub Bot commented on DRILL-8484: --- shfshihuafeng commented on code in PR #2889: URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261 ## exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java: ## @@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer myContainer, InputStream for (SerializedField metaData : fieldList) { final int dataLength = metaData.getBufferLength(); final MaterializedField field = MaterializedField.create(metaData); - final DrillBuf buf = allocator.buffer(dataLength); - final ValueVector vector; + DrillBuf buf = null; + ValueVector vector = null; try { +buf = allocator.buffer(dataLength); buf.writeBytes(input, dataLength); vector = TypeHelper.getNewVector(field, allocator); vector.load(metaData, buf); + } catch (OutOfMemoryException oom) { +for (ValueVector valueVector : vectorList) { + valueVector.clear(); +} +throw UserException.memoryError(oom).message("Allocator memory failed").build(logger); Review Comment: when we prepare to allocator memory using "allocator.buffer(dataLength)" for hashjoinPop allocator, if memory > maxAllocation(The parameter is calculated by call computeOperatorMemory) ,then it throw exception, like following my test。 user can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or reduce concurrency based on actual conditions. ``` Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 16384 (rounded from 14359) due to memory limit (41943040). Current allocation: 22583616 at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241) at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216) at org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172) ``` > HashJoinPOP memory leak is caused by an oom exception when read data from > Stream with container > - > > Key: DRILL-8484 > URL: https://issues.apache.org/jira/browse/DRILL-8484 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > > *Describe the bug* > An oom exception occurred When read data from Stream with container > ,resulting in hashJoinPOP memory leak > *To Reproduce* > prepare data for tpch 1s > # 30 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *leak info* > {code:java} > Allocator(frag:5:0) 500/100/31067136/40041943040 > (res/actual/peak/limit) > child allocators: 1 > Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 > (res/actual/peak/limit) > child allocators: 0 > ledgers: 2 > ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, > size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: > [1703465, life: 16936270178813617..0] holds 4 buffers. > DrillBuf[2041995], udle: [1703441 0..957]{code} > {code:java} > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container
[ https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828182#comment-17828182 ] ASF GitHub Bot commented on DRILL-8484: --- shfshihuafeng commented on code in PR #2889: URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261 ## exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java: ## @@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer myContainer, InputStream for (SerializedField metaData : fieldList) { final int dataLength = metaData.getBufferLength(); final MaterializedField field = MaterializedField.create(metaData); - final DrillBuf buf = allocator.buffer(dataLength); - final ValueVector vector; + DrillBuf buf = null; + ValueVector vector = null; try { +buf = allocator.buffer(dataLength); buf.writeBytes(input, dataLength); vector = TypeHelper.getNewVector(field, allocator); vector.load(metaData, buf); + } catch (OutOfMemoryException oom) { +for (ValueVector valueVector : vectorList) { + valueVector.clear(); +} +throw UserException.memoryError(oom).message("Allocator memory failed").build(logger); Review Comment: when we prepare to allocator memory using "allocator.buffer(dataLength)" for hashjoinPop allocator, if memory > maxAllocation(The parameter is calculated by call computeOperatorMemory) then it throw exception, like following my test。user can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or reduce concurrency based on actual conditions. ``` Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to allocate buffer of size 16384 (rounded from 14359) due to memory limit (41943040). Current allocation: 22583616 at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241) at org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216) at org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172) ``` > HashJoinPOP memory leak is caused by an oom exception when read data from > Stream with container > - > > Key: DRILL-8484 > URL: https://issues.apache.org/jira/browse/DRILL-8484 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > > *Describe the bug* > An oom exception occurred When read data from Stream with container > ,resulting in hashJoinPOP memory leak > *To Reproduce* > prepare data for tpch 1s > # 30 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *leak info* > {code:java} > Allocator(frag:5:0) 500/100/31067136/40041943040 > (res/actual/peak/limit) > child allocators: 1 > Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 > (res/actual/peak/limit) > child allocators: 0 > ledgers: 2 > ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, > size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: > [1703465, life: 16936270178813617..0] holds 4 buffers. > DrillBuf[2041995], udle: [1703441 0..957]{code} > {code:java} > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container
[ https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828026#comment-17828026 ] ASF GitHub Bot commented on DRILL-8484: --- cgivre commented on code in PR #2889: URL: https://github.com/apache/drill/pull/2889#discussion_r1528802856 ## exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java: ## @@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer myContainer, InputStream for (SerializedField metaData : fieldList) { final int dataLength = metaData.getBufferLength(); final MaterializedField field = MaterializedField.create(metaData); - final DrillBuf buf = allocator.buffer(dataLength); - final ValueVector vector; + DrillBuf buf = null; + ValueVector vector = null; try { +buf = allocator.buffer(dataLength); buf.writeBytes(input, dataLength); vector = TypeHelper.getNewVector(field, allocator); vector.load(metaData, buf); + } catch (OutOfMemoryException oom) { +for (ValueVector valueVector : vectorList) { + valueVector.clear(); +} +throw UserException.memoryError(oom).message("Allocator memory failed").build(logger); Review Comment: Do we know what would cause an error like this? If so what would the user need to do to fix this? > HashJoinPOP memory leak is caused by an oom exception when read data from > Stream with container > - > > Key: DRILL-8484 > URL: https://issues.apache.org/jira/browse/DRILL-8484 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > > *Describe the bug* > An oom exception occurred When read data from Stream with container > ,resulting in hashJoinPOP memory leak > *To Reproduce* > prepare data for tpch 1s > # 30 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *leak info* > {code:java} > Allocator(frag:5:0) 500/100/31067136/40041943040 > (res/actual/peak/limit) > child allocators: 1 > Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 > (res/actual/peak/limit) > child allocators: 0 > ledgers: 2 > ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, > size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: > [1703465, life: 16936270178813617..0] holds 4 buffers. > DrillBuf[2041995], udle: [1703441 0..957]{code} > {code:java} > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container
[ https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827949#comment-17827949 ] ASF GitHub Bot commented on DRILL-8484: --- shfshihuafeng opened a new pull request, #2889: URL: https://github.com/apache/drill/pull/2889 …en read data from Stream with container # [DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484): HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container ## Description ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing You can add debugging code to reproduce this scenario as following or test tpch like [drill8483](https://github.com/apache/drill/pull/2888) **(1) debug code** ``` public void readFromStreamWithContainer(VectorContainer myContainer, InputStream input) throws IOException { final VectorContainer container = new VectorContainer(); final UserBitShared.RecordBatchDef batchDef = UserBitShared.RecordBatchDef.parseDelimitedFrom(input); recordCount = batchDef.getRecordCount(); if (batchDef.hasCarriesTwoByteSelectionVector() && batchDef.getCarriesTwoByteSelectionVector()) { if (sv2 == null) { sv2 = new SelectionVector2(allocator); } sv2.allocateNew(recordCount * SelectionVector2.RECORD_SIZE); sv2.getBuffer().setBytes(0, input, recordCount * SelectionVector2.RECORD_SIZE); svMode = BatchSchema.SelectionVectorMode.TWO_BYTE; } final List vectorList = Lists.newArrayList(); final List fieldList = batchDef.getFieldList(); int i = 0; for (SerializedField metaData : fieldList) { i++; final int dataLength = metaData.getBufferLength(); final MaterializedField field = MaterializedField.create(metaData); final DrillBuf buf = allocator.buffer(dataLength); ValueVector vector = null; try { buf.writeBytes(input, dataLength); vector = TypeHelper.getNewVector(field, allocator); if (i == 3) { logger.warn("shf test memory except"); throw new OutOfMemoryException("test memory except"); } vector.load(metaData, buf); } catch (Exception e) { if (vectorList.size() > 0 ) { for (ValueVector valueVector : vectorList) { DrillBuf[] buffers = valueVector.getBuffers(false); logger.warn("shf leak buffers " + Arrays.asList(buffers)); // valueVector.clear(); } } throw e; } finally { buf.release(); } vectorList.add(vector); } ``` **(2) run following sql (tpch8)** ``` select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as mkt_share from ( select extract(year from o_orderdate) as o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as all_nations group by o_year order by o_year; ``` **(3) you find memory leak ,but there is no sql** https://github.com/apache/drill/assets/25974968/e716ab12-4eeb-4a69-9c0f-07664bcb80a4";> > HashJoinPOP memory leak is caused by an oom exception when read data from > Stream with container > - > > Key: DRILL-8484 > URL: https://issues.apache.org/jira/browse/DRILL-8484 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > > *Describe the bug* > An oom exception occurred When read data from Stream with container > ,resulting in hashJoinPOP memory leak > *To Reproduce* > prepare data for tpch 1s > # 30 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *leak info* > {code:java} > Allocator(frag:5:0) 500/100/31067136/40041943040 > (res/actual/peak/limit) > child allocators: 1 > Allocator(op:5:0:1:HashJoinPOP) 100/16384/2282
[jira] [Commented] (DRILL-8483) SpilledRecordBatch memory leak when the program threw an exception during the process of building a hash table
[ https://issues.apache.org/jira/browse/DRILL-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824832#comment-17824832 ] ASF GitHub Bot commented on DRILL-8483: --- shfshihuafeng opened a new pull request, #2888: URL: https://github.com/apache/drill/pull/2888 …exception during the process of building a hash table (#2887) # [DRILL-8483](https://issues.apache.org/jira/browse/DRILL-8483): SpilledRecordBatch memory leak when the program threw an exception during the process of building a hash table (Please replace `PR Title` with actual PR Title) ## Description During the process of reading data from disk to building hash tables in memory, if an exception is thrown, it will result in a memory SpilledRecordBatch leak ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing prepare data for tpch 1s 1. 30 concurrent for tpch sql8 2. set direct memory 5g 3. when it had OutOfMemoryException , stopped all sql. 4. finding memory leak test script ``` random_sql(){ #for i in `seq 1 3` while true do num=$((RANDOM%22+1)) if [ -f $fileName ]; then echo "$fileName" " is exit" exit 0 else $drill_home/sqlline -u \"jdbc:drillr:zk=ip:2181/drillbits_shf\" -f tpch_sql8.sql >> sql8.log 2>&1 fi done } main(){ #sleep 2h #TPCH power test for i in `seq 1 30` do random_sql & done } ``` > SpilledRecordBatch memory leak when the program threw an exception during the > process of building a hash table > -- > > Key: DRILL-8483 > URL: https://issues.apache.org/jira/browse/DRILL-8483 > Project: Apache Drill > Issue Type: Bug > Components: Server >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > > During the process of reading data from disk to building hash tables in > memory, if an exception is thrown, it will result in a memory > SpilledRecordBatch leak > exception log as following > {code:java} > Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to > allocate buffer of size 8192 due to memory limit (41943040). Current > allocation: 3684352 > at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241) > at > org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216) > at > org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:411) > at > org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:270) > at > org.apache.drill.exec.physical.impl.common.HashPartition.allocateNewVectorContainer(HashPartition.java:215) > at > org.apache.drill.exec.physical.impl.common.HashPartition.allocateNewCurrentBatchAndHV(HashPartition.java:238) > at > org.apache.drill.exec.physical.impl.common.HashPartition.(HashPartition.java:165){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823852#comment-17823852 ] ASF GitHub Bot commented on DRILL-8479: --- cgivre merged PR #2878: URL: https://github.com/apache/drill/pull/2878 > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823846#comment-17823846 ] ASF GitHub Bot commented on DRILL-8479: --- shfshihuafeng commented on code in PR #2878: URL: https://github.com/apache/drill/pull/2878#discussion_r1513773463 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java: ## @@ -297,7 +297,14 @@ public void close() { batchMemoryManager.getAvgOutputRowWidth(), batchMemoryManager.getTotalOutputRecords()); super.close(); -leftIterator.close(); +try { + leftIterator.close(); +} catch (Exception e) { Review Comment: stack ``` Caused by: org.apache.drill.exec.ops.QueryCancelledException: null at org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533) at org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165) at org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359) at org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365) at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301) ``` > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823843#comment-17823843 ] ASF GitHub Bot commented on DRILL-8479: --- shfshihuafeng commented on code in PR #2878: URL: https://github.com/apache/drill/pull/2878#discussion_r1513768876 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java: ## @@ -297,7 +297,14 @@ public void close() { batchMemoryManager.getAvgOutputRowWidth(), batchMemoryManager.getTotalOutputRecords()); super.close(); -leftIterator.close(); +try { + leftIterator.close(); +} catch (Exception e) { + rightIterator.close(); + throw UserException.executionError(e) Review Comment: it throw exception from method clearInflightBatches() , but it has cleared the memory by clear(); so it does not affect memory leaks ,see following code ` public void close() { clear(); clearInflightBatches(); }` > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823842#comment-17823842 ] ASF GitHub Bot commented on DRILL-8479: --- shfshihuafeng commented on code in PR #2878: URL: https://github.com/apache/drill/pull/2878#discussion_r1513766703 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java: ## @@ -297,7 +297,14 @@ public void close() { batchMemoryManager.getAvgOutputRowWidth(), batchMemoryManager.getTotalOutputRecords()); super.close(); -leftIterator.close(); +try { + leftIterator.close(); +} catch (Exception e) { Review Comment: add exception info ? ``` try { leftIterator.close(); } catch (QueryCancelledException qce) { throw UserException.executionError(qce) .message("Failed when depleting incoming batches, probably because query was cancelled " + "by executor had some error") .build(logger); } catch (Exception e) { throw UserException.internalError(e) .message("Failed when depleting incoming batches") .build(logger); } finally { // todo catch exception info or By default,the exception is thrown directly ? rightIterator.close(); } ``` > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823643#comment-17823643 ] ASF GitHub Bot commented on DRILL-8479: --- cgivre commented on code in PR #2878: URL: https://github.com/apache/drill/pull/2878#discussion_r1512908376 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java: ## @@ -297,7 +297,14 @@ public void close() { batchMemoryManager.getAvgOutputRowWidth(), batchMemoryManager.getTotalOutputRecords()); super.close(); -leftIterator.close(); +try { + leftIterator.close(); +} catch (Exception e) { + rightIterator.close(); + throw UserException.executionError(e) Review Comment: What happens if the right iterator doesn't close properly? > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823641#comment-17823641 ] ASF GitHub Bot commented on DRILL-8479: --- shfshihuafeng commented on code in PR #2878: URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java: ## @@ -297,7 +297,14 @@ public void close() { batchMemoryManager.getAvgOutputRowWidth(), batchMemoryManager.getTotalOutputRecords()); super.close(); -leftIterator.close(); +try { + leftIterator.close(); +} catch (Exception e) { Review Comment: @cgivre In my test case ,it throw "QueryCancelledException",because some minorfragment throw .OutOfMemoryException ,so it inform foreman failed. foreman send "QueryCancel" commands to other minorfragments. it throws QueryCancelledException after the method "incoming.next()" called method checkContinue() Although the "checkContinue" phase throws a fixed "QueryCancelledException" message, I am not sure what is causing it (In my test case ,OutOfMemoryException cause exception) ``` public void clearInflightBatches() { while (lastOutcome == IterOutcome.OK || lastOutcome == IterOutcome.OK_NEW_SCHEMA) { // Clear all buffers from incoming. for (VectorWrapper wrapper : incoming) { wrapper.getValueVector().clear(); } lastOutcome = incoming.next(); } } public void checkContinue() { if (!shouldContinue()) { throw new QueryCancelledException(); } } } ``` **stack** ``` Caused by: org.apache.drill.exec.ops.QueryCancelledException: null at org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533) at org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165) at org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359) at org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365) at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301) ``` > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- Thi
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823611#comment-17823611 ] ASF GitHub Bot commented on DRILL-8479: --- shfshihuafeng commented on code in PR #2878: URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java: ## @@ -297,7 +297,14 @@ public void close() { batchMemoryManager.getAvgOutputRowWidth(), batchMemoryManager.getTotalOutputRecords()); super.close(); -leftIterator.close(); +try { + leftIterator.close(); +} catch (Exception e) { Review Comment: @cgivre In my test case ,it throw "QueryCancelledException",because some minorfragment throw .OutOfMemoryException ,so it inform foreman failed. foreman send "QueryCancel" commands to other minorfragments. it throws QueryCancelledException after the method "incoming.next()" called method checkContinue() Although the "checkContinue" phase throws a fixed "QueryCancelledException" message, I am not sure what is causing it (In my test case ,OutOfMemoryException cause exception) ``` public void clearInflightBatches() { while (lastOutcome == IterOutcome.OK || lastOutcome == IterOutcome.OK_NEW_SCHEMA) { // Clear all buffers from incoming. for (VectorWrapper wrapper : incoming) { wrapper.getValueVector().clear(); } lastOutcome = incoming.next(); } } public void checkContinue() { if (!shouldContinue()) { throw new QueryCancelledException(); } } } ``` **stack** ``` Caused by: org.apache.drill.exec.ops.QueryCancelledException: null at org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533) at org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165) at org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359) at org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365) at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301) ``` > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- Thi
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823597#comment-17823597 ] ASF GitHub Bot commented on DRILL-8479: --- shfshihuafeng commented on code in PR #2878: URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java: ## @@ -297,7 +297,14 @@ public void close() { batchMemoryManager.getAvgOutputRowWidth(), batchMemoryManager.getTotalOutputRecords()); super.close(); -leftIterator.close(); +try { + leftIterator.close(); +} catch (Exception e) { Review Comment: @cgivre In my test case ,it throw "QueryCancelledException",because some minorfragment throw .OutOfMemoryException ,so it inform foreman failed. foreman send "QueryCancel" commands to other minorfragments. it throws QueryCancelledException after the method "incoming.next()" called method checkContinue() method Although the "checkContinue" phase throws a fixed "QueryCancelledException" message, I am not sure what is causing it (In my test case ,OutOfMemoryException cause exception) ``` public void clearInflightBatches() { while (lastOutcome == IterOutcome.OK || lastOutcome == IterOutcome.OK_NEW_SCHEMA) { // Clear all buffers from incoming. for (VectorWrapper wrapper : incoming) { wrapper.getValueVector().clear(); } lastOutcome = incoming.next(); } } public void checkContinue() { if (!shouldContinue()) { throw new QueryCancelledException(); } } } ``` **stack** ``` Caused by: org.apache.drill.exec.ops.QueryCancelledException: null at org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533) at org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105) at org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59) at org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165) at org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359) at org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365) at org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301) ``` > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > >
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823405#comment-17823405 ] ASF GitHub Bot commented on DRILL-8479: --- cgivre commented on code in PR #2878: URL: https://github.com/apache/drill/pull/2878#discussion_r1512093859 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java: ## @@ -297,7 +297,14 @@ public void close() { batchMemoryManager.getAvgOutputRowWidth(), batchMemoryManager.getTotalOutputRecords()); super.close(); -leftIterator.close(); +try { + leftIterator.close(); +} catch (Exception e) { Review Comment: Do we know what kind(s) of exceptions to expect here? Also, can we throw a better error message? Specifically, can we tell the user more information about the cause of the crash and how to fix it? > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node
[ https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822878#comment-17822878 ] ASF GitHub Bot commented on DRILL-8482: --- cgivre merged PR #2885: URL: https://github.com/apache/drill/pull/2885 > Assign region throw exception when some region is deployed on affinity node > and some on non-affinity node > - > > Key: DRILL-8482 > URL: https://issues.apache.org/jira/browse/DRILL-8482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HBase >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > Attachments: > 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch > > > *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe > the bug* > Assign region throw exception when some region is deployed on affinity > node and some on non-affinity node。 > *To Reproduce* > Steps to reproduce the behavior: > # > {code:java} > NavigableMap regionsToScan = Maps.newTreeMap(); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), > SERVER_A); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), > SERVER_A); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), > SERVER_B); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), > SERVER_B); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), > SERVER_D); > final List endpoints = Lists.newArrayList(); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build()); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build()); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build()); > HBaseGroupScan scan = new HBaseGroupScan(); > scan.setRegionsToScan(regionsToScan); > scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], > null)); > scan.applyAssignments(endpoints);{code} > *Expected behavior* > A has 3 regions > B has 2 regions > C has 3 regions > *Error detail, log output or screenshots* > {code:java} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node
[ https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822730#comment-17822730 ] ASF GitHub Bot commented on DRILL-8482: --- shfshihuafeng commented on PR #2885: URL: https://github.com/apache/drill/pull/2885#issuecomment-1974124079 @cgivre yes , when hbase region are distributed as follows , you select * from table , we do not get result. ``` NavigableMap regionsToScan = Maps.newTreeMap(); regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), SERVER_A); regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), SERVER_A); regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), SERVER_B); regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), SERVER_B); regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), SERVER_D); regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), SERVER_D); regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), SERVER_D); regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), SERVER_D); final List endpoints = Lists.newArrayList(); endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build()); endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build()); endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build()); ``` > Assign region throw exception when some region is deployed on affinity node > and some on non-affinity node > - > > Key: DRILL-8482 > URL: https://issues.apache.org/jira/browse/DRILL-8482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HBase >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > Attachments: > 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch > > > *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe > the bug* > Assign region throw exception when some region is deployed on affinity > node and some on non-affinity node。 > *To Reproduce* > Steps to reproduce the behavior: > # > {code:java} > NavigableMap regionsToScan = Maps.newTreeMap(); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), > SERVER_A); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), > SERVER_A); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), > SERVER_B); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), > SERVER_B); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), > SERVER_D); > final List endpoints = Lists.newArrayList(); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build()); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build()); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build()); > HBaseGroupScan scan = new HBaseGroupScan(); > scan.setRegionsToScan(regionsToScan); > scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], > null)); > scan.applyAssignments(endpoints);{code} > *Expected behavior* > A has 3 regions > B has 2 regions > C has 3 regions > *Error detail, log output or screenshots* > {code:java} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node
[ https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822598#comment-17822598 ] ASF GitHub Bot commented on DRILL-8482: --- cgivre commented on PR #2885: URL: https://github.com/apache/drill/pull/2885#issuecomment-1973318466 @shfshihuafeng Is this a bug? > Assign region throw exception when some region is deployed on affinity node > and some on non-affinity node > - > > Key: DRILL-8482 > URL: https://issues.apache.org/jira/browse/DRILL-8482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HBase >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > Attachments: > 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch > > > *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe > the bug* > Assign region throw exception when some region is deployed on affinity > node and some on non-affinity node。 > *To Reproduce* > Steps to reproduce the behavior: > # > {code:java} > NavigableMap regionsToScan = Maps.newTreeMap(); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), > SERVER_A); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), > SERVER_A); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), > SERVER_B); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), > SERVER_B); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), > SERVER_D); > final List endpoints = Lists.newArrayList(); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build()); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build()); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build()); > HBaseGroupScan scan = new HBaseGroupScan(); > scan.setRegionsToScan(regionsToScan); > scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], > null)); > scan.applyAssignments(endpoints);{code} > *Expected behavior* > A has 3 regions > B has 2 regions > C has 3 regions > *Error detail, log output or screenshots* > {code:java} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node
[ https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822459#comment-17822459 ] ASF GitHub Bot commented on DRILL-8482: --- shfshihuafeng opened a new pull request, #2885: URL: https://github.com/apache/drill/pull/2885 … on affinity node and some on non-affinity node # [DRILL-8482](https://issues.apache.org/jira/browse/DRILL-8482): Assign region throw exception when some region is deployed on affinity node and some on non-affinity node ## Description Assign region throw exception when some region is deployed on affinity node and some on non-affinity node。 ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing Refer to unit test cases on TestHBaseRegionScanAssignments#testHBaseGroupScanAssignmentSomeAfinedAndSomeWithOrphans > Assign region throw exception when some region is deployed on affinity node > and some on non-affinity node > - > > Key: DRILL-8482 > URL: https://issues.apache.org/jira/browse/DRILL-8482 > Project: Apache Drill > Issue Type: Bug > Components: Storage - HBase >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.22.0 > > Attachments: > 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch > > > *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe > the bug* > Assign region throw exception when some region is deployed on affinity > node and some on non-affinity node。 > *To Reproduce* > Steps to reproduce the behavior: > # > {code:java} > NavigableMap regionsToScan = Maps.newTreeMap(); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), > SERVER_A); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), > SERVER_A); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), > SERVER_B); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), > SERVER_B); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), > SERVER_D); > regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), > SERVER_D); > final List endpoints = Lists.newArrayList(); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build()); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build()); > endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build()); > HBaseGroupScan scan = new HBaseGroupScan(); > scan.setRegionsToScan(regionsToScan); > scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], > null)); > scan.applyAssignments(endpoints);{code} > *Expected behavior* > A has 3 regions > B has 2 regions > C has 3 regions > *Error detail, log output or screenshots* > {code:java} > Caused by: java.lang.NullPointerException: null > at > org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE
[ https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820931#comment-17820931 ] ASF GitHub Bot commented on DRILL-8475: --- cgivre merged PR #2879: URL: https://github.com/apache/drill/pull/2879 > Update the binary distributions LICENSE > --- > > Key: DRILL-8475 > URL: https://issues.apache.org/jira/browse/DRILL-8475 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Calvin Kirs >Assignee: James Turton >Priority: Blocker > Fix For: 1.21.2 > > Attachments: dependencies.txt, drill-dep-list.txt > > > I checked the latest released version, and it does not follow the > corresponding rules[1]. This is very important and I hope it will be taken > seriously by the PMC team. I'd be happy to do it if needed. > [1] [https://infra.apache.org/licensing-howto.html#binary] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE
[ https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819870#comment-17819870 ] ASF GitHub Bot commented on DRILL-8475: --- cgivre commented on PR #2879: URL: https://github.com/apache/drill/pull/2879#issuecomment-1960638549 @jnturton Are we closed to merging this? > Update the binary distributions LICENSE > --- > > Key: DRILL-8475 > URL: https://issues.apache.org/jira/browse/DRILL-8475 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Calvin Kirs >Assignee: James Turton >Priority: Blocker > Fix For: 1.21.2 > > Attachments: dependencies.txt, drill-dep-list.txt > > > I checked the latest released version, and it does not follow the > corresponding rules[1]. This is very important and I hope it will be taken > seriously by the PMC team. I'd be happy to do it if needed. > [1] [https://infra.apache.org/licensing-howto.html#binary] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE
[ https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814095#comment-17814095 ] ASF GitHub Bot commented on DRILL-8475: --- jnturton commented on PR #2879: URL: https://github.com/apache/drill/pull/2879#issuecomment-1925767446 TODO: determine whether too much has been pruned from the JDBC driver, specifically libraries related to Kerberos. > Update the binary distributions LICENSE > --- > > Key: DRILL-8475 > URL: https://issues.apache.org/jira/browse/DRILL-8475 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Calvin Kirs >Assignee: James Turton >Priority: Blocker > Fix For: 1.21.2 > > Attachments: dependencies.txt, drill-dep-list.txt > > > I checked the latest released version, and it does not follow the > corresponding rules[1]. This is very important and I hope it will be taken > seriously by the PMC team. I'd be happy to do it if needed. > [1] [https://infra.apache.org/licensing-howto.html#binary] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8475) The binary version License and NOTICE do not comply with the corresponding terms.
[ https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814055#comment-17814055 ] ASF GitHub Bot commented on DRILL-8475: --- jnturton opened a new pull request, #2879: URL: https://github.com/apache/drill/pull/2879 # [DRILL-8475](https://issues.apache.org/jira/browse/DRILL-8475): Update the binary dist LICENSE ## Description The LICENSE file included in the binary distributions of Drill becomes an artifact that is generated automatically by the org.codehaus.mojo:license-maven-plugin (and so is no longer part of the Git source tree). Dependencies that it cannot detect are kept in the LICENSE-base.txt file which is combined with the generated license notices by a new Freemarker template. Various other dependency related changes are included as part of this work. It is still possible that fat jars have introduced hidden depedencies but I propose that those are analysed in a subsequent Jira issue. ## Documentation Comments and updated dev docs. ## Testing Comparison of the jars/ directory of a Drill build against the generated LICENSE file to check that every bundled jar has a license notice in LICENSE. > The binary version License and NOTICE do not comply with the corresponding > terms. > - > > Key: DRILL-8475 > URL: https://issues.apache.org/jira/browse/DRILL-8475 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Calvin Kirs >Assignee: James Turton >Priority: Blocker > Fix For: 1.21.2 > > Attachments: dependencies.txt, drill-dep-list.txt > > > I checked the latest released version, and it does not follow the > corresponding rules[1]. This is very important and I hope it will be taken > seriously by the PMC team. I'd be happy to do it if needed. > [1] [https://infra.apache.org/licensing-howto.html#binary] > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8479) mergejion memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810231#comment-17810231 ] ASF GitHub Bot commented on DRILL-8479: --- shfshihuafeng opened a new pull request, #2878: URL: https://github.com/apache/drill/pull/2878 … (#2876) # [DRILL-8479](https://issues.apache.org/jira/browse/DRILL-8479): mergejoin leak when Depleting incoming batches throw exception ## Description when fragment failed, it call close() from MergeJoinBatch. but if leftIterator.close() throw exception, we could not call rightIterator.close() to release memory。 ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing The test method is the same with link, only one parameter needs to be modified, set planner.enable_hashjoin =false to ensure use mergejoin operator [](https://github.com/apache/drill/pull/2875) > mergejion memory leak when exception > - > > Key: DRILL-8479 > URL: https://issues.apache.org/jira/browse/DRILL-8479 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Critical > Attachments: 0001-mergejoin-leak.patch > > > *Describe the bug* > megerjoin leak when RecordIterator allocate memory exception with > OutOfMemoryException{*}{*} > {*}Steps to reproduce the behavior{*}: > # prepare data for tpch 1s > # set direct memory 5g > # set planner.enable_hashjoin =false to ensure use mergejoin operator。 > # set drill.memory.debug.allocator =true (Check for memory leaks ) > # 20 concurrent for tpch sql8 > # when it had OutOfMemoryException or null EXCEPTION , stopped all sql. > # finding memory leak > *Expected behavior* > when all sql sop , we should find direct memory is 0 AND could not > find leak log like following. > {code:java} > Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 > (res/actual/peak/limit){code} > *Error detail, log output or screenshots* > {code:java} > Unable to allocate buffer of size XX (rounded from XX) due to memory limit > (). Current allocation: xx{code} > [^0001-mergejoin-leak.patch] > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810171#comment-17810171 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1464222619 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java: ## @@ -1312,7 +1312,9 @@ private void cleanup() { } // clean (and deallocate) each partition, and delete its spill file for (HashPartition partn : partitions) { - partn.close(); + if (partn != null) { +partn.close(); + } Review Comment: (partn != null) are necessary ,see Comment above on 1. fix idea > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810170#comment-17810170 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); Review Comment: ### 1. fix idea The design is any operator fails, the entire operator stack is closed. but partitions is array which is initialed by null。if hashPartition object is not created successfully, it throw exception. so partitions array data after index which is null。 ` for (int part = 0; part < numPartitions; part++) { partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions); }` for example partitions array length is 32, numPartitions =32 when numPartitions =10 ,throw except. partitions[11-31] will be null object which index numPartitions =10 was created failed ,but it had allocater memory. when calling close() , hashpartion object which numPartitions =10 can not call close,beacause it is null。 ### 2. another fix idea we do not throw exception and do not call close, but catch. we can create hash partiotn object . thus when calling close() , we can release。 ``` //1. add isException parameter when construct HashPartition object HashPartition(FragmentContext context, BufferAllocator allocator, ChainedHashTable baseHashTable, RecordBatch buildBatch, RecordBatch probeBatch, boolean semiJoin, int recordsPerBatch, SpillSet spillSet, int partNum, int cycleNum, int numPartitions , boolean **isException** ) //2. catch except to ensure HashPartition object has been created } catch (OutOfMemoryException oom) { //do not call close ,do not throw except isException =true; } //3.deal with exception AbstractHashBinaryRecordBatch#initializeBuild boolean isException = false; try { for (int part = 0; part < numPartitions; part++) { if (isException) { break; } partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions,**isException** ); } } catch (Exception e) { isException = true; } if (isException ){ throw UserException.memoryError(exceptions[0]) .message("Failed to allocate hash partition.") .build(logger); } ``` > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810168#comment-17810168 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); Review Comment: ### 1. fix idea The design is any operator fails, the entire operator stack is closed. but partitions is array which is initialed by null。if hashPartition object is not created successfully, it throw exception. so partitions array data after index which is null。 ` for (int part = 0; part < numPartitions; part++) { partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions); }` for example partitions array length is 32, numPartitions =32 when numPartitions =10 ,throw except. partitions[11-31] will be null object which index numPartitions =10 was created failed ,but it had allocater memory. when calling close() , hashpartion object which numPartitions =10 can not call close,beacause it is null。 ### 2. another fix idea we do not throw exception and do not call close, but catch. we can create hash partiotn object . thus when calling close() , we can release。 ``` //add isException parameter when construct HashPartition object HashPartition(FragmentContext context, BufferAllocator allocator, ChainedHashTable baseHashTable, RecordBatch buildBatch, RecordBatch probeBatch, boolean semiJoin, int recordsPerBatch, SpillSet spillSet, int partNum, int cycleNum, int numPartitions , boolean **isException** ) } catch (OutOfMemoryException oom) { //do not call close ,do not throw except isException =true; } AbstractHashBinaryRecordBatch#initializeBuild boolean isException = false; try { for (int part = 0; part < numPartitions; part++) { if (isException) { break; } partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions,**isException** ); } } catch (Exception e) { isException = true; } if (isException ){ throw UserException.memoryError(exceptions[0]) .message("Failed to allocate hash partition.") .build(logger); } ``` > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpc
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810167#comment-17810167 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); Review Comment: ### 1. fix idea The design is any operator fails, the entire operator stack is closed. but partitions is array which is initialed by null。if hashPartition object is not created successfully, it throw exception. so partitions array data after index which is null。 ` for (int part = 0; part < numPartitions; part++) { partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions); }` for example partitions array length is 32, numPartitions =32 when numPartitions =10 ,throw except. partitions[11-31] will be null object which index numPartitions =10 was created failed ,but it had allocater memory. when calling close() , hashpartion object which numPartitions =10 can not call close,beacause it is null。 ### 2. another fix idea we do not throw exception and do not call close, but catch. we can create hash partiotn object . thus when calling close() , we can release。 but if ``` //add isException parameter when construct HashPartition object HashPartition(FragmentContext context, BufferAllocator allocator, ChainedHashTable baseHashTable, RecordBatch buildBatch, RecordBatch probeBatch, boolean semiJoin, int recordsPerBatch, SpillSet spillSet, int partNum, int cycleNum, int numPartitions , boolean **isException** ) } catch (OutOfMemoryException oom) { //do not call close ,do not throw except isException =true; } AbstractHashBinaryRecordBatch#initializeBuild boolean isException = false; try { for (int part = 0; part < numPartitions; part++) { if (isException) { break; } partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions,**isException** ); } } catch (Exception e) { isException = true; } if (isException ){ throw UserException.memoryError(exceptions[0]) .message("Failed to allocate hash partition.") .build(logger); } ``` > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem,
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810166#comment-17810166 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); Review Comment: ### 1. fix idea The design is any operator fails, the entire operator stack is closed. but partitions is array which is initialed by null。if hashPartition object is not created successfully, it throw exception. so partitions array data after index which is null。 ` for (int part = 0; part < numPartitions; part++) { partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions); }` for example partitions array length is 32, numPartitions =32 when numPartitions =10 ,throw except. partitions[11-31] will be null object which index numPartitions =10 was created failed ,but it had allocater memory. when calling close() , hashpartion object which numPartitions =10 can not call close,beacause it is null。 ### 2. another fix idea we do not throw exception and do not call close, but catch. we can create hash partiotn object . thus when calling close() , we can release。 but if ``` HashPartition(FragmentContext context, BufferAllocator allocator, ChainedHashTable baseHashTable, RecordBatch buildBatch, RecordBatch probeBatch, boolean semiJoin, int recordsPerBatch, SpillSet spillSet, int partNum, int cycleNum, int numPartitions , boolean **isException** ) } catch (OutOfMemoryException oom) { //do not call close ,do not throw except isException =false; } AbstractHashBinaryRecordBatch#initializeBuild boolean isException = false; try { for (int part = 0; part < numPartitions; part++) { if (isException) { break; } partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions,**isException** ); } } catch (Exception e) { isException = true; } if (isException ){ throw UserException.memoryError(exceptions[0]) .message("Failed to allocate hash partition.") .build(logger); } ``` > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, >
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810162#comment-17810162 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); Review Comment: ### 1. fix idea The design is any operator fails, the entire operator stack is closed. but partitions is array which is initialed by null。if hashPartition object is not created successfully, it throw exception. so partitions array data after index which is null。 ` for (int part = 0; part < numPartitions; part++) { partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions); }` for example partitions array length is 32, numPartitions =32 when numPartitions =10 ,throw except. partitions[11-31] will be null object which index numPartitions =10 was created failed ,but it had allocater memory. when calling close() , hashpartion object which numPartitions =10 can not call close,beacause it is null。 ### 2. another fix idea we do not throw exception and do not call close, but catch. we can create hash partiotn object . thus when calling close() , we can release。 but if ``` } catch (OutOfMemoryException oom) { //do not call close ,only throw except throw UserException.memoryError(oom) .message("Failed to allocate hash partition.") .build(logger); } AbstractHashBinaryRecordBatch#initializeBuild boolean isException = false; try { for (int part = 0; part < numPartitions; part++) { if (isException) { break; } partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions); } } catch (Exception e) { isException = true; } if (isException ){ throw UserException.memoryError(exceptions[0]) .message("Failed to allocate hash partition.") .build(logger); } ``` > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_na
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810161#comment-17810161 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); Review Comment: ### 1. fix idea The design is any operator fails, the entire operator stack is closed. but partitions is array which is initialed by null。if hashPartition object is not created successfully, it throw exception. so partitions array data after index which is null。 ` for (int part = 0; part < numPartitions; part++) { partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions); }` for example partitions array length is 32, numPartitions =32 when numPartitions =10 ,throw except. partitions[11-31] will be null object which index numPartitions =10 was created failed ,but it had allocater memory. when calling close() , hashpartion object which numPartitions =10 can not call close,beacause it is null。 2. another fix idea we do not throw exception and do not call close, but catch. we can create hash partiotn object . thus when calling close() , we can release。 but if ``` } catch (OutOfMemoryException oom) { //do not call close ,only throw except throw UserException.memoryError(oom) .message("Failed to allocate hash partition.") .build(logger); } AbstractHashBinaryRecordBatch#initializeBuild boolean isException = false; try { for (int part = 0; part < numPartitions; part++) { if (isException) { break; } partitions[part] = new HashPartition(context, allocator, baseHashTable, buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, part, spilledState.getCycle(), numPartitions); } } catch (Exception e) { isException = true; } if (isException ){ throw UserException.memoryError(exceptions[0]) .message("Failed to allocate hash partition.") .build(logger); } ``` > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nation
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810092#comment-17810092 ] ASF GitHub Bot commented on DRILL-8474: --- mbeckerle commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1906827568 Ok, so the geo-ip UDF stuff has no special mechanisms or description about those resource files, so the generic code that "scans" must find them and drag them along automatically. That's the behavior I want. What is "Drill's 3rd Party Jar folder"? If a magic folder just gets dragged over to all nodes, and drill uses a class loader that arranges for jars in that folder to be searched, then there is very little to do, since a DFDL schema can be just a set of jar files containing related resources, and the classes for Daffodil's own UDFs and layers which are java code extensions of its own kind. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810091#comment-17810091 ] ASF GitHub Bot commented on DRILL-8478: --- paul-rogers commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1463921977 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java: ## @@ -1312,7 +1312,9 @@ private void cleanup() { } // clean (and deallocate) each partition, and delete its spill file for (HashPartition partn : partitions) { - partn.close(); + if (partn != null) { +partn.close(); + } Review Comment: The above is OK as a work-around. I wonder, however, where the code added a null pointer to the partition list. That should never happen. If it does, it should be fixed at the point where the null pointer is added to the list. Fixing it here is incomplete: there are other places where we loop through the list, and those will also fail. ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); Review Comment: This call is _probably_ fine. However, the design is that if any operator fails, the entire operator stack is closed. So, `close()` should be called by the fragment executor. There is probably no harm in calling `close()` here, as long as the `close()` method is safe to call twice. If the fragment executor _does not_ call close when the failure occurs during setup, then there is a bug since failing to call `close()` results in just this kind of error. > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810070#comment-17810070 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1906689793 > > > @cgivre @paul-rogers is there an example of a Drill UDF that is not part of the drill repository tree? > > > I'd like to understand the mechanisms for distributing any jar files and dependencies of the UDF that drill uses. I can't find any such in the quasi-USFs that are in the Drill tree, because well, since they are part of Drill, and so are their dependencies, this problem doesn't exist. > > > > > > @mbeckerle Here's an example: https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't able to connect last week. > > If I understand this correctly, if a jar is on the classpath and has drill-module.conf in its root dir, then drill will find it and read that HOCON file to get the package to add to drill.classpath.scanning.packages. I believe that is correct. > > Drill then appears to scan jars for class files for those packages. Not sure what it is doing with the class files. I imagine it is repackaging them somehow so Drill can use them on the drill distributed nodes. But it isn't yet clear to me how this aspect works. Do these classes just get loaded on the distributed drill nodes? Or is the classpath augmented in some way on the drill nodes so that they see a jar that contains all these classes? > > I have two questions: > > (1) what about dependencies? The UDF may depend on libraries which depend on other libraries, etc. So UDFs are a bit of a special case, but if they do have dependencies, you have to also include those JAR files in the UDF directory, or in Drill's 3rd party JAR folder. I'm not that good with maven, but I've often wondered about making a so-called fat-JAR which includes the dependencies as part of the UDF JAR file. > > (2) what about non-class files, e.g., things under src/main/resources of the project that go into the jar, but aren't "class" files? How do those things also get moved? How would code running in the drill node access these? The usual method is to call getResource(URL) with a URL that gives the path within a jar file to the resource in question. Take a look at this UDF. https://github.com/datadistillr/drill-geoip-functions This UDF has a few external resources including a CSV file and the MaxMind databases. > > Thanks for any info. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810051#comment-17810051 ] ASF GitHub Bot commented on DRILL-8474: --- mbeckerle commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1906561549 > > @cgivre @paul-rogers is there an example of a Drill UDF that is not part of the drill repository tree? > > I'd like to understand the mechanisms for distributing any jar files and dependencies of the UDF that drill uses. I can't find any such in the quasi-USFs that are in the Drill tree, because well, since they are part of Drill, and so are their dependencies, this problem doesn't exist. > > @mbeckerle Here's an example: https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't able to connect last week. If I understand this correctly, if a jar is on the classpath and has drill-module.conf in its root dir, then drill will find it and read that HOCON file to get the package to add to drill.classpath.scanning.packages. Drill then appears to scan jars for class files for those packages. Not sure what it is doing with the class files. I imagine it is repackaging them somehow so Drill can use them on the drill distributed nodes. But it isn't yet clear to me how this aspect works. Do these classes just get loaded on the distributed drill nodes? Or is the classpath augmented in some way on the drill nodes so that they see a jar that contains all these classes? I have two questions: (1) what about dependencies? The UDF may depend on libraries which depend on other libraries, etc. (2) what about non-class files, e.g., things under src/main/resources of the project that go into the jar, but aren't "class" files? How do those things also get moved? How would code running in the drill node access these? The usual method is to call getResource(URL) with a URL that gives the path within a jar file to the resource in question. Thanks for any info. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809982#comment-17809982 ] ASF GitHub Bot commented on DRILL-8478: --- jnturton merged PR #2875: URL: https://github.com/apache/drill/pull/2875 > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809816#comment-17809816 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on PR #2875: URL: https://github.com/apache/drill/pull/2875#issuecomment-1905599592 > [An unsued import crept in](https://github.com/apache/drill/actions/runs/7622586264/job/20762475705#step:6:1277), could you remove it please? removed it > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809814#comment-17809814 ] ASF GitHub Bot commented on DRILL-8478: --- jnturton commented on PR #2875: URL: https://github.com/apache/drill/pull/2875#issuecomment-1905598192 [An unsued import crept in](https://github.com/apache/drill/actions/runs/7622586264/job/20762475705#step:6:1277), could you remove it please? > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809792#comment-17809792 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1462854154 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); + throw UserException.memoryError(oom) + .message("OutOfMemory while allocate memory for hash partition.") Review Comment: i resubmit pr and supply test step > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809771#comment-17809771 ] ASF GitHub Bot commented on DRILL-8478: --- paul-rogers commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1462817821 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); + throw UserException.memoryError(oom) + .message("OutOfMemory while allocate memory for hash partition.") Review Comment: Suggested: `"Failed to allocate hash partition."` The `memoryError()` already indicate it is an OOM error. ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java: ## @@ -1312,7 +1313,9 @@ private void cleanup() { } // clean (and deallocate) each partition, and delete its spill file for (HashPartition partn : partitions) { - partn.close(); + if (Objects.nonNull(partn)) { Review Comment: Simpler `if (partn != null) {` > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809772#comment-17809772 ] ASF GitHub Bot commented on DRILL-8478: --- paul-rogers commented on code in PR #2875: URL: https://github.com/apache/drill/pull/2875#discussion_r1462817821 ## exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java: ## @@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, BufferAllocator allocator, Chained .build(logger); } catch (SchemaChangeException sce) { throw new IllegalStateException("Unexpected Schema Change while creating a hash table",sce); -} -this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator); -tmpBatchesList = new ArrayList<>(); -if (numPartitions > 1) { - allocateNewCurrentBatchAndHV(); +} catch (OutOfMemoryException oom) { + close(); + throw UserException.memoryError(oom) + .message("OutOfMemory while allocate memory for hash partition.") Review Comment: Suggested: `"Failed to allocate hash partition."` The `memoryError()` already indicates that it is an OOM error. > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception
[ https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809763#comment-17809763 ] ASF GitHub Bot commented on DRILL-8478: --- shfshihuafeng opened a new pull request, #2875: URL: https://github.com/apache/drill/pull/2875 # [DRILL-](https://issues.apache.org/jira/browse/DRILL-): PR Title DRILL-8478. HashPartition memory leak when it allocate memory exception with OutOfMemoryException (#2874) ## Description when allocating memory for hashParttion with OutOfMemoryException,it cause memory leak. beacuase hashpartiton object cannot be created successfully, so it cannot be cleaned up In the closing phase. ## Documentation (Please describe user-visible changes similar to what should appear in the Drill documentation.) ## Testing (Please describe how this PR has been tested.) > HashPartition memory leak when exception > - > > Key: DRILL-8478 > URL: https://issues.apache.org/jira/browse/DRILL-8478 > Project: Apache Drill > Issue Type: Bug > Components: Execution - Relational Operators >Affects Versions: 1.21.1 >Reporter: shihuafeng >Priority: Major > Fix For: 1.21.2 > > Attachments: > 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch > > > *Describe the bug* > hashpartition leak when allocate memory exception with OutOfMemoryException > *To Reproduce* > Steps to reproduce the behavior: > # prepare data for tpch 1s > # 20 concurrent for tpch sql8 > # set direct memory 5g > # when it had OutOfMemoryException , stopped all sql. > # finding memory leak > *Expected behavior* > (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"} > (2) i run sql8 (sql detail as Additional context) with 20 concurrent > (3) it had OutOfMemoryException when create hashPartion > *Error detail, log output or screenshots* > Unable to allocate buffer of size 262144 (rounded from 262140) due to memory > limit (41943040). Current allocation: 20447232 > > sql > {code:java} > // code placeholder > select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / > sum(volume) as mkt_share from ( select extract(year from o_orderdate) as > o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from > hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, > hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, > hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and > s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey > and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name > = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date > '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as > all_nations group by o_year order by o_year > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809174#comment-17809174 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1902751729 > @cgivre @paul-rogers is there an example of a Drill UDF that is not part of the drill repository tree? > > I'd like to understand the mechanisms for distributing any jar files and dependencies of the UDF that drill uses. I can't find any such in the quasi-USFs that are in the Drill tree, because well, since they are part of Drill, and so are their dependencies, this problem doesn't exist. @mbeckerle Here's an example: https://github.com/datadistillr/drill-humanname-functions.I'm sorry we weren't able to connect last week. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809173#comment-17809173 ] ASF GitHub Bot commented on DRILL-8474: --- mbeckerle commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1902750285 @cgivre @paul-rogers is there an example of a Drill UDF that is not part of the drill repository tree? I'd like to understand the mechanisms for distributing any jar files and dependencies of the UDF that drill uses. I can't find any such in the quasi-USFs that are in the Drill tree, because well, since they are part of Drill, and so are their dependencies, this problem doesn't exist. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809172#comment-17809172 ] ASF GitHub Bot commented on DRILL-8474: --- mbeckerle commented on code in PR #2836: URL: https://github.com/apache/drill/pull/2836#discussion_r1461099077 ## contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.daffodil.schema; + +import org.apache.daffodil.runtime1.api.ChoiceMetadata; +import org.apache.daffodil.runtime1.api.ComplexElementMetadata; +import org.apache.daffodil.runtime1.api.ElementMetadata; +import org.apache.daffodil.runtime1.api.InfosetSimpleElement; +import org.apache.daffodil.runtime1.api.MetadataHandler; +import org.apache.daffodil.runtime1.api.SequenceMetadata; +import org.apache.daffodil.runtime1.api.SimpleElementMetadata; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.record.metadata.MapBuilder; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Stack; + +/** + * This class transforms a DFDL/Daffodil schema into a Drill Schema. + */ +public class DrillDaffodilSchemaVisitor extends MetadataHandler { + private static final Logger logger = LoggerFactory.getLogger(DrillDaffodilSchemaVisitor.class); + /** + * Unfortunately, SchemaBuilder and MapBuilder, while similar, do not share a base class so we + * have a stack of MapBuilders, and when empty we use the SchemaBuilder Review Comment: This is fixed in the latest commit. Created MapBuilderLike interface shared by SchemaBuilder and MapBuilder. I only populated it with the methods I needed. The corresponding problem doesn't really occur in the rowWriter area as tupleWriter is the common underlying class used. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807233#comment-17807233 ] ASF GitHub Bot commented on DRILL-8474: --- mbeckerle commented on code in PR #2836: URL: https://github.com/apache/drill/pull/2836#discussion_r1453422371 ## contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.daffodil; + +import org.apache.daffodil.japi.DataProcessor; +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.common.exceptions.CustomErrorContext; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader; +import org.apache.drill.exec.physical.impl.scan.v3.file.FileDescrip; +import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.dfs.easy.EasySubScan; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.InputStream; +import java.net.URI; +import java.net.URISyntaxException; +import java.util.Objects; + +import static org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory.*; +import static org.apache.drill.exec.store.daffodil.schema.DrillDaffodilSchemaUtils.daffodilDataProcessorToDrillSchema; + +public class DaffodilBatchReader implements ManagedReader { + + private static final Logger logger = LoggerFactory.getLogger(DaffodilBatchReader.class); + private final RowSetLoader rowSetLoader; + private final CustomErrorContext errorContext; + private final DaffodilMessageParser dafParser; + private final InputStream dataInputStream; + + public DaffodilBatchReader(DaffodilReaderConfig readerConfig, EasySubScan scan, + FileSchemaNegotiator negotiator) { + +errorContext = negotiator.parentErrorContext(); +DaffodilFormatConfig dafConfig = readerConfig.plugin.getConfig(); + +String schemaURIString = dafConfig.getSchemaURI(); // "schema/complexArray1.dfdl.xsd"; +String rootName = dafConfig.getRootName(); +String rootNamespace = dafConfig.getRootNamespace(); +boolean validationMode = dafConfig.getValidationMode(); + +URI dfdlSchemaURI; +try { + dfdlSchemaURI = new URI(schemaURIString); +} catch (URISyntaxException e) { + throw UserException.validationError(e).build(logger); +} + +FileDescrip file = negotiator.file(); +DrillFileSystem fs = file.fileSystem(); +URI fsSchemaURI = fs.getUri().resolve(dfdlSchemaURI); + +DaffodilDataProcessorFactory dpf = new DaffodilDataProcessorFactory(); +DataProcessor dp; +try { + dp = dpf.getDataProcessor(fsSchemaURI, validationMode, rootName, rootNamespace); +} catch (CompileFailure e) { + throw UserException.dataReadError(e) + .message(String.format("Failed to get Daffodil DFDL processor for: %s", fsSchemaURI)) + .addContext(errorContext).addContext(e.getMessage()).build(logger); +} +// Create the corresponding Drill schema. +// Note: this could be a very large schema. Think of a large complex RDBMS schema, +// all of it, hundreds of tables, but all part of the same metadata tree. +TupleMetadata drillSchema = daffodilDataProcessorToDrillSchema(dp); +// Inform Drill about the schema +negotiator.tableSchema(drillSchema, true); + +// +// DATA TIME: Next we construct the runtime objects, and open files. +// +// We get the DaffodilMessageParser, which is a stateful driver for daffodil that +// actually does the parsing. +rowSetLoader = negotiator.build().writer(); + +// We construct the Daffodil InfosetOutputter which the daffodil parser uses to +// conver
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806651#comment-17806651 ] ASF GitHub Bot commented on DRILL-8188: --- jnturton merged PR #2515: URL: https://github.com/apache/drill/pull/2515 > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806649#comment-17806649 ] ASF GitHub Bot commented on DRILL-8188: --- jnturton commented on code in PR #2515: URL: https://github.com/apache/drill/pull/2515#discussion_r1446231938 ## contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java: ## @@ -171,107 +168,109 @@ public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig formatConfig) } } - public HDF5BatchReader(HDF5ReaderConfig readerConfig, int maxRecords) { -this.readerConfig = readerConfig; -this.maxRecords = maxRecords; + public HDF5BatchReader(HDF5ReaderConfig config, EasySubScan scan, FileSchemaNegotiator negotiator) { +errorContext = negotiator.parentErrorContext(); +file = negotiator.file(); +readerConfig = config; dataWriters = new ArrayList<>(); -this.showMetadataPreview = readerConfig.formatConfig.showPreview(); - } +showMetadataPreview = readerConfig.formatConfig.showPreview(); - @Override - public boolean open(FileSchemaNegotiator negotiator) { -split = negotiator.split(); -errorContext = negotiator.parentErrorContext(); // Since the HDF file reader uses a stream to actually read the file, the file name from the // module is incorrect. -fileName = split.getPath().getName(); -try { - openFile(negotiator); -} catch (IOException e) { - throw UserException -.dataReadError(e) -.addContext("Failed to close input file: %s", split.getPath()) -.addContext(errorContext) -.build(logger); -} +fileName = file.split().getPath().getName(); -ResultSetLoader loader; -if (readerConfig.defaultPath == null) { - // Get file metadata - List metadata = getFileMetadata(hdfFile, new ArrayList<>()); - metadataIterator = metadata.iterator(); - - // Schema for Metadata query - SchemaBuilder builder = new SchemaBuilder() -.addNullable(PATH_COLUMN_NAME, MinorType.VARCHAR) -.addNullable(DATA_TYPE_COLUMN_NAME, MinorType.VARCHAR) -.addNullable(FILE_NAME_COLUMN_NAME, MinorType.VARCHAR) -.addNullable(DATA_SIZE_COLUMN_NAME, MinorType.BIGINT) -.addNullable(IS_LINK_COLUMN_NAME, MinorType.BIT) -.addNullable(ELEMENT_COUNT_NAME, MinorType.BIGINT) -.addNullable(DATASET_DATA_TYPE_NAME, MinorType.VARCHAR) -.addNullable(DIMENSIONS_FIELD_NAME, MinorType.VARCHAR); - - negotiator.tableSchema(builder.buildSchema(), false); - - loader = negotiator.build(); - dimensions = new int[0]; - rowWriter = loader.writer(); - -} else { - // This is the case when the default path is specified. Since the user is explicitly asking for a dataset - // Drill can obtain the schema by getting the datatypes below and ultimately mapping that schema to columns - Dataset dataSet = hdfFile.getDatasetByPath(readerConfig.defaultPath); - dimensions = dataSet.getDimensions(); - - loader = negotiator.build(); - rowWriter = loader.writer(); - writerSpec = new WriterSpec(rowWriter, negotiator.providedSchema(), - negotiator.parentErrorContext()); - if (dimensions.length <= 1) { -buildSchemaFor1DimensionalDataset(dataSet); - } else if (dimensions.length == 2) { -buildSchemaFor2DimensionalDataset(dataSet); - } else { -// Case for datasets of greater than 2D -// These are automatically flattened -buildSchemaFor2DimensionalDataset(dataSet); +{ // Opens an HDF5 file Review Comment: I guess some of these could become private methods but it's a minor point for me. ## contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java: ## @@ -171,107 +164,104 @@ public HDF5ReaderConfig(HDF5FormatPlugin plugin, HDF5FormatConfig formatConfig) } } - public HDF5BatchReader(HDF5ReaderConfig readerConfig, int maxRecords) { -this.readerConfig = readerConfig; -this.maxRecords = maxRecords; + public HDF5BatchReader(HDF5ReaderConfig config, EasySubScan scan, FileSchemaNegotiator negotiator) { +errorContext = negotiator.parentErrorContext(); +file = negotiator.file(); +readerConfig = config; dataWriters = new ArrayList<>(); -this.showMetadataPreview = readerConfig.formatConfig.showPreview(); - } +showMetadataPreview = readerConfig.formatConfig.showPreview(); - @Override - public boolean open(FileSchemaNegotiator negotiator) { -split = negotiator.split(); -errorContext = negotiator.parentErrorContext(); // Since the HDF file reader uses a stream to actually read the file, the file name from the // module is incorrect. -fileName = split.getPath().getName(); -try { - openFile(negotiator); -} catch (IOException e) { -
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806487#comment-17806487 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1890990577 > > @mbeckerle With respect to style, I tried to reply to that comment, but the thread won't let me. In any event, Drill classes will typically start with the constructor, then have whatever methods are appropriate for the class. The logger creation usually happens before the constructor. I think all of your other classes followed this format, so the one or two that didn't kind of jumped out at me. > > @cgivre I believe the style issues are all fixed. The build did not get any codestyle issues. The issue I was referring to was more around the organization of a few classes. Usually we'll have the constructor (if present) at the top followed by any class methods. I think there was a class or two where the constructor was at the bottom or something like that. In any event, consider the issue resolved. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806486#comment-17806486 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on code in PR #2836: URL: https://github.com/apache/drill/pull/2836#discussion_r1451758017 ## contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java: ## @@ -0,0 +1,181 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.daffodil; + +import org.apache.daffodil.japi.DataProcessor; +import org.apache.drill.common.AutoCloseables; +import org.apache.drill.common.exceptions.CustomErrorContext; +import org.apache.drill.common.exceptions.UserException; +import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader; +import org.apache.drill.exec.physical.impl.scan.v3.file.FileDescrip; +import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory; +import org.apache.drill.exec.store.dfs.DrillFileSystem; +import org.apache.drill.exec.store.dfs.easy.EasySubScan; +import org.apache.hadoop.fs.Path; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.io.InputStream; +import java.net.URI; +import java.net.URISyntaxException; +import java.util.Objects; + +import static org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory.*; +import static org.apache.drill.exec.store.daffodil.schema.DrillDaffodilSchemaUtils.daffodilDataProcessorToDrillSchema; + +public class DaffodilBatchReader implements ManagedReader { + + private static final Logger logger = LoggerFactory.getLogger(DaffodilBatchReader.class); + private final RowSetLoader rowSetLoader; + private final CustomErrorContext errorContext; + private final DaffodilMessageParser dafParser; + private final InputStream dataInputStream; + + public DaffodilBatchReader(DaffodilReaderConfig readerConfig, EasySubScan scan, + FileSchemaNegotiator negotiator) { + +errorContext = negotiator.parentErrorContext(); +DaffodilFormatConfig dafConfig = readerConfig.plugin.getConfig(); + +String schemaURIString = dafConfig.getSchemaURI(); // "schema/complexArray1.dfdl.xsd"; +String rootName = dafConfig.getRootName(); +String rootNamespace = dafConfig.getRootNamespace(); +boolean validationMode = dafConfig.getValidationMode(); + +URI dfdlSchemaURI; +try { + dfdlSchemaURI = new URI(schemaURIString); +} catch (URISyntaxException e) { + throw UserException.validationError(e).build(logger); +} + +FileDescrip file = negotiator.file(); +DrillFileSystem fs = file.fileSystem(); +URI fsSchemaURI = fs.getUri().resolve(dfdlSchemaURI); + +DaffodilDataProcessorFactory dpf = new DaffodilDataProcessorFactory(); +DataProcessor dp; +try { + dp = dpf.getDataProcessor(fsSchemaURI, validationMode, rootName, rootNamespace); +} catch (CompileFailure e) { + throw UserException.dataReadError(e) + .message(String.format("Failed to get Daffodil DFDL processor for: %s", fsSchemaURI)) + .addContext(errorContext).addContext(e.getMessage()).build(logger); +} +// Create the corresponding Drill schema. +// Note: this could be a very large schema. Think of a large complex RDBMS schema, +// all of it, hundreds of tables, but all part of the same metadata tree. +TupleMetadata drillSchema = daffodilDataProcessorToDrillSchema(dp); +// Inform Drill about the schema +negotiator.tableSchema(drillSchema, true); + +// +// DATA TIME: Next we construct the runtime objects, and open files. +// +// We get the DaffodilMessageParser, which is a stateful driver for daffodil that +// actually does the parsing. +rowSetLoader = negotiator.build().writer(); + +// We construct the Daffodil InfosetOutputter which the daffodil parser uses to +// convert i
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806484#comment-17806484 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on code in PR #2836: URL: https://github.com/apache/drill/pull/2836#discussion_r1451757410 ## contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.daffodil.schema; + +import org.apache.daffodil.runtime1.api.ChoiceMetadata; +import org.apache.daffodil.runtime1.api.ComplexElementMetadata; +import org.apache.daffodil.runtime1.api.ElementMetadata; +import org.apache.daffodil.runtime1.api.InfosetSimpleElement; +import org.apache.daffodil.runtime1.api.MetadataHandler; +import org.apache.daffodil.runtime1.api.SequenceMetadata; +import org.apache.daffodil.runtime1.api.SimpleElementMetadata; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.record.metadata.MapBuilder; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Stack; + +/** + * This class transforms a DFDL/Daffodil schema into a Drill Schema. + */ +public class DrillDaffodilSchemaVisitor extends MetadataHandler { + private static final Logger logger = LoggerFactory.getLogger(DrillDaffodilSchemaVisitor.class); + /** + * Unfortunately, SchemaBuilder and MapBuilder, while similar, do not share a base class so we + * have a stack of MapBuilders, and when empty we use the SchemaBuilder Review Comment: This is likely music to @paul-rogers's ears. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806482#comment-17806482 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on code in PR #2836: URL: https://github.com/apache/drill/pull/2836#discussion_r1451756763 ## contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.daffodil.schema; + +import org.apache.daffodil.japi.Compiler; +import org.apache.daffodil.japi.Daffodil; +import org.apache.daffodil.japi.DataProcessor; +import org.apache.daffodil.japi.Diagnostic; +import org.apache.daffodil.japi.InvalidParserException; +import org.apache.daffodil.japi.InvalidUsageException; +import org.apache.daffodil.japi.ProcessorFactory; +import org.apache.daffodil.japi.ValidationMode; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.net.URI; +import java.net.URISyntaxException; +import java.nio.channels.Channels; +import java.util.List; +import java.util.Objects; + +/** + * Compiles a DFDL schema (mostly for tests) or loads a pre-compiled DFDL schema so that one can + * obtain a DataProcessor for use with DaffodilMessageParser. + * + * TODO: Needs to use a cache to avoid reloading/recompiling every time. + */ +public class DaffodilDataProcessorFactory { + // Default constructor is used. + + private static final Logger logger = LoggerFactory.getLogger(DaffodilDataProcessorFactory.class); + + private DataProcessor dp; + + /** + * Gets a Daffodil DataProcessor given the necessary arguments to compile or reload it. + * + * @param schemaFileURI + * pre-compiled dfdl schema (.bin extension) or DFDL schema source (.xsd extension) + * @param validationMode + * Use true to request Daffodil built-in 'limited' validation. Use false for no validation. + * @param rootName + * Local name of root element of the message. Can be null to use the first element declaration + * of the primary schema file. Ignored if reloading a pre-compiled schema. + * @param rootNS + * Namespace URI as a string. Can be null to use the target namespace of the primary schema + * file or if it is unambiguous what element is the rootName. Ignored if reloading a + * pre-compiled schema. + * @return the DataProcessor + * @throws CompileFailure + * - if schema compilation fails + */ + public DataProcessor getDataProcessor(URI schemaFileURI, boolean validationMode, String rootName, + String rootNS) + throws CompileFailure { + +DaffodilDataProcessorFactory dmp = new DaffodilDataProcessorFactory(); +boolean isPrecompiled = schemaFileURI.toString().endsWith(".bin"); +if (isPrecompiled) { + if (Objects.nonNull(rootName) && !rootName.isEmpty()) { +// A usage error. You shouldn't supply the name and optionally namespace if loading +// precompiled schema because those are built into it. Should be null or "". +logger.warn("Root element name '{}' is ignored when used with precompiled DFDL schema.", +rootName); + } + try { +dmp.loadSchema(schemaFileURI); + } catch (IOException | InvalidParserException e) { +throw new CompileFailure(e); + } + dmp.setupDP(validationMode, null); +} else { + List pfDiags; + try { +pfDiags = dmp.compileSchema(schemaFileURI, rootName, rootNS); + } catch (URISyntaxException | IOException e) { +throw new CompileFailure(e); + } + dmp.setupDP(validationMode, pfDiags); +} +return dmp.dp; + } + + private void loadSchema(URI schemaFileURI) throws IOException, InvalidParserException { +Compiler c = Daffodil.compiler(); +dp = c.reload(Channels.newChannel(schemaFileURI.toURL().openStream())); Review Comment: This definitely seems like an area where there is potential for a lot of different things to go wrong. My view is we should just do our best to provide c
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806481#comment-17806481 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on code in PR #2836: URL: https://github.com/apache/drill/pull/2836#discussion_r1451756527 ## contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java: ## @@ -0,0 +1,162 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.store.daffodil.schema; + +import org.apache.daffodil.japi.Compiler; +import org.apache.daffodil.japi.Daffodil; +import org.apache.daffodil.japi.DataProcessor; +import org.apache.daffodil.japi.Diagnostic; +import org.apache.daffodil.japi.InvalidParserException; +import org.apache.daffodil.japi.InvalidUsageException; +import org.apache.daffodil.japi.ProcessorFactory; +import org.apache.daffodil.japi.ValidationMode; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.io.IOException; +import java.net.URI; +import java.net.URISyntaxException; +import java.nio.channels.Channels; +import java.util.List; +import java.util.Objects; + +/** + * Compiles a DFDL schema (mostly for tests) or loads a pre-compiled DFDL schema so that one can + * obtain a DataProcessor for use with DaffodilMessageParser. + * + * TODO: Needs to use a cache to avoid reloading/recompiling every time. + */ +public class DaffodilDataProcessorFactory { + // Default constructor is used. + + private static final Logger logger = LoggerFactory.getLogger(DaffodilDataProcessorFactory.class); + + private DataProcessor dp; + + /** + * Gets a Daffodil DataProcessor given the necessary arguments to compile or reload it. + * + * @param schemaFileURI + * pre-compiled dfdl schema (.bin extension) or DFDL schema source (.xsd extension) + * @param validationMode + * Use true to request Daffodil built-in 'limited' validation. Use false for no validation. + * @param rootName + * Local name of root element of the message. Can be null to use the first element declaration + * of the primary schema file. Ignored if reloading a pre-compiled schema. + * @param rootNS + * Namespace URI as a string. Can be null to use the target namespace of the primary schema + * file or if it is unambiguous what element is the rootName. Ignored if reloading a + * pre-compiled schema. + * @return the DataProcessor + * @throws CompileFailure + * - if schema compilation fails + */ + public DataProcessor getDataProcessor(URI schemaFileURI, boolean validationMode, String rootName, + String rootNS) + throws CompileFailure { + +DaffodilDataProcessorFactory dmp = new DaffodilDataProcessorFactory(); +boolean isPrecompiled = schemaFileURI.toString().endsWith(".bin"); +if (isPrecompiled) { + if (Objects.nonNull(rootName) && !rootName.isEmpty()) { +// A usage error. You shouldn't supply the name and optionally namespace if loading +// precompiled schema because those are built into it. Should be null or "". +logger.warn("Root element name '{}' is ignored when used with precompiled DFDL schema.", +rootName); + } + try { +dmp.loadSchema(schemaFileURI); + } catch (IOException | InvalidParserException e) { +throw new CompileFailure(e); Review Comment: My thought here would be to fail as quickly as possible. If the DFDL schema can't be read, I'm assuming that we cannot proceed, so throwing an exception would be the right thing to do IMHO. With that said, we should make sure we provide a good error message that would explain what went wrong. One of the issues we worked on for a while with Drill was that it would fail and you'd get a stack trace w/o a clear idea of what the actual issue is and how to rectify it. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: N
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805821#comment-17805821 ] ASF GitHub Bot commented on DRILL-8188: --- cgivre commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1888037847 > Did the recent EVF revisions allow the tests for this PR to pass? Is there anything that is still missing? Also, did the excitement over my botched merge settle down and are we good now? All the unit tests pass whether that means that everything is working this plugin has a decent amount of tests, so I'd feel pretty good. > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805790#comment-17805790 ] ASF GitHub Bot commented on DRILL-8188: --- paul-rogers commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1887901054 Did the recent EVF revisions allow the tests for this PR to pass? Is there anything that is still missing? Also, did the excitement over my botched merge settle down and are we good now? > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805457#comment-17805457 ] ASF GitHub Bot commented on DRILL-8188: --- jnturton commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1886695348 > @paul-rogers I attempted to fix. I kind of suck at git, so I think it's more or less correct now, but there was probably a better way to do this. Just workng through the review comments that @paul-rogers left (the ones unrelated to the needed functionality that was missing from EVF2). > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805188#comment-17805188 ] ASF GitHub Bot commented on DRILL-8188: --- cgivre commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1885044910 @jnturton I did as you suggested. Would you mind please taking a look? > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804917#comment-17804917 ] ASF GitHub Bot commented on DRILL-8474: --- mbeckerle commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1883962208 > @mbeckerle With respect to style, I tried to reply to that comment, but the thread won't let me. In any event, Drill classes will typically start with the constructor, then have whatever methods are appropriate for the class. The logger creation usually happens before the constructor. I think all of your other classes followed this format, so the one or two that didn't kind of jumped out at me. @cgivre I believe the style issues are all fixed. The build did not get any codestyle issues. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804572#comment-17804572 ] ASF GitHub Bot commented on DRILL-8375: --- jnturton commented on PR #2867: URL: https://github.com/apache/drill/pull/2867#issuecomment-1882408865 > @cgivre, the `.asf.yaml` file you mentioned has lots of metadata, but does not actually prevent a force push. Perhaps we are missing something? It would generally be a good idea to forbid such things to prevent catastrophic mistakes. Oh that's interesting. Something's changed since I last went through this with @vvysotskyi to do something that could only be done on master, perhaps testing of the automatic snapshot artifact publishing which requires access to GitHub Actions secrets. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804571#comment-17804571 ] ASF GitHub Bot commented on DRILL-8188: --- jnturton commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1882403741 I see Git's "patch contents already upstream" feature doesn't automatically clean up the unwanted commits. I've dropped them manually in a new branch in my fork and now suggest ``` git reset --hard origin/master git pull --rebase https://github.com/jnturton/drill.git 8188-hdf5-evf2 git push --force # to luocooong's fork ``` > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804568#comment-17804568 ] ASF GitHub Bot commented on DRILL-8188: --- jnturton commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1882389703 > @paul-rogers I attempted to fix. I kind of suck at git, so I think it's more or less correct now, but there was probably a better way to do this. I think you still want something like ``` git pull --rebase upstream master git push --force-with-lease ``` > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804563#comment-17804563 ] ASF GitHub Bot commented on DRILL-8188: --- cgivre commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1882377677 @paul-rogers I attempted to fix. I kind of suck at git, so I think it's more or less correct now, but there was probably a better way to do this. > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804552#comment-17804552 ] ASF GitHub Bot commented on DRILL-8188: --- paul-rogers commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1882246203 It seems you did this work on top of the master with my unsquashed commits. When you try to push, those commits come along for the ride. I think you should grab the latest master, then rebase your branch on it. Plan B is to a) grab the latest master, and b) create a new branch that cherry-picks the commit(s) you meant to add. If even this doesn't work, then I'll clean up this branch for you since I created the mess in the first place... > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2
[ https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804547#comment-17804547 ] ASF GitHub Bot commented on DRILL-8188: --- cgivre commented on PR #2515: URL: https://github.com/apache/drill/pull/2515#issuecomment-1882168615 I think I hosed the version control somehow This PR should only modify a few files in the HDF5 reader. > Convert HDF5 format to EVF2 > --- > > Key: DRILL-8188 > URL: https://issues.apache.org/jira/browse/DRILL-8188 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.20.0 >Reporter: Cong Luo >Assignee: Cong Luo >Priority: Major > > Use EVF V2 instead of old V1. > Also, fixed a few bugs in V2 framework. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804546#comment-17804546 ] ASF GitHub Bot commented on DRILL-8375: --- cgivre commented on PR #2867: URL: https://github.com/apache/drill/pull/2867#issuecomment-1882148031 > I successfully squashed the commits, and provided a proper commit message, while preserving the later commit. Did a force push to master to rewrite history. > > You should update your own master to pick up the revised history. You are a braver man than I. > > @cgivre, the `.asf.yaml` file you mentioned has lots of metadata, but does not actually prevent a force push. Perhaps we are missing something? It would generally be a good idea to forbid such things to prevent catastrophic mistakes. Thanks for flagging... I'm not sure how to do that, but I'll investigate. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804507#comment-17804507 ] ASF GitHub Bot commented on DRILL-8375: --- paul-rogers commented on PR #2867: URL: https://github.com/apache/drill/pull/2867#issuecomment-1881965757 I successfully squashed the commits, and provided a proper commit message, while preserving the later commit. Did a force push to master to rewrite history. You should update your own master to pick up the revised history. @cgivre, the `.asf.yaml` file you mentioned has lots of metadata, but does not actually prevent a force push. Perhaps we are missing something? It would generally be a good idea to forbid such things to prevent catastrophic mistakes. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804317#comment-17804317 ] ASF GitHub Bot commented on DRILL-8375: --- cgivre commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444727956 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: Just to be clear... I was just saying that if this is a major headache and you don't want to deal with it, my vote is to leave it alone. If it isn't a big headache and you want to, I have no issues there as well. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804276#comment-17804276 ] ASF GitHub Bot commented on DRILL-8375: --- jnturton commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444618000 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: @paul-rogers, @cgivre [commented that he's in favour of leaving master as is](https://github.com/apache/drill/pull/2866#issuecomment-1880409413) and I've since merged [a commit on top](https://github.com/apache/drill/commit/f5fb7f5a4023651252afb1f907311d71840eb144). I do think it would still be feasible for us to go back and squash (for exactly the reason you give) but at this point we could also just leave it where it is? P.S. The process is a little laborious. A conventional commit switching off master branch protection in .asf.yaml, then the force push doing the clean up and switching master branch protection back on, the latter being achievable in the same breath by simply dropping the switch-on commit. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804270#comment-17804270 ] ASF GitHub Bot commented on DRILL-8375: --- jnturton commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444618000 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: @paul-rogers, @cgivre [commented that he's in favour of leaving master as is](https://github.com/apache/drill/pull/2866#issuecomment-1880409413) and I've since merged [a commit on top](https://github.com/apache/drill/commit/f5fb7f5a4023651252afb1f907311d71840eb144). I do think it would still be feasible for us to go back and squash (for exactly the reason you give) but at this point we could also just leave it where it is? P.S. The process is a little laborious. A conventional commit switching off master branch protection in .asf.yaml, then the force push doing the clean up and switching master branch protection back on. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1
[ https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804258#comment-17804258 ] ASF GitHub Bot commented on DRILL-8415: --- jnturton merged PR #2866: URL: https://github.com/apache/drill/pull/2866 > Upgrade Jackson 2.14.3 → 2.16.1 > --- > > Key: DRILL-8415 > URL: https://issues.apache.org/jira/browse/DRILL-8415 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.21.1 >Reporter: PJ Fanning >Priority: Major > Fix For: 1.22.0 > > > I'm not advocating for an upgrade to [Jackson > 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. > 2.15.0-rc1 has just been released and 2.15.0 should be out soon. > There are some security focused enhancements including a new class called > StreamReadConstraints. The defaults on > [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] > are pretty high but it is not inconceivable that some Drill users might need > to relax them. Parsing large strings as numbers is sub-quadratic, thus the > default limit of 1000 chars or bytes (depending on input context). > When the Drill team consider upgrading to Jackson 2.15 or above, you might > also want to consider adding some way for users to configure the > StreamReadConstraints. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804150#comment-17804150 ] ASF GitHub Bot commented on DRILL-8375: --- paul-rogers commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444244003 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: My bad. My other project likes to leave these in master; I forgot Drill does not. Since there is not much activity, I can squash the commits within the master branch and do a force push. Normally that is a big NO NO in active projects, but it should not actually cause problems here. I'll go ahead and do that tomorrow unless anyone objects. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804151#comment-17804151 ] ASF GitHub Bot commented on DRILL-8375: --- paul-rogers commented on PR #2867: URL: https://github.com/apache/drill/pull/2867#issuecomment-1880502847 Backporting should be safe: as safe as having the change in master itself. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804137#comment-17804137 ] ASF GitHub Bot commented on DRILL-8375: --- jnturton commented on PR #2867: URL: https://github.com/apache/drill/pull/2867#issuecomment-1880450409 I think we can regard this as a bug fix to framework code already present in 1.21 and therefore backport it. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Affects Versions: 1.21.1 >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > Fix For: 1.21.2 > > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804123#comment-17804123 ] ASF GitHub Bot commented on DRILL-8375: --- jnturton commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444185697 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: P.S. I'm happy to live with the WIP commits in master too, just asking what folks think. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1
[ https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804122#comment-17804122 ] ASF GitHub Bot commented on DRILL-8415: --- cgivre commented on PR #2866: URL: https://github.com/apache/drill/pull/2866#issuecomment-1880409413 > I haven't rebased this yet in case we decide to squash the WIP commits that were merged into master. Once a decision is made either way this can be rebased and a CI run obtained. I'm fine with leaving the WIP commits as long as we don't make a habit out of it. It's probably more of a hassle to undo the PR, squash the commits and re-merge them. > Upgrade Jackson 2.14.3 → 2.16.1 > --- > > Key: DRILL-8415 > URL: https://issues.apache.org/jira/browse/DRILL-8415 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.21.1 >Reporter: PJ Fanning >Priority: Major > Fix For: 1.22.0 > > > I'm not advocating for an upgrade to [Jackson > 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. > 2.15.0-rc1 has just been released and 2.15.0 should be out soon. > There are some security focused enhancements including a new class called > StreamReadConstraints. The defaults on > [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] > are pretty high but it is not inconceivable that some Drill users might need > to relax them. Parsing large strings as numbers is sub-quadratic, thus the > default limit of 1000 chars or bytes (depending on input context). > When the Drill team consider upgrading to Jackson 2.15 or above, you might > also want to consider adding some way for users to configure the > StreamReadConstraints. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1
[ https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804121#comment-17804121 ] ASF GitHub Bot commented on DRILL-8415: --- jnturton commented on PR #2866: URL: https://github.com/apache/drill/pull/2866#issuecomment-1880408041 I haven't rebased this yet in case we decide to squash the WIP commits that were merged into master. Once a decision is made either way this can be rebased and a CI run obtained. > Upgrade Jackson 2.14.3 → 2.16.1 > --- > > Key: DRILL-8415 > URL: https://issues.apache.org/jira/browse/DRILL-8415 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.21.1 >Reporter: PJ Fanning >Priority: Major > Fix For: 1.22.0 > > > I'm not advocating for an upgrade to [Jackson > 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. > 2.15.0-rc1 has just been released and 2.15.0 should be out soon. > There are some security focused enhancements including a new class called > StreamReadConstraints. The defaults on > [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] > are pretty high but it is not inconceivable that some Drill users might need > to relax them. Parsing large strings as numbers is sub-quadratic, thus the > default limit of 1000 chars or bytes (depending on input context). > When the Drill team consider upgrading to Jackson 2.15 or above, you might > also want to consider adding some way for users to configure the > StreamReadConstraints. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1
[ https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804120#comment-17804120 ] ASF GitHub Bot commented on DRILL-8415: --- cgivre commented on PR #2866: URL: https://github.com/apache/drill/pull/2866#issuecomment-1880407190 @jnturton This looks good however there is a merge conflict. Can you please resolve so that we can run the CI? > Upgrade Jackson 2.14.3 → 2.16.1 > --- > > Key: DRILL-8415 > URL: https://issues.apache.org/jira/browse/DRILL-8415 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.21.1 >Reporter: PJ Fanning >Priority: Major > Fix For: 1.22.0 > > > I'm not advocating for an upgrade to [Jackson > 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. > 2.15.0-rc1 has just been released and 2.15.0 should be out soon. > There are some security focused enhancements including a new class called > StreamReadConstraints. The defaults on > [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] > are pretty high but it is not inconceivable that some Drill users might need > to relax them. Parsing large strings as numbers is sub-quadratic, thus the > default limit of 1000 chars or bytes (depending on input context). > When the Drill team consider upgrading to Jackson 2.15 or above, you might > also want to consider adding some way for users to configure the > StreamReadConstraints. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1
[ https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804119#comment-17804119 ] ASF GitHub Bot commented on DRILL-8415: --- jnturton commented on PR #2866: URL: https://github.com/apache/drill/pull/2866#issuecomment-1880406941 I starting adding congifuration support for the new StreamReadConstraints, first globally and then just in the JSON reader, but I got stopped by a sense of YAGNI. It's hard to imagine someone who will need something beyond the default values in Jackson and more configuration is more complexity that users must contend with. So my opinion at this point is that we should only add that configurability if someone asks for it... > Upgrade Jackson 2.14.3 → 2.16.1 > --- > > Key: DRILL-8415 > URL: https://issues.apache.org/jira/browse/DRILL-8415 > Project: Apache Drill > Issue Type: Improvement >Affects Versions: 1.21.1 >Reporter: PJ Fanning >Priority: Major > Fix For: 1.22.0 > > > I'm not advocating for an upgrade to [Jackson > 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. > 2.15.0-rc1 has just been released and 2.15.0 should be out soon. > There are some security focused enhancements including a new class called > StreamReadConstraints. The defaults on > [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html] > are pretty high but it is not inconceivable that some Drill users might need > to relax them. Parsing large strings as numbers is sub-quadratic, thus the > default limit of 1000 chars or bytes (depending on input context). > When the Drill team consider upgrading to Jackson 2.15 or above, you might > also want to consider adding some way for users to configure the > StreamReadConstraints. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804117#comment-17804117 ] ASF GitHub Bot commented on DRILL-8375: --- cgivre commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444180941 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: > Thanks for this Paul! We must remember to squash when merging, we got the WIP commits from the feature branch into master. We've all done something similar at some point > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804116#comment-17804116 ] ASF GitHub Bot commented on DRILL-8375: --- jnturton commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444180526 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: Thanks for this Paul! We must remember to squash when merging, we got the WIP commits from the feature branch into master. We've all done something similar at some point > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804115#comment-17804115 ] ASF GitHub Bot commented on DRILL-8375: --- jnturton commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444180526 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: Thanks for this Paul! We must remember to squash when merging, we got the WIP commits from feature branch into master. We've all done something similar at some point > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804074#comment-17804074 ] ASF GitHub Bot commented on DRILL-8375: --- paul-rogers commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444117648 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: The UNION type has been discussed for as long as I've been involved in the project: since 2016. The idea is simple: Drill should be able to read any kind of JSON, and UNION (plus LIST, etc.) have been essential for this. The problem, as we've also discussed for a long time, is that UNION barely works, and JDBC and similar clients can't make sense of it. That is, UNION turned out to be the wrong solution to the problem. Still, there is always the hope that UNION can be made to work > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804050#comment-17804050 ] ASF GitHub Bot commented on DRILL-8375: --- luocooong commented on code in PR #2867: URL: https://github.com/apache/drill/pull/2867#discussion_r1444101786 ## exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java: ## @@ -0,0 +1,455 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.drill.exec.physical.resultSet.impl; + +import static org.junit.Assert.assertEquals; +import static org.junit.Assert.assertFalse; +import static org.junit.Assert.assertTrue; + +import java.util.List; + +import org.apache.drill.common.expression.SchemaPath; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.physical.resultSet.ResultSetLoader; +import org.apache.drill.exec.physical.resultSet.RowSetLoader; +import org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions; +import org.apache.drill.exec.physical.resultSet.project.Projections; +import org.apache.drill.exec.physical.rowSet.RowSet; +import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet; +import org.apache.drill.exec.physical.rowSet.RowSetTestUtils; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.apache.drill.exec.vector.accessor.ArrayWriter; +import org.apache.drill.exec.vector.accessor.ScalarWriter; +import org.apache.drill.exec.vector.accessor.TupleWriter; +import org.apache.drill.test.SubOperatorTest; +import org.apache.drill.test.rowSet.RowSetUtilities; +import org.junit.Test; + +/** + * Verify the correct functioning of the "dummy" columns created + * for unprojected columns. + */ +public class TestResultSetLoaderUnprojected extends SubOperatorTest { Review Comment: This test case provides a good use guide. In addition, will it be possible for us to remove Union completely from the data type in the future? > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803993#comment-17803993 ] ASF GitHub Bot commented on DRILL-8375: --- paul-rogers merged PR #2867: URL: https://github.com/apache/drill/pull/2867 > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803992#comment-17803992 ] ASF GitHub Bot commented on DRILL-8375: --- paul-rogers commented on PR #2867: URL: https://github.com/apache/drill/pull/2867#issuecomment-1880146814 Thanks @cgivre for the comments and review. @luocooong, I'll commit this. When convenient, please see if this addresses the issue you raised long ago. Otherwise, these capabilities are available for the next person who is seduced into trying out the UNION-based types. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803984#comment-17803984 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1880110452 > > @mbeckerle I had a thought about your TODO list. See inline. > > > This is ready for a next review. All the scalar types are now implemented with typed setter calls. > > > The prior review comments have all been addressed I believe. > > > Remaining things to do include: > > > > > > 1. How to get the compiled DFDL schema object so it can be loaded by daffodil out at the distributed Drill nodes. > > > > > > I was thinking about this and I remembered something that might be useful. Drill has support for User Defined Functions (UDF) which are written in Java. To add a UDF to Drill, you also have to write some Java classes in a particular way, and include the JARs. Much like the DFDL class files, the UDF JARs must be accessible to all nodes of a Drill cluster. > > Additionally, Drill has the capability of adding UDFs dynamically. This feature was added here: #574. Anyway, I wonder if we could use a similar mechanism to load and store the DFDL files so that they are accessible to all Drill nodes. What do you think? > > Excellent: So drill has all the machinery, it's just a question of repackaging it so it's available for this usage pattern, which is a bit different from Drill's UDFs, but also very similar. > > There are two user scenarios which we can call production and test. > > 1. Production: binary compiled DFDL schema file + code jars for Daffodil's own UDFs and "layers" plugins. This should, ideally, cache the compiled schema and not reload it for every query (at every node), but keep the same loaded instance in memory in a persistant JVM image on each node. For large production DFDL schemas this is the only sensible mechanism as it can take minutes to compile large DFDL schemas. > 2. Test: on-the-fly centralized compilation of DFDL schema (from a combination of jars and files) to create and cache (to avoid recompiling) the binary compiled DFDL schema file. Then using that compiled binary file, as item 1. For small DFDL schemas this can be fast enough for production use. Ideally, if the DFDL schema is unchanged this would reuse the compiled binary file, but that's an optimization that may not matter much. > > Kinds of objects involved are: > > * Daffodil plugin code jars > * DFDL schema jars > * DFDL schema files (just not packaged into a jar) > * Daffodil compiled schema binary file > * Daffodil config file - parameters, tunables, and options needed at compile time and/or runtime > > Code jars: Daffodil provides two extension features for DFDL users - DFDL UDFs and DFDL 'layers' (ex: plug-ins for uudecode, or gunzip algorithms used in part of the data format). Those are ordinary compiled class files in jars, so in all scenarios those jars are needed on the node class path if the DFDL schema uses them. Daffodil dynamically finds and loads these from the classpath in regular Java Service-Provider Interface (SPI) mechanisms. > > Schema jars: Daffodil packages DFDL schema files (source files i.e., mySchema.dfdl.xsd) into jar files to allow inter-schema dependencies to be managed using ordinary jar/java-style managed dependencies. Tools like sbt and maven can express the dependencies of one schema on another, grab and pull them together, etc. Daffodil has a resolver so when one schema file referenes another with include/import it searches the class path directories and jars for the files. > > Schema jars are only needed centrally when compiling the schema to a binary file. All references to the jar files for inter-schema file references are compiled into the compiled binary file. > > It is possible for one DFDL schema 'project' to define a DFDL schema, along with the code for a plugin like a Daffodil UDF or layer. In that case the one jar created is both a code jar and a schema jar. The schema jar aspects are used when the schema is compiled and ignored at Daffodil runtime. The code jar aspects are used at Daffodil run time and ignored at schema compilation time. So such a jar that is both code and schema jar needs to be on the class path in both places, but there's no interaction of the two things. > > Binary Compiled Schema File: Centrally, DFDL schemas in files and/or jars are compiled to create a single binary object which can be reloaded in order to actually use the schema to parse/unparse data. > > * These binary files are tied to a specific version+build of Daffodil. (They are just a java object serialization of the runtime data structures used by Daffodil). > * Once reloaded i
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803983#comment-17803983 ] ASF GitHub Bot commented on DRILL-8474: --- cgivre commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1880109717 @mbeckerle With respect to style, I tried to reply to that comment, but the thread won't let me. In any event, Drill classes will typically start with the constructor, then have whatever methods are appropriate for the class. The logger creation usually happens before the constructor. I think all of your other classes followed this format, so the one or two that didn't kind of jumped out at me. > Add Daffodil Format Plugin > -- > > Key: DRILL-8474 > URL: https://issues.apache.org/jira/browse/DRILL-8474 > Project: Apache Drill > Issue Type: New Feature >Affects Versions: 1.21.1 >Reporter: Charles Givre >Priority: Major > Fix For: 1.22.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors
[ https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803893#comment-17803893 ] ASF GitHub Bot commented on DRILL-8375: --- cgivre commented on PR #2867: URL: https://github.com/apache/drill/pull/2867#issuecomment-1879917039 Once we merge this, we should also rebase https://github.com/apache/drill/pull/2515 on the current master and merge that as well. > Incomplete support for non-projected complex vectors > > > Key: DRILL-8375 > URL: https://issues.apache.org/jira/browse/DRILL-8375 > Project: Apache Drill > Issue Type: Bug >Reporter: Paul Rogers >Assignee: Paul Rogers >Priority: Major > > The `ResultSetLoader` implementation supports all of Drill's vector types. > However, DRILL-8188 discovered holes in support for non-projected vectors. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803594#comment-17803594 ] ASF GitHub Bot commented on DRILL-8474: --- mbeckerle commented on code in PR #2836: URL: https://github.com/apache/drill/pull/2836#discussion_r1442993784 ## contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java: ## @@ -0,0 +1,229 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.daffodil.schema; + +import org.apache.daffodil.runtime1.api.ChoiceMetadata; +import org.apache.daffodil.runtime1.api.ComplexElementMetadata; +import org.apache.daffodil.runtime1.api.ElementMetadata; +import org.apache.daffodil.runtime1.api.InfosetSimpleElement; +import org.apache.daffodil.runtime1.api.MetadataHandler; +import org.apache.daffodil.runtime1.api.SequenceMetadata; +import org.apache.daffodil.runtime1.api.SimpleElementMetadata; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.record.metadata.MapBuilder; +import org.apache.drill.exec.record.metadata.SchemaBuilder; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import org.slf4j.Logger; +import org.slf4j.LoggerFactory; + +import java.util.Stack; + +/** + * This class transforms a DFDL/Daffodil schema into a Drill Schema. + */ +public class DrillDaffodilSchemaVisitor extends MetadataHandler { + private static final Logger logger = LoggerFactory.getLogger(DrillDaffodilSchemaVisitor.class); + /** + * Unfortunately, SchemaBuilder and MapBuilder, while similar, do not share a base class so we + * have a stack of MapBuilders, and when empty we use the SchemaBuilder Review Comment: Note that this awkwardness effectively doubles the code size of things that interface to Drill. This duplication of similar behavior for schema and map builders (and rowWriters and mapWriters) is expected and typical of systems that start from a tabular view of the data world and later add the features needed for hierachical data. Nevertheless it is awkward when one is dealing entirely with hierarchical data. A MetaBuilder that does the map thing if the builder is a map, and the schema thing if the builder is a schema would eliminate this. This could be an interface mixed into both SchemaBuilder and MapBuilder (could also be called MapBuilderLike). The same discontinuity at the base holds for RowWriter vs. MapWriter in the runtime handling of data. Again it doubles the code size/complexity, every fix goes in 2 places, etc. A MapWriterLike interface could be factored out. Maybe we should build such mechanisms to avoid this, and then use them to improve this Daffodil plugin? ## contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaUtils.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.drill.exec.store.daffodil.schema; + +import org.apache.daffodil.japi.InvalidParserException; +import org.apache.daffodil.japi.DataProcessor; +import org.apache.daffodil.runtime1.api.PrimitiveType; +import org.apache.drill.common.types.TypeProtos.MinorType; +import org.apache.drill.exec.record.metadata.TupleMetadata; +import com.google.common.annotations.VisibleFor
[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin
[ https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803592#comment-17803592 ] ASF GitHub Bot commented on DRILL-8474: --- mbeckerle commented on PR #2836: URL: https://github.com/apache/drill/pull/2836#issuecomment-1878896878 > @mbeckerle I had a thought about your TODO list. See inline. > > > This is ready for a next review. All the scalar types are now implemented with typed setter calls. > > The prior review comments have all been addressed I believe. > > Remaining things to do include: > > > > 1. How to get the compiled DFDL schema object so it can be loaded by daffodil out at the distributed Drill nodes. > > I was thinking about this and I remembered something that might be useful. Drill has support for User Defined Functions (UDF) which are written in Java. To add a UDF to Drill, you also have to write some Java classes in a particular way, and include the JARs. Much like the DFDL class files, the UDF JARs must be accessible to all nodes of a Drill cluster. > > Additionally, Drill has the capability of adding UDFs dynamically. This feature was added here: #574. Anyway, I wonder if we could use a similar mechanism to load and store the DFDL files so that they are accessible to all Drill nodes. What do you think? Excellent: So drill has all the machinery, it's just a question of repackaging it so it's available for this usage pattern, which is a bit different from Drill's UDFs, but also very similar. There are two user scenarios which we can call production and test. 1. Production: binary compiled DFDL schema file + code jars for Daffodil's own UDFs and "layers" plugins. This should, ideally, cache the compiled schema and not reload it for every query (at every node), but keep the same loaded instance in memory in a persistant JVM image on each node. For large production DFDL schemas this is the only sensible mechanism as it can take minutes to compile large DFDL schemas. 2. Test: on-the-fly centralized compilation of DFDL schema (from a combination of jars and files) to create and cache (to avoid recompiling) the binary compiled DFDL schema file. Then using that compiled binary file, as item 1. For small DFDL schemas this can be fast enough for production use. Ideally, if the DFDL schema is unchanged this would reuse the compiled binary file, but that's an optimization that may not matter much. Kinds of objects involved are: - Daffodil plugin code jars - DFDL schema jars - DFDL schema files (just not packaged into a jar) - Daffodil compiled schema binary file - Daffodil config file - parameters, tunables, and options needed at compile time and/or runtime Code jars: Daffodil provides two extension features for DFDL users - DFDL UDFs and DFDL 'layers' (ex: plug-ins for uudecode, or gunzip algorithms used in part of the data format). Those are ordinary compiled class files in jars, so in all scenarios those jars are needed on the node class path if the DFDL schema uses them. Daffodil dynamically finds and loads these from the classpath in regular Java Service-Provider Interface (SPI) mechanisms. Schema jars: Daffodil packages DFDL schema files (source files i.e., mySchema.dfdl.xsd) into jar files to allow inter-schema dependencies to be managed using ordinary jar/java-style managed dependencies. Tools like sbt and maven can express the dependencies of one schema on another, grab and pull them together, etc. Daffodil has a resolver so when one schema file referenes another with include/import it searches the class path directories and jars for the files. Schema jars are only needed centrally when compiling the schema to a binary file. All references to the jar files for inter-schema file references are compiled into the compiled binary file. It is possible for one DFDL schema 'project' to define a DFDL schema, along with the code for a plugin like a Daffodil UDF or layer. In that case the one jar created is both a code jar and a schema jar. The schema jar aspects are used when the schema is compiled and ignored at Daffodil runtime. The code jar aspects are used at Daffodil run time and ignored at schema compilation time. So such a jar that is both code and schema jar needs to be on the class path in both places, but there's no interaction of the two things. Binary Compiled Schema File: Centrally, DFDL schemas in files and/or jars are compiled to create a single binary object which can be reloaded in order to actually use the schema to parse/unparse data. - These binary files are tied to a specific version+build of Daffodil. (They are just a java object serialization of the runtime data structures used by Daffodil). - Once reloaded into a JVM to create a Daffodil DataProcessor object, t
[jira] [Commented] (DRILL-8465) Check data input when loading iceberg data
[ https://issues.apache.org/jira/browse/DRILL-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803578#comment-17803578 ] ASF GitHub Bot commented on DRILL-8465: --- jnturton commented on PR #2853: URL: https://github.com/apache/drill/pull/2853#issuecomment-1878847019 Got it @pjfanning. Let's discuss further in the right forum. > Check data input when loading iceberg data > -- > > Key: DRILL-8465 > URL: https://issues.apache.org/jira/browse/DRILL-8465 > Project: Apache Drill > Issue Type: Improvement > Components: Security, Storage - Iceberg >Affects Versions: 1.21.1 >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Fix For: 1.21.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8465) Check data input when loading iceberg data
[ https://issues.apache.org/jira/browse/DRILL-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803514#comment-17803514 ] ASF GitHub Bot commented on DRILL-8465: --- pjfanning commented on PR #2853: URL: https://github.com/apache/drill/pull/2853#issuecomment-1878512092 The short background to this in this link - https://lists.apache.org/thread/vpjz467rg8449m63v1n9nl3o56twwyzt (a private thread requiring ASF login). I'm no expert on Iceberg or the Drill Iceberg Plugin but I was hoping to maybe engage with someone who knows more about how they work and to get an understanding of if we need some constraints. Due to the security aspect of this, I'm not too comfortable going into more detail here. > Check data input when loading iceberg data > -- > > Key: DRILL-8465 > URL: https://issues.apache.org/jira/browse/DRILL-8465 > Project: Apache Drill > Issue Type: Improvement > Components: Security, Storage - Iceberg >Affects Versions: 1.21.1 >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Fix For: 1.21.2 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (DRILL-8465) Check data input when loading iceberg data
[ https://issues.apache.org/jira/browse/DRILL-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803477#comment-17803477 ] ASF GitHub Bot commented on DRILL-8465: --- jnturton commented on PR #2853: URL: https://github.com/apache/drill/pull/2853#issuecomment-1878400693 I've started looking at this. First question: if we're adding dynamically loaded class checks to protect against untrusted code then is checking the package name worth much? Or do we need to do something like verify signatures against a list of trusted keys? Second question: if this is about security then is the code we're loading actually untrusted or is it only ever loaded from serialisations that we produced ourselves (e.g. in IcebergWorkSerializer)? P.S. Please include this "Why we're doing this" background that I'm lacking in the Jira issue when it's nontrivial. > Check data input when loading iceberg data > -- > > Key: DRILL-8465 > URL: https://issues.apache.org/jira/browse/DRILL-8465 > Project: Apache Drill > Issue Type: Improvement > Components: Storage - Iceberg >Reporter: PJ Fanning >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010)