[jira] [Commented] (DRILL-8480) Cleanup before finished. 0 out of 1 streams have finished

2024-03-28 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831903#comment-17831903
 ] 

ASF GitHub Bot commented on DRILL-8480:
---

rymarm opened a new pull request, #2897:
URL: https://github.com/apache/drill/pull/2897

   
   
   # [DRILL-8480](https://issues.apache.org/jira/browse/DRILL-8480): Make 
Nested Loop Join operator properly process empty batches and batches with new 
schema
   
   ## Description
   Nested Loop Join operator (`NestedLoopJoinBatch`, `NestedLoopJoin`) 
unproperly handles batch iteration outcome `OK` with 0 records. Drill design of 
the processing of batches involves 5 states:
   * `NONE` (batch can have only 0 records)
   * `OK` (batch can have 0+ records)
   * `OK_NEW_SCHEMA` (batch can have 0+ records)
   * `NOT_YET` (undefined)
   * `EMIT` (batch can have 0+ records)
   The Nested Loop Join operator in some circumstances could receive `OK` 
outcome with 0 records, and instead of requesting the next batch, the operator 
stops data processing and returns `NONE` outcome to upstream batches(operators) 
without freeing resources of underlying batches.
   
   
   ## Documentation
   -
   
   ## Testing
   Manual testing with a file from the Jira ticket 
[DRILL-8480](https://issues.apache.org/jira/browse/DRILL-8480)




> Cleanup before finished. 0 out of 1 streams have finished
> -
>
> Key: DRILL-8480
> URL: https://issues.apache.org/jira/browse/DRILL-8480
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Maksym Rymar
>Assignee: Maksym Rymar
>Priority: Major
> Attachments: 1a349ff1-d1f9-62bf-ed8c-26346c548005.sys.drill, 
> tableWithNumber2.parquet
>
>
> Drill fails to execute a query with the following exception:
> {code:java}
> org.apache.drill.common.exceptions.UserException: SYSTEM ERROR: 
> IllegalStateException: Cleanup before finished. 0 out of 1 streams have 
> finished
> Fragment: 1:0
> Please, refer to logs for more information.
> [Error Id: 270da8f4-0bb6-4985-bf4f-34853138881c on 
> compute7.vmcluster.com:31010]
>         at 
> org.apache.drill.common.exceptions.UserException$Builder.build(UserException.java:657)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.sendFinalState(FragmentExecutor.java:395)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:245)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.run(FragmentExecutor.java:362)
>         at 
> org.apache.drill.common.SelfCleaningRunnable.run(SelfCleaningRunnable.java:38)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>         at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>         at java.base/java.lang.Thread.run(Thread.java:829)
> Caused by: java.lang.IllegalStateException: Cleanup before finished. 0 out of 
> 1 streams have finished
>         at 
> org.apache.drill.exec.work.batch.BaseRawBatchBuffer.close(BaseRawBatchBuffer.java:111)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:71)
>         at 
> org.apache.drill.exec.work.batch.AbstractDataCollector.close(AbstractDataCollector.java:121)
>         at 
> org.apache.drill.common.AutoCloseables.close(AutoCloseables.java:91)
>         at 
> org.apache.drill.exec.work.batch.IncomingBuffers.close(IncomingBuffers.java:144)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.java:581)
>         at 
> org.apache.drill.exec.ops.FragmentContextImpl.close(FragmentContextImpl.java:567)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.closeOutResources(FragmentExecutor.java:417)
>         at 
> org.apache.drill.exec.work.fragment.FragmentExecutor.cleanup(FragmentExecutor.java:240)
>         ... 5 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Cleanup before finished. 
> 0 out of 1 streams have finished
>                 ... 15 common frames omitted
>         Suppressed: java.lang.IllegalStateException: Memory was leaked by 
> query. Memory leaked: (32768)
> Allocator(op:1:0:8:UnorderedReceiver) 100/32768/32768/100 
> (res/actual/peak/limit)
>                 at 
> org.apache.drill.exec.memory.BaseAllocator.close(BaseAllocator.java:519)
>                 at 
> org.apache.drill.exec.ops.BaseOperatorContext.close(BaseOperatorContext.java:159)
>                 at 
> org.apache.drill.exec.ops.OperatorContextImpl.close(OperatorContextImpl.java:77)
>                 at 
> org.apache.drill.exec.ops.FragmentContextImpl.suppressingClose(FragmentContextImpl.ja

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17831155#comment-17831155
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on PR #2889:
URL: https://github.com/apache/drill/pull/2889#issuecomment-2021914479

   > @shfshihuafeng Can you please resolve merge conflicts.
   
   it is done




> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830942#comment-17830942
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

cgivre commented on PR #2889:
URL: https://github.com/apache/drill/pull/2889#issuecomment-2020523775

   @shfshihuafeng Can you please resolve merge conflicts.




> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream

2024-03-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830330#comment-17830330
 ] 

ASF GitHub Bot commented on DRILL-8485:
---

shfshihuafeng commented on PR #2891:
URL: https://github.com/apache/drill/pull/2891#issuecomment-2017129746

   > LGTM +1 Thanks @shfshihuafeng for all these memory leak fixes.
   
   I am honored to get your approved.




> HashJoinPOP memory leak is caused by an oom exception when read data from 
> InputStream
> -
>
> Key: DRILL-8485
> URL: https://issues.apache.org/jira/browse/DRILL-8485
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.1
>
>
> when traversing fieldList druing read data from InputStream, if the 
> intermediate process throw exception,we can not release previously 
> constructed vectors. it result in memory leak。
> it is similar to DRILL-8484



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream

2024-03-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830305#comment-17830305
 ] 

ASF GitHub Bot commented on DRILL-8485:
---

cgivre merged PR #2891:
URL: https://github.com/apache/drill/pull/2891




> HashJoinPOP memory leak is caused by an oom exception when read data from 
> InputStream
> -
>
> Key: DRILL-8485
> URL: https://issues.apache.org/jira/browse/DRILL-8485
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.1
>
>
> when traversing fieldList druing read data from InputStream, if the 
> intermediate process throw exception,we can not release previously 
> constructed vectors. it result in memory leak。
> it is similar to DRILL-8484



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8485) HashJoinPOP memory leak is caused by an oom exception when read data from InputStream

2024-03-24 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830261#comment-17830261
 ] 

ASF GitHub Bot commented on DRILL-8485:
---

shfshihuafeng opened a new pull request, #2891:
URL: https://github.com/apache/drill/pull/2891

   …n read data from InputStream
   
   # [DRILL-8485](https://issues.apache.org/jira/browse/DRILL-8485): 
HashJoinPOP memory leak is caused by an oom exception when read data from 
InputStream
   
   ## Description
   
   it is similar to 
[DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484)
   
   **exception info** 
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 15364) due to memory limit 
(41943040). Current allocation: 4337664
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.memory.BaseAllocator.read(BaseAllocator.java:856)
   ```
   **leak  info**
   
   ```
 Allocator(frag:5:1) 500/100/27824128/40041943040 
(res/actual/peak/limit)
 child allocators: 1
   Allocator(op:5:1:1:HashJoinPOP) 100/16384/22822912/41943040 
(res/actual/peak/limit)
 child allocators: 0
 ledgers: 2
   ledger[442780] allocator: op:5:1:1:HashJoinPOP), isOwning: true, 
size: 8192, references: 2, life: 4486836603491..0, allocatorManager: [390894, 
life: 4486836601180..0] holds 4 buffers. 
   DrillBuf[458469], udle: [390895 1024..8192]
event log for: DrillBuf[458469]
   ```
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   The testing method for drill-8485 is the similar as for 
[DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484). we can throw 
exception  in the method readVectors
   




> HashJoinPOP memory leak is caused by an oom exception when read data from 
> InputStream
> -
>
> Key: DRILL-8485
> URL: https://issues.apache.org/jira/browse/DRILL-8485
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.1
>
>
> when traversing fieldList druing read data from InputStream, if the 
> intermediate process throw exception,we can not release previously 
> constructed vectors. it result in memory leak。
> it is similar to DRILL-8484



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828185#comment-17828185
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
 when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if actual memory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) ,then it throw exception, like 
following my test。
 user  can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or 
reduce concurrency based on actual  conditions. 
   
   **throw exception code**
   ```
   public DrillBuf buffer(final int initialRequestSize, BufferManager manager) {
   assertOpen();
   
   AllocationOutcome outcome = allocateBytes(actualRequestSize);
   if (!outcome.isOk()) {
 throw new OutOfMemoryException(createErrorMsg(this, actualRequestSize, 
initialRequestSize));
   }
   ```
   **my test scenario**
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828184#comment-17828184
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
 when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if actualmemory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) ,then it throw exception, like 
following my test。
 user  can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or 
reduce concurrency based on actual  conditions. 
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```



##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
 when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if actual memory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) ,then it throw exception, like 
following my test。
 user  can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or 
reduce concurrency based on actual  conditions. 
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMe

[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828183#comment-17828183
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
 when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if memory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) ,then it throw exception, like 
following my test。
 user  can adjust directMemory parameters (DRILL_MAX_DIRECT_MEMORY) or 
reduce concurrency based on actual  conditions. 
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828182#comment-17828182
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1529743261


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
   when we prepare to allocator memory  using "allocator.buffer(dataLength)" 
for hashjoinPop allocator, if memory > maxAllocation(The parameter is 
calculated  by call computeOperatorMemory) then it throw exception, like 
following my test。user   can adjust directMemory parameters 
(DRILL_MAX_DIRECT_MEMORY) or reduce concurrency based on actual  conditions. 
   
   ```
   Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
allocate buffer of size 16384 (rounded from 14359) due to memory limit 
(41943040). Current allocation: 22583616
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
   at 
org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
   at 
org.apache.drill.exec.cache.VectorAccessibleSerializable.readFromStreamWithContainer(VectorAccessibleSerializable.java:172)
   ```





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17828026#comment-17828026
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

cgivre commented on code in PR #2889:
URL: https://github.com/apache/drill/pull/2889#discussion_r1528802856


##
exec/java-exec/src/main/java/org/apache/drill/exec/cache/VectorAccessibleSerializable.java:
##
@@ -155,12 +157,18 @@ public void readFromStreamWithContainer(VectorContainer 
myContainer, InputStream
 for (SerializedField metaData : fieldList) {
   final int dataLength = metaData.getBufferLength();
   final MaterializedField field = MaterializedField.create(metaData);
-  final DrillBuf buf = allocator.buffer(dataLength);
-  final ValueVector vector;
+  DrillBuf buf = null;
+  ValueVector vector = null;
   try {
+buf = allocator.buffer(dataLength);
 buf.writeBytes(input, dataLength);
 vector = TypeHelper.getNewVector(field, allocator);
 vector.load(metaData, buf);
+  } catch (OutOfMemoryException oom) {
+for (ValueVector valueVector : vectorList) {
+  valueVector.clear();
+}
+throw UserException.memoryError(oom).message("Allocator memory 
failed").build(logger);

Review Comment:
   Do we know what would cause an error like this?  If so what would the user 
need to do to fix this?





> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/22822912/41943040 
> (res/actual/peak/limit)
>           child allocators: 0
>           ledgers: 2
>             ledger[1882757] allocator: op:5:0:1:HashJoinPOP), isOwning: true, 
> size: 8192, references: 2, life: 16936270178816167..0, allocatorManager: 
> [1703465, life: 16936270178813617..0] holds 4 buffers.
>                 DrillBuf[2041995], udle: [1703441 0..957]{code}
> {code:java}
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8484) HashJoinPOP memory leak is caused by an oom exception when read data from Stream with container

2024-03-18 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17827949#comment-17827949
 ] 

ASF GitHub Bot commented on DRILL-8484:
---

shfshihuafeng opened a new pull request, #2889:
URL: https://github.com/apache/drill/pull/2889

   …en read data from Stream with container
   
   # [DRILL-8484](https://issues.apache.org/jira/browse/DRILL-8484): 
HashJoinPOP memory leak is caused by  an oom exception when read data from 
Stream with container 
   
   ## Description
   
   
   
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   You can add debugging code to reproduce this scenario as following or test 
tpch
   like [drill8483](https://github.com/apache/drill/pull/2888)
   **(1) debug code**
   ```
 public void readFromStreamWithContainer(VectorContainer myContainer, 
InputStream input) throws IOException {
   final VectorContainer container = new VectorContainer();
   final UserBitShared.RecordBatchDef batchDef = 
UserBitShared.RecordBatchDef.parseDelimitedFrom(input);
   recordCount = batchDef.getRecordCount();
   if (batchDef.hasCarriesTwoByteSelectionVector() && 
batchDef.getCarriesTwoByteSelectionVector()) {
   
 if (sv2 == null) {
   sv2 = new SelectionVector2(allocator);
 }
 sv2.allocateNew(recordCount * SelectionVector2.RECORD_SIZE);
 sv2.getBuffer().setBytes(0, input, recordCount * 
SelectionVector2.RECORD_SIZE);
 svMode = BatchSchema.SelectionVectorMode.TWO_BYTE;
   }
   final List vectorList = Lists.newArrayList();
   final List fieldList = batchDef.getFieldList();
   int i = 0;
   for (SerializedField metaData : fieldList) {
 i++;
 final int dataLength = metaData.getBufferLength();
 final MaterializedField field = MaterializedField.create(metaData);
 final DrillBuf buf = allocator.buffer(dataLength);
 ValueVector vector = null;
 try {
   buf.writeBytes(input, dataLength);
   vector = TypeHelper.getNewVector(field, allocator);
   if (i == 3) {
 logger.warn("shf test memory except");
 throw new OutOfMemoryException("test memory except");
   }
   vector.load(metaData, buf);
 } catch (Exception e) {
   if (vectorList.size() > 0 ) {
 for (ValueVector valueVector : vectorList) {
   DrillBuf[] buffers = valueVector.getBuffers(false);
   logger.warn("shf leak buffers " + Arrays.asList(buffers));
   // valueVector.clear();
 }
   }
   throw e;
 } finally {
   buf.release();
 }
 vectorList.add(vector);
   }
   ```
   **(2) run following sql  (tpch8)**
   
   ```
   select
   o_year,
   sum(case when nation = 'CHINA' then volume else 0 end) / sum(volume) as 
mkt_share
   from (
   select
   extract(year from o_orderdate) as o_year,
   l_extendedprice * 1.0 as volume,
   n2.n_name as nation
   from hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
hive.tpch1s.nation n2, hive.tpch1s.region
   where
   p_partkey = l_partkey
   and s_suppkey = l_suppkey
   and l_orderkey = o_orderkey
   and o_custkey = c_custkey
   and c_nationkey = n1.n_nationkey
   and n1.n_regionkey = r_regionkey
   and r_name = 'ASIA'
   and s_nationkey = n2.n_nationkey
   and o_orderdate between date '1995-01-01'
   and date '1996-12-31'
   and p_type = 'LARGE BRUSHED BRASS') as all_nations
   group by o_year
   order by o_year;
   ```
   **(3) you  find  memory leak  ,but there is no sql**
   
   https://github.com/apache/drill/assets/25974968/e716ab12-4eeb-4a69-9c0f-07664bcb80a4";>
   




> HashJoinPOP memory leak is caused by  an oom exception when read data from 
> Stream with container 
> -
>
> Key: DRILL-8484
> URL: https://issues.apache.org/jira/browse/DRILL-8484
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
>
> *Describe the bug*
> An oom exception occurred When read data from Stream with container 
> ,resulting in hashJoinPOP memory leak 
> *To Reproduce*
> prepare data for tpch 1s
>  # 30 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *leak  info* 
> {code:java}
>    Allocator(frag:5:0) 500/100/31067136/40041943040 
> (res/actual/peak/limit)
>       child allocators: 1
>         Allocator(op:5:0:1:HashJoinPOP) 100/16384/2282

[jira] [Commented] (DRILL-8483) SpilledRecordBatch memory leak when the program threw an exception during the process of building a hash table

2024-03-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17824832#comment-17824832
 ] 

ASF GitHub Bot commented on DRILL-8483:
---

shfshihuafeng opened a new pull request, #2888:
URL: https://github.com/apache/drill/pull/2888

   …exception during the process of building a hash table (#2887)
   
   # [DRILL-8483](https://issues.apache.org/jira/browse/DRILL-8483): 
SpilledRecordBatch memory leak when the program threw an exception during the 
process of building a hash table
   
   (Please replace `PR Title` with actual PR Title)
   
   ## Description
   
   During the process of reading data from disk to building hash tables in 
memory, if an exception is thrown, it will result in a memory  
SpilledRecordBatch leak
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   prepare data for tpch 1s
   1. 30 concurrent for tpch sql8
   2. set direct memory 5g
   3. when it had OutOfMemoryException , stopped all sql.
   4. finding memory leak
   
   test script
   
   ```
   random_sql(){
   #for i in `seq 1 3`
   while true
   do
   
 num=$((RANDOM%22+1))
 if [ -f $fileName ]; then
 echo "$fileName" " is exit"
 exit 0
 else
 $drill_home/sqlline -u \"jdbc:drillr:zk=ip:2181/drillbits_shf\" -f 
tpch_sql8.sql >> sql8.log 2>&1
 fi
   done
   }
   
   main(){
   #sleep 2h
   
   #TPCH power test
   for i in `seq 1 30`
   do
   random_sql &
   done
   }
   ```




> SpilledRecordBatch memory leak when the program threw an exception during the 
> process of building a hash table
> --
>
> Key: DRILL-8483
> URL: https://issues.apache.org/jira/browse/DRILL-8483
> Project: Apache Drill
>  Issue Type: Bug
>  Components:  Server
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
>
> During the process of reading data from disk to building hash tables in 
> memory, if an exception is thrown, it will result in a memory  
> SpilledRecordBatch leak
> exception log as following
> {code:java}
> Caused by: org.apache.drill.exec.exception.OutOfMemoryException: Unable to 
> allocate buffer of size 8192 due to memory limit (41943040). Current 
> allocation: 3684352
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:241)
>         at 
> org.apache.drill.exec.memory.BaseAllocator.buffer(BaseAllocator.java:216)
>         at 
> org.apache.drill.exec.vector.VarCharVector.allocateNew(VarCharVector.java:411)
>         at 
> org.apache.drill.exec.vector.NullableVarCharVector.allocateNew(NullableVarCharVector.java:270)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.allocateNewVectorContainer(HashPartition.java:215)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.allocateNewCurrentBatchAndHV(HashPartition.java:238)
>         at 
> org.apache.drill.exec.physical.impl.common.HashPartition.(HashPartition.java:165){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823852#comment-17823852
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

cgivre merged PR #2878:
URL: https://github.com/apache/drill/pull/2878




> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823846#comment-17823846
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513773463


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   stack 
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823843#comment-17823843
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513768876


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {
+  rightIterator.close();
+  throw UserException.executionError(e)

Review Comment:
it throw exception from  method clearInflightBatches() , but it has cleared 
the memory by   clear(); so it does not affect memory leaks  ,see following 
code 
   
   `  public void close() {
   clear();
   clearInflightBatches();
 }`





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823842#comment-17823842
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1513766703


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
add exception info  ?
   ```
   try {
 leftIterator.close();
   } catch (QueryCancelledException qce) {
 throw UserException.executionError(qce)
 .message("Failed when depleting incoming batches, probably because 
query was cancelled " +
 "by executor had some error")
 .build(logger);
   } catch (Exception e) {
 throw UserException.internalError(e)
 .message("Failed when depleting incoming batches")
 .build(logger);
   } finally {
 // todo catch exception info or By default,the exception is thrown 
directly ?
 rightIterator.close();
   }
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823643#comment-17823643
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

cgivre commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512908376


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {
+  rightIterator.close();
+  throw UserException.executionError(e)

Review Comment:
   What happens if the right iterator doesn't close properly?





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823641#comment-17823641
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() 
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
Thi

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823611#comment-17823611
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() 
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
Thi

[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823597#comment-17823597
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512773128


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   @cgivre 
   In my test case ,it throw  "QueryCancelledException",because some 
minorfragment throw .OutOfMemoryException ,so it inform foreman failed.
   
   foreman send "QueryCancel" commands to other minorfragments. it  throws 
QueryCancelledException after the method "incoming.next()"   called method 
checkContinue() method
   
   Although the "checkContinue" phase throws a fixed "QueryCancelledException" 
message, I am not sure what is causing it (In my test case 
,OutOfMemoryException cause exception)
   
   
```
public void clearInflightBatches() {
   while (lastOutcome == IterOutcome.OK || lastOutcome == 
IterOutcome.OK_NEW_SCHEMA) {
 // Clear all buffers from incoming.
 for (VectorWrapper wrapper : incoming) {
   wrapper.getValueVector().clear();
 }
 lastOutcome = incoming.next();
   }
 }
  
  public void checkContinue() {
 if (!shouldContinue()) {
   throw new QueryCancelledException();
 }
   }
 }
   ```
   
   **stack**
   
   ```
   Caused by: org.apache.drill.exec.ops.QueryCancelledException: null
   at 
org.apache.drill.exec.work.fragment.FragmentExecutor$ExecutorStateImpl.checkContinue(FragmentExecutor.java:533)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.checkContinue(AbstractRecordBatch.java:278)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:105)
   at 
org.apache.drill.exec.record.AbstractUnaryRecordBatch.innerNext(AbstractUnaryRecordBatch.java:59)
   at 
org.apache.drill.exec.record.AbstractRecordBatch.next(AbstractRecordBatch.java:165)
   at 
org.apache.drill.exec.record.RecordIterator.clearInflightBatches(RecordIterator.java:359)
   at 
org.apache.drill.exec.record.RecordIterator.close(RecordIterator.java:365)
   at 
org.apache.drill.exec.physical.impl.join.MergeJoinBatch.close(MergeJoinBatch.java:301)
   ```





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  




[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-03-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17823405#comment-17823405
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

cgivre commented on code in PR #2878:
URL: https://github.com/apache/drill/pull/2878#discussion_r1512093859


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/MergeJoinBatch.java:
##
@@ -297,7 +297,14 @@ public void close() {
   batchMemoryManager.getAvgOutputRowWidth(), 
batchMemoryManager.getTotalOutputRecords());
 
 super.close();
-leftIterator.close();
+try {
+  leftIterator.close();
+} catch (Exception e) {

Review Comment:
   Do we know what kind(s) of exceptions to expect here?   Also, can we throw a 
better error message?   Specifically, can we tell the user more information 
about the cause of the crash and how to fix it?





> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node

2024-03-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822878#comment-17822878
 ] 

ASF GitHub Bot commented on DRILL-8482:
---

cgivre merged PR #2885:
URL: https://github.com/apache/drill/pull/2885




> Assign region throw exception when some region is deployed on affinity node 
> and some on non-affinity node
> -
>
> Key: DRILL-8482
> URL: https://issues.apache.org/jira/browse/DRILL-8482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: 
> 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch
>
>
> *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe 
> the bug*
>    Assign region throw exception when some region is deployed on affinity 
> node and some on non-affinity node。
> *To Reproduce*
> Steps to reproduce the behavior:
>  # 
> {code:java}
> NavigableMap regionsToScan = Maps.newTreeMap();
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
> SERVER_D);
> final List endpoints = Lists.newArrayList();
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
> HBaseGroupScan scan = new HBaseGroupScan();
> scan.setRegionsToScan(regionsToScan);
> scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], 
> null));
> scan.applyAssignments(endpoints);{code}
> *Expected behavior*
>  A has 3 regions
>  B has 2 regions
>  C has 3 regions
> *Error detail, log output or screenshots*
> {code:java}
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node

2024-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822730#comment-17822730
 ] 

ASF GitHub Bot commented on DRILL-8482:
---

shfshihuafeng commented on PR #2885:
URL: https://github.com/apache/drill/pull/2885#issuecomment-1974124079

   @cgivre yes , when hbase region  are distributed as follows  , you select * 
from table , we do not get result.
   
   ```
   NavigableMap regionsToScan = Maps.newTreeMap();
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
SERVER_A);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
SERVER_A);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
SERVER_B);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
SERVER_B);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
SERVER_D);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
SERVER_D);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
SERVER_D);
   regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
SERVER_D);
   final List endpoints = Lists.newArrayList();
   
endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
   
endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
   
endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
   ```




> Assign region throw exception when some region is deployed on affinity node 
> and some on non-affinity node
> -
>
> Key: DRILL-8482
> URL: https://issues.apache.org/jira/browse/DRILL-8482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: 
> 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch
>
>
> *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe 
> the bug*
>    Assign region throw exception when some region is deployed on affinity 
> node and some on non-affinity node。
> *To Reproduce*
> Steps to reproduce the behavior:
>  # 
> {code:java}
> NavigableMap regionsToScan = Maps.newTreeMap();
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
> SERVER_D);
> final List endpoints = Lists.newArrayList();
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
> HBaseGroupScan scan = new HBaseGroupScan();
> scan.setRegionsToScan(regionsToScan);
> scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], 
> null));
> scan.applyAssignments(endpoints);{code}
> *Expected behavior*
>  A has 3 regions
>  B has 2 regions
>  C has 3 regions
> *Error detail, log output or screenshots*
> {code:java}
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node

2024-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822598#comment-17822598
 ] 

ASF GitHub Bot commented on DRILL-8482:
---

cgivre commented on PR #2885:
URL: https://github.com/apache/drill/pull/2885#issuecomment-1973318466

   @shfshihuafeng Is this a bug?




> Assign region throw exception when some region is deployed on affinity node 
> and some on non-affinity node
> -
>
> Key: DRILL-8482
> URL: https://issues.apache.org/jira/browse/DRILL-8482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: 
> 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch
>
>
> *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe 
> the bug*
>    Assign region throw exception when some region is deployed on affinity 
> node and some on non-affinity node。
> *To Reproduce*
> Steps to reproduce the behavior:
>  # 
> {code:java}
> NavigableMap regionsToScan = Maps.newTreeMap();
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
> SERVER_D);
> final List endpoints = Lists.newArrayList();
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
> HBaseGroupScan scan = new HBaseGroupScan();
> scan.setRegionsToScan(regionsToScan);
> scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], 
> null));
> scan.applyAssignments(endpoints);{code}
> *Expected behavior*
>  A has 3 regions
>  B has 2 regions
>  C has 3 regions
> *Error detail, log output or screenshots*
> {code:java}
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8482) Assign region throw exception when some region is deployed on affinity node and some on non-affinity node

2024-03-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17822459#comment-17822459
 ] 

ASF GitHub Bot commented on DRILL-8482:
---

shfshihuafeng opened a new pull request, #2885:
URL: https://github.com/apache/drill/pull/2885

   … on affinity node and some on non-affinity node
   
   # [DRILL-8482](https://issues.apache.org/jira/browse/DRILL-8482): 
   
   Assign region throw exception when some region is deployed on affinity node 
and some on non-affinity node
   
   ## Description
   
Assign region throw exception when some region is deployed on affinity node 
and some on non-affinity node。
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
 
   Refer to unit test cases on 
TestHBaseRegionScanAssignments#testHBaseGroupScanAssignmentSomeAfinedAndSomeWithOrphans




> Assign region throw exception when some region is deployed on affinity node 
> and some on non-affinity node
> -
>
> Key: DRILL-8482
> URL: https://issues.apache.org/jira/browse/DRILL-8482
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Storage - HBase
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.22.0
>
> Attachments: 
> 0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch
>
>
> *[^0001-DRILL-8482-Assign-region-throw-exception-when-some-r.patch]Describe 
> the bug*
>    Assign region throw exception when some region is deployed on affinity 
> node and some on non-affinity node。
> *To Reproduce*
> Steps to reproduce the behavior:
>  # 
> {code:java}
> NavigableMap regionsToScan = Maps.newTreeMap();
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[0], splits[1]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[1], splits[2]), 
> SERVER_A);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[2], splits[3]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[3], splits[4]), 
> SERVER_B);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[6], splits[7]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[7], splits[8]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[8], splits[9]), 
> SERVER_D);
> regionsToScan.put(new HRegionInfo(TABLE_NAME, splits[9], splits[10]), 
> SERVER_D);
> final List endpoints = Lists.newArrayList();
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_A).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_B).setControlPort(1234).build());
> endpoints.add(DrillbitEndpoint.newBuilder().setAddress(HOST_C).setControlPort(1234).build());
> HBaseGroupScan scan = new HBaseGroupScan();
> scan.setRegionsToScan(regionsToScan);
> scan.setHBaseScanSpec(new HBaseScanSpec(TABLE_NAME_STR, splits[0], splits[0], 
> null));
> scan.applyAssignments(endpoints);{code}
> *Expected behavior*
>  A has 3 regions
>  B has 2 regions
>  C has 3 regions
> *Error detail, log output or screenshots*
> {code:java}
> Caused by: java.lang.NullPointerException: null
>         at 
> org.apache.drill.exec.store.hbase.HBaseGroupScan.applyAssignments(HBaseGroupScan.java:283){code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE

2024-02-26 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17820931#comment-17820931
 ] 

ASF GitHub Bot commented on DRILL-8475:
---

cgivre merged PR #2879:
URL: https://github.com/apache/drill/pull/2879




> Update the binary distributions LICENSE
> ---
>
> Key: DRILL-8475
> URL: https://issues.apache.org/jira/browse/DRILL-8475
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Calvin Kirs
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.21.2
>
> Attachments: dependencies.txt, drill-dep-list.txt
>
>
> I checked the latest released version, and it does not follow the 
> corresponding rules[1]. This is very important and I hope it will be taken 
> seriously by the PMC team. I'd be happy to do it if needed.
> [1] [https://infra.apache.org/licensing-howto.html#binary]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE

2024-02-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17819870#comment-17819870
 ] 

ASF GitHub Bot commented on DRILL-8475:
---

cgivre commented on PR #2879:
URL: https://github.com/apache/drill/pull/2879#issuecomment-1960638549

   @jnturton Are we closed to merging this?




> Update the binary distributions LICENSE
> ---
>
> Key: DRILL-8475
> URL: https://issues.apache.org/jira/browse/DRILL-8475
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Calvin Kirs
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.21.2
>
> Attachments: dependencies.txt, drill-dep-list.txt
>
>
> I checked the latest released version, and it does not follow the 
> corresponding rules[1]. This is very important and I hope it will be taken 
> seriously by the PMC team. I'd be happy to do it if needed.
> [1] [https://infra.apache.org/licensing-howto.html#binary]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8475) Update the binary distributions LICENSE

2024-02-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814095#comment-17814095
 ] 

ASF GitHub Bot commented on DRILL-8475:
---

jnturton commented on PR #2879:
URL: https://github.com/apache/drill/pull/2879#issuecomment-1925767446

   TODO: determine whether too much has been pruned from the JDBC driver, 
specifically libraries related to Kerberos.




> Update the binary distributions LICENSE
> ---
>
> Key: DRILL-8475
> URL: https://issues.apache.org/jira/browse/DRILL-8475
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Calvin Kirs
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.21.2
>
> Attachments: dependencies.txt, drill-dep-list.txt
>
>
> I checked the latest released version, and it does not follow the 
> corresponding rules[1]. This is very important and I hope it will be taken 
> seriously by the PMC team. I'd be happy to do it if needed.
> [1] [https://infra.apache.org/licensing-howto.html#binary]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8475) The binary version License and NOTICE do not comply with the corresponding terms.

2024-02-04 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17814055#comment-17814055
 ] 

ASF GitHub Bot commented on DRILL-8475:
---

jnturton opened a new pull request, #2879:
URL: https://github.com/apache/drill/pull/2879

   # [DRILL-8475](https://issues.apache.org/jira/browse/DRILL-8475): Update the 
binary dist LICENSE
   
   ## Description
   
   The LICENSE file included in the binary distributions of Drill becomes an 
artifact that is generated automatically by the 
org.codehaus.mojo:license-maven-plugin (and so is no longer part of the Git 
source tree). Dependencies that it cannot detect are kept in the 
LICENSE-base.txt file which is combined with the generated license notices by a 
new Freemarker template. Various other dependency related changes are included 
as part of this work. It is still possible that fat jars have introduced hidden 
depedencies but I propose that those are analysed in a subsequent Jira issue.
   
   ## Documentation
   Comments and updated dev docs.
   
   ## Testing
   Comparison of the jars/ directory of a Drill build against the generated 
LICENSE file to check that every bundled jar has a license notice in LICENSE.
   




> The binary version License and NOTICE do not comply with the corresponding 
> terms.
> -
>
> Key: DRILL-8475
> URL: https://issues.apache.org/jira/browse/DRILL-8475
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Calvin Kirs
>Assignee: James Turton
>Priority: Blocker
> Fix For: 1.21.2
>
> Attachments: dependencies.txt, drill-dep-list.txt
>
>
> I checked the latest released version, and it does not follow the 
> corresponding rules[1]. This is very important and I hope it will be taken 
> seriously by the PMC team. I'd be happy to do it if needed.
> [1] [https://infra.apache.org/licensing-howto.html#binary]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8479) mergejion memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810231#comment-17810231
 ] 

ASF GitHub Bot commented on DRILL-8479:
---

shfshihuafeng opened a new pull request, #2878:
URL: https://github.com/apache/drill/pull/2878

   … (#2876)
   
   # [DRILL-8479](https://issues.apache.org/jira/browse/DRILL-8479):  mergejoin 
leak when Depleting incoming batches throw exception
   
   ## Description
   
   when fragment failed, it call close() from MergeJoinBatch. but if  
leftIterator.close() throw exception, we could not call  rightIterator.close() 
to release memory。
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   
   The test method is the same with link, only one parameter needs to be 
modified,
   set planner.enable_hashjoin =false  to  ensure use mergejoin operator
   [](https://github.com/apache/drill/pull/2875)
   
   




> mergejion memory leak when  exception
> -
>
> Key: DRILL-8479
> URL: https://issues.apache.org/jira/browse/DRILL-8479
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Critical
> Attachments: 0001-mergejoin-leak.patch
>
>
> *Describe the bug*
> megerjoin  leak when RecordIterator allocate memory exception with 
> OutOfMemoryException{*}{*}
> {*}Steps to reproduce the behavior{*}:
>  # prepare data for tpch 1s
>  # set direct memory 5g
>  #  set planner.enable_hashjoin =false  to  ensure use mergejoin operator。
>  #  set drill.memory.debug.allocator =true (Check for memory leaks )
>  # 20 concurrent for tpch sql8
>  # when it had OutOfMemoryException or null EXCEPTION , stopped all sql.
>  # finding memory leak
> *Expected behavior*
>       when all  sql sop , we should find direct memory is 0 AND  could not 
> find leak log like following.
> {code:java}
> Allocator(op:2:0:11:MergeJoinPOP) 100/73728/4874240/100 
> (res/actual/peak/limit){code}
> *Error detail, log output or screenshots*
> {code:java}
> Unable to allocate buffer of size XX (rounded from XX) due to memory limit 
> (). Current allocation: xx{code}
> [^0001-mergejoin-leak.patch]
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810171#comment-17810171
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464222619


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java:
##
@@ -1312,7 +1312,9 @@ private void cleanup() {
 }
 // clean (and deallocate) each partition, and delete its spill file
 for (HashPartition partn : partitions) {
-  partn.close();
+  if (partn != null) {
+partn.close();
+  }

Review Comment:
(partn != null)  are necessary ,see Comment above on 1. fix idea





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810170#comment-17810170
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   
   ```
   //1. add isException parameter when construct HashPartition object
   
   HashPartition(FragmentContext context, BufferAllocator allocator, 
ChainedHashTable baseHashTable,
  RecordBatch buildBatch, RecordBatch probeBatch, 
boolean semiJoin,
  int recordsPerBatch, SpillSet spillSet, int partNum, 
int cycleNum, int numPartitions , boolean **isException** )
   //2. catch except to ensure  HashPartition object has been created
 } catch (OutOfMemoryException oom) {
//do not call  close ,do  not  throw except
 isException =true;
   }
   //3.deal with exception
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions,**isException** );
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810168#comment-17810168
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   
   ```
   //add isException parameter when construct HashPartition object
   
   HashPartition(FragmentContext context, BufferAllocator allocator, 
ChainedHashTable baseHashTable,
  RecordBatch buildBatch, RecordBatch probeBatch, 
boolean semiJoin,
  int recordsPerBatch, SpillSet spillSet, int partNum, 
int cycleNum, int numPartitions , boolean **isException** )
   
 } catch (OutOfMemoryException oom) {
//do not call  close ,do  not  throw except
 isException =true;
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions,**isException** );
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpc

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810167#comment-17810167
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   but if 
   
   ```
   //add isException parameter when construct HashPartition object
   
   HashPartition(FragmentContext context, BufferAllocator allocator, 
ChainedHashTable baseHashTable,
  RecordBatch buildBatch, RecordBatch probeBatch, 
boolean semiJoin,
  int recordsPerBatch, SpillSet spillSet, int partNum, 
int cycleNum, int numPartitions , boolean **isException** )
   
 } catch (OutOfMemoryException oom) {
//do not call  close ,do  not  throw except
 isException =true;
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions,**isException** );
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810166#comment-17810166
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   but if 
   
   ```
   HashPartition(FragmentContext context, BufferAllocator allocator, 
ChainedHashTable baseHashTable,
  RecordBatch buildBatch, RecordBatch probeBatch, 
boolean semiJoin,
  int recordsPerBatch, SpillSet spillSet, int partNum, 
int cycleNum, int numPartitions , boolean **isException** )
   
 } catch (OutOfMemoryException oom) {
//do not call  close ,do  not  throw except
 isException =false;
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions,**isException** );
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
>

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810162#comment-17810162
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   ### 2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   but if 
   
   ```
 } catch (OutOfMemoryException oom) {
//do not call  close ,only throw except
 throw UserException.memoryError(oom)
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions);
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_na

[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810161#comment-17810161
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1464211148


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   ### 1. fix idea
   The design is any operator fails, the entire operator stack is closed. but 
partitions is array which is initialed by null。if hashPartition object is not 
created successfully, it throw exception. so partitions array  data after index 
which is null。
   
   `  for (int part = 0; part < numPartitions; part++) {
 partitions[part] = new HashPartition(context, allocator, baseHashTable,
 buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
 spilledState.getCycle(), numPartitions);
   }`
   
   for example
   
   partitions array length is 32, numPartitions =32 when numPartitions =10 
,throw except. partitions[11-31]  will be null 
object which index  numPartitions =10 was created  failed ,but it had 
allocater memory.
   
   when calling close() , hashpartion  object which numPartitions =10 can not 
call close,beacause it is null。
   
   
   2. another fix idea
   
 we do  not  throw exception and do not call  close, but catch. we can 
create hash partiotn object . thus when calling close() , we can release。
   but if 
   
   ```
 } catch (OutOfMemoryException oom) {
//do not call  close ,only throw except
 throw UserException.memoryError(oom)
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   
   AbstractHashBinaryRecordBatch#initializeBuild
   boolean isException = false;
   try {
 for (int part = 0; part < numPartitions; part++) {
   if (isException) {
 break;
   }
   partitions[part] = new HashPartition(context, allocator, 
baseHashTable,
   buildBatch, probeBatch, semiJoin, RECORDS_PER_BATCH, spillSet, 
part,
   spilledState.getCycle(), numPartitions);
 }
   } catch (Exception e) {
 isException = true;
   }
   if (isException ){
 throw UserException.memoryError(exceptions[0])
 .message("Failed to allocate hash partition.")
 .build(logger);
   }
   ```
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nation

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810092#comment-17810092
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1906827568

   Ok, so the geo-ip UDF stuff has no special mechanisms or description about 
those resource files, so the generic code that "scans" must find them and drag 
them along automatically. 
   
   That's the behavior I want. 
   
   What is "Drill's 3rd Party Jar folder"? 
   
   If a magic folder just gets dragged over to all nodes, and drill uses a 
class loader that arranges for jars in that folder to be searched, then there 
is very little to do, since a DFDL schema can be just a set of jar files 
containing related resources, and the classes for Daffodil's own UDFs and 
layers which are java code extensions of its own kind. 




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810091#comment-17810091
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

paul-rogers commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1463921977


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java:
##
@@ -1312,7 +1312,9 @@ private void cleanup() {
 }
 // clean (and deallocate) each partition, and delete its spill file
 for (HashPartition partn : partitions) {
-  partn.close();
+  if (partn != null) {
+partn.close();
+  }

Review Comment:
   The above is OK as a work-around. I wonder, however, where the code added a 
null pointer to the partition list. That should never happen. If it does, it 
should be fixed at the point where the null pointer is added to the list. 
Fixing it here is incomplete: there are other places where we loop through the 
list, and those will also fail.



##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();

Review Comment:
   This call is _probably_ fine. However, the design is that if any operator 
fails, the entire operator stack is closed. So, `close()` should be called by 
the fragment executor. There is probably no harm in calling `close()` here, as 
long as the `close()` method is safe to call twice.
   
   If the fragment executor _does not_ call close when the failure occurs 
during setup, then there is a bug since failing to call `close()` results in 
just this kind of error.





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810070#comment-17810070
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1906689793

   > > > @cgivre @paul-rogers is there an example of a Drill UDF that is not 
part of the drill repository tree?
   > > > I'd like to understand the mechanisms for distributing any jar files 
and dependencies of the UDF that drill uses. I can't find any such in the 
quasi-USFs that are in the Drill tree, because well, since they are part of 
Drill, and so are their dependencies, this problem doesn't exist.
   > > 
   > > 
   > > @mbeckerle Here's an example: 
https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't 
able to connect last week.
   > 
   > If I understand this correctly, if a jar is on the classpath and has 
drill-module.conf in its root dir, then drill will find it and read that HOCON 
file to get the package to add to drill.classpath.scanning.packages.
   
   I believe that is correct.
   
   > 
   > Drill then appears to scan jars for class files for those packages. Not 
sure what it is doing with the class files. I imagine it is repackaging them 
somehow so Drill can use them on the drill distributed nodes. But it isn't yet 
clear to me how this aspect works. Do these classes just get loaded on the 
distributed drill nodes? Or is the classpath augmented in some way on the drill 
nodes so that they see a jar that contains all these classes?
   > 
   > I have two questions:
   > 
   > (1) what about dependencies? The UDF may depend on libraries which depend 
on other libraries, etc.
   
   So UDFs are a bit of a special case, but if they do have dependencies, you 
have to also include those JAR files in the UDF directory, or in Drill's 3rd 
party JAR folder.   I'm not that good with maven, but I've often wondered about 
making a so-called fat-JAR which includes the dependencies as part of the UDF 
JAR file.
   
   > 
   > (2) what about non-class files, e.g., things under src/main/resources of 
the project that go into the jar, but aren't "class" files? How do those things 
also get moved? How would code running in the drill node access these? The 
usual method is to call getResource(URL) with a URL that gives the path within 
a jar file to the resource in question.
   
   Take a look at this UDF. 
https://github.com/datadistillr/drill-geoip-functions
   This UDF has a few external resources including a CSV file and the MaxMind 
databases.
   
   
   > 
   > Thanks for any info.
   
   




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17810051#comment-17810051
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1906561549

   > > @cgivre @paul-rogers is there an example of a Drill UDF that is not part 
of the drill repository tree?
   > > I'd like to understand the mechanisms for distributing any jar files and 
dependencies of the UDF that drill uses. I can't find any such in the 
quasi-USFs that are in the Drill tree, because well, since they are part of 
Drill, and so are their dependencies, this problem doesn't exist.
   > 
   > @mbeckerle Here's an example: 
https://github.com/datadistillr/drill-humanname-functions. I'm sorry we weren't 
able to connect last week.
   
   If I understand this correctly, if a jar is on the classpath and has 
drill-module.conf in its root dir, then drill will find it and read that HOCON 
file to get the package to add to drill.classpath.scanning.packages. 
   
   Drill then appears to scan jars for class files for those packages. Not sure 
what it is doing with the class files. I imagine it is repackaging them somehow 
so Drill can use them on the drill distributed nodes. But it isn't yet clear to 
me how this aspect works. Do these classes just get loaded on the distributed 
drill nodes? Or is the classpath augmented in some way on the drill nodes so 
that they see a jar that contains all these classes?
   
   I have two questions: 
   
   (1) what about dependencies? The UDF may depend on libraries which depend on 
other libraries, etc. 
   
   (2) what about non-class files, e.g., things under src/main/resources of the 
project that go into the jar, but aren't "class" files? How do those things 
also get moved? How would code running in the drill node access these? The 
usual method is to call getResource(URL) with a URL that gives the path within 
a jar file to the resource in question. 
   
   Thanks for any info. 
   




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809982#comment-17809982
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

jnturton merged PR #2875:
URL: https://github.com/apache/drill/pull/2875




> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809816#comment-17809816
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on PR #2875:
URL: https://github.com/apache/drill/pull/2875#issuecomment-1905599592

   > [An unsued import crept 
in](https://github.com/apache/drill/actions/runs/7622586264/job/20762475705#step:6:1277),
 could you remove it please?
   
   removed it




> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-23 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809814#comment-17809814
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

jnturton commented on PR #2875:
URL: https://github.com/apache/drill/pull/2875#issuecomment-1905598192

   [An unsued import crept 
in](https://github.com/apache/drill/actions/runs/7622586264/job/20762475705#step:6:1277),
 could you remove it please?




> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809792#comment-17809792
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1462854154


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();
+  throw UserException.memoryError(oom)
+  .message("OutOfMemory while allocate memory for hash partition.")

Review Comment:
   i resubmit pr and supply test step





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809771#comment-17809771
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

paul-rogers commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1462817821


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();
+  throw UserException.memoryError(oom)
+  .message("OutOfMemory while allocate memory for hash partition.")

Review Comment:
   Suggested: `"Failed to allocate hash partition."`
   
   The `memoryError()` already indicate it is an OOM error.
   



##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/join/AbstractHashBinaryRecordBatch.java:
##
@@ -1312,7 +1313,9 @@ private void cleanup() {
 }
 // clean (and deallocate) each partition, and delete its spill file
 for (HashPartition partn : partitions) {
-  partn.close();
+  if (Objects.nonNull(partn)) {

Review Comment:
   Simpler `if (partn != null) {`





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809772#comment-17809772
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

paul-rogers commented on code in PR #2875:
URL: https://github.com/apache/drill/pull/2875#discussion_r1462817821


##
exec/java-exec/src/main/java/org/apache/drill/exec/physical/impl/common/HashPartition.java:
##
@@ -157,11 +162,11 @@ public HashPartition(FragmentContext context, 
BufferAllocator allocator, Chained
 .build(logger);
 } catch (SchemaChangeException sce) {
   throw new IllegalStateException("Unexpected Schema Change while creating 
a hash table",sce);
-}
-this.hjHelper = semiJoin ? null : new HashJoinHelper(context, allocator);
-tmpBatchesList = new ArrayList<>();
-if (numPartitions > 1) {
-  allocateNewCurrentBatchAndHV();
+} catch (OutOfMemoryException oom) {
+  close();
+  throw UserException.memoryError(oom)
+  .message("OutOfMemory while allocate memory for hash partition.")

Review Comment:
   Suggested: `"Failed to allocate hash partition."`
   
   The `memoryError()` already indicates that it is an OOM error.
   





> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8478) HashPartition memory leak when exception

2024-01-22 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809763#comment-17809763
 ] 

ASF GitHub Bot commented on DRILL-8478:
---

shfshihuafeng opened a new pull request, #2875:
URL: https://github.com/apache/drill/pull/2875

   # [DRILL-](https://issues.apache.org/jira/browse/DRILL-): PR Title
   
   DRILL-8478. HashPartition memory leak when it allocate memory exception with 
OutOfMemoryException (#2874)
   
   ## Description
   
when allocating memory for hashParttion with OutOfMemoryException,it cause 
memory leak.
beacuase hashpartiton object  cannot be created successfully, so it cannot 
be cleaned up In the closing phase.
   
   
   ## Documentation
   (Please describe user-visible changes similar to what should appear in the 
Drill documentation.)
   
   ## Testing
   (Please describe how this PR has been tested.)
   




> HashPartition memory leak when  exception
> -
>
> Key: DRILL-8478
> URL: https://issues.apache.org/jira/browse/DRILL-8478
> Project: Apache Drill
>  Issue Type: Bug
>  Components: Execution - Relational Operators
>Affects Versions: 1.21.1
>Reporter: shihuafeng
>Priority: Major
> Fix For: 1.21.2
>
> Attachments: 
> 0001-DRILL-8478.-HashPartition-memory-leak-when-it-alloca.patch
>
>
> *Describe the bug*
> hashpartition leak when allocate memory exception with OutOfMemoryException
> *To Reproduce*
> Steps to reproduce the behavior:
>  # prepare data for tpch 1s
>  # 20 concurrent for tpch sql8
>  # set direct memory 5g
>  # when it had OutOfMemoryException , stopped all sql.
>  # finding memory leak
> *Expected behavior*
> (1)i set \{DRILL_MAX_DIRECT_MEMORY:-"5G"}
> (2) i run sql8 (sql detail as Additional context) with 20 concurrent
> (3) it had OutOfMemoryException when create hashPartion
> *Error detail, log output or screenshots*
> Unable to allocate buffer of size 262144 (rounded from 262140) due to memory 
> limit (41943040). Current allocation: 20447232
>  
> sql 
> {code:java}
> // code placeholder
> select o_year, sum(case when nation = 'CHINA' then volume else 0 end) / 
> sum(volume) as mkt_share from ( select extract(year from o_orderdate) as 
> o_year, l_extendedprice * 1.0 as volume, n2.n_name as nation from 
> hive.tpch1s.part, hive.tpch1s.supplier, hive.tpch1s.lineitem, 
> hive.tpch1s.orders, hive.tpch1s.customer, hive.tpch1s.nation n1, 
> hive.tpch1s.nation n2, hive.tpch1s.region where p_partkey = l_partkey and 
> s_suppkey = l_suppkey and l_orderkey = o_orderkey and o_custkey = c_custkey 
> and c_nationkey = n1.n_nationkey and n1.n_regionkey = r_regionkey and r_name 
> = 'ASIA' and s_nationkey = n2.n_nationkey and o_orderdate between date 
> '1995-01-01' and date '1996-12-31' and p_type = 'LARGE BRUSHED BRASS') as 
> all_nations group by o_year order by o_year
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809174#comment-17809174
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1902751729

   > @cgivre @paul-rogers is there an example of a Drill UDF that is not part 
of the drill repository tree?
   > 
   > I'd like to understand the mechanisms for distributing any jar files and 
dependencies of the UDF that drill uses. I can't find any such in the 
quasi-USFs that are in the Drill tree, because well, since they are part of 
Drill, and so are their dependencies, this problem doesn't exist.
   
   
   @mbeckerle Here's an example: 
https://github.com/datadistillr/drill-humanname-functions.I'm sorry we 
weren't able to connect last week.  
   
   




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809173#comment-17809173
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1902750285

   @cgivre @paul-rogers is there an example of a Drill UDF that is not part of 
the drill repository tree? 
   
   I'd like to understand the mechanisms for distributing any jar files and 
dependencies of the UDF that drill uses. I can't find any such in the 
quasi-USFs that are in the Drill tree, because well, since they are part of 
Drill, and so are their dependencies, this problem doesn't exist. 




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-21 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17809172#comment-17809172
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1461099077


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.runtime1.api.ChoiceMetadata;
+import org.apache.daffodil.runtime1.api.ComplexElementMetadata;
+import org.apache.daffodil.runtime1.api.ElementMetadata;
+import org.apache.daffodil.runtime1.api.InfosetSimpleElement;
+import org.apache.daffodil.runtime1.api.MetadataHandler;
+import org.apache.daffodil.runtime1.api.SequenceMetadata;
+import org.apache.daffodil.runtime1.api.SimpleElementMetadata;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Stack;
+
+/**
+ * This class transforms a DFDL/Daffodil schema into a Drill Schema.
+ */
+public class DrillDaffodilSchemaVisitor extends MetadataHandler {
+  private static final Logger logger = 
LoggerFactory.getLogger(DrillDaffodilSchemaVisitor.class);
+  /**
+   * Unfortunately, SchemaBuilder and MapBuilder, while similar, do not share 
a base class so we
+   * have a stack of MapBuilders, and when empty we use the SchemaBuilder

Review Comment:
   This is fixed in the latest commit. Created MapBuilderLike interface shared 
by SchemaBuilder and MapBuilder. I only populated it with the methods I needed. 
   
   The corresponding problem doesn't really occur in the rowWriter area as 
tupleWriter is the common underlying class used. 





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-16 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17807233#comment-17807233
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1453422371


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil;
+
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader;
+import org.apache.drill.exec.physical.impl.scan.v3.file.FileDescrip;
+import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import 
org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Objects;
+
+import static 
org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory.*;
+import static 
org.apache.drill.exec.store.daffodil.schema.DrillDaffodilSchemaUtils.daffodilDataProcessorToDrillSchema;
+
+public class DaffodilBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(DaffodilBatchReader.class);
+  private final RowSetLoader rowSetLoader;
+  private final CustomErrorContext errorContext;
+  private final DaffodilMessageParser dafParser;
+  private final InputStream dataInputStream;
+
+  public DaffodilBatchReader(DaffodilReaderConfig readerConfig, EasySubScan 
scan,
+  FileSchemaNegotiator negotiator) {
+
+errorContext = negotiator.parentErrorContext();
+DaffodilFormatConfig dafConfig = readerConfig.plugin.getConfig();
+
+String schemaURIString = dafConfig.getSchemaURI(); // 
"schema/complexArray1.dfdl.xsd";
+String rootName = dafConfig.getRootName();
+String rootNamespace = dafConfig.getRootNamespace();
+boolean validationMode = dafConfig.getValidationMode();
+
+URI dfdlSchemaURI;
+try {
+  dfdlSchemaURI = new URI(schemaURIString);
+} catch (URISyntaxException e) {
+  throw UserException.validationError(e).build(logger);
+}
+
+FileDescrip file = negotiator.file();
+DrillFileSystem fs = file.fileSystem();
+URI fsSchemaURI = fs.getUri().resolve(dfdlSchemaURI);
+
+DaffodilDataProcessorFactory dpf = new DaffodilDataProcessorFactory();
+DataProcessor dp;
+try {
+  dp = dpf.getDataProcessor(fsSchemaURI, validationMode, rootName, 
rootNamespace);
+} catch (CompileFailure e) {
+  throw UserException.dataReadError(e)
+  .message(String.format("Failed to get Daffodil DFDL processor for: 
%s", fsSchemaURI))
+  .addContext(errorContext).addContext(e.getMessage()).build(logger);
+}
+// Create the corresponding Drill schema.
+// Note: this could be a very large schema. Think of a large complex RDBMS 
schema,
+// all of it, hundreds of tables, but all part of the same metadata tree.
+TupleMetadata drillSchema = daffodilDataProcessorToDrillSchema(dp);
+// Inform Drill about the schema
+negotiator.tableSchema(drillSchema, true);
+
+//
+// DATA TIME: Next we construct the runtime objects, and open files.
+//
+// We get the DaffodilMessageParser, which is a stateful driver for 
daffodil that
+// actually does the parsing.
+rowSetLoader = negotiator.build().writer();
+
+// We construct the Daffodil InfosetOutputter which the daffodil parser 
uses to
+// conver

[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806651#comment-17806651
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

jnturton merged PR #2515:
URL: https://github.com/apache/drill/pull/2515




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806649#comment-17806649
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

jnturton commented on code in PR #2515:
URL: https://github.com/apache/drill/pull/2515#discussion_r1446231938


##
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java:
##
@@ -171,107 +168,109 @@ public HDF5ReaderConfig(HDF5FormatPlugin plugin, 
HDF5FormatConfig formatConfig)
 }
   }
 
-  public HDF5BatchReader(HDF5ReaderConfig readerConfig, int maxRecords) {
-this.readerConfig = readerConfig;
-this.maxRecords = maxRecords;
+  public HDF5BatchReader(HDF5ReaderConfig config, EasySubScan scan, 
FileSchemaNegotiator negotiator) {
+errorContext = negotiator.parentErrorContext();
+file = negotiator.file();
+readerConfig = config;
 dataWriters = new ArrayList<>();
-this.showMetadataPreview = readerConfig.formatConfig.showPreview();
-  }
+showMetadataPreview = readerConfig.formatConfig.showPreview();
 
-  @Override
-  public boolean open(FileSchemaNegotiator negotiator) {
-split = negotiator.split();
-errorContext = negotiator.parentErrorContext();
 // Since the HDF file reader uses a stream to actually read the file, the 
file name from the
 // module is incorrect.
-fileName = split.getPath().getName();
-try {
-  openFile(negotiator);
-} catch (IOException e) {
-  throw UserException
-.dataReadError(e)
-.addContext("Failed to close input file: %s", split.getPath())
-.addContext(errorContext)
-.build(logger);
-}
+fileName = file.split().getPath().getName();
 
-ResultSetLoader loader;
-if (readerConfig.defaultPath == null) {
-  // Get file metadata
-  List metadata = getFileMetadata(hdfFile, new 
ArrayList<>());
-  metadataIterator = metadata.iterator();
-
-  // Schema for Metadata query
-  SchemaBuilder builder = new SchemaBuilder()
-.addNullable(PATH_COLUMN_NAME, MinorType.VARCHAR)
-.addNullable(DATA_TYPE_COLUMN_NAME, MinorType.VARCHAR)
-.addNullable(FILE_NAME_COLUMN_NAME, MinorType.VARCHAR)
-.addNullable(DATA_SIZE_COLUMN_NAME, MinorType.BIGINT)
-.addNullable(IS_LINK_COLUMN_NAME, MinorType.BIT)
-.addNullable(ELEMENT_COUNT_NAME, MinorType.BIGINT)
-.addNullable(DATASET_DATA_TYPE_NAME, MinorType.VARCHAR)
-.addNullable(DIMENSIONS_FIELD_NAME, MinorType.VARCHAR);
-
-  negotiator.tableSchema(builder.buildSchema(), false);
-
-  loader = negotiator.build();
-  dimensions = new int[0];
-  rowWriter = loader.writer();
-
-} else {
-  // This is the case when the default path is specified. Since the user 
is explicitly asking for a dataset
-  // Drill can obtain the schema by getting the datatypes below and 
ultimately mapping that schema to columns
-  Dataset dataSet = hdfFile.getDatasetByPath(readerConfig.defaultPath);
-  dimensions = dataSet.getDimensions();
-
-  loader = negotiator.build();
-  rowWriter = loader.writer();
-  writerSpec = new WriterSpec(rowWriter, negotiator.providedSchema(),
-  negotiator.parentErrorContext());
-  if (dimensions.length <= 1) {
-buildSchemaFor1DimensionalDataset(dataSet);
-  } else if (dimensions.length == 2) {
-buildSchemaFor2DimensionalDataset(dataSet);
-  } else {
-// Case for datasets of greater than 2D
-// These are automatically flattened
-buildSchemaFor2DimensionalDataset(dataSet);
+{ // Opens an HDF5 file

Review Comment:
   I guess some of these could become private methods but it's a minor point 
for me.



##
contrib/format-hdf5/src/main/java/org/apache/drill/exec/store/hdf5/HDF5BatchReader.java:
##
@@ -171,107 +164,104 @@ public HDF5ReaderConfig(HDF5FormatPlugin plugin, 
HDF5FormatConfig formatConfig)
 }
   }
 
-  public HDF5BatchReader(HDF5ReaderConfig readerConfig, int maxRecords) {
-this.readerConfig = readerConfig;
-this.maxRecords = maxRecords;
+  public HDF5BatchReader(HDF5ReaderConfig config, EasySubScan scan, 
FileSchemaNegotiator negotiator) {
+errorContext = negotiator.parentErrorContext();
+file = negotiator.file();
+readerConfig = config;
 dataWriters = new ArrayList<>();
-this.showMetadataPreview = readerConfig.formatConfig.showPreview();
-  }
+showMetadataPreview = readerConfig.formatConfig.showPreview();
 
-  @Override
-  public boolean open(FileSchemaNegotiator negotiator) {
-split = negotiator.split();
-errorContext = negotiator.parentErrorContext();
 // Since the HDF file reader uses a stream to actually read the file, the 
file name from the
 // module is incorrect.
-fileName = split.getPath().getName();
-try {
-  openFile(negotiator);
-} catch (IOException e) {
-

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806487#comment-17806487
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1890990577

   > > @mbeckerle With respect to style, I tried to reply to that comment, but 
the thread won't let me. In any event, Drill classes will typically start with 
the constructor, then have whatever methods are appropriate for the class. The 
logger creation usually happens before the constructor. I think all of your 
other classes followed this format, so the one or two that didn't kind of 
jumped out at me.
   > 
   > @cgivre I believe the style issues are all fixed. The build did not get 
any codestyle issues.
   
   The issue I was referring to was more around the organization of a few 
classes.  Usually we'll have the constructor (if present) at the top followed 
by any class methods.  I think there was a class or two where the constructor 
was at the bottom or something like that.  In any event, consider the issue 
resolved.




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806486#comment-17806486
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1451758017


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/DaffodilBatchReader.java:
##
@@ -0,0 +1,181 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil;
+
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.drill.common.AutoCloseables;
+import org.apache.drill.common.exceptions.CustomErrorContext;
+import org.apache.drill.common.exceptions.UserException;
+import org.apache.drill.exec.physical.impl.scan.v3.ManagedReader;
+import org.apache.drill.exec.physical.impl.scan.v3.file.FileDescrip;
+import org.apache.drill.exec.physical.impl.scan.v3.file.FileSchemaNegotiator;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import 
org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory;
+import org.apache.drill.exec.store.dfs.DrillFileSystem;
+import org.apache.drill.exec.store.dfs.easy.EasySubScan;
+import org.apache.hadoop.fs.Path;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.util.Objects;
+
+import static 
org.apache.drill.exec.store.daffodil.schema.DaffodilDataProcessorFactory.*;
+import static 
org.apache.drill.exec.store.daffodil.schema.DrillDaffodilSchemaUtils.daffodilDataProcessorToDrillSchema;
+
+public class DaffodilBatchReader implements ManagedReader {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(DaffodilBatchReader.class);
+  private final RowSetLoader rowSetLoader;
+  private final CustomErrorContext errorContext;
+  private final DaffodilMessageParser dafParser;
+  private final InputStream dataInputStream;
+
+  public DaffodilBatchReader(DaffodilReaderConfig readerConfig, EasySubScan 
scan,
+  FileSchemaNegotiator negotiator) {
+
+errorContext = negotiator.parentErrorContext();
+DaffodilFormatConfig dafConfig = readerConfig.plugin.getConfig();
+
+String schemaURIString = dafConfig.getSchemaURI(); // 
"schema/complexArray1.dfdl.xsd";
+String rootName = dafConfig.getRootName();
+String rootNamespace = dafConfig.getRootNamespace();
+boolean validationMode = dafConfig.getValidationMode();
+
+URI dfdlSchemaURI;
+try {
+  dfdlSchemaURI = new URI(schemaURIString);
+} catch (URISyntaxException e) {
+  throw UserException.validationError(e).build(logger);
+}
+
+FileDescrip file = negotiator.file();
+DrillFileSystem fs = file.fileSystem();
+URI fsSchemaURI = fs.getUri().resolve(dfdlSchemaURI);
+
+DaffodilDataProcessorFactory dpf = new DaffodilDataProcessorFactory();
+DataProcessor dp;
+try {
+  dp = dpf.getDataProcessor(fsSchemaURI, validationMode, rootName, 
rootNamespace);
+} catch (CompileFailure e) {
+  throw UserException.dataReadError(e)
+  .message(String.format("Failed to get Daffodil DFDL processor for: 
%s", fsSchemaURI))
+  .addContext(errorContext).addContext(e.getMessage()).build(logger);
+}
+// Create the corresponding Drill schema.
+// Note: this could be a very large schema. Think of a large complex RDBMS 
schema,
+// all of it, hundreds of tables, but all part of the same metadata tree.
+TupleMetadata drillSchema = daffodilDataProcessorToDrillSchema(dp);
+// Inform Drill about the schema
+negotiator.tableSchema(drillSchema, true);
+
+//
+// DATA TIME: Next we construct the runtime objects, and open files.
+//
+// We get the DaffodilMessageParser, which is a stateful driver for 
daffodil that
+// actually does the parsing.
+rowSetLoader = negotiator.build().writer();
+
+// We construct the Daffodil InfosetOutputter which the daffodil parser 
uses to
+// convert i

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806484#comment-17806484
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1451757410


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.runtime1.api.ChoiceMetadata;
+import org.apache.daffodil.runtime1.api.ComplexElementMetadata;
+import org.apache.daffodil.runtime1.api.ElementMetadata;
+import org.apache.daffodil.runtime1.api.InfosetSimpleElement;
+import org.apache.daffodil.runtime1.api.MetadataHandler;
+import org.apache.daffodil.runtime1.api.SequenceMetadata;
+import org.apache.daffodil.runtime1.api.SimpleElementMetadata;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Stack;
+
+/**
+ * This class transforms a DFDL/Daffodil schema into a Drill Schema.
+ */
+public class DrillDaffodilSchemaVisitor extends MetadataHandler {
+  private static final Logger logger = 
LoggerFactory.getLogger(DrillDaffodilSchemaVisitor.class);
+  /**
+   * Unfortunately, SchemaBuilder and MapBuilder, while similar, do not share 
a base class so we
+   * have a stack of MapBuilders, and when empty we use the SchemaBuilder

Review Comment:
   This is likely music to @paul-rogers's ears.





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806482#comment-17806482
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1451756763


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.japi.Compiler;
+import org.apache.daffodil.japi.Daffodil;
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.daffodil.japi.Diagnostic;
+import org.apache.daffodil.japi.InvalidParserException;
+import org.apache.daffodil.japi.InvalidUsageException;
+import org.apache.daffodil.japi.ProcessorFactory;
+import org.apache.daffodil.japi.ValidationMode;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.nio.channels.Channels;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * Compiles a DFDL schema (mostly for tests) or loads a pre-compiled DFDL 
schema so that one can
+ * obtain a DataProcessor for use with DaffodilMessageParser.
+ * 
+ * TODO: Needs to use a cache to avoid reloading/recompiling every time.
+ */
+public class DaffodilDataProcessorFactory {
+  // Default constructor is used.
+
+  private static final Logger logger = 
LoggerFactory.getLogger(DaffodilDataProcessorFactory.class);
+
+  private DataProcessor dp;
+
+  /**
+   * Gets a Daffodil DataProcessor given the necessary arguments to compile or 
reload it.
+   *
+   * @param schemaFileURI
+   * pre-compiled dfdl schema (.bin extension) or DFDL schema source (.xsd 
extension)
+   * @param validationMode
+   * Use true to request Daffodil built-in 'limited' validation. Use false 
for no validation.
+   * @param rootName
+   * Local name of root element of the message. Can be null to use the 
first element declaration
+   * of the primary schema file. Ignored if reloading a pre-compiled 
schema.
+   * @param rootNS
+   * Namespace URI as a string. Can be null to use the target namespace of 
the primary schema
+   * file or if it is unambiguous what element is the rootName. Ignored if 
reloading a
+   * pre-compiled schema.
+   * @return the DataProcessor
+   * @throws CompileFailure
+   * - if schema compilation fails
+   */
+  public DataProcessor getDataProcessor(URI schemaFileURI, boolean 
validationMode, String rootName,
+  String rootNS)
+  throws CompileFailure {
+
+DaffodilDataProcessorFactory dmp = new DaffodilDataProcessorFactory();
+boolean isPrecompiled = schemaFileURI.toString().endsWith(".bin");
+if (isPrecompiled) {
+  if (Objects.nonNull(rootName) && !rootName.isEmpty()) {
+// A usage error. You shouldn't supply the name and optionally 
namespace if loading
+// precompiled schema because those are built into it. Should be null 
or "".
+logger.warn("Root element name '{}' is ignored when used with 
precompiled DFDL schema.",
+rootName);
+  }
+  try {
+dmp.loadSchema(schemaFileURI);
+  } catch (IOException | InvalidParserException e) {
+throw new CompileFailure(e);
+  }
+  dmp.setupDP(validationMode, null);
+} else {
+  List pfDiags;
+  try {
+pfDiags = dmp.compileSchema(schemaFileURI, rootName, rootNS);
+  } catch (URISyntaxException | IOException e) {
+throw new CompileFailure(e);
+  }
+  dmp.setupDP(validationMode, pfDiags);
+}
+return dmp.dp;
+  }
+
+  private void loadSchema(URI schemaFileURI) throws IOException, 
InvalidParserException {
+Compiler c = Daffodil.compiler();
+dp = c.reload(Channels.newChannel(schemaFileURI.toURL().openStream()));

Review Comment:
   This definitely seems like an area where there is potential for a lot of 
different things to go wrong.  My view is we should just do our best to provide 
c

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-14 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17806481#comment-17806481
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1451756527


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DaffodilDataProcessorFactory.java:
##
@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.japi.Compiler;
+import org.apache.daffodil.japi.Daffodil;
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.daffodil.japi.Diagnostic;
+import org.apache.daffodil.japi.InvalidParserException;
+import org.apache.daffodil.japi.InvalidUsageException;
+import org.apache.daffodil.japi.ProcessorFactory;
+import org.apache.daffodil.japi.ValidationMode;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.io.IOException;
+import java.net.URI;
+import java.net.URISyntaxException;
+import java.nio.channels.Channels;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * Compiles a DFDL schema (mostly for tests) or loads a pre-compiled DFDL 
schema so that one can
+ * obtain a DataProcessor for use with DaffodilMessageParser.
+ * 
+ * TODO: Needs to use a cache to avoid reloading/recompiling every time.
+ */
+public class DaffodilDataProcessorFactory {
+  // Default constructor is used.
+
+  private static final Logger logger = 
LoggerFactory.getLogger(DaffodilDataProcessorFactory.class);
+
+  private DataProcessor dp;
+
+  /**
+   * Gets a Daffodil DataProcessor given the necessary arguments to compile or 
reload it.
+   *
+   * @param schemaFileURI
+   * pre-compiled dfdl schema (.bin extension) or DFDL schema source (.xsd 
extension)
+   * @param validationMode
+   * Use true to request Daffodil built-in 'limited' validation. Use false 
for no validation.
+   * @param rootName
+   * Local name of root element of the message. Can be null to use the 
first element declaration
+   * of the primary schema file. Ignored if reloading a pre-compiled 
schema.
+   * @param rootNS
+   * Namespace URI as a string. Can be null to use the target namespace of 
the primary schema
+   * file or if it is unambiguous what element is the rootName. Ignored if 
reloading a
+   * pre-compiled schema.
+   * @return the DataProcessor
+   * @throws CompileFailure
+   * - if schema compilation fails
+   */
+  public DataProcessor getDataProcessor(URI schemaFileURI, boolean 
validationMode, String rootName,
+  String rootNS)
+  throws CompileFailure {
+
+DaffodilDataProcessorFactory dmp = new DaffodilDataProcessorFactory();
+boolean isPrecompiled = schemaFileURI.toString().endsWith(".bin");
+if (isPrecompiled) {
+  if (Objects.nonNull(rootName) && !rootName.isEmpty()) {
+// A usage error. You shouldn't supply the name and optionally 
namespace if loading
+// precompiled schema because those are built into it. Should be null 
or "".
+logger.warn("Root element name '{}' is ignored when used with 
precompiled DFDL schema.",
+rootName);
+  }
+  try {
+dmp.loadSchema(schemaFileURI);
+  } catch (IOException | InvalidParserException e) {
+throw new CompileFailure(e);

Review Comment:
   My thought here would be to fail as quickly as possible.  If the DFDL schema 
can't be read, I'm assuming that we cannot proceed, so throwing an exception 
would be the right thing to do IMHO.  With that said, we should make sure we 
provide a good error message that would explain what went wrong. 
   One of the issues we worked on for a while with Drill was that it would fail 
and you'd get a stack trace w/o a clear idea of what the actual issue is and 
how to rectify it. 
   





> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: N

[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805821#comment-17805821
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

cgivre commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1888037847

   > Did the recent EVF revisions allow the tests for this PR to pass? Is there 
anything that is still missing? Also, did the excitement over my botched merge 
settle down and are we good now?
   
   All the unit tests pass whether that means that everything is 
working this plugin has a decent amount of tests, so I'd feel pretty good. 




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805790#comment-17805790
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

paul-rogers commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1887901054

   Did the recent EVF revisions allow the tests for this PR to pass? Is there 
anything that is still missing? Also, did the excitement over my botched merge 
settle down and are we good now?




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-11 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805457#comment-17805457
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

jnturton commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1886695348

   > @paul-rogers I attempted to fix. I kind of suck at git, so I think it's 
more or less correct now, but there was probably a better way to do this.
   
   Just workng through the review comments that @paul-rogers left (the ones 
unrelated to the needed functionality that was missing from EVF2).




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-10 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805188#comment-17805188
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

cgivre commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1885044910

   @jnturton I did as you suggested.  Would you mind please taking a look?




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804917#comment-17804917
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1883962208

   > @mbeckerle With respect to style, I tried to reply to that comment, but 
the thread won't let me. In any event, Drill classes will typically start with 
the constructor, then have whatever methods are appropriate for the class. The 
logger creation usually happens before the constructor. I think all of your 
other classes followed this format, so the one or two that didn't kind of 
jumped out at me.
   
   @cgivre I believe the style issues are all fixed. The build did not get any 
codestyle issues. 




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804572#comment-17804572
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

jnturton commented on PR #2867:
URL: https://github.com/apache/drill/pull/2867#issuecomment-1882408865

   > @cgivre, the `.asf.yaml` file you mentioned has lots of metadata, but does 
not actually prevent a force push. Perhaps we are missing something? It would 
generally be a good idea to forbid such things to prevent catastrophic mistakes.
   
   Oh that's interesting. Something's changed since I last went through this 
with @vvysotskyi to do something that could only be done on master, perhaps 
testing of the automatic snapshot artifact publishing which requires access to 
GitHub Actions secrets.




> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804571#comment-17804571
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

jnturton commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1882403741

I see Git's "patch contents already upstream" feature doesn't automatically 
clean up the unwanted commits. I've dropped them manually in a new branch in my 
fork and now suggest
```
   git reset --hard origin/master
   git pull --rebase https://github.com/jnturton/drill.git 8188-hdf5-evf2
   git push --force # to luocooong's fork
   ```




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804568#comment-17804568
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

jnturton commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1882389703

   > @paul-rogers I attempted to fix. I kind of suck at git, so I think it's 
more or less correct now, but there was probably a better way to do this.
   
   I think you still want something like
   ```
   git pull --rebase upstream master
   git push --force-with-lease
   ```




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804563#comment-17804563
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

cgivre commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1882377677

   @paul-rogers I attempted to fix.  I kind of suck at git, so I think it's 
more or less correct now, but there was probably a better way to do this.




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804552#comment-17804552
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

paul-rogers commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1882246203

   It seems you did this work on top of the master with my unsquashed commits. 
When you try to push, those commits come along for the ride. I think you should 
grab the latest master, then rebase your branch on it.
   
   Plan B is to a) grab the latest master, and b) create a new branch that 
cherry-picks the  commit(s) you meant to add.
   
   If even this doesn't work, then I'll clean up this branch for you since I 
created the mess in the first place...




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8188) Convert HDF5 format to EVF2

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804547#comment-17804547
 ] 

ASF GitHub Bot commented on DRILL-8188:
---

cgivre commented on PR #2515:
URL: https://github.com/apache/drill/pull/2515#issuecomment-1882168615

   I think I hosed the version control somehow This PR should only modify a 
few files in the HDF5 reader. 




> Convert HDF5 format to EVF2
> ---
>
> Key: DRILL-8188
> URL: https://issues.apache.org/jira/browse/DRILL-8188
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.20.0
>Reporter: Cong Luo
>Assignee: Cong Luo
>Priority: Major
>
> Use EVF V2 instead of old V1.
> Also, fixed a few bugs in V2 framework.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804546#comment-17804546
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

cgivre commented on PR #2867:
URL: https://github.com/apache/drill/pull/2867#issuecomment-1882148031

   > I successfully squashed the commits, and provided a proper commit message, 
while preserving the later commit. Did a force push to master to rewrite 
history.
   > 
   > You should update your own master to pick up the revised history.
   
   You are a braver man than I. 
   
   > 
   > @cgivre, the `.asf.yaml` file you mentioned has lots of metadata, but does 
not actually prevent a force push. Perhaps we are missing something? It would 
generally be a good idea to forbid such things to prevent catastrophic mistakes.
   
   Thanks for flagging... I'm not sure how to do that, but I'll investigate.
   
   




> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804507#comment-17804507
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

paul-rogers commented on PR #2867:
URL: https://github.com/apache/drill/pull/2867#issuecomment-1881965757

   I successfully squashed the commits, and provided a proper commit message, 
while preserving the later commit. Did a force push to master to rewrite 
history.
   
   You should update your own master to pick up the revised history.
   
   @cgivre, the `.asf.yaml` file you mentioned has lots of metadata, but does 
not actually prevent a force push. Perhaps we are missing something? It would 
generally be a good idea to forbid such things to prevent catastrophic mistakes.




> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804317#comment-17804317
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

cgivre commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444727956


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   Just to be clear... I was just saying that if this is a major headache and 
you don't want to deal with it, my vote is to leave it alone.  If it isn't a 
big headache and you want to, I have no issues there as well. 





> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804276#comment-17804276
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

jnturton commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444618000


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   @paul-rogers, @cgivre  [commented that he's in favour of leaving master as 
is](https://github.com/apache/drill/pull/2866#issuecomment-1880409413) and I've 
since merged [a commit on 
top](https://github.com/apache/drill/commit/f5fb7f5a4023651252afb1f907311d71840eb144).
 I do think it would still be feasible for us to go back and squash (for 
exactly the reason you give) but at this point we could also just leave it 
where it is?
   
   P.S. The process is a little laborious. A conventional commit switching off 
master branch protection in .asf.yaml, then the force push doing the clean up 
and switching master branch protection back on, the latter being achievable in 
the same breath by simply dropping the switch-on commit.





> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804270#comment-17804270
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

jnturton commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444618000


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   @paul-rogers, @cgivre  [commented that he's in favour of leaving master as 
is](https://github.com/apache/drill/pull/2866#issuecomment-1880409413) and I've 
since merged [a commit on 
top](https://github.com/apache/drill/commit/f5fb7f5a4023651252afb1f907311d71840eb144).
 I do think it would still be feasible for us to go back and squash (for 
exactly the reason you give) but at this point we could also just leave it 
where it is?
   
   P.S. The process is a little laborious. A conventional commit switching off 
master branch protection in .asf.yaml, then the force push doing the clean up 
and switching master branch protection back on.





> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1

2024-01-08 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804258#comment-17804258
 ] 

ASF GitHub Bot commented on DRILL-8415:
---

jnturton merged PR #2866:
URL: https://github.com/apache/drill/pull/2866




> Upgrade Jackson 2.14.3 → 2.16.1
> ---
>
> Key: DRILL-8415
> URL: https://issues.apache.org/jira/browse/DRILL-8415
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.21.1
>Reporter: PJ Fanning
>Priority: Major
> Fix For: 1.22.0
>
>
> I'm not advocating for an upgrade to [Jackson 
> 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 
> 2.15.0-rc1 has just been released and 2.15.0 should be out soon.
> There are some security focused enhancements including a new class called 
> StreamReadConstraints. The defaults on 
> [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
>  are pretty high but it is not inconceivable that some Drill users might need 
> to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
> default limit of 1000 chars or bytes (depending on input context).
> When the Drill team consider upgrading to Jackson 2.15 or above, you might 
> also want to consider adding some way for users to configure the 
> StreamReadConstraints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804150#comment-17804150
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

paul-rogers commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444244003


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   My bad. My other project likes to leave these in master; I forgot Drill does 
not.
   
   Since there is not much activity, I can squash the commits within the master 
branch and do a force push. Normally that is a big NO NO in active projects, 
but it should not actually cause problems here. I'll go ahead and do that 
tomorrow unless anyone objects.





> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804151#comment-17804151
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

paul-rogers commented on PR #2867:
URL: https://github.com/apache/drill/pull/2867#issuecomment-1880502847

   Backporting should be safe: as safe as having the change in master itself.




> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804137#comment-17804137
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

jnturton commented on PR #2867:
URL: https://github.com/apache/drill/pull/2867#issuecomment-1880450409

   I think we can regard this as a bug fix to framework code already present in 
1.21 and therefore backport it.




> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Affects Versions: 1.21.1
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
> Fix For: 1.21.2
>
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804123#comment-17804123
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

jnturton commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444185697


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   P.S. I'm happy to live with the WIP commits in master too, just asking what 
folks think.





> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804122#comment-17804122
 ] 

ASF GitHub Bot commented on DRILL-8415:
---

cgivre commented on PR #2866:
URL: https://github.com/apache/drill/pull/2866#issuecomment-1880409413

   > I haven't rebased this yet in case we decide to squash the WIP commits 
that were merged into master. Once a decision is made either way this can be 
rebased and a CI run obtained.
   
   I'm fine with leaving the WIP commits as long as we don't make a habit out 
of it.  It's probably more of a hassle to undo the PR, squash the commits and 
re-merge them. 




> Upgrade Jackson 2.14.3 → 2.16.1
> ---
>
> Key: DRILL-8415
> URL: https://issues.apache.org/jira/browse/DRILL-8415
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.21.1
>Reporter: PJ Fanning
>Priority: Major
> Fix For: 1.22.0
>
>
> I'm not advocating for an upgrade to [Jackson 
> 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 
> 2.15.0-rc1 has just been released and 2.15.0 should be out soon.
> There are some security focused enhancements including a new class called 
> StreamReadConstraints. The defaults on 
> [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
>  are pretty high but it is not inconceivable that some Drill users might need 
> to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
> default limit of 1000 chars or bytes (depending on input context).
> When the Drill team consider upgrading to Jackson 2.15 or above, you might 
> also want to consider adding some way for users to configure the 
> StreamReadConstraints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804121#comment-17804121
 ] 

ASF GitHub Bot commented on DRILL-8415:
---

jnturton commented on PR #2866:
URL: https://github.com/apache/drill/pull/2866#issuecomment-1880408041

   I haven't rebased this yet in case we decide to squash the WIP commits that 
were merged into master. Once a decision is made either way this can be rebased 
and a CI run obtained.




> Upgrade Jackson 2.14.3 → 2.16.1
> ---
>
> Key: DRILL-8415
> URL: https://issues.apache.org/jira/browse/DRILL-8415
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.21.1
>Reporter: PJ Fanning
>Priority: Major
> Fix For: 1.22.0
>
>
> I'm not advocating for an upgrade to [Jackson 
> 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 
> 2.15.0-rc1 has just been released and 2.15.0 should be out soon.
> There are some security focused enhancements including a new class called 
> StreamReadConstraints. The defaults on 
> [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
>  are pretty high but it is not inconceivable that some Drill users might need 
> to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
> default limit of 1000 chars or bytes (depending on input context).
> When the Drill team consider upgrading to Jackson 2.15 or above, you might 
> also want to consider adding some way for users to configure the 
> StreamReadConstraints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804120#comment-17804120
 ] 

ASF GitHub Bot commented on DRILL-8415:
---

cgivre commented on PR #2866:
URL: https://github.com/apache/drill/pull/2866#issuecomment-1880407190

   @jnturton This looks good however there is a merge conflict.   Can you 
please resolve so that we can run the CI?




> Upgrade Jackson 2.14.3 → 2.16.1
> ---
>
> Key: DRILL-8415
> URL: https://issues.apache.org/jira/browse/DRILL-8415
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.21.1
>Reporter: PJ Fanning
>Priority: Major
> Fix For: 1.22.0
>
>
> I'm not advocating for an upgrade to [Jackson 
> 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 
> 2.15.0-rc1 has just been released and 2.15.0 should be out soon.
> There are some security focused enhancements including a new class called 
> StreamReadConstraints. The defaults on 
> [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
>  are pretty high but it is not inconceivable that some Drill users might need 
> to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
> default limit of 1000 chars or bytes (depending on input context).
> When the Drill team consider upgrading to Jackson 2.15 or above, you might 
> also want to consider adding some way for users to configure the 
> StreamReadConstraints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8415) Upgrade Jackson 2.14.3 → 2.16.1

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804119#comment-17804119
 ] 

ASF GitHub Bot commented on DRILL-8415:
---

jnturton commented on PR #2866:
URL: https://github.com/apache/drill/pull/2866#issuecomment-1880406941

   I starting adding congifuration support for the new StreamReadConstraints, 
first globally and then just in the JSON reader, but I got stopped by a sense 
of YAGNI. It's hard to imagine someone who will need something beyond the 
default values in Jackson and more configuration is more complexity that users 
must contend with. So my opinion at this point is that we should only add that 
configurability if someone asks for it...




> Upgrade Jackson 2.14.3 → 2.16.1
> ---
>
> Key: DRILL-8415
> URL: https://issues.apache.org/jira/browse/DRILL-8415
> Project: Apache Drill
>  Issue Type: Improvement
>Affects Versions: 1.21.1
>Reporter: PJ Fanning
>Priority: Major
> Fix For: 1.22.0
>
>
> I'm not advocating for an upgrade to [Jackson 
> 2.15|https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.15]. 
> 2.15.0-rc1 has just been released and 2.15.0 should be out soon.
> There are some security focused enhancements including a new class called 
> StreamReadConstraints. The defaults on 
> [StreamReadConstraints|https://javadoc.io/static/com.fasterxml.jackson.core/jackson-core/2.15.0-rc1/com/fasterxml/jackson/core/StreamReadConstraints.html]
>  are pretty high but it is not inconceivable that some Drill users might need 
> to relax them. Parsing large strings as numbers is sub-quadratic, thus the 
> default limit of 1000 chars or bytes (depending on input context).
> When the Drill team consider upgrading to Jackson 2.15 or above, you might 
> also want to consider adding some way for users to configure the 
> StreamReadConstraints.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804117#comment-17804117
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

cgivre commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444180941


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   > Thanks for this Paul! We must remember to squash when merging, we got the 
WIP commits from the feature branch into master. We've all done something 
similar at some point

> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804116#comment-17804116
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

jnturton commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444180526


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   Thanks for this Paul! We must remember to squash when merging, we got the 
WIP commits from the feature branch into master. We've all done something 
similar at some point

> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804115#comment-17804115
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

jnturton commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444180526


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   Thanks for this Paul! We must remember to squash when merging, we got the 
WIP commits from feature branch into master. We've all done something similar 
at some point

> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804074#comment-17804074
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

paul-rogers commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444117648


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   The UNION type has been discussed for as long as I've been involved in the 
project: since 2016. The idea is simple: Drill should be able to read any kind 
of JSON, and UNION (plus LIST, etc.) have been essential for this. The problem, 
as we've also discussed for a long time, is that UNION barely works, and JDBC 
and similar clients can't make sense of it. That is, UNION turned out to be the 
wrong solution to the problem.
   
   Still, there is always the hope that UNION can be made to work 

> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17804050#comment-17804050
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

luocooong commented on code in PR #2867:
URL: https://github.com/apache/drill/pull/2867#discussion_r1444101786


##
exec/java-exec/src/test/java/org/apache/drill/exec/physical/resultSet/impl/TestResultSetLoaderUnprojected.java:
##
@@ -0,0 +1,455 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.drill.exec.physical.resultSet.impl;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+import java.util.List;
+
+import org.apache.drill.common.expression.SchemaPath;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.physical.resultSet.ResultSetLoader;
+import org.apache.drill.exec.physical.resultSet.RowSetLoader;
+import 
org.apache.drill.exec.physical.resultSet.impl.ResultSetLoaderImpl.ResultSetOptions;
+import org.apache.drill.exec.physical.resultSet.project.Projections;
+import org.apache.drill.exec.physical.rowSet.RowSet;
+import org.apache.drill.exec.physical.rowSet.RowSet.SingleRowSet;
+import org.apache.drill.exec.physical.rowSet.RowSetTestUtils;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.apache.drill.exec.vector.accessor.ArrayWriter;
+import org.apache.drill.exec.vector.accessor.ScalarWriter;
+import org.apache.drill.exec.vector.accessor.TupleWriter;
+import org.apache.drill.test.SubOperatorTest;
+import org.apache.drill.test.rowSet.RowSetUtilities;
+import org.junit.Test;
+
+/**
+ * Verify the correct functioning of the "dummy" columns created
+ * for unprojected columns.
+ */
+public class TestResultSetLoaderUnprojected  extends SubOperatorTest {

Review Comment:
   This test case provides a good use guide. In addition, will it be possible 
for us to remove Union completely from the data type in the future?





> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803993#comment-17803993
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

paul-rogers merged PR #2867:
URL: https://github.com/apache/drill/pull/2867




> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803992#comment-17803992
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

paul-rogers commented on PR #2867:
URL: https://github.com/apache/drill/pull/2867#issuecomment-1880146814

   Thanks @cgivre for the comments and review. @luocooong, I'll commit this. 
When convenient, please see if this addresses the issue you raised long ago. 
Otherwise, these capabilities are available for the next person who is seduced 
into trying out the UNION-based types. 




> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803984#comment-17803984
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1880110452

   > > @mbeckerle I had a thought about your TODO list. See inline.
   > > > This is ready for a next review. All the scalar types are now 
implemented with typed setter calls.
   > > > The prior review comments have all been addressed I believe.
   > > > Remaining things to do include:
   > > > 
   > > > 1. How to get the compiled DFDL schema object so it can be loaded by 
daffodil out at the distributed Drill nodes.
   > > 
   > > 
   > > I was thinking about this and I remembered something that might be 
useful. Drill has support for User Defined Functions (UDF) which are written in 
Java. To add a UDF to Drill, you also have to write some Java classes in a 
particular way, and include the JARs. Much like the DFDL class files, the UDF 
JARs must be accessible to all nodes of a Drill cluster.
   > > Additionally, Drill has the capability of adding UDFs dynamically. This 
feature was added here: #574. Anyway, I wonder if we could use a similar 
mechanism to load and store the DFDL files so that they are accessible to all 
Drill nodes. What do you think?
   > 
   > Excellent: So drill has all the machinery, it's just a question of 
repackaging it so it's available for this usage pattern, which is a bit 
different from Drill's UDFs, but also very similar.
   > 
   > There are two user scenarios which we can call production and test.
   > 
   > 1. Production: binary compiled DFDL schema file + code jars for Daffodil's 
own UDFs and "layers" plugins. This should, ideally, cache the compiled schema 
and not reload it for every query (at every node), but keep the same loaded 
instance in memory in a persistant JVM image on each node. For large production 
DFDL schemas this is the only sensible mechanism as it can take minutes to 
compile large DFDL schemas.
   > 2. Test: on-the-fly centralized compilation of DFDL schema (from a 
combination of jars and files) to create and cache (to avoid recompiling) the 
binary compiled DFDL schema file. Then using that compiled binary file, as item 
1. For small DFDL schemas this can be fast enough for production use. Ideally, 
if the DFDL schema is unchanged this would reuse the compiled binary file, but 
that's an optimization that may not matter much.
   > 
   > Kinds of objects involved are:
   > 
   > * Daffodil plugin code jars
   > * DFDL schema jars
   > * DFDL schema files (just not packaged into a jar)
   > * Daffodil compiled schema binary file
   > * Daffodil config file - parameters, tunables, and options needed at 
compile time and/or runtime
   > 
   > Code jars: Daffodil provides two extension features for DFDL users - DFDL 
UDFs and DFDL 'layers' (ex: plug-ins for uudecode, or gunzip algorithms used in 
part of the data format). Those are ordinary compiled class files in jars, so 
in all scenarios those jars are needed on the node class path if the DFDL 
schema uses them. Daffodil dynamically finds and loads these from the classpath 
in regular Java Service-Provider Interface (SPI) mechanisms.
   > 
   > Schema jars: Daffodil packages DFDL schema files (source files i.e., 
mySchema.dfdl.xsd) into jar files to allow inter-schema dependencies to be 
managed using ordinary jar/java-style managed dependencies. Tools like sbt and 
maven can express the dependencies of one schema on another, grab and pull them 
together, etc. Daffodil has a resolver so when one schema file referenes 
another with include/import it searches the class path directories and jars for 
the files.
   > 
   > Schema jars are only needed centrally when compiling the schema to a 
binary file. All references to the jar files for inter-schema file references 
are compiled into the compiled binary file.
   > 
   > It is possible for one DFDL schema 'project' to define a DFDL schema, 
along with the code for a plugin like a Daffodil UDF or layer. In that case the 
one jar created is both a code jar and a schema jar. The schema jar aspects are 
used when the schema is compiled and ignored at Daffodil runtime. The code jar 
aspects are used at Daffodil run time and ignored at schema compilation time. 
So such a jar that is both code and schema jar needs to be on the class path in 
both places, but there's no interaction of the two things.
   > 
   > Binary Compiled Schema File: Centrally, DFDL schemas in files and/or jars 
are compiled to create a single binary object which can be reloaded in order to 
actually use the schema to parse/unparse data.
   > 
   > * These binary files are tied to a specific version+build of Daffodil. 
(They are just a java object serialization of the runtime data structures used 
by Daffodil).
   > * Once reloaded i

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803983#comment-17803983
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

cgivre commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1880109717

   @mbeckerle 
   With respect to style, I tried to reply to that comment, but the thread 
won't let me.   In any event, Drill classes will typically start with the 
constructor, then have whatever methods are appropriate for the class.  The 
logger creation usually happens before the constructor.  I think all of your 
other classes followed this format, so the one or two that didn't kind of 
jumped out at me. 




> Add Daffodil Format Plugin
> --
>
> Key: DRILL-8474
> URL: https://issues.apache.org/jira/browse/DRILL-8474
> Project: Apache Drill
>  Issue Type: New Feature
>Affects Versions: 1.21.1
>Reporter: Charles Givre
>Priority: Major
> Fix For: 1.22.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8375) Incomplete support for non-projected complex vectors

2024-01-06 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8375?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803893#comment-17803893
 ] 

ASF GitHub Bot commented on DRILL-8375:
---

cgivre commented on PR #2867:
URL: https://github.com/apache/drill/pull/2867#issuecomment-1879917039

   Once we merge this, we should also rebase 
https://github.com/apache/drill/pull/2515 on the current master and merge that 
as well. 




> Incomplete support for non-projected complex vectors
> 
>
> Key: DRILL-8375
> URL: https://issues.apache.org/jira/browse/DRILL-8375
> Project: Apache Drill
>  Issue Type: Bug
>Reporter: Paul Rogers
>Assignee: Paul Rogers
>Priority: Major
>
> The `ResultSetLoader` implementation supports all of Drill's vector types. 
> However, DRILL-8188 discovered holes in support for non-projected vectors.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803594#comment-17803594
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on code in PR #2836:
URL: https://github.com/apache/drill/pull/2836#discussion_r1442993784


##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaVisitor.java:
##
@@ -0,0 +1,229 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.runtime1.api.ChoiceMetadata;
+import org.apache.daffodil.runtime1.api.ComplexElementMetadata;
+import org.apache.daffodil.runtime1.api.ElementMetadata;
+import org.apache.daffodil.runtime1.api.InfosetSimpleElement;
+import org.apache.daffodil.runtime1.api.MetadataHandler;
+import org.apache.daffodil.runtime1.api.SequenceMetadata;
+import org.apache.daffodil.runtime1.api.SimpleElementMetadata;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.metadata.MapBuilder;
+import org.apache.drill.exec.record.metadata.SchemaBuilder;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import java.util.Stack;
+
+/**
+ * This class transforms a DFDL/Daffodil schema into a Drill Schema.
+ */
+public class DrillDaffodilSchemaVisitor extends MetadataHandler {
+  private static final Logger logger = 
LoggerFactory.getLogger(DrillDaffodilSchemaVisitor.class);
+  /**
+   * Unfortunately, SchemaBuilder and MapBuilder, while similar, do not share 
a base class so we
+   * have a stack of MapBuilders, and when empty we use the SchemaBuilder

Review Comment:
   Note that this awkwardness effectively doubles the code size of things that 
interface to Drill. 
   
   This duplication of similar behavior for schema and map builders (and 
rowWriters and mapWriters) is expected and typical of systems that start from a 
tabular view of the data world and later add the features needed for 
hierachical data. Nevertheless it is awkward when one is dealing entirely with 
hierarchical data. 
   
   A MetaBuilder that does the map thing if the builder is a map, and the 
schema thing if the builder is a schema would eliminate this. This could be an 
interface mixed into both SchemaBuilder and MapBuilder (could also be called 
MapBuilderLike). 
   
   The same discontinuity at the base holds for RowWriter vs. MapWriter in the 
runtime handling of data. Again it doubles the code size/complexity, every fix 
goes in 2 places, etc. A MapWriterLike interface could be factored out.
   
   Maybe we should build such mechanisms to avoid this, and then use them to 
improve this Daffodil plugin?
   



##
contrib/format-daffodil/src/main/java/org/apache/drill/exec/store/daffodil/schema/DrillDaffodilSchemaUtils.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.drill.exec.store.daffodil.schema;
+
+import org.apache.daffodil.japi.InvalidParserException;
+import org.apache.daffodil.japi.DataProcessor;
+import org.apache.daffodil.runtime1.api.PrimitiveType;
+import org.apache.drill.common.types.TypeProtos.MinorType;
+import org.apache.drill.exec.record.metadata.TupleMetadata;
+import com.google.common.annotations.VisibleFor

[jira] [Commented] (DRILL-8474) Add Daffodil Format Plugin

2024-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8474?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803592#comment-17803592
 ] 

ASF GitHub Bot commented on DRILL-8474:
---

mbeckerle commented on PR #2836:
URL: https://github.com/apache/drill/pull/2836#issuecomment-1878896878

   > @mbeckerle I had a thought about your TODO list. See inline.
   > 
   > > This is ready for a next review. All the scalar types are now 
implemented with typed setter calls.
   > > The prior review comments have all been addressed I believe.
   > > Remaining things to do include:
   > > 
   > > 1. How to get the compiled DFDL schema object so it can be loaded by 
daffodil out at the distributed Drill nodes.
   > 
   > I was thinking about this and I remembered something that might be useful. 
Drill has support for User Defined Functions (UDF) which are written in Java. 
To add a UDF to Drill, you also have to write some Java classes in a particular 
way, and include the JARs. Much like the DFDL class files, the UDF JARs must be 
accessible to all nodes of a Drill cluster.
   > 
   > Additionally, Drill has the capability of adding UDFs dynamically. This 
feature was added here: #574. Anyway, I wonder if we could use a similar 
mechanism to load and store the DFDL files so that they are accessible to all 
Drill nodes. What do you think?
   
   Excellent: So drill has all the machinery, it's just a question of 
repackaging it so it's available for this usage pattern, which is a bit 
different from Drill's UDFs, but also very similar. 
   
   There are two user scenarios which we can call production and test.
   
   1. Production: binary compiled DFDL schema file + code jars for Daffodil's 
own UDFs and "layers" plugins. This should, ideally, cache the compiled schema 
and not reload it for every query (at every node), but keep the same loaded 
instance in memory in a persistant JVM image on each node. For large production 
DFDL schemas this is the only sensible mechanism as it can take minutes to 
compile large DFDL schemas. 
   
   2. Test: on-the-fly centralized compilation of DFDL schema (from a 
combination of jars and files) to create and cache (to avoid recompiling) the 
binary compiled DFDL schema file. Then using that compiled binary file, as item 
1. For small DFDL schemas this can be fast enough for production use. Ideally, 
if the DFDL schema is unchanged this would reuse the compiled binary file, but 
that's an optimization that may not matter much. 
   
   Kinds of objects involved are:
   
   - Daffodil plugin code jars
   - DFDL schema jars
   - DFDL schema files (just not packaged into a jar)
   - Daffodil compiled schema binary file
   - Daffodil config file - parameters, tunables, and options needed at compile 
time and/or runtime
   
   Code jars: Daffodil provides two extension features for DFDL users - DFDL 
UDFs and DFDL 'layers' (ex: plug-ins for uudecode, or gunzip algorithms used in 
part of the data format). Those are ordinary compiled class files in jars, so 
in all scenarios those jars are needed on the node class path if the DFDL 
schema uses them. Daffodil dynamically finds and loads these from the classpath 
in regular Java Service-Provider Interface (SPI) mechanisms. 
   
   Schema jars: Daffodil packages DFDL schema files (source files i.e., 
mySchema.dfdl.xsd) into jar files to allow inter-schema dependencies to be 
managed using ordinary jar/java-style managed dependencies. Tools like sbt and 
maven can express the dependencies of one schema on another, grab and pull them 
together, etc. Daffodil has a resolver so when one schema file referenes 
another with include/import it searches the class path directories and jars for 
the files. 
   
   Schema jars are only needed centrally when compiling the schema to a binary 
file. All references to the jar files for inter-schema file references are 
compiled into the compiled binary file. 
   
   It is possible for one DFDL schema 'project' to define a DFDL schema, along 
with the code for a plugin like a Daffodil UDF or layer. In that case the one 
jar created is both a code jar and a schema jar. The schema jar aspects are 
used when the schema is compiled and ignored at Daffodil runtime. The code jar 
aspects are used at Daffodil run time and ignored at schema compilation time. 
So such a jar that is both code and schema jar needs to be on the class path in 
both places, but there's no interaction of the two things. 
   
   Binary Compiled Schema File: Centrally, DFDL schemas in files and/or jars 
are compiled to create a single binary object which can be reloaded in order to 
actually use the schema to parse/unparse data. 
   
   - These binary files are tied to a specific version+build of Daffodil. (They 
are just a java object serialization of the runtime data structures used by 
Daffodil). 
   - Once reloaded into a JVM to create a Daffodil DataProcessor object, t

[jira] [Commented] (DRILL-8465) Check data input when loading iceberg data

2024-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803578#comment-17803578
 ] 

ASF GitHub Bot commented on DRILL-8465:
---

jnturton commented on PR #2853:
URL: https://github.com/apache/drill/pull/2853#issuecomment-1878847019

   Got it @pjfanning. Let's discuss further in the right forum.




> Check data input when loading iceberg data
> --
>
> Key: DRILL-8465
> URL: https://issues.apache.org/jira/browse/DRILL-8465
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Security, Storage - Iceberg
>Affects Versions: 1.21.1
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
> Fix For: 1.21.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8465) Check data input when loading iceberg data

2024-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803514#comment-17803514
 ] 

ASF GitHub Bot commented on DRILL-8465:
---

pjfanning commented on PR #2853:
URL: https://github.com/apache/drill/pull/2853#issuecomment-1878512092

   The short background to this in this link -  
https://lists.apache.org/thread/vpjz467rg8449m63v1n9nl3o56twwyzt (a private 
thread requiring ASF login).
   
   I'm no expert on Iceberg or the Drill Iceberg Plugin but I was hoping to 
maybe engage with someone who knows more about how they work and to get an 
understanding of if we need some constraints. Due to the security aspect of 
this, I'm not too comfortable going into more detail here.




> Check data input when loading iceberg data
> --
>
> Key: DRILL-8465
> URL: https://issues.apache.org/jira/browse/DRILL-8465
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Security, Storage - Iceberg
>Affects Versions: 1.21.1
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
> Fix For: 1.21.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (DRILL-8465) Check data input when loading iceberg data

2024-01-05 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/DRILL-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803477#comment-17803477
 ] 

ASF GitHub Bot commented on DRILL-8465:
---

jnturton commented on PR #2853:
URL: https://github.com/apache/drill/pull/2853#issuecomment-1878400693

   I've started looking at this. First question: if we're adding dynamically 
loaded class checks to protect against untrusted code then is checking the 
package name worth much? Or do we need to do something like verify signatures 
against a list of trusted keys? Second question: if this is about security then 
is the code we're loading actually untrusted or is it only ever loaded from 
serialisations that we produced ourselves (e.g. in IcebergWorkSerializer)?
   
   P.S. Please include this "Why we're doing this" background that I'm lacking 
in the Jira issue when it's nontrivial.




> Check data input when loading iceberg data
> --
>
> Key: DRILL-8465
> URL: https://issues.apache.org/jira/browse/DRILL-8465
> Project: Apache Drill
>  Issue Type: Improvement
>  Components: Storage - Iceberg
>Reporter: PJ Fanning
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    1   2   3   4   5   6   7   8   9   10   >