Re: [PR] Add spill file count limit for GroupBy query (druid)

2026-03-13 Thread via GitHub


maytasm merged PR #19141:
URL: https://github.com/apache/druid/pull/19141


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add spill file count limit for GroupBy query (druid)

2026-03-12 Thread via GitHub


maytasm commented on code in PR #19141:
URL: https://github.com/apache/druid/pull/19141#discussion_r2927839393


##
docs/configuration/index.md:
##
@@ -2266,6 +2266,7 @@ Supported runtime properties:
 |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-segment string dictionaries. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|1|
 |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-query string dictionaries. When the dictionary 
exceeds this size, a spill to disk will be triggered. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|1|
 |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, 
per-query, for spilling result sets to disk when either the merging buffer or 
the dictionary fills up. Queries that exceed this limit will fail. Set to zero 
to disable disk spilling.|0 (disabled)|
+|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed 
per GroupBy query. Queries that exceed this limit will fail.|Integer.MAX_VALUE 
(unlimited)|

Review Comment:
   Im going to say "See groupBy memory tuning and resource limits for details." 
here and add more details to the memory-tuning-and-resource-limits section (so 
as to not repeat myself)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add spill file count limit for GroupBy query (druid)

2026-03-12 Thread via GitHub


maytasm commented on code in PR #19141:
URL: https://github.com/apache/druid/pull/19141#discussion_r2927839393


##
docs/configuration/index.md:
##
@@ -2266,6 +2266,7 @@ Supported runtime properties:
 |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-segment string dictionaries. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|1|
 |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-query string dictionaries. When the dictionary 
exceeds this size, a spill to disk will be triggered. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|1|
 |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, 
per-query, for spilling result sets to disk when either the merging buffer or 
the dictionary fills up. Queries that exceed this limit will fail. Set to zero 
to disable disk spilling.|0 (disabled)|
+|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed 
per GroupBy query. Queries that exceed this limit will fail.|Integer.MAX_VALUE 
(unlimited)|

Review Comment:
   Im going to say "See [groupBy memory tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details." here and add more details to the memory-tuning-and-resource-limits 
section (so as to not repeat myself)



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add spill file count limit for GroupBy query (druid)

2026-03-12 Thread via GitHub


maytasm commented on PR #19141:
URL: https://github.com/apache/druid/pull/19141#issuecomment-4050736609

   > maybe this should extend InternalServerError so we can classify it? I see 
the other group by exception doesn't do this, but maybe that predates the 
exception classes
   That's not needed. SpillingGrouper.aggregate() doesn't throw the Exception. 
It returns AggregateResult with a failure reason. That will be handled 
upstream. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add spill file count limit for GroupBy query (druid)

2026-03-12 Thread via GitHub


maytasm commented on code in PR #19141:
URL: https://github.com/apache/druid/pull/19141#discussion_r2927792099


##
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/TemporaryStorageFileLimitException.java:
##
@@ -17,42 +17,16 @@
  * under the License.
  */
 
-package org.apache.druid.storage.s3;
+package org.apache.druid.query.groupby.epinephelinae;
 
-import software.amazon.awssdk.services.s3.model.S3Object;
+import org.apache.druid.java.util.common.StringUtils;
 
-/**
- * Pairs an {@link S3Object} with its bucket name.
- * In AWS SDK v2, {@link S3Object} does not carry the bucket name, so callers 
that need it must track it separately.
- */
-public class S3ObjectWithBucket
-{
-  private final String bucket;
-  private final S3Object s3Object;
-
-  public S3ObjectWithBucket(String bucket, S3Object s3Object)
-  {
-this.bucket = bucket;
-this.s3Object = s3Object;
-  }
-
-  public String getBucket()
-  {
-return bucket;
-  }
+import java.io.IOException;
 
-  public String getKey()
-  {
-return s3Object.key();
-  }
-
-  public long getSize()
-  {
-return s3Object.size();
-  }
-
-  public S3Object getS3Object()
+public class TemporaryStorageFileLimitException extends IOException
+{
+  public TemporaryStorageFileLimitException(final int fileCount)
   {
-return s3Object;
+super(StringUtils.format("Cannot write to disk, hit file count limit of 
%,d.", fileCount));

Review Comment:
   Done



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add spill file count limit for GroupBy query (druid)

2026-03-12 Thread via GitHub


jtuglu1 commented on code in PR #19141:
URL: https://github.com/apache/druid/pull/19141#discussion_r2923085277


##
docs/configuration/index.md:
##
@@ -2266,6 +2266,7 @@ Supported runtime properties:
 |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-segment string dictionaries. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|1|
 |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-query string dictionaries. When the dictionary 
exceeds this size, a spill to disk will be triggered. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|1|
 |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, 
per-query, for spilling result sets to disk when either the merging buffer or 
the dictionary fills up. Queries that exceed this limit will fail. Set to zero 
to disable disk spilling.|0 (disabled)|
+|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed 
per GroupBy query. Queries that exceed this limit will fail.|Integer.MAX_VALUE 
(unlimited)|

Review Comment:
   maybe also mention why you might want to use this, either a) to prevent OOM 
and/or to prevent excessive spill files, while the total spill byte size is 
under `druid.query.groupBy.maxOnDiskStorage`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]



Re: [PR] Add spill file count limit for GroupBy query (druid)

2026-03-12 Thread via GitHub


jtuglu1 commented on code in PR #19141:
URL: https://github.com/apache/druid/pull/19141#discussion_r2923078420


##
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/TemporaryStorageFileLimitException.java:
##
@@ -17,42 +17,16 @@
  * under the License.
  */
 
-package org.apache.druid.storage.s3;
+package org.apache.druid.query.groupby.epinephelinae;
 
-import software.amazon.awssdk.services.s3.model.S3Object;
+import org.apache.druid.java.util.common.StringUtils;
 
-/**
- * Pairs an {@link S3Object} with its bucket name.
- * In AWS SDK v2, {@link S3Object} does not carry the bucket name, so callers 
that need it must track it separately.
- */
-public class S3ObjectWithBucket
-{
-  private final String bucket;
-  private final S3Object s3Object;
-
-  public S3ObjectWithBucket(String bucket, S3Object s3Object)
-  {
-this.bucket = bucket;
-this.s3Object = s3Object;
-  }
-
-  public String getBucket()
-  {
-return bucket;
-  }
+import java.io.IOException;
 
-  public String getKey()
-  {
-return s3Object.key();
-  }
-
-  public long getSize()
-  {
-return s3Object.size();
-  }
-
-  public S3Object getS3Object()
+public class TemporaryStorageFileLimitException extends IOException
+{
+  public TemporaryStorageFileLimitException(final int fileCount)
   {
-return s3Object;
+super(StringUtils.format("Cannot write to disk, hit file count limit of 
%,d.", fileCount));

Review Comment:
   nit: spill file count



##
docs/configuration/index.md:
##
@@ -2266,6 +2266,7 @@ Supported runtime properties:
 |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-segment string dictionaries. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|1|
 |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space 
(approximately) to use for per-query string dictionaries. When the dictionary 
exceeds this size, a spill to disk will be triggered. See [groupBy memory 
tuning and resource 
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for 
details.|1|
 |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, 
per-query, for spilling result sets to disk when either the merging buffer or 
the dictionary fills up. Queries that exceed this limit will fail. Set to zero 
to disable disk spilling.|0 (disabled)|
+|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed 
per GroupBy query. Queries that exceed this limit will fail.|Integer.MAX_VALUE 
(unlimited)|

Review Comment:
   maybe also mention why you might want to use this, either a) to prevent OOM 
or b) to prevent excessive spill files, while the total spill byte size is 
under `druid.query.groupBy.maxOnDiskStorage`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]