Re: [PR] Add spill file count limit for GroupBy query (druid)
maytasm merged PR #19141: URL: https://github.com/apache/druid/pull/19141 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add spill file count limit for GroupBy query (druid)
maytasm commented on code in PR #19141: URL: https://github.com/apache/druid/pull/19141#discussion_r2927839393 ## docs/configuration/index.md: ## @@ -2266,6 +2266,7 @@ Supported runtime properties: |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space (approximately) to use for per-segment string dictionaries. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|1| |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space (approximately) to use for per-query string dictionaries. When the dictionary exceeds this size, a spill to disk will be triggered. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|1| |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, per-query, for spilling result sets to disk when either the merging buffer or the dictionary fills up. Queries that exceed this limit will fail. Set to zero to disable disk spilling.|0 (disabled)| +|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed per GroupBy query. Queries that exceed this limit will fail.|Integer.MAX_VALUE (unlimited)| Review Comment: Im going to say "See groupBy memory tuning and resource limits for details." here and add more details to the memory-tuning-and-resource-limits section (so as to not repeat myself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add spill file count limit for GroupBy query (druid)
maytasm commented on code in PR #19141: URL: https://github.com/apache/druid/pull/19141#discussion_r2927839393 ## docs/configuration/index.md: ## @@ -2266,6 +2266,7 @@ Supported runtime properties: |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space (approximately) to use for per-segment string dictionaries. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|1| |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space (approximately) to use for per-query string dictionaries. When the dictionary exceeds this size, a spill to disk will be triggered. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|1| |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, per-query, for spilling result sets to disk when either the merging buffer or the dictionary fills up. Queries that exceed this limit will fail. Set to zero to disable disk spilling.|0 (disabled)| +|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed per GroupBy query. Queries that exceed this limit will fail.|Integer.MAX_VALUE (unlimited)| Review Comment: Im going to say "See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details." here and add more details to the memory-tuning-and-resource-limits section (so as to not repeat myself) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add spill file count limit for GroupBy query (druid)
maytasm commented on PR #19141: URL: https://github.com/apache/druid/pull/19141#issuecomment-4050736609 > maybe this should extend InternalServerError so we can classify it? I see the other group by exception doesn't do this, but maybe that predates the exception classes That's not needed. SpillingGrouper.aggregate() doesn't throw the Exception. It returns AggregateResult with a failure reason. That will be handled upstream. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add spill file count limit for GroupBy query (druid)
maytasm commented on code in PR #19141:
URL: https://github.com/apache/druid/pull/19141#discussion_r2927792099
##
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/TemporaryStorageFileLimitException.java:
##
@@ -17,42 +17,16 @@
* under the License.
*/
-package org.apache.druid.storage.s3;
+package org.apache.druid.query.groupby.epinephelinae;
-import software.amazon.awssdk.services.s3.model.S3Object;
+import org.apache.druid.java.util.common.StringUtils;
-/**
- * Pairs an {@link S3Object} with its bucket name.
- * In AWS SDK v2, {@link S3Object} does not carry the bucket name, so callers
that need it must track it separately.
- */
-public class S3ObjectWithBucket
-{
- private final String bucket;
- private final S3Object s3Object;
-
- public S3ObjectWithBucket(String bucket, S3Object s3Object)
- {
-this.bucket = bucket;
-this.s3Object = s3Object;
- }
-
- public String getBucket()
- {
-return bucket;
- }
+import java.io.IOException;
- public String getKey()
- {
-return s3Object.key();
- }
-
- public long getSize()
- {
-return s3Object.size();
- }
-
- public S3Object getS3Object()
+public class TemporaryStorageFileLimitException extends IOException
+{
+ public TemporaryStorageFileLimitException(final int fileCount)
{
-return s3Object;
+super(StringUtils.format("Cannot write to disk, hit file count limit of
%,d.", fileCount));
Review Comment:
Done
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
Re: [PR] Add spill file count limit for GroupBy query (druid)
jtuglu1 commented on code in PR #19141: URL: https://github.com/apache/druid/pull/19141#discussion_r2923085277 ## docs/configuration/index.md: ## @@ -2266,6 +2266,7 @@ Supported runtime properties: |`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space (approximately) to use for per-segment string dictionaries. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|1| |`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space (approximately) to use for per-query string dictionaries. When the dictionary exceeds this size, a spill to disk will be triggered. See [groupBy memory tuning and resource limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for details.|1| |`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use, per-query, for spilling result sets to disk when either the merging buffer or the dictionary fills up. Queries that exceed this limit will fail. Set to zero to disable disk spilling.|0 (disabled)| +|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed per GroupBy query. Queries that exceed this limit will fail.|Integer.MAX_VALUE (unlimited)| Review Comment: maybe also mention why you might want to use this, either a) to prevent OOM and/or to prevent excessive spill files, while the total spill byte size is under `druid.query.groupBy.maxOnDiskStorage`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] - To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
Re: [PR] Add spill file count limit for GroupBy query (druid)
jtuglu1 commented on code in PR #19141:
URL: https://github.com/apache/druid/pull/19141#discussion_r2923078420
##
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/TemporaryStorageFileLimitException.java:
##
@@ -17,42 +17,16 @@
* under the License.
*/
-package org.apache.druid.storage.s3;
+package org.apache.druid.query.groupby.epinephelinae;
-import software.amazon.awssdk.services.s3.model.S3Object;
+import org.apache.druid.java.util.common.StringUtils;
-/**
- * Pairs an {@link S3Object} with its bucket name.
- * In AWS SDK v2, {@link S3Object} does not carry the bucket name, so callers
that need it must track it separately.
- */
-public class S3ObjectWithBucket
-{
- private final String bucket;
- private final S3Object s3Object;
-
- public S3ObjectWithBucket(String bucket, S3Object s3Object)
- {
-this.bucket = bucket;
-this.s3Object = s3Object;
- }
-
- public String getBucket()
- {
-return bucket;
- }
+import java.io.IOException;
- public String getKey()
- {
-return s3Object.key();
- }
-
- public long getSize()
- {
-return s3Object.size();
- }
-
- public S3Object getS3Object()
+public class TemporaryStorageFileLimitException extends IOException
+{
+ public TemporaryStorageFileLimitException(final int fileCount)
{
-return s3Object;
+super(StringUtils.format("Cannot write to disk, hit file count limit of
%,d.", fileCount));
Review Comment:
nit: spill file count
##
docs/configuration/index.md:
##
@@ -2266,6 +2266,7 @@ Supported runtime properties:
|`druid.query.groupBy.maxSelectorDictionarySize`|Maximum amount of heap space
(approximately) to use for per-segment string dictionaries. See [groupBy memory
tuning and resource
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for
details.|1|
|`druid.query.groupBy.maxMergingDictionarySize`|Maximum amount of heap space
(approximately) to use for per-query string dictionaries. When the dictionary
exceeds this size, a spill to disk will be triggered. See [groupBy memory
tuning and resource
limits](../querying/groupbyquery.md#memory-tuning-and-resource-limits) for
details.|1|
|`druid.query.groupBy.maxOnDiskStorage`|Maximum amount of disk space to use,
per-query, for spilling result sets to disk when either the merging buffer or
the dictionary fills up. Queries that exceed this limit will fail. Set to zero
to disable disk spilling.|0 (disabled)|
+|`druid.query.groupBy.maxSpillFileCount`|Maximum number of spill files allowed
per GroupBy query. Queries that exceed this limit will fail.|Integer.MAX_VALUE
(unlimited)|
Review Comment:
maybe also mention why you might want to use this, either a) to prevent OOM
or b) to prevent excessive spill files, while the total spill byte size is
under `druid.query.groupBy.maxOnDiskStorage`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
-
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
