[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r556280161 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -232,6 +237,10 @@ public boolean isReady(TaskActionClient taskActionClient) throws Exception throw new UOE("partitionsSpec[%s] is not supported", tuningConfig.getPartitionsSpec().getClass().getName()); } } +InputSource inputSource = getIngestionSchema().getIOConfig().getNonNullInputSource( +getIngestionSchema().getDataSchema().getParser() +); +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: This should be done in `ParallelIndexSupervisorTask.isReady()` as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r556275449 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -1044,17 +1048,21 @@ public IndexIOConfig( if (firehoseFactory != null && inputFormat != null) { throw new IAE("Cannot use firehose and inputFormat together. Try using inputSource instead of firehose."); } + if (inputSource != null) { +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: > Just to double check, if i move it to isReady the task will be accepted by the overlord and it will fail when the task is actually tried to be run. I was thinking that if someone is not allowed, we should block the ingestion at that time when the task is submitted at first place. Is that the behavior you are expecting ? @nishantmonu51 `Task.isReady()` is actually called as soon as it is issued to the Overlord. When it returns true, the Overlord changes its state from `waiting` to `pending`, and tries to assign it to one of middleManagers. So, I think it will be effectively the same as what you want to achieve. > isReady is also called multiple times, so not sure if that is any better in terms of number of checks, Good point. I think it won't be not that expensive, but maybe we can check the `validated` state and avoid further checking if it is already validated? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10746: Tidy up HTTP status codes for query errors
jihoonson commented on a change in pull request #10746: URL: https://github.com/apache/druid/pull/10746#discussion_r556271084 ## File path: server/src/main/java/org/apache/druid/server/QueryResource.java ## @@ -463,36 +481,36 @@ Response ok(Object object) throws IOException Response gotError(Exception e) throws IOException { - return Response.serverError() - .type(contentType) - .entity( - newOutputWriter(null, null, false) - .writeValueAsBytes(QueryInterruptedException.wrapIfNeeded(e)) - ) - .build(); + return buildNonOkResponse( + Status.INTERNAL_SERVER_ERROR.getStatusCode(), + QueryInterruptedException.wrapIfNeeded(e) + ); } Response gotTimeout(QueryTimeoutException e) throws IOException { - return Response.status(QueryTimeoutException.STATUS_CODE) - .type(contentType) - .entity( - newOutputWriter(null, null, false) - .writeValueAsBytes(e) - ) - .build(); + return buildNonOkResponse(QueryTimeoutException.STATUS_CODE, e); } Response gotLimited(QueryCapacityExceededException e) throws IOException { - return Response.status(QueryCapacityExceededException.STATUS_CODE) - .entity(newOutputWriter(null, null, false).writeValueAsBytes(e)) - .build(); + return buildNonOkResponse(QueryCapacityExceededException.STATUS_CODE, e); } Response gotUnsupported(QueryUnsupportedException e) throws IOException { - return Response.status(QueryUnsupportedException.STATUS_CODE) + return buildNonOkResponse(QueryUnsupportedException.STATUS_CODE, e); +} + +Response gotBadQuery(BadQueryException e) throws IOException +{ + return buildNonOkResponse(BadQueryException.STATUS_CODE, e); Review comment: Thanks! ## File path: processing/src/main/java/org/apache/druid/query/QueryUnsupportedException.java ## @@ -24,21 +24,20 @@ import com.fasterxml.jackson.annotation.JsonProperty; import javax.annotation.Nullable; -import java.net.InetAddress; /** * This exception is for the query engine to surface when a query cannot be run. This can be due to the * following reasons: 1) The query is not supported yet. 2) The query is not something Druid would ever supports. * For these cases, the exact causes and details should also be documented in Druid user facing documents. * - * As a {@link QueryException} it is expected to be serialied to a json response, but will be mapped to - * {@link #STATUS_CODE} instead of the default HTTP 500 status. + * As a {@link QueryException} it is expected to be serialied to a json response with a proper HTTP error code Review comment: Oops, thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10746: Tidy up HTTP status codes for query errors
jihoonson commented on a change in pull request #10746: URL: https://github.com/apache/druid/pull/10746#discussion_r556270879 ## File path: processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java ## @@ -19,18 +19,33 @@ package org.apache.druid.query; +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; import org.apache.druid.java.util.common.StringUtils; /** * Exception indicating that an operation failed because it exceeded some configured resource limit. * - * This is used as a marker exception by {@link QueryInterruptedException} to report the "Resource limit exceeded" - * error code. + * This is a {@link BadQueryException} because it will likely a user's misbehavior when this exception is thrown. + * Druid cluster operators set some set of resource limitations for some good reason. When a user query requires + * too many resources, it will likely mean that the user query should be modified to not use excessive resources. Review comment: @a2l007 oh, you are correct about the exception type in `checkTotalBytesLimit()`. My point was that this exception doesn't seem to be propagated to `JsonParserIterator` anyway because [`NettyHttpClient` doesn't do it](https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/java/util/http/client/NettyHttpClient.java#L263). I think `NettyHttpClient` should call `retVal.setException()` instead of `retVal.set(null)`, but I haven't studied the code enough around it to figure out why we are doing this. Anyway, it doesn't look harm to change the exception type in this PR, so I changed :slightly_smiling_face: @jon-wei :+1: Updated the javadoc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions
clintropolis commented on a change in pull request #10605: URL: https://github.com/apache/druid/pull/10605#discussion_r556267351 ## File path: docs/misc/math-expr.md ## @@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for each function. |acos|acos(x) would return the arc cosine of x| |asin|asin(x) would return the arc sine of x| |atan|atan(x) would return the arc tangent of x| +|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values will be converted to their bit representation| +|bitwiseComplement|bitwiseComplement(x) would return the result of ~x. Double values will be converted to their bit representation| +|bitwiseConvertDouble|bitwiseConvertDouble(x) would convert the IEEE 754 floating-point "double" bits stored in a long into a double value if the input is a long, or the copy bits of a double value into a long if the input is a double.| Review comment: btw, testing `bitwiseConvertLongBitsToDouble(bitwiseConvertLongBitsToDouble(..))` uncovered an issue with the parser when trying to parse the output of `Expr.stringify` (because unit tests cover this round trip scenario, and when flatten is true it turns it into a constant), where large doubles with exponents, e.g. `1E10` or whatever, could not be correctly parsed, so I expanded the grammar to allow it [roughly according to these rules](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#valueOf-java.lang.String-) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10743: fix limited queries with subtotals
jihoonson commented on a change in pull request #10743: URL: https://github.com/apache/druid/pull/10743#discussion_r556259073 ## File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/ByteBufferMinMaxOffsetHeap.java ## @@ -43,6 +43,7 @@ private final LimitedBufferHashGrouper.BufferGrouperOffsetHeapIndexUpdater heapIndexUpdater; private int heapSize; + private int maxHeapSize; Review comment: nit: this sounds like a max limit on the heap size while it is actually to remember the max value of heap size. Maybe some javadoc would help. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jon-wei commented on a change in pull request #10746: Tidy up HTTP status codes for query errors
jon-wei commented on a change in pull request #10746: URL: https://github.com/apache/druid/pull/10746#discussion_r556245770 ## File path: processing/src/main/java/org/apache/druid/query/QueryUnsupportedException.java ## @@ -24,21 +24,20 @@ import com.fasterxml.jackson.annotation.JsonProperty; import javax.annotation.Nullable; -import java.net.InetAddress; /** * This exception is for the query engine to surface when a query cannot be run. This can be due to the * following reasons: 1) The query is not supported yet. 2) The query is not something Druid would ever supports. * For these cases, the exact causes and details should also be documented in Druid user facing documents. * - * As a {@link QueryException} it is expected to be serialied to a json response, but will be mapped to - * {@link #STATUS_CODE} instead of the default HTTP 500 status. + * As a {@link QueryException} it is expected to be serialied to a json response with a proper HTTP error code Review comment: serialied -> serialized ## File path: processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java ## @@ -19,18 +19,33 @@ package org.apache.druid.query; +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; import org.apache.druid.java.util.common.StringUtils; /** * Exception indicating that an operation failed because it exceeded some configured resource limit. * - * This is used as a marker exception by {@link QueryInterruptedException} to report the "Resource limit exceeded" - * error code. + * This is a {@link BadQueryException} because it will likely a user's misbehavior when this exception is thrown. + * Druid cluster operators set some set of resource limitations for some good reason. When a user query requires + * too many resources, it will likely mean that the user query should be modified to not use excessive resources. Review comment: Suggest rewording to something like: "This is a {@link BadQueryException} because it likely indicates a user's misbehavior when this exception is thrown. The resource limitations set by Druid cluster operators are typically less flexible than the parameters of a user query, so when a user query requires too many resources, the likely remedy is that the user query should be modified to use fewer resources, or to reduce query volume." This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] a2l007 commented on a change in pull request #10746: Tidy up HTTP status codes for query errors
a2l007 commented on a change in pull request #10746: URL: https://github.com/apache/druid/pull/10746#discussion_r556218910 ## File path: processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java ## @@ -19,18 +19,33 @@ package org.apache.druid.query; +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; import org.apache.druid.java.util.common.StringUtils; /** * Exception indicating that an operation failed because it exceeded some configured resource limit. * - * This is used as a marker exception by {@link QueryInterruptedException} to report the "Resource limit exceeded" - * error code. + * This is a {@link BadQueryException} because it will likely a user's misbehavior when this exception is thrown. + * Druid cluster operators set some set of resource limitations for some good reason. When a user query requires + * too many resources, it will likely mean that the user query should be modified to not use excessive resources. Review comment: If we change this, the exception type and code changes from RE to ResourceLimitExceeded for queries exceeding maxScatterGatherBytes. What are the side effects that could happen? Sorry if I'm missing something :) ## File path: server/src/main/java/org/apache/druid/server/QueryResource.java ## @@ -463,36 +481,36 @@ Response ok(Object object) throws IOException Response gotError(Exception e) throws IOException { - return Response.serverError() - .type(contentType) - .entity( - newOutputWriter(null, null, false) - .writeValueAsBytes(QueryInterruptedException.wrapIfNeeded(e)) - ) - .build(); + return buildNonOkResponse( + Status.INTERNAL_SERVER_ERROR.getStatusCode(), + QueryInterruptedException.wrapIfNeeded(e) + ); } Response gotTimeout(QueryTimeoutException e) throws IOException { - return Response.status(QueryTimeoutException.STATUS_CODE) - .type(contentType) - .entity( - newOutputWriter(null, null, false) - .writeValueAsBytes(e) - ) - .build(); + return buildNonOkResponse(QueryTimeoutException.STATUS_CODE, e); } Response gotLimited(QueryCapacityExceededException e) throws IOException { - return Response.status(QueryCapacityExceededException.STATUS_CODE) - .entity(newOutputWriter(null, null, false).writeValueAsBytes(e)) - .build(); + return buildNonOkResponse(QueryCapacityExceededException.STATUS_CODE, e); } Response gotUnsupported(QueryUnsupportedException e) throws IOException { - return Response.status(QueryUnsupportedException.STATUS_CODE) + return buildNonOkResponse(QueryUnsupportedException.STATUS_CODE, e); +} + +Response gotBadQuery(BadQueryException e) throws IOException +{ + return buildNonOkResponse(BadQueryException.STATUS_CODE, e); Review comment: Thanks works fine now. I agree that we can avoid deeper tinkering of QueryException before the release and it can be picked up post release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10732: Add a config for monitorScheduler type
jihoonson commented on a change in pull request #10732: URL: https://github.com/apache/druid/pull/10732#discussion_r556211781 ## File path: server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java ## @@ -28,9 +29,17 @@ */ public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig { + @JsonProperty + private String schedulerClassName = ClockDriftSafeMonitorScheduler.class.getName(); Review comment: Changed the default back to `BasicMonitorScheduler`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10732: Add a config for monitorScheduler type
jihoonson commented on a change in pull request #10732: URL: https://github.com/apache/druid/pull/10732#discussion_r556211566 ## File path: server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java ## @@ -28,9 +29,17 @@ */ public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig { + @JsonProperty + private String schedulerClassName = ClockDriftSafeMonitorScheduler.class.getName(); Review comment: > So, because the potential reward has a small impact, and the potential risk has a large impact, I think it's best to default to the old scheduler for another release or so. Just until such time as people have been able to do long-running tests in production and have found that there are no issues. This makes sense to me. I think we can do more extensive testing by ourselves instead of rushing to change the default. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10746: Tidy up HTTP status codes for query errors
jihoonson commented on a change in pull request #10746: URL: https://github.com/apache/druid/pull/10746#discussion_r556202316 ## File path: server/src/main/java/org/apache/druid/server/QueryResource.java ## @@ -463,36 +481,36 @@ Response ok(Object object) throws IOException Response gotError(Exception e) throws IOException { - return Response.serverError() - .type(contentType) - .entity( - newOutputWriter(null, null, false) - .writeValueAsBytes(QueryInterruptedException.wrapIfNeeded(e)) - ) - .build(); + return buildNonOkResponse( + Status.INTERNAL_SERVER_ERROR.getStatusCode(), + QueryInterruptedException.wrapIfNeeded(e) + ); } Response gotTimeout(QueryTimeoutException e) throws IOException { - return Response.status(QueryTimeoutException.STATUS_CODE) - .type(contentType) - .entity( - newOutputWriter(null, null, false) - .writeValueAsBytes(e) - ) - .build(); + return buildNonOkResponse(QueryTimeoutException.STATUS_CODE, e); } Response gotLimited(QueryCapacityExceededException e) throws IOException { - return Response.status(QueryCapacityExceededException.STATUS_CODE) - .entity(newOutputWriter(null, null, false).writeValueAsBytes(e)) - .build(); + return buildNonOkResponse(QueryCapacityExceededException.STATUS_CODE, e); } Response gotUnsupported(QueryUnsupportedException e) throws IOException { - return Response.status(QueryUnsupportedException.STATUS_CODE) + return buildNonOkResponse(QueryUnsupportedException.STATUS_CODE, e); +} + +Response gotBadQuery(BadQueryException e) throws IOException +{ + return buildNonOkResponse(BadQueryException.STATUS_CODE, e); Review comment: Thanks for testing! I thought I fixed all cases where exceptions are wrapped in `QueryInterruptedException`, but missed `JsonParserIterator`. My intention was to fix it in this PR, so I went ahead and fixed it. It turns out there is an issue of that the type of query exception is lost after JSON deserialization. I ended up with [this ugly switch clause](https://github.com/apache/druid/pull/10746/files#diff-28a786f3719500e89592ff9617a926736337b723e4245c410601669db0f29ee5R207-R259) to restore the exception type. A better approach is modifying `QueryException` to store the type using Jackson, but I wasn't 100% sure what is the side effect there too. The ugly switch clause seems at least safer than modifying `QueryException`. I think we can rework the serde system of `QueryException` after release. What do you think? ## File path: processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java ## @@ -19,18 +19,33 @@ package org.apache.druid.query; +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; import org.apache.druid.java.util.common.StringUtils; /** * Exception indicating that an operation failed because it exceeded some configured resource limit. * - * This is used as a marker exception by {@link QueryInterruptedException} to report the "Resource limit exceeded" - * error code. + * This is a {@link BadQueryException} because it will likely a user's misbehavior when this exception is thrown. + * Druid cluster operators set some set of resource limitations for some good reason. When a user query requires + * too many resources, it will likely mean that the user query should be modified to not use excessive resources. Review comment: Good catch :slightly_smiling_face: It should be, but I wasn't 100% sure whether it could cause any side effects if I did it. I left a comment about it [here](https://github.com/apache/druid/pull/10746/files#diff-28a786f3719500e89592ff9617a926736337b723e4245c410601669db0f29ee5R171). I will also make an issue to track it. ## File path: processing/src/main/java/org/apache/druid/query/BadQueryException.java ## @@ -20,12 +20,16 @@ package org.apache.druid.query; /** - * This exception is thrown when the requested operation cannot be completed due to a lack of available resources. + * An abstract class for all query exceptions that should return a bad request status code (404). Review comment: Oops, yes it should be. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [druid] techdocsmith commented on a change in pull request #10748: reword for clarity
techdocsmith commented on a change in pull request #10748: URL: https://github.com/apache/druid/pull/10748#discussion_r556188376 ## File path: docs/ingestion/native-batch.md ## @@ -192,7 +192,7 @@ that range if there's some stray data with unexpected timestamps. ||---|---|-| |type|The task type, this should always be `index_parallel`.|none|yes| |inputFormat|[`inputFormat`](./data-formats.md#input-format) to specify how to parse input data.|none|yes| -|appendToExisting|Creates segments as additional shards of the latest version, effectively appending to the segment set instead of replacing it. The current limitation is that you can append to any datasources regardless of their original partitioning scheme, but the appended segments should be partitioned using the `dynamic` partitionsSpec.|false|no| +|appendToExisting|Creates segments as additional shards of the latest version, effectively appending to the segment set instead of replacing it. This means that you can append new segments to any datasource regardless of its original partitioning scheme. However, you must use the `dynamic` partitioning type for the appended segments .|false|no| Review comment: I am still a noob, but I tried setting the partitionspec to `hashed` and appendToExisting to `true`. The task succeeded, but when I check the payload it looks like appendToExisting was set to `false` ``` "ioConfig": { "type": "index_parallel", "inputSource": { "type": "http", "uris": [ "https://static.imply.io/data/wikipedia.json.gz; ], "httpAuthenticationUsername": null, "httpAuthenticationPassword": null }, "inputFormat": { "type": "json", "flattenSpec": { "useFieldDiscovery": true, "fields": [] }, "featureSpec": {} }, "appendToExisting": false ``` I'll need to try again maybe and follow up with someone with deeper insight to verify how it is supposed to work in this case. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10614: re-use expression vector evaluation results for the same offset in expression vector selectors
jihoonson commented on pull request #10614: URL: https://github.com/apache/druid/pull/10614#issuecomment-759125756 LGTM. Thanks @clintropolis. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (aecc9e5 -> 2fc2938)
This is an automated email from the ASF dual-hosted git repository. suneet pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from aecc9e5 Remove legacy code from LogUsedSegments duty (#10287) add 2fc2938 Web console: fix bad results if there is not native bigint (#10741) No new revisions were added by this update. Summary of changes: licenses.yaml | 34 ++ web-console/package-lock.json | 14 +- web-console/package.json | 2 +- web-console/src/singletons/api.spec.ts | 28 4 files changed, 40 insertions(+), 38 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on pull request #10741: Web console: fix bad results if there is not native bigint
suneet-s commented on pull request #10741: URL: https://github.com/apache/druid/pull/10741#issuecomment-759124141 nice test! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s merged pull request #10741: Web console: fix bad results if there is not native bigint
suneet-s merged pull request #10741: URL: https://github.com/apache/druid/pull/10741 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10748: reword for clarity
gianm commented on a change in pull request #10748: URL: https://github.com/apache/druid/pull/10748#discussion_r556167558 ## File path: docs/ingestion/native-batch.md ## @@ -192,7 +192,7 @@ that range if there's some stray data with unexpected timestamps. ||---|---|-| |type|The task type, this should always be `index_parallel`.|none|yes| |inputFormat|[`inputFormat`](./data-formats.md#input-format) to specify how to parse input data.|none|yes| -|appendToExisting|Creates segments as additional shards of the latest version, effectively appending to the segment set instead of replacing it. The current limitation is that you can append to any datasources regardless of their original partitioning scheme, but the appended segments should be partitioned using the `dynamic` partitionsSpec.|false|no| +|appendToExisting|Creates segments as additional shards of the latest version, effectively appending to the segment set instead of replacing it. This means that you can append new segments to any datasource regardless of its original partitioning scheme. However, you must use the `dynamic` partitioning type for the appended segments .|false|no| Review comment: The doc should address what happens if you disregard this advice and set appendToExisting: true but don't use dynamic partitioning. (Do you get an error? Does something weird happen?) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] techdocsmith opened a new pull request #10748: reword for clarity
techdocsmith opened a new pull request #10748: URL: https://github.com/apache/druid/pull/10748 Fixes awkward wording around the behavior of `appendToExsiting`. ### Description Removes ambiguity around the "current limitation". Describes the option in terms of the current product functionaltiy. This PR has: - [x] been self-reviewed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (ca32652 -> aecc9e5)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from ca32652 Fix potential deadlock in batch ingestion (#10736) add aecc9e5 Remove legacy code from LogUsedSegments duty (#10287) No new revisions were added by this update. Summary of changes: .../apache/druid/server/coordinator/DruidCoordinator.java| 2 +- .../druid/server/coordinator/duty/LogUsedSegments.java | 12 ++-- 2 files changed, 3 insertions(+), 11 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #10287: Remove legacy code from LogUsedSegments duty
jihoonson merged pull request #10287: URL: https://github.com/apache/druid/pull/10287 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (3984457 -> ca32652)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 3984457 Add missing unit tests for segment loading in historicals (#10737) add ca32652 Fix potential deadlock in batch ingestion (#10736) No new revisions were added by this update. Summary of changes: .../common/task/AbstractBatchIndexTask.java| 12 +- .../druid/indexing/common/task/TaskLockHelper.java | 11 - .../druid/indexing/overlord/TaskLockbox.java | 29 ++- .../apache/druid/indexing/overlord/TaskQueue.java | 134 +- .../indexing/common/task/IngestionTestBase.java| 16 ++ .../druid/indexing/overlord/TaskQueueTest.java | 278 + 6 files changed, 397 insertions(+), 83 deletions(-) create mode 100644 indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskQueueTest.java - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10736: Fix potential deadlock in batch ingestion
jihoonson commented on pull request #10736: URL: https://github.com/apache/druid/pull/10736#issuecomment-758968538 @loquisgon @maytasm thanks for the review! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #10736: Fix potential deadlock in batch ingestion
jihoonson merged pull request #10736: URL: https://github.com/apache/druid/pull/10736 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] a2l007 commented on a change in pull request #10746: Tidy up HTTP status codes for query errors
a2l007 commented on a change in pull request #10746: URL: https://github.com/apache/druid/pull/10746#discussion_r556043383 ## File path: processing/src/main/java/org/apache/druid/query/BadQueryException.java ## @@ -20,12 +20,16 @@ package org.apache.druid.query; /** - * This exception is thrown when the requested operation cannot be completed due to a lack of available resources. + * An abstract class for all query exceptions that should return a bad request status code (404). Review comment: Shouldn't it be 400 here? ## File path: processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java ## @@ -19,18 +19,33 @@ package org.apache.druid.query; +import com.fasterxml.jackson.annotation.JsonCreator; +import com.fasterxml.jackson.annotation.JsonProperty; import org.apache.druid.java.util.common.StringUtils; /** * Exception indicating that an operation failed because it exceeded some configured resource limit. * - * This is used as a marker exception by {@link QueryInterruptedException} to report the "Resource limit exceeded" - * error code. + * This is a {@link BadQueryException} because it will likely a user's misbehavior when this exception is thrown. + * Druid cluster operators set some set of resource limitations for some good reason. When a user query requires + * too many resources, it will likely mean that the user query should be modified to not use excessive resources. Review comment: Based on the description, shouldn't [this exception](https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/client/DirectDruidClient.java#L447) also be categorized as ResourceLimitExceeded ? ## File path: server/src/main/java/org/apache/druid/server/QueryResource.java ## @@ -463,36 +481,36 @@ Response ok(Object object) throws IOException Response gotError(Exception e) throws IOException { - return Response.serverError() - .type(contentType) - .entity( - newOutputWriter(null, null, false) - .writeValueAsBytes(QueryInterruptedException.wrapIfNeeded(e)) - ) - .build(); + return buildNonOkResponse( + Status.INTERNAL_SERVER_ERROR.getStatusCode(), + QueryInterruptedException.wrapIfNeeded(e) + ); } Response gotTimeout(QueryTimeoutException e) throws IOException { - return Response.status(QueryTimeoutException.STATUS_CODE) - .type(contentType) - .entity( - newOutputWriter(null, null, false) - .writeValueAsBytes(e) - ) - .build(); + return buildNonOkResponse(QueryTimeoutException.STATUS_CODE, e); } Response gotLimited(QueryCapacityExceededException e) throws IOException { - return Response.status(QueryCapacityExceededException.STATUS_CODE) - .entity(newOutputWriter(null, null, false).writeValueAsBytes(e)) - .build(); + return buildNonOkResponse(QueryCapacityExceededException.STATUS_CODE, e); } Response gotUnsupported(QueryUnsupportedException e) throws IOException { - return Response.status(QueryUnsupportedException.STATUS_CODE) + return buildNonOkResponse(QueryUnsupportedException.STATUS_CODE, e); +} + +Response gotBadQuery(BadQueryException e) throws IOException +{ + return buildNonOkResponse(BadQueryException.STATUS_CODE, e); Review comment: I was curious whether this would catch ResourceLimitExceeded exceptions wrapped under `QueryInterruptedException` and so I ran the PR on my test cluster. It looks like such tests are still giving us 500 instead of 400. Related stack trace: ``` HTTP/1.1 500 Internal Server Error < Content-Type: application/json < Content-Length: 372 { "error" : "Resource limit exceeded", "trace": org.apache.druid.query.QueryInterruptedException: Not enough aggregation buffer space to execute this query. Try increasing druid.processing.buffer.sizeBytes or enable disk spilling by setting druid.query.groupBy.maxOnDiskStorage to a positive number. at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_231] at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_231] at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_231] at java.lang.reflect.Constructor.newInstance(Constructor.java:423) ~[?:1.8.0_231] at com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:124) ~[jackson-databind-2.10.5.1.jar:2.10.5.1] at
[GitHub] [druid] suneet-s commented on a change in pull request #10732: Add a config for monitorScheduler type
suneet-s commented on a change in pull request #10732: URL: https://github.com/apache/druid/pull/10732#discussion_r556068500 ## File path: server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java ## @@ -28,9 +29,17 @@ */ public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig { + @JsonProperty + private String schedulerClassName = ClockDriftSafeMonitorScheduler.class.getName(); Review comment: I've also thought about this a bunch and have changed my opinion on whether or not we should change the scheduler to a new dependency by default a few times. While changing to use the CronScheduler might fix a bug, it isn't clear whether any users have run into this in the field. I thought about documenting why a user would want to change the scheduler to CronScheduler instead of the older implementation, and I couldn't think of a good user facing reason to do so. So if we set the default to the old implementation, I don't think anyone would test it in production, so it would continue to live as dead code, and we'll have the same dilemma in the next release or 2 when we ask whether or not this has been run in production. Setting the default to the older implementation reduces the impact of any bug that might show up in long running tests (even though this library was specifically built to fix issues found with long running processes). The drawback here is finding a reason for some users to try this in production so that we can sunset the feature flag in a release or 2. Writing out this comment, I now think the more cautious approach - keeping the default the same - is better as it's hard to articulate the benefit for switching the scheduling and taking on the risk associated with changing the older behavior. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10732: Add a config for monitorScheduler type
gianm commented on a change in pull request #10732: URL: https://github.com/apache/druid/pull/10732#discussion_r556054132 ## File path: server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java ## @@ -28,9 +29,17 @@ */ public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig { + @JsonProperty + private String schedulerClassName = ClockDriftSafeMonitorScheduler.class.getName(); Review comment: By the way, if anyone has been running the patch in production for a while already, now would be a good time to speak up. If we have already built up a good amount of confidence then I think it makes sense to default to the new one. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10732: Add a config for monitorScheduler type
gianm commented on a change in pull request #10732: URL: https://github.com/apache/druid/pull/10732#discussion_r556053575 ## File path: server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java ## @@ -28,9 +29,17 @@ */ public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig { + @JsonProperty + private String schedulerClassName = ClockDriftSafeMonitorScheduler.class.getName(); Review comment: My 2ยข: the best plan is to default to the old one, and then in the future (after some people have enabled the new one in production) we should switch to the new one, and remove the old one and remove the config entirely. Rationale: The new scheduler is designed to eliminate potential clock drift for monitors. This reward is real but is pretty small impact. I don't expect anything bad will happen if the schedule drifts a bit. The main risk of the new scheduler, I suppose, is that there's some case where it goes haywire, and either locks up completely or fires much more often than it should. I'm not sure how likely this is, but it's (a) hard to test for, (b) quite bad if it happens. So, because the potential reward has a small impact, and the potential risk has a large impact, I think it's best to default to the old scheduler for another release or so. Just until such time as people have been able to do long-running tests in production and have found that there are no issues. At any rate, it's good that this is undocumented, since it's an inside-baseball sort of config that we would only want to exist for a few releases. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #10743: fix limited queries with subtotals
suneet-s commented on a change in pull request #10743: URL: https://github.com/apache/druid/pull/10743#discussion_r556004791 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -16281,6 +16281,71 @@ public void testTimeStampAddConversion() throws Exception ); } + @Test + public void testGroupingSetsWithLimit() throws Exception Review comment: Skimming through the code - I think we should add some tests to test the iterators built for a query with an `ORDER BY` on dimensions that are part of the grouping sets (like `dim2`) and for something that isn't (like `SUM(cnt)` or another dimension) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid-website-src] druid-matt opened a new pull request #200: remove 1 meetup, add 2
druid-matt opened a new pull request #200: URL: https://github.com/apache/druid-website-src/pull/200 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on pull request #10740: Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
maytasm commented on pull request #10740: URL: https://github.com/apache/druid/pull/10740#issuecomment-758863964 > ``` > [ERROR] Failures: > [ERROR] org.apache.druid.segment.realtime.appenderator.AppenderatorTest.testMaxBytesInMemory(org.apache.druid.segment.realtime.appenderator.AppenderatorTest) > [ERROR] Run 1: AppenderatorTest.testMaxBytesInMemory:204 expected:<0> but was:<364> > ``` > > Test failure looks legit fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] loquisgon commented on a change in pull request #10742: Granularity interval materialization
loquisgon commented on a change in pull request #10742: URL: https://github.com/apache/druid/pull/10742#discussion_r555991064 ## File path: server/src/main/java/org/apache/druid/segment/indexing/granularity/GranularitySpec.java ## @@ -45,7 +46,7 @@ * * @return set of all time groups Review comment: done ## File path: server/src/main/java/org/apache/druid/segment/indexing/granularity/UniformGranularitySpec.java ## @@ -57,18 +60,17 @@ public UniformGranularitySpec( this.segmentGranularity = segmentGranularity == null ? DEFAULT_SEGMENT_GRANULARITY : segmentGranularity; if (inputIntervals != null) { - List granularIntervals = new ArrayList<>(); - for (Interval inputInterval : inputIntervals) { -Iterables.addAll(granularIntervals, this.segmentGranularity.getIterable(inputInterval)); - } - this.inputIntervals = ImmutableList.copyOf(inputIntervals); - this.wrappedSpec = new ArbitraryGranularitySpec(queryGranularity, rollup, granularIntervals); + // sort them Review comment: done This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on pull request #10740: Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
suneet-s commented on pull request #10740: URL: https://github.com/apache/druid/pull/10740#issuecomment-758803244 ``` [ERROR] Failures: [ERROR] org.apache.druid.segment.realtime.appenderator.AppenderatorTest.testMaxBytesInMemory(org.apache.druid.segment.realtime.appenderator.AppenderatorTest) [ERROR] Run 1: AppenderatorTest.testMaxBytesInMemory:204 expected:<0> but was:<364> ``` Test failure looks legit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] loquisgon commented on a change in pull request #10742: Granularity interval materialization
loquisgon commented on a change in pull request #10742: URL: https://github.com/apache/druid/pull/10742#discussion_r555931298 ## File path: server/src/main/java/org/apache/druid/segment/indexing/granularity/UniformGranularitySpec.java ## @@ -57,18 +60,17 @@ public UniformGranularitySpec( this.segmentGranularity = segmentGranularity == null ? DEFAULT_SEGMENT_GRANULARITY : segmentGranularity; if (inputIntervals != null) { - List granularIntervals = new ArrayList<>(); - for (Interval inputInterval : inputIntervals) { -Iterables.addAll(granularIntervals, this.segmentGranularity.getIterable(inputInterval)); - } - this.inputIntervals = ImmutableList.copyOf(inputIntervals); - this.wrappedSpec = new ArbitraryGranularitySpec(queryGranularity, rollup, granularIntervals); + // sort them Review comment: Contract indicates that inputIntervals must be left "as is" will remove sorting in next commit This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] nishantmonu51 commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
nishantmonu51 commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r555825958 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -1044,17 +1048,21 @@ public IndexIOConfig( if (firehoseFactory != null && inputFormat != null) { throw new IAE("Cannot use firehose and inputFormat together. Try using inputSource instead of firehose."); } + if (inputSource != null) { +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: isReady is also called multiple times, so not sure if that is any better in terms of number of checks, Just to double check, if i move it to isReady the task will be accepted by the overlord and it will fail when the task is actually tried to be run. I was thinking that if someone is not allowed, we should block the ingestion at that time when the task is submitted at first place. Is that the behavior you are expecting ? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] nishantmonu51 commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
nishantmonu51 commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r555815427 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java ## @@ -153,6 +154,9 @@ @JsonIgnore private final RetryPolicyFactory retryPolicyFactory; + @JsonIgnore + private final InputSourceSecurityConfig securityConfig; Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] nishantmonu51 commented on pull request #10677: Add URI based allowPrefexList and denyPrefixList
nishantmonu51 commented on pull request #10677: URL: https://github.com/apache/druid/pull/10677#issuecomment-758695886 @jihoonson thanks for the review, I hv updated the PR, yes, it will be nice to get this in 0.21.0 release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] nishantmonu51 commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
nishantmonu51 commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r555809503 ## File path: extensions-core/hdfs-storage/src/main/java/org/apache/druid/inputsource/hdfs/HdfsInputSource.java ## @@ -214,10 +215,23 @@ public boolean needsFormat() private void cachePathsIfNeeded() throws IOException { if (cachedPaths == null) { - cachedPaths = ImmutableList.copyOf(Preconditions.checkNotNull(getPaths(inputPaths, configuration), "paths")); + Collection paths = Preconditions.checkNotNull(getPaths(inputPaths, configuration), "paths"); + cachedPaths = ImmutableList.copyOf(paths); } } + @Override + public void validateAllowDenyPrefixList(InputSourceSecurityConfig securityConfig) + { +try { + cachePathsIfNeeded(); Review comment: done. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] nishantmonu51 commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
nishantmonu51 commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r555808142 ## File path: core/src/main/java/org/apache/druid/data/input/AbstractInputSource.java ## @@ -31,13 +31,16 @@ */ public abstract class AbstractInputSource implements InputSource { + private transient boolean validated = false; Review comment: equals tests on base classes were failing without making it transient as they were not including this field in the comparison This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] licl2014 opened a new issue #10747: Hyperunique is giving different results on multiple execution of same query
licl2014 opened a new issue #10747: URL: https://github.com/apache/druid/issues/10747 Hyperunique is giving different results on multiple execution of same query ### Affected Version 0.12.3 ### Description What my query as follows: ``` { "queryType":"timeseries", "dataSource":{ "type":"table", "name":"datasource" }, "intervals":{ "type":"LegacySegmentSpec", "intervals":[ "2020-12-31T16:00:00.000Z/2021-01-01T16:00:00.000Z" ] }, "descending":false, "virtualColumns":[ ], "filter":null, "granularity":{ "type":"period", "period":"P1D", "timeZone":"Asia/Shanghai", "origin":null }, "aggregations":[ { "type":"hyperUnique", "name":"hll_cd", "fieldName":"cd", "isInputHyperUnique":false, "round":false } ], "postAggregations":[ ], "context":{ "groupByStrategy":"v2", "skipEmptyBuckets":true, "timeout":3 } } ``` **Results** attempt1: ``` [ { "timestamp" : "2021-01-01T00:00:00.000+08:00", "result" : { "hll_cd" : 3.0347768393719055E7 } } ] ``` attempt2: ``` [ { "timestamp" : "2021-01-01T00:00:00.000+08:00", "result" : { "hll_cd" : 3.036636100442829E7 } } ] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] fabes99 commented on issue #8663: Kafka indexing service duplicate entry exception in druid_pendingSegments
fabes99 commented on issue #8663: URL: https://github.com/apache/druid/issues/8663#issuecomment-758589129 Facing the same issue also with version 0.18.1. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10732: Add a config for monitorScheduler type
jihoonson commented on a change in pull request #10732: URL: https://github.com/apache/druid/pull/10732#discussion_r555600385 ## File path: core/src/main/java/org/apache/druid/java/util/metrics/ClockDriftSafeMonitorScheduler.java ## @@ -0,0 +1,134 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.java.util.metrics; + +import io.timeandspace.cronscheduler.CronScheduler; +import io.timeandspace.cronscheduler.CronTask; +import org.apache.druid.java.util.common.logger.Logger; +import org.apache.druid.java.util.emitter.service.ServiceEmitter; + +import java.util.List; +import java.util.concurrent.ExecutorService; +import java.util.concurrent.Future; +import java.util.concurrent.TimeUnit; +import java.util.concurrent.atomic.AtomicReference; + +/** + * A {@link MonitorScheduler} implementation based on {@link CronScheduler}. + */ +public class ClockDriftSafeMonitorScheduler extends MonitorScheduler +{ + private static final Logger LOG = new Logger(ClockDriftSafeMonitorScheduler.class); + + private final CronScheduler monitorScheduler; + private final ExecutorService monitorRunner; + + public ClockDriftSafeMonitorScheduler( + MonitorSchedulerConfig config, + ServiceEmitter emitter, + List monitors, + CronScheduler monitorScheduler, + ExecutorService monitorRunner + ) + { +super(config, emitter, monitors); +this.monitorScheduler = monitorScheduler; +this.monitorRunner = monitorRunner; + } + + @Override + void startMonitor(final Monitor monitor) + { +monitor.start(); +long rate = getConfig().getEmitterPeriod().getMillis(); +final AtomicReference> scheduleFutureReference = new AtomicReference<>(); +Future scheduledFuture = monitorScheduler.scheduleAtFixedRate( +rate, +rate, +TimeUnit.MILLISECONDS, +new CronTask() +{ + private Future scheduleFuture = null; Review comment: Ah, forgot to clean them up before commit. Thanks :+1: This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10732: Add a config for monitorScheduler type
jihoonson commented on a change in pull request #10732: URL: https://github.com/apache/druid/pull/10732#discussion_r1 ## File path: server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java ## @@ -28,9 +29,17 @@ */ public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig { + @JsonProperty + private String schedulerClassName = ClockDriftSafeMonitorScheduler.class.getName(); Review comment: Yes, I have done some testing before and will do more. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10746: Tidy up HTTP status codes for query errors
jihoonson commented on pull request #10746: URL: https://github.com/apache/druid/pull/10746#issuecomment-758500465 Will update docs as a follow-up. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson opened a new pull request #10746: Tidy up HTTP status codes for query errors
jihoonson opened a new pull request #10746: URL: https://github.com/apache/druid/pull/10746 ### Description Druid querying system returns an HTTP status code on query failures. Currently, most of query failures return an internal error (500) which could be misleading in some cases. This PR is to tidy up this and return a proper status code. Here is a list of query errors, their current status codes, and proposed changes. Query planning errors Exception | Description| Current code | Proposed new code | - | - | - SqlParseException and ValidationException from Calcite | Query planning failed | 500 | **400** CannotPlanException from Calcite | Failed to find the best plan. Likely an internal error | 500 | RelConversionException from Calcite | Conversion from SQL to Druid failed | 500 | CannotBuildQueryException | Failed to create a Druid native query from DruidRel. Likely an internal error | 500 | Others | Unknown errors | 500 | Query execution errors Exception | Description| Current code | Proposed new code | - | - | - QueryTimeoutException | Query execution didn't finish in timeout | 504 | Unauthorized exception | Authentication failed to execute the query | 401 | ForbiddenException | Authorization failed to execute the query | 403 | QueryCapacityExceededException | Query failed to schedule because of the lack of resources available at the time when the query was submitted | 429 | ResourceLimitExceededException | Query asked more resources than configured threshold | 500 | **400** InsufficientResourceException | Query failed to schedule because of lack of merge buffers available at the time when it was submitted | 500 | **429**, merged to QueryCapacityExceededException QueryUnsupportedException | Unsupported functionality | 400 | **501** Others | Unknown exceptions | 500 | `BadQueryException` is added to represent all kinds of query exceptions that should return 400. This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. # Key changed/added classes in this PR * `BadQueryException` * `ResourceLimitExceededException` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] kaijianding opened a new pull request #10745: can reset kafka Supervisor to offset by specified time
kaijianding opened a new pull request #10745: URL: https://github.com/apache/druid/pull/10745 ### Description Currently, druid can only reset to earliest/latest offset, sometimes, users want to read from a specified time like the start of today A new optional param `timestamp` is added to api `POST /druid/indexer/v1/supervisor//reset?timestamp=` Add `Map getPositionFromTime(long offsetTime);` to get offsets from time Only kafka supports this feature Add `OverrideNotice` to replace the whole DataSourceMetadata in metastore and kill tasks to let tasks restart and read starting from the new DatasourceMetadata(the same logic as ResetNotice) This PR has: - [x] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [x] added documentation for new or modified features or behaviors. - [x] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [x] been tested in a test Druid cluster. # Key changed/added classes in this PR * `KafkaRecordSupplier` * `IndexerSQLMetadataStorageCoordinator` * `SeekableStreamSupervisor.java` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org