[GitHub] [druid] maytasm opened a new pull request #10740: Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead
maytasm opened a new pull request #10740: URL: https://github.com/apache/druid/pull/10740 Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead ### Description Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead. Right now one sink with 1000 rows and 1000 sink with 1 row (same rows) will result in the same byte calculation which is incorrect. This can cause OOM during ingestion when we have a lot of sink (case such as big time range and/or small segmentGranularity. This PR has: - [ ] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (2837a9b -> 118b501)
This is an automated email from the ASF dual-hosted git repository. xvrl pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 2837a9b [Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout` (#10661) add 118b501 Introduce KafkaRecordEntity to support Kafka headers in InputFormats (#10730) No new revisions were added by this update. Summary of changes: .../org/apache/druid/data/input/InputEntity.java | 3 + .../druid/data/input/kafka/KafkaRecordEntity.java | 53 + .../IncrementalPublishingKafkaIndexTaskRunner.java | 13 +- .../druid/indexing/kafka/KafkaIndexTask.java | 5 +- .../druid/indexing/kafka/KafkaRecordSupplier.java | 10 +- .../indexing/kafka/supervisor/KafkaSupervisor.java | 9 +- .../druid/indexing/kafka/KafkaIndexTaskTest.java | 256 ++--- .../indexing/kafka/KafkaRecordSupplierTest.java| 26 ++- .../kafka/supervisor/KafkaSupervisorTest.java | 3 +- .../druid/indexing/kinesis/KinesisIndexTask.java | 5 +- .../indexing/kinesis/KinesisIndexTaskRunner.java | 9 +- .../indexing/kinesis/KinesisRecordSupplier.java| 61 +++-- .../kinesis/supervisor/KinesisSupervisor.java | 9 +- .../indexing/kinesis/KinesisIndexTaskTest.java | 106 + .../kinesis/KinesisRecordSupplierTest.java | 38 +-- .../indexing/kinesis/KinesisSamplerSpecTest.java | 13 +- .../kinesis/supervisor/KinesisSupervisorTest.java | 3 +- .../seekablestream/RecordSupplierInputSource.java | 16 +- .../seekablestream/SeekableStreamIndexTask.java| 11 +- .../SeekableStreamIndexTaskRunner.java | 28 +-- .../seekablestream/SeekableStreamSamplerSpec.java | 6 +- .../indexing/seekablestream/SequenceMetadata.java | 12 +- .../indexing/seekablestream/StreamChunkParser.java | 14 +- .../common/OrderedPartitionableRecord.java | 25 +- .../seekablestream/common/RecordSupplier.java | 5 +- .../supervisor/SeekableStreamSupervisor.java | 21 +- .../overlord/sampler/InputSourceSamplerTest.java | 35 +-- .../RecordSupplierInputSourceTest.java | 11 +- .../SeekableStreamIndexTaskTestBase.java | 48 +++- .../seekablestream/StreamChunkParserTest.java | 13 +- .../SeekableStreamSupervisorStateTest.java | 17 +- 31 files changed, 593 insertions(+), 291 deletions(-) create mode 100644 extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/data/input/kafka/KafkaRecordEntity.java - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] xvrl merged pull request #10730: Introduce KafkaRecordEntity to support Kafka headers in InputFormats
xvrl merged pull request #10730: URL: https://github.com/apache/druid/pull/10730 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] xvrl commented on pull request #10730: Introduce KafkaRecordEntity to support Kafka headers in InputFormats
xvrl commented on pull request #10730: URL: https://github.com/apache/druid/pull/10730#issuecomment-757055227 ok I will merge it as is then. Adding specific tests for the trace logging seems a little overkill. We can add it later if we feel strongly about it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (d519264 -> 2837a9b)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from d519264 remove extra comma (#10670) add 2837a9b [Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout` (#10661) No new revisions were added by this update. Summary of changes: docs/configuration/index.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #10661: [Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout`
jihoonson merged pull request #10661: URL: https://github.com/apache/druid/pull/10661 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10661: [Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout`
jihoonson commented on pull request #10661: URL: https://github.com/apache/druid/pull/10661#issuecomment-757045393 > I guess if we indeed do want to support shutdown without waiting, we might need another config property to toggle [setStopAtShutdown](https://www.eclipse.org/jetty/javadoc/current/org/eclipse/jetty/server/Server.html#setStopAtShutdown(boolean)). But I doubt the need for such a config as `druid.server.http.gracefulShutdownTimeout` satisfies most of the usecases. I agree. The current config seems enough. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (3624acb -> d519264)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 3624acb fix web-console show json bug (#10710) add d519264 remove extra comma (#10670) No new revisions were added by this update. Summary of changes: integration-tests/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #10670: [Doc-Fix] Remove extra comma in `druid/integration-tests/README.md`
jihoonson merged pull request #10670: URL: https://github.com/apache/druid/pull/10670 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on pull request #10677: URL: https://github.com/apache/druid/pull/10677#issuecomment-757043116 @nishantmonu51 BTW, since we are about to cut a branch for 0.21.0 release (scheduled on the next Monday), do you think we can get this PR merged before the branch cut? I'm asking because this change will introduce an incompatible change after the release. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r554239833 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java ## @@ -153,6 +154,9 @@ @JsonIgnore private final RetryPolicyFactory retryPolicyFactory; + @JsonIgnore + private final InputSourceSecurityConfig securityConfig; Review comment: Why not `InputSourceSecurityConfig.ALLOW_ALL`? The system-wide securityConfig is ignored in DruidInputSource anyway. ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -1044,17 +1048,21 @@ public IndexIOConfig( if (firehoseFactory != null && inputFormat != null) { throw new IAE("Cannot use firehose and inputFormat together. Try using inputSource instead of firehose."); } + if (inputSource != null) { +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: I guess you wanted to validate inputSource in the constructor to fail early. However, this seems possible to cause some issues because it will throw exception on validation failure, which in turn making creating an ioConfig instance failed. For example, suppose you had an ingestion spec stored in the metadata store. Then, for some reason, you wanted to update the security config to forbid some URIs which are in the ingestion spec in the metadata store. The update will break reading ingestion specs from metadata store which happens in the Overlord. It will also unnecessarily perform the duplicate check multiple times during ingestion. I think this should be called somewhere in the code path of ingestion, such as `Task.isReady()`. ## File path: extensions-core/hdfs-storage/src/main/java/org/apache/druid/inputsource/hdfs/HdfsInputSource.java ## @@ -214,10 +215,23 @@ public boolean needsFormat() private void cachePathsIfNeeded() throws IOException { if (cachedPaths == null) { - cachedPaths = ImmutableList.copyOf(Preconditions.checkNotNull(getPaths(inputPaths, configuration), "paths")); + Collection paths = Preconditions.checkNotNull(getPaths(inputPaths, configuration), "paths"); + cachedPaths = ImmutableList.copyOf(paths); } } + @Override + public void validateAllowDenyPrefixList(InputSourceSecurityConfig securityConfig) + { +try { + cachePathsIfNeeded(); Review comment: I think you won't probably need to cache paths since this will unnecessarily populate the cache in the Overlord. ## File path: core/src/main/java/org/apache/druid/data/input/AbstractInputSource.java ## @@ -31,13 +31,16 @@ */ public abstract class AbstractInputSource implements InputSource { + private transient boolean validated = false; Review comment: Why `transient`? We don't use `Serializable`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] zhangyue19921010 commented on pull request #10688: Historical reloads damaged segments automatically when lazy on start.
zhangyue19921010 commented on pull request #10688: URL: https://github.com/apache/druid/pull/10688#issuecomment-757039352 close and reopen to run ci again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] zhangyue19921010 closed pull request #10688: Historical reloads damaged segments automatically when lazy on start.
zhangyue19921010 closed pull request #10688: URL: https://github.com/apache/druid/pull/10688 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (c62b7c1 -> 3624acb)
This is an automated email from the ASF dual-hosted git repository. vogievetsky pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from c62b7c1 javascript filter result convert to java boolean (#10721) add 3624acb fix web-console show json bug (#10710) No new revisions were added by this update. Summary of changes: .../src/components/json-collapse/json-collapse.spec.tsx | 3 ++- web-console/src/components/json-collapse/json-collapse.tsx| 3 ++- web-console/src/components/json-input/json-input.tsx | 5 +++-- web-console/src/components/show-history/show-history.tsx | 3 ++- web-console/src/components/show-json/show-json.tsx| 3 ++- web-console/src/components/show-log/show-log.tsx | 3 ++- web-console/src/components/table-cell/table-cell.tsx | 3 ++- .../src/dialogs/edit-context-dialog/edit-context-dialog.tsx | 3 ++- .../src/dialogs/history-dialog/history-dialog.spec.tsx| 5 +++-- .../src/dialogs/query-plan-dialog/query-plan-dialog.tsx | 11 --- web-console/src/dialogs/snitch-dialog/snitch-dialog.spec.tsx | 5 +++-- web-console/src/dialogs/spec-dialog/spec-dialog.tsx | 5 - web-console/src/utils/local-storage-backed-array.tsx | 4 +++- web-console/src/utils/local-storage-keys.tsx | 4 +++- web-console/src/utils/object-change.spec.ts | 4 +++- web-console/src/utils/query-history.ts| 4 +++- web-console/src/utils/sampler.ts | 4 +++- web-console/src/views/load-data-view/load-data-view.tsx | 9 + .../load-data-view/parse-data-table/parse-data-table.tsx | 3 ++- .../src/views/query-view/query-output/query-output.tsx| 3 ++- web-console/src/views/query-view/query-view.tsx | 5 +++-- web-console/src/views/segments-view/segments-view.tsx | 5 +++-- 22 files changed, 65 insertions(+), 32 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] vogievetsky merged pull request #10710: fix web-console show json bug
vogievetsky merged pull request #10710: URL: https://github.com/apache/druid/pull/10710 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] vogievetsky commented on pull request #10710: fix web-console show json bug
vogievetsky commented on pull request #10710: URL: https://github.com/apache/druid/pull/10710#issuecomment-757038204 I think this could be tested by adding a snapshot test to each of the changes to make sure the correct dom is rendered even when there are bigints in the JSON. That would be a very big undertaking though that I think is out of scope of this immediate fix. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10635: Add SQL functions to format numbers into human readable format
gianm commented on pull request #10635: URL: https://github.com/apache/druid/pull/10635#issuecomment-757031980 My 2ยข on the naming thing: I like having 3 separate functions, because I think a 1-function model works best if you have a good spec for format strings, which isn't what this patch is about. An example of a good spec is: https://www.postgresql.org/docs/current/functions-formatting.html#FUNCTIONS-FORMATTING-NUMERIC-TABLE If you just have a few different mutually exclusive options, like we do here, I think it's more SQL-y to have different functions. In the future, we might introduce a postgresql-style number formatting spec, but we don't have to do it today. A couple of things though: - IMO the name `decimal_format` is misleading. It sounds like a function that will format a number to a specific amount of decimals. Maybe instead we could call it `human_readable_decimal_format`. For consistency it may then be nice to call the others `human_readable_binary_byte_format` and `human_readable_decimal_byte_format`. The function names are long, but I think that's probably better than being misleading. Any thoughts @FrankChen021, @abhishekagarwal87, @clintropolis? - Please include examples in the documentation of what the formatted numbers will look like. I'm just writing this as a comment instead of a review, since I didn't read the code. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (f66fdbf -> c62b7c1)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from f66fdbf add offsetFetchPeriod to kinesis ingestion doc (#10734) add c62b7c1 javascript filter result convert to java boolean (#10721) No new revisions were added by this update. Summary of changes: .../druid/query/filter/JavaScriptDimFilter.java| 13 ++- .../query/filter/JavaScriptDimFilterTest.java | 44 ++ 2 files changed, 55 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson closed issue #10719: Javascript filter return true when the expression result is false
jihoonson closed issue #10719: URL: https://github.com/apache/druid/issues/10719 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #10721: javascript filter result convert to java boolean
jihoonson merged pull request #10721: URL: https://github.com/apache/druid/pull/10721 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10635: Add SQL functions to format numbers into human readable format
gianm commented on a change in pull request #10635: URL: https://github.com/apache/druid/pull/10635#discussion_r554228178 ## File path: docs/querying/sql.md ## @@ -563,6 +563,9 @@ The [DataSketches extension](../development/extensions-core/datasketches-extensi |`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL nor empty string.| |`NVL(expr,expr-for-null)`|Returns 'expr-for-null' if 'expr' is null (or empty string for string type).| |`BLOOM_FILTER_TEST(, )`|Returns true if the value is contained in a Base64-serialized bloom filter. See the [Bloom filter extension](../development/extensions-core/bloom-filter.html) documentation for additional details.| +|`BINARY_BYTE_FORMAT(value, [precision])`|Returns the value in human-readable [IEC](https://en.wikipedia.org/wiki/Binary_prefix) format. Supported unit suffix: `B`, `KiB`, `MiB`, `GiB`, `TiB`, `PiB`, `EiB`. `precision` must be in the range of [0,3] (default: 2).| +|`DECIMAL_BYTE_FORMAT(value, [precision])`|Returns the value in human-readable [SI](https://en.wikipedia.org/wiki/Binary_prefix) format. Supported unit suffix: `B`, `KB`, `MB`, `GB`, `TB`, `PB`, `EB`. `precision` must be in the range of [0,3] (default: 2).| +|`DECIMAL_FORMAT(value, [precision])`|Returns the value in human-readable SI format. Supported unit suffix: `K`, `M`, `G`, `T`, `P`, `E`. `precision` must be in the range of [0,3] (default: 2).| Review comment: I think they make sense in the numeric section, because they accept numbers. It's OK that they don't return numbers. It's analogous to `TIME_FORMAT`, which still belongs in the time section. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] capistrant opened a new pull request #10739: Druid Server Guilds and Guild Aware Replication
capistrant opened a new pull request #10739: URL: https://github.com/apache/druid/pull/10739 Fixes #9816 ### Description ***Note***: I use the term "Guild" throughout this proposal. It is an arbitrary name that I chose for the Historical grouping construct that I am proposing. A Guild would be a group of Historical servers. All servers who do not specify a Guild in runtime.properties would be assigned to a default Guild. I am adding the idea of guilds in Druid. A guild is a logical grouping of servers. With this PR, the only use of guilds is replicant distribution across Historical Servers. The idea has been born out of the desire for HDFS like rack aware replication for bare metal deployment on-prem. For this use case, Druid Historical services are assigned a guild based on the physical rack that they live on. The coordinator uses SegmentReplicantLookup to build a lookup for segment:guild replicant count. The Coordinator has a preference for loading replicants across 2 or more guilds. It is important to note that, instead of having less replicants than specified by Druid Load Rules, the Coordinator will load replicants on the same guild if it must. This idea can go beyond just physical racks in a data center and apply to things such as availability zones or arbitrary historical server groupings in a virtualized deployment. That is why I came up with the name "guild" instead of just saying rack explicitly. ### Implementation **Configs** runtime.properties: * `druid.server.guild` The STRING guild name assigned to a server. Default value applied if not specified. * `druid.coordinator.guildReplication.on` The BOOLEAN flag telling the coordinator if it should use server guilds as a part of its coordination logic. Default value is false. coordinator dynamic config: * `guildReplicaitonMaxPercentOfSegmentsToMove` What % of segments moved during BalanceSegments duty should be dedicated to moving segments that are not meeting guild replication threshold of 2? This is applied against what is left over after moving segments off of decommissioning servers, if there are any. `SegmentReplicantLookup` The coordinator currently builds this object each time the `HistoricalManagementDuties` run. Prior to this proposal, the object would contain two main data structures: `Table segmentsInCluster` - replicant count for each segment by tier for segments served by the cluster. `Table loadingSegments` - replicant count for each segment by tier for segments being loaded by the cluster. My proposal adds a third data structure that is specific to guild replication: `Table historicalGuildDistribution` - replicant count for each segment by guild. This new structure is only created if guild replication is enabled. It is not worth the resources if we are not going to use it! The structure is used for quick lookup to get guild replication state for a given segment. It will be used by coordinator duties when making decisions on loading/moving/dropping segments. `LoadRule` When Assigning replicas, use the changes in SegmentReplicantLookup in order to split `ServerHolders` into groups of servers who are on a guild that is serving a replicant of the segment and servers who are on a guild that is not serving a replicant of the segment. If possible `LoadRule` will load replicants to the best scored server(s) from the guild(s) not serving a replicant. However, we always fallback to just loading the specified number of replicants even if replication across guilds cannot be achieved. When picking replicants to drop, we will also split the `ServerHolders` in the same way. The segments will be dropped from decommissioning servers first, then from servers on guilds with > 1 replicant of the segment, then lastly from the remaining servers serving a replicant. `BalancerStrategy` Add a method to the interface. `pickSegmentToMove(List serverHolders, Set broadcastDataSources, DruidCoordinatorRuntimeParams params);` This new method is added so we have the `SegmentReplicantLookup` information needed to pick a segment who is not replicated across > 1 guild when balancing the cluster. It is needed because we introduce the dynamic config and associated balancing phase in BalanceSegments that prioritizes the balancing of segments who are not properly replicated across guilds. We need to only pick segments that meet the requirements. The existing `pickSegmentToMove` does not suffice. RandomBalancerStrategy and CostBalancerStrategy add implementations. `ReservoirSegmentSampler` Adds a method for getting a `SegmentHolder` that is violator of the goal of being on > 1 guilds. This method needs to have access to the `SegmentReplicantLookup` in order to quickly look up replication state of segments it is possibly selecting. It returns the
[druid] branch master updated (6eef0e4 -> f66fdbf)
This is an automated email from the ASF dual-hosted git repository. jihoonson pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 6eef0e4 Fix collision between #10689 and #10593. (#10738) add f66fdbf add offsetFetchPeriod to kinesis ingestion doc (#10734) No new revisions were added by this update. Summary of changes: docs/development/extensions-core/kinesis-ingestion.md | 1 + 1 file changed, 1 insertion(+) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson merged pull request #10734: add offsetFetchPeriod to kinesis ingestion doc
jihoonson merged pull request #10734: URL: https://github.com/apache/druid/pull/10734 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10605: bitwise math function expressions
gianm commented on a change in pull request #10605: URL: https://github.com/apache/druid/pull/10605#discussion_r554224361 ## File path: core/src/test/java/org/apache/druid/math/expr/FunctionTest.java ## @@ -519,6 +519,31 @@ public void testLeast() assertExpr("least(1, null, 'A')", "1"); } + @Test + public void testBitwise() + { +assertExpr("bitwiseAnd(3, 1)", 1L); +assertExpr("bitwiseAnd(2, 1)", 0L); +assertExpr("bitwiseOr(3, 1)", 3L); +assertExpr("bitwiseOr(2, 1)", 3L); +assertExpr("bitwiseXor(3, 1)", 2L); +assertExpr("bitwiseXor(2, 1)", 3L); +assertExpr("bitwiseShiftLeft(2, 1)", 4L); +assertExpr("bitwiseShiftRight(2, 1)", 1L); +assertExpr("bitwiseAnd(bitwiseComplement(1), 7)", 6L); +assertExpr("bitwiseAnd('2', '1')", null); +assertExpr("bitwiseAnd(2, '1')", 0L); + +assertExpr("bitwiseOr(2.345, 1)", 4612462889363109315L); Review comment: I see, I missed that test. Sounds good. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10605: bitwise math function expressions
gianm commented on a change in pull request #10605: URL: https://github.com/apache/druid/pull/10605#discussion_r554224182 ## File path: docs/misc/math-expr.md ## @@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for each function. |acos|acos(x) would return the arc cosine of x| |asin|asin(x) would return the arc sine of x| |atan|atan(x) would return the arc tangent of x| +|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values will be converted to their bit representation| +|bitwiseComplement|bitwiseComplement(x) would return the result of ~x. Double values will be converted to their bit representation| +|bitwiseConvertDouble|bitwiseConvertDouble(x) would convert the IEEE 754 floating-point "double" bits stored in a long into a double value if the input is a long, or the copy bits of a double value into a long if the input is a double.| Review comment: It should implicitly cast, I think. Generally I think function behavior is easier to understand if the function implicitly casts its inputs to the type that it expects, vs. changing behavior based on its input type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions
clintropolis commented on a change in pull request #10605: URL: https://github.com/apache/druid/pull/10605#discussion_r554206473 ## File path: docs/misc/math-expr.md ## @@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for each function. |acos|acos(x) would return the arc cosine of x| |asin|asin(x) would return the arc sine of x| |atan|atan(x) would return the arc tangent of x| +|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values will be converted to their bit representation| +|bitwiseComplement|bitwiseComplement(x) would return the result of ~x. Double values will be converted to their bit representation| +|bitwiseConvertDouble|bitwiseConvertDouble(x) would convert the IEEE 754 floating-point "double" bits stored in a long into a double value if the input is a long, or the copy bits of a double value into a long if the input is a double.| Review comment: >I'd think that `bitwiseConvertDouble(bitwiseConvertDouble(x))` would be identical to `bitwiseConvertDouble(x)`. hmm, should the conversion just pass through if the type is already the output type or should it implicitly cast similar to the other bitwise functions? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10730: Introduce KafkaRecordEntity to support Kafka headers in InputFormats
gianm commented on pull request #10730: URL: https://github.com/apache/druid/pull/10730#issuecomment-757000160 > looks like tests are failing due to insufficient diff coverage for some lines. Many of them appear to be where we introduced additional type parameters. @gianm what's the guidance here? What I've done is either, - Add tests if it's something that really should be tested - Add a suppression if it's something that really shouldn't be tested - Ignore the coverage check if it's something that could be tested, but it isn't that important, and we want to skip it for now. (It's a diff-oriented check, so ignoring it won't affect future patches.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] xvrl commented on pull request #10730: Introduce KafkaRecordEntity to support Kafka headers in InputFormats
xvrl commented on pull request #10730: URL: https://github.com/apache/druid/pull/10730#issuecomment-756995841 looks like tests are failing due to insufficient diff coverage for some lines. Many of them appear to be where we introduced additional type parameters. @gianm what's the guidance here? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm merged pull request #10738: Fix collision between #10689 and #10593.
gianm merged pull request #10738: URL: https://github.com/apache/druid/pull/10738 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on pull request #10710: fix web-console show json bug
suneet-s commented on pull request #10710: URL: https://github.com/apache/druid/pull/10710#issuecomment-756933916 > > @bananaaggle Thanks for this fix - can you please add a test that catches parsing these large numbers in the web-console so that future refactorings don't accidentally break this functionality. Thanks! > > EDIT - tagged the wrong user initially > > I'm not very familiar with console, can you give me some information for creating test about this PR? Like which direct I can refer to for creating test or other hints. Could you help me create it? I'm not familiar with front-end tests either. @vogievetsky do you have any advice on how to write a test for this? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] capistrant commented on a change in pull request #10287: allow the LogUsedSegments duty to be skipped
capistrant commented on a change in pull request #10287: URL: https://github.com/apache/druid/pull/10287#discussion_r554119706 ## File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java ## @@ -669,21 +669,31 @@ private void stopBeingLeader() private List makeHistoricalManagementDuties() { -return ImmutableList.of( -new LogUsedSegments(), +List duties = new ArrayList<>(); +if (config.isLogUsedSegmentsDutyEnabled()) { + duties.add(new LogUsedSegments()); +} +duties.addAll(ImmutableList.of( new UpdateCoordinatorStateAndPrepareCluster(), new RunRules(DruidCoordinator.this), new UnloadUnusedSegments(), new MarkAsUnusedOvershadowedSegments(DruidCoordinator.this), new BalanceSegments(DruidCoordinator.this), new EmitClusterStatsAndMetrics(DruidCoordinator.this) +)); +log.debug( +"Done making historical management duties %s", +duties.stream().map(duty -> duty.getClass().getName()).collect(Collectors.toList()) ); +return duties; Review comment: reverting to the original way of doing this now that LogUsedSegments is always used again This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] capistrant commented on a change in pull request #10287: allow the LogUsedSegments duty to be skipped
capistrant commented on a change in pull request #10287: URL: https://github.com/apache/druid/pull/10287#discussion_r554117678 ## File path: server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java ## @@ -669,21 +669,31 @@ private void stopBeingLeader() private List makeHistoricalManagementDuties() { -return ImmutableList.of( -new LogUsedSegments(), +List duties = new ArrayList<>(); +if (config.isLogUsedSegmentsDutyEnabled()) { + duties.add(new LogUsedSegments()); +} +duties.addAll(ImmutableList.of( new UpdateCoordinatorStateAndPrepareCluster(), new RunRules(DruidCoordinator.this), new UnloadUnusedSegments(), new MarkAsUnusedOvershadowedSegments(DruidCoordinator.this), new BalanceSegments(DruidCoordinator.this), new EmitClusterStatsAndMetrics(DruidCoordinator.this) +)); +log.debug( +"Done making historical management duties %s", +duties.stream().map(duty -> duty.getClass().getName()).collect(Collectors.toList()) ); +return duties; Review comment: makeIndexingServiceDuties below used ImmutableList.copyOf(duties) so I will stick with that for consistency This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] capistrant commented on pull request #10287: allow the LogUsedSegments duty to be skipped
capistrant commented on pull request #10287: URL: https://github.com/apache/druid/pull/10287#issuecomment-756913585 > I like this log message from LogUsedSegments and have found it very useful: > > ```java > log.info("Found [%,d] used segments.", params.getUsedSegments().size()); > ``` > > It's nearly free, since `params.getUsedSegments()` is something that's just passed into the method and doesn't need to be computed. Instead of a config to disable the duty, how about about removing this block and leaving everything else the same: > > ```java > DataSourcesSnapshot dataSourcesSnapshot = params.getDataSourcesSnapshot(); > for (DataSegment segment : dataSourcesSnapshot.iterateAllUsedSegmentsInSnapshot()) { > if (segment.getSize() < 0) { > log.makeAlert("No size on a segment") >.addData("segment", segment) >.emit(); > } > } > ``` > > I'm pretty sure this is ancient debugging code and we can safely get rid of it. > > The other potentially expensive part is skipped if log level is INFO or higher. (Which is the default, so it shouldn't be a problem.) I think I like this idea the most considering everything we have learned since I created the PR. I am going to amend the PR to remove the configuration to skip the duty. Instead we will remove this legacy code and leave everything else as is. Maybe in the future this duty will change and become expensive resulting in having the ability to skip become beneficial. but lets not put the cart ahead of the horse and over complicate things today. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10247: Remove CloseQuietly and migrate its usages to other methods.
gianm commented on pull request #10247: URL: https://github.com/apache/druid/pull/10247#issuecomment-756907541 Fixed up some merge conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (26bcd47 -> 6eef0e4)
This is an automated email from the ASF dual-hosted git repository. gian pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 26bcd47 Thread-safety for ResponseContext.REGISTERED_KEYS (#9667) add 6eef0e4 Fix collision between #10689 and #10593. (#10738) No new revisions were added by this update. Summary of changes: .../java/org/apache/druid/segment/IndexMergerTestBase.java | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10738: Fix collision between #10689 and #10593.
gianm commented on pull request #10738: URL: https://github.com/apache/druid/pull/10738#issuecomment-756905866 Travis passed. I'll merge this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] liran-funaro commented on pull request #10593: IncrementalIndex Tests and Benchmarks Parametrization
liran-funaro commented on pull request #10593: URL: https://github.com/apache/druid/pull/10593#issuecomment-756795145 Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] FrankChen021 edited a comment on issue #10735: Misleading info about tsColumn in the lookups-cached-global docs
FrankChen021 edited a comment on issue #10735: URL: https://github.com/apache/druid/issues/10735#issuecomment-756784327 I guess why `getLookupPairs` currently reloads the whole table is that if there're records deletions in database, such a full reload could drop those deleted records from cache. Otherwise, if it loads records incrementally by tsColumn, there must be extra code to check if records in cache have been deleted from database. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] FrankChen021 commented on issue #10735: Misleading info about tsColumn in the lookups-cached-global docs
FrankChen021 commented on issue #10735: URL: https://github.com/apache/druid/issues/10735#issuecomment-756784327 I guess why `getLookupPairs` currently reload the whole table is that if there're records deletions in database, such a full reload could drop those deleted records from cache. Otherwise, if it loads records incrementally by tsColumn, there must be extra code to check if records in cache have been deleted from database. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics
ayushkul2910 commented on a change in pull request #10448: URL: https://github.com/apache/druid/pull/10448#discussion_r553893369 ## File path: server/src/main/java/org/apache/druid/server/metrics/MetricsModule.java ## @@ -108,10 +108,10 @@ public MonitorScheduler getMonitorScheduler( return new MonitorScheduler( config.get(), - CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorScheduler-%s").build(), + CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorSchedulerThread").build(), emitter, monitors, -Executors.newCachedThreadPool() +Execs.multiThreaded(64, "MonitorThread-%d") Review comment: This makes sense, we should reduce the number of monitor threads. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553847083 ## File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java ## @@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector) @Override public boolean canVectorize(InputBindingInspector inspector) { -return inspector.areNumeric(left, right) && inspector.canVectorize(left, right); Review comment: i _think_ canVectorize will still end up false because array types/functions are not yet vectorizable, but yeah i guess array types would not be well handled here even if they were, though maybe that is the case for non-vectorized expression as well. I'll do a bit of exploration and see what up This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553845596 ## File path: processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java ## @@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector decorate(MultiValueDimensionVectorSele throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName()); } + default VectorObjectSelector decorate(VectorObjectSelector selector) Review comment: This is used with the poorly named `VectorColumnSelectorFactory.makeStringObjectSelector`, which should be renamed `VectorColumnSelectorFactory.makeDimensionObjectSelector` or something, which is for doing dimensiony things with a vector object selector similar to what is done for `SingleValueDimensionVectorSelector` or `MultiValueDimensionVectorSelector`. There are currently no implementations of this, like is the case with the other decorate methods with vector selectors, i was just trying for symmetry here so that the object selector could behave similar to the other string selectors when used in this context This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553840570 ## File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnProcessorFactory.java ## @@ -64,7 +65,16 @@ public GroupByVectorColumnSelector makeMultiValueDimensionProcessor( ValueType.STRING == capabilities.getType(), "groupBy dimension processors must be STRING typed" ); -throw new UnsupportedOperationException("Multi-value dimensions not yet implemented for vectorized groupBys"); +throw new UnsupportedOperationException("Multi-value dimensions are not yet implemented for vectorized groupBys"); + } + + @Override + public GroupByVectorColumnSelector makeObjectProcessor( + ColumnCapabilities capabilities, + VectorObjectSelector selector + ) + { +throw new UnsupportedOperationException("Object columns are not yet implemented for vectorized groupBys"); Review comment: yes, I have a related branch that I will open as a follow-up to this PR that adds support for group-by queries on vectorized string expressions, using an object selector and a newly added dictionary building `GroupByVectorColumnSelector` similar to the non-vectorized group by code-path. In that branch this part checks that the column capabilities are string typed, and if so then will make the new dictionary building selector, else throw an exception for being an unsupported type. It seems beneficial to not have to make the string expression vector results have to masquerade as dictionary encoded, to be translated to string values to be re-encoded into a new dictionary, which is part of my reason for pushing for object selectors for the use case This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553833318 ## File path: processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java ## @@ -27,6 +27,13 @@ { Predicate makeStringPredicate(); + default Predicate makeObjectPredicate() + { +// default to try to use string predicate; Review comment: currently I don't think it should be possible because only expression filter will make a matcher factory with object matching predicate and only for string column types, but i guess cast exceptions if an object selector of a complex column somehow got in here from future code, see other comment about thinking a bit further about this ๏ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553833052 ## File path: processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java ## @@ -27,6 +27,13 @@ { Predicate makeStringPredicate(); + default Predicate makeObjectPredicate() + { +// default to try to use string predicate; +final Predicate stringPredicate = makeStringPredicate(); +return o -> stringPredicate.apply((String) o); Review comment: uh, it is sort of built on hope right now... the hope that it will probably be a string because the only context it will currently happen in is from making vector matcher object processors on string expression columns, but if we opened it up to additional non-primitive number types like array expressions and complex column types then this would need to be reconsidered, or stronger checks in place that the filter supports filtering on the underlying object type. I might need to think about this a bit deeper. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553820619 ## File path: processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java ## @@ -57,6 +57,15 @@ default int getMaxVectorSize() */ MultiValueDimensionVectorSelector makeMultiValueDimensionSelector(DimensionSpec dimensionSpec); + /** + * Returns an object selector for string columns, useful for non-dictionary encoded strings, or when Review comment: hmm, i have no idea why I named this function like this, it should probably be called `makeDimensionObjectSelector`. Its just expected to make a `VectorObjectSelector` from a `DimensionSpec` rather than a raw column name This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553818153 ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called + // when testing equality of DimensionPredicateFilter, so it's the truly required method, but even that seems + // strange at best. + // Rather than tackle removing the annotation and reworking equality tests for now, will leave this to refactor + // as part of https://github.com/apache/druid/issues/8256 which suggests combining Filter and DimFilter + // implementations, which should clean up some of this mess. + @Override + public int hashCode() + { +return super.hashCode(); + } + + @Override + public boolean equals(Object obj) + { +return super.equals(obj); + } +}; + + +final ExprType outputType = theExpr.getOutputType(factory); + +if (outputType == null) { + // if an expression is vectorizable, but the output type is null, the result will be null (or the default + // value in default mode) because expression is either all null constants or missing columns + + // in sql compatible mode, this means no matches ever, so just use the false matcher: + if (NullHandling.sqlCompatible()) { +return new FalseVectorMatcher(factory.getVectorSizeInspector()); + } + // in default mode, just fallback to using a long matcher since nearly all boolean-ish expressions + // output a long value so it is probably a safe bet? idk, ending up here by using all null-ish things + // in default mode is dancing on the edge of insanity anyway... + return VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG), + ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) + ).makeMatcher(predicateFactory); +} + +switch (outputType) { + case LONG: +return VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG), +ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) +).makeMatcher(predicateFactory); + case DOUBLE: +return VectorValueMatcherColumnProcessorFactory.instance().makeDoubleProcessor( + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE), +ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr) +).makeMatcher(predicateFactory); + case STRING: +return VectorValueMatcherColumnProcessorFactory.instance().makeObjectProcessor( +// using 'numeric' capabilities creator so we are configured to NOT be dictionary encoded, etc + ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.STRING), Review comment: it is strange, but on purpose for the reason the comment says since the numeric capabilities defaults are correct in this case. I will make sure all current users are numeric types, and if so then just make these capabilites from scratch because we should be counting on something geared towards numeric behavior in case those change in some unexpected way. Or, if it is used by other non-numbers then rename it to something more appropriate, like `createSimpleSingleValueColumnCapabilites` or similar This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to
[GitHub] [druid] gianm commented on pull request #10267: DruidInputSource: Fix issues in column projection, timestamp handling.
gianm commented on pull request #10267: URL: https://github.com/apache/druid/pull/10267#issuecomment-756633682 Fixed up some merge conflicts. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553816206 ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called Review comment: yeah, I was planning on addressing this in a separate PR, the comments are partially to help remind us to do it in case I didn't get to it immediately This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines
clintropolis commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553814648 ## File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java ## @@ -305,17 +305,29 @@ private static ColumnCapabilities getEffectiveCapabilities( originalCapabilities == null || ValueType.COMPLEX.equals(originalCapabilities.getType()); if (type == ValueType.STRING) { - if (!forceSingleValue && effectiveCapabilites.hasMultipleValues().isMaybeTrue()) { -return strategyFactory.makeMultiValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeMultiValueDimensionSelector(dimensionSpec) -); - } else { -return strategyFactory.makeSingleValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeSingleValueDimensionSelector(dimensionSpec) -); + if (!forceSingleValue) { +// if column is not dictionary encoded (and not non-existent or complex), use object selector +if (effectiveCapabilites.isDictionaryEncoded().isFalse() || +effectiveCapabilites.areDictionaryValuesUnique().isFalse() Review comment: by regular string selector do you mean dimension selector? The answer is yes we _could_, but if the dictionary values are not unique, i came to the conclusion that they aren't especially useful since in all the cases I can think of you can't use them as is and you're going to have to lookup the string values anyway (e.g. filter matching and vector group by). I was trying to provide an avenue to avoid the extra steps with expression virtual columns, and allow an object selector to be created directly since the string is the useable part in this case, and the dictionary ids just feel like an obstruction. But maybe this is overly broad and there are cases where we would want it to act like a dictionary encoded column for some reason that I have missed? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10287: allow the LogUsedSegments duty to be skipped
gianm commented on pull request #10287: URL: https://github.com/apache/druid/pull/10287#issuecomment-756630739 > circling back on this. my cluster with 161k segments takes ~20ms to execute `LogUsedSegments`. This duty should scale fairly linearly with number of segments so you'd have to 50x that number of segments to hit 1 second if that is true. I really look forward to when I can deploy #10603 in my larger production environment to confirm. I should be able to do this in mid-Jan. You could also use `sjk` to grab a flame graph of a couple of coordinator run cycles (maybe let the tool run for 5 minutes). It would be illuminating as to exactly where the time is going. Some instructions here: https://support.imply.io/hc/en-us/articles/360033747953-Profiling-Druid-queries-using-flame-graphs (That's written with profiling queries in mind, but the technique would work just as well for profiling anything.) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions
clintropolis commented on a change in pull request #10605: URL: https://github.com/apache/druid/pull/10605#discussion_r553811715 ## File path: core/src/test/java/org/apache/druid/math/expr/FunctionTest.java ## @@ -519,6 +519,31 @@ public void testLeast() assertExpr("least(1, null, 'A')", "1"); } + @Test + public void testBitwise() + { +assertExpr("bitwiseAnd(3, 1)", 1L); +assertExpr("bitwiseAnd(2, 1)", 0L); +assertExpr("bitwiseOr(3, 1)", 3L); +assertExpr("bitwiseOr(2, 1)", 3L); +assertExpr("bitwiseXor(3, 1)", 2L); +assertExpr("bitwiseXor(2, 1)", 3L); +assertExpr("bitwiseShiftLeft(2, 1)", 4L); +assertExpr("bitwiseShiftRight(2, 1)", 1L); +assertExpr("bitwiseAnd(bitwiseComplement(1), 7)", 6L); +assertExpr("bitwiseAnd('2', '1')", null); +assertExpr("bitwiseAnd(2, '1')", 0L); + +assertExpr("bitwiseOr(2.345, 1)", 4612462889363109315L); Review comment: more exhaustive coverage which should include these combinations is done in `VectorExprSanityTest`, where non-vectorized and vectorized evaluation results are asserted to be equal with a variety of combinations of inputs, but I can add explicit tests here since those don't necessarily confirm correctness, just self consistency between the two evaluation modes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions
clintropolis commented on a change in pull request #10605: URL: https://github.com/apache/druid/pull/10605#discussion_r553810857 ## File path: docs/misc/math-expr.md ## @@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for each function. |acos|acos(x) would return the arc cosine of x| |asin|asin(x) would return the arc sine of x| |atan|atan(x) would return the arc tangent of x| +|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values will be converted to their bit representation| +|bitwiseComplement|bitwiseComplement(x) would return the result of ~x. Double values will be converted to their bit representation| +|bitwiseConvertDouble|bitwiseConvertDouble(x) would convert the IEEE 754 floating-point "double" bits stored in a long into a double value if the input is a long, or the copy bits of a double value into a long if the input is a double.| Review comment: will split into `bitwiseConvertDoubleToLongBits` and `bitwiseConvertLongBitsToDouble` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions
clintropolis commented on a change in pull request #10605: URL: https://github.com/apache/druid/pull/10605#discussion_r553810678 ## File path: docs/misc/math-expr.md ## @@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for each function. |acos|acos(x) would return the arc cosine of x| |asin|asin(x) would return the arc sine of x| |atan|atan(x) would return the arc tangent of x| +|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values will be converted to their bit representation| Review comment: thinking back, I think i originally did this behavior before I added `bitwiseConvertDouble`, so it was done as a way to do bitwise operations on double values. After I added the explicit function, it isn't really necessary anymore, so will revert to the behavior of casting and assuming long inputs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions
clintropolis commented on a change in pull request #10605: URL: https://github.com/apache/druid/pull/10605#discussion_r553810003 ## File path: core/src/main/java/org/apache/druid/math/expr/Evals.java ## @@ -71,4 +71,14 @@ public static boolean asBoolean(@Nullable String x) { return !NullHandling.isNullOrEquivalent(x) && Boolean.parseBoolean(x); } + + public static long doubleToLongBits(double x) + { +return Double.doubleToLongBits(x); Review comment: Trying to remember.. I think i was just trying to be consistent with the other value conversion functions and put it here, but it probably could just call it directly as it seems unlike we would change the function This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (08ab82f -> 26bcd47)
This is an automated email from the ASF dual-hosted git repository. gian pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from 08ab82f IncrementalIndex Tests and Benchmarks Parametrization (#10593) add 26bcd47 Thread-safety for ResponseContext.REGISTERED_KEYS (#9667) No new revisions were added by this update. Summary of changes: .../druid/query/context/ResponseContext.java | 34 +++--- 1 file changed, 17 insertions(+), 17 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm merged pull request #9667: Thread-safety for ResponseContext.REGISTERED_KEYS
gianm merged pull request #9667: URL: https://github.com/apache/druid/pull/9667 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm closed issue #9106: Unbalanced synchronization of ResponseContext.REGISTERED_KEYS
gianm closed issue #9106: URL: https://github.com/apache/druid/issues/9106 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #9667: Thread-safety for ResponseContext.REGISTERED_KEYS
gianm commented on pull request #9667: URL: https://github.com/apache/druid/pull/9667#issuecomment-756625948 Thanks @alex-plekhanov! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] stale[bot] commented on pull request #10043: Add support for REGEXP_REPLACE
stale[bot] commented on pull request #10043: URL: https://github.com/apache/druid/pull/10043#issuecomment-756624168 This issue is no longer marked as stale. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on pull request #10043: Add support for REGEXP_REPLACE
gianm commented on pull request #10043: URL: https://github.com/apache/druid/pull/10043#issuecomment-756624149 > one more time please stalebot ๏ two more times? ๏ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #10613: expression filter support for vectorized query engines
gianm commented on a change in pull request #10613: URL: https://github.com/apache/druid/pull/10613#discussion_r553782719 ## File path: core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java ## @@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector) @Override public boolean canVectorize(InputBindingInspector inspector) { -return inspector.areNumeric(left, right) && inspector.canVectorize(left, right); Review comment: This opens not just STRING but also LONG_ARRAY and DOUBLE_ARRAY. Is that intentional (will numeric arrays work)? ## File path: core/src/main/java/org/apache/druid/math/expr/ExprTypeConversion.java ## @@ -20,24 +20,29 @@ package org.apache.druid.math.expr; import org.apache.druid.java.util.common.IAE; +import org.apache.druid.java.util.common.Pair; import javax.annotation.Nullable; import java.util.List; public class ExprTypeConversion { + public static Pair coerceNull(Expr.InputBindingInspector inspector, Expr left, Expr right) Review comment: This method deserves a javadoc. ## File path: processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java ## @@ -305,17 +305,29 @@ private static ColumnCapabilities getEffectiveCapabilities( originalCapabilities == null || ValueType.COMPLEX.equals(originalCapabilities.getType()); if (type == ValueType.STRING) { - if (!forceSingleValue && effectiveCapabilites.hasMultipleValues().isMaybeTrue()) { -return strategyFactory.makeMultiValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeMultiValueDimensionSelector(dimensionSpec) -); - } else { -return strategyFactory.makeSingleValueDimensionProcessor( -effectiveCapabilites, -selectorFactory.makeSingleValueDimensionSelector(dimensionSpec) -); + if (!forceSingleValue) { +// if column is not dictionary encoded (and not non-existent or complex), use object selector +if (effectiveCapabilites.isDictionaryEncoded().isFalse() || +effectiveCapabilites.areDictionaryValuesUnique().isFalse() Review comment: Why does it matter if the dictionary values are unique? (Even if they're not, can't we still use a regular string selector?) ## File path: processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnProcessorFactory.java ## @@ -64,7 +65,16 @@ public GroupByVectorColumnSelector makeMultiValueDimensionProcessor( ValueType.STRING == capabilities.getType(), "groupBy dimension processors must be STRING typed" ); -throw new UnsupportedOperationException("Multi-value dimensions not yet implemented for vectorized groupBys"); +throw new UnsupportedOperationException("Multi-value dimensions are not yet implemented for vectorized groupBys"); + } + + @Override + public GroupByVectorColumnSelector makeObjectProcessor( + ColumnCapabilities capabilities, + VectorObjectSelector selector + ) + { +throw new UnsupportedOperationException("Object columns are not yet implemented for vectorized groupBys"); Review comment: "yet"? Will they ever be? ## File path: processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java ## @@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final FilterTuning filterTuni this.filterTuning = filterTuning; } + @Override + public boolean canVectorizeMatcher(ColumnInspector inspector) + { +return expr.get().canVectorize(inspector); + } + + @Override + public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory factory) + { +final Expr theExpr = expr.get(); + +DruidPredicateFactory predicateFactory = new DruidPredicateFactory() +{ + @Override + public Predicate makeStringPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidLongPredicate makeLongPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidFloatPredicate makeFloatPredicate() + { +return Evals::asBoolean; + } + + @Override + public DruidDoublePredicate makeDoublePredicate() + { +return Evals::asBoolean; + } + + // The hashcode and equals are to make SubclassesMustOverrideEqualsAndHashCodeTest stop complaining.. + // DruidPredicateFactory doesn't really need equals or hashcode, in fact only the 'toString' method is called + // when testing equality of DimensionPredicateFilter, so it's the truly required method, but even that seems + // strange at best. + // Rather than tackle removing the annotation and reworking equality tests for now, will leave this to refactor + // as part of https://github.com/apache/druid/issues/8256 which suggests