[GitHub] [druid] xvrl opened a new issue #9912: coordinator restart during index task shutdown may cause succesful task to get failed
xvrl opened a new issue #9912: URL: https://github.com/apache/druid/issues/9912 ### Affected Version 0.17.0 ### Description We got a little unlucky and hit some edge case in the task handling during coordinator restart. If a task taskes a little too long to shutdown, there is a window during which the task status cannot get queried after it succeeds. A coordinator/overlord in the process of starting might discover the task in zookeeper and try to query its status. The status call will fail, resulting in the coordinator deciding to kill the task that is about to succeed. While the task is still in the process of shutting down, the worker will get the signal from the supervisor to kill the task, which closes the input stream on the process, triggering an unclean shutdown with a non-zero exit code of the task. This in turn causes the worker to believe the task failed, despite the task logs showing success. In our case the chance of hitting the problem was exacerbated by `druid.server.http.unannouncePropagationDelay` being set to 5 seconds, which increased the time between the chathandler no longer being reachable and the process exiting timeline from logs below ``` coordinator May 20th 2020, 17:54:19.431 Curator-PathChildrenCache-1 INFO Worker[druid-middle-manager-8.druid-middle-manager.druid.svc.cluster.local:8091] wrote RUNNING status for task [index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd] on [TaskLocation{host='druid-middle-manager-8.druid-middle-manager.druid.svc.cluster.local', port=8100, tlsPort=-1}] coordinator May 20th 2020, 17:54:19.592 LeaderSelector[/druid/prod/overlord/_OVERLORD] INFOAdding task[index_kafka_telemetry_prod_caf1fb5dc7be8a4_bknnbodd] to activeTasks task 2020-05-20T17:54:21.520ZTask completed with status: {\n \"id\" : \"index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd\",\n \"status\" : \"SUCCESS\",\n \"duration\" : 3840683 [...] task 2020-05-20T17:54:21.527ZStopping lifecycle [module] stage [SERVER] task 2020-05-20T17:54:21.527ZSleeping 5000 ms for unannouncement to propagate. coordinator May 20th 2020, 17:54:25.328 IndexTaskClient-mydata-0 WARNException while sending request org.apache.druid.java.util.common.IAE: Received 400 Bad Request with body: {"error":"Can't find chatHandler for handler[index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd]"} at org.apache.druid.indexing.common.IndexTaskClient.submitRequest(IndexTaskClient.java:356) at org.apache.druid.indexing.common.IndexTaskClient.submitRequestWithEmptyContent(IndexTaskClient.java:220) at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient.getStatus(SeekableStreamIndexTaskClient.java:172) at org.apache.druid.indexing.seekablestream.SeekableStreamIndexTaskClient.lambda$getStatusAsync$9(SeekableStreamIndexTaskClient.java:373) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) coordinator May 20th 2020, 17:54:25.767 KafkaSupervisor-mydata INFO Shutdown [index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd] because: [Task [index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd] failed to return status, killing task] middlemanagerMay 20th 2020, 17:54:25.769 qtp1008934993-97 INFOShutdown [index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd] because: [shut down request via HTTP endpoint] middlemanagerMay 20th 2020, 17:54:25.769 qtp1008934993-97 INFOClosing output stream to task[index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd]. task 2020-05-20T17:54:25.769ZTriggering JVM shutdown. task 2020-05-20T17:54:25.770ZRunning shutdown hook task 2020-05-20T17:54:25.770ZLifecycle [module] already stopped and stop was called. Silently skipping coordinator May 20th 2020, 17:54:25.778 KafkaSupervisor-mydata INFO Removing task[index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd] from activeTasks coordinator May 20th 2020, 17:54:25.778 KafkaSupervisor-mydata INFO Sent shutdown message to worker: druid-middle-manager-8.druid-middle-manager.druid.svc.cluster.local:8091, status 200 OK, response: {"task":"index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd"} coordinator May 20th 2020, 17:54:25.778 KafkaSupervisor-mydata INFO Removing task[index_kafka_mydata_caf1fb5dc7be8a4_bknnbodd] from TaskLock[TimeChunkLock{type=EXCLUSIVE, groupId='index_kafka_mydata', dataSource='mydata', interval=2020-05-20T16:45:00.000Z/2020-05-20T17:00:00.000Z, version='2020-05-20T16:45:00.403Z', priority=75, revoked=false}] coordinator May 20th 2020, 17:54:25.784
[GitHub] [druid] 2bethere commented on a change in pull request #9879: Querying doc refresh tutorial
2bethere commented on a change in pull request #9879: URL: https://github.com/apache/druid/pull/9879#discussion_r429039250 ## File path: docs/querying/querying.md ## @@ -44,7 +49,7 @@ The Content-Type/Accept Headers can also take 'application/x-jackson-smile'. curl -X POST ':/druid/v2/?pretty' -H 'Content-Type:application/json' -H 'Accept:application/x-jackson-smile' -d @ ``` Review comment: Maybe add in a sentence here: "If you are running quick start, you should replace : with localhost: 8082" Sometimes I find it frustrating when reading tutorials but can't find the default parameter. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] cloventt edited a comment on issue #9911: Sorting on the value of an injective lookup returns incorrect order
cloventt edited a comment on issue #9911: URL: https://github.com/apache/druid/issues/9911#issuecomment-632469480 This is roughly the query we are running: ```json { "queryType": "groupBy", "dataSource": { "type": "table", "name": "test-datasource" }, "intervals": { "type": "LegacySegmentSpec", "intervals": [ "2020-04-25T00:00:00.000Z/2020-04-26T00:00:00.000Z" ] }, "descending": false, "virtualColumns": [], "granularity": "DAY", "dimensions": [ { "type": "lookup", "dimension": "actualDimension", "outputName": "injectiveLookupDimension", "name": "test_lookup" } ], "aggregations": [ { "type": "longSum", "name": "numericalValue", "fieldName": "numericalValue" } ], "postAggregations": [], "limitSpec": { "type": "default", "limit": 5, "columns": [ { "dimension": "injectiveLookupDimension", "direction": "descending" } ] }, "limit": 2147483647 } ``` We expect the results of this to be ordered by the value of `injectiveLookupDimension`, but instead they are ordered by `actualDimension`. The registered lookup is configured to be `injective`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] cloventt commented on issue #9911: Sorting on the value of an injective lookup returns incorrect order
cloventt commented on issue #9911: URL: https://github.com/apache/druid/issues/9911#issuecomment-632469480 This is roughly the query we are running: ```json { "queryType": "groupBy", "dataSource": { "type": "table", "name": "test-datasource" }, "intervals": { "type": "LegacySegmentSpec", "intervals": [ "2020-04-25T00:00:00.000Z/2020-04-26T00:00:00.000Z" ] }, "descending": false, "virtualColumns": [], "granularity": "DAY", "dimensions": [ { "type": "lookup", "dimension": "actualDimension", "outputName": "injectiveLookupDimension", "name": "test_lookup" } ], "aggregations": [ { "type": "longSum", "name": "numericalValue", "fieldName": "numericalValue" } ], "postAggregations": [], "limitSpec": { "type": "default", "limit": 5, "columns": [ { "dimension": "injectiveLookupDimension", "direction": "descending" } ] }, "limit": 2147483647 } ``` We expect the results of this to be ordered by the value of `injectiveLookupDimension`, but instead they are ordered by `actualDimension`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] cloventt opened a new issue #9911: Sorting on the value of an injective lookup returns incorrect order
cloventt opened a new issue #9911: URL: https://github.com/apache/druid/issues/9911 ### Affected Version 0.17.0 ### Description We have been using injective lookups to perform a one-to-one mapping between an ID and some model data. However, when we construct a query to order results by the mapped value, the results are returned ordered by the original dimension value. For example, if injective lookups perform this mapping: ``` // key -> value 1 -> C 2 -> B 3 -> A ``` Then when making a query ordering by `value`, we would expect to see: ``` A B C ``` But what we actually see is: ``` C B A ``` The results are ordered by the value of key, not value. This issue does not seem to occur for non-injective lookups. However when we change to use a non-injective lookup, we see a performance hit and a considerable increase in heap usage and GC time on the historical while the query is running. Given how injective lookups are processed, this might be expected behaviour. However as far as I can tell this behaviour is not mentioned in the documentation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm opened a new pull request #9910: Update doc on tmp dir (java.io.tmpdir) best practice
maytasm opened a new pull request #9910: URL: https://github.com/apache/druid/pull/9910 Update doc on tmp dir (java.io.tmpdir) best practice ### Description We have had issue where our /mnt/tmp is an NFS mount. This can causes the intermediate persist of our streaming ingestion task to be extremely slow. Update the docs to cautious other users on setting tmp dir (aka java.io.tmpdir) to NFS mount and the importance of having good read and write speed to this dir. This PR has: - [x] been self-reviewed. - [x] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] yuanlihan commented on issue #9904: Fail to compact overlapping segments
yuanlihan commented on issue #9904: URL: https://github.com/apache/druid/issues/9904#issuecomment-632448083 Hi @jihoonson > This log means, you got two segments of overlapping intervals and same version. This is never allowed since Druid doesn't know what segment to read in this case. Druid should never create such segments of overlapping intervals but same version. Can you check why and how they were created? The two partially overlapping segments with same version are created by the compaction task as described in step 3 of bug duplication steps above. According to the log of compaction task, I found that the two input partially overlapping segments(with different version) are arranged to compact in two different `indexSpec`, and creating a `DAY` granularity segment and an `HOUR` granularity segment correspondingly but with same version. And I pasted the log of compaction task [here](https://gist.github.com/yuanlihan/a3f54ca6eb6ba7508fcfa653065c79eb) FYI. And you can also easily reproduce this bug locally according to the steps listed above. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] averma111 opened a new issue #9909: Need to fix twist lock vulnerabilities for Google Cloud Kubernetes Engine
averma111 opened a new issue #9909: URL: https://github.com/apache/druid/issues/9909 Druid Version : 0.18.1 version. There 5 high priority vulnerabilities which have poped up. They are give below: Jackson Guava netty log4j logging-log4j. I am trying to build druid after replacing the version in the pom.xml and trying to build the druid from source code. I need urgent help how to build druid from source code . I mean exact commands and steps. Any help is appretiated. Thanks, Ashish This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] vogievetsky commented on pull request #9887: Re-order and document format detection in web console
vogievetsky commented on pull request #9887: URL: https://github.com/apache/druid/pull/9887#issuecomment-632396654 Oh I guess you saw a binary file that happened to have more than 3 tabs or commas... This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] vogievetsky merged pull request #9887: Re-order and document format detection in web console
vogievetsky merged pull request #9887: URL: https://github.com/apache/druid/pull/9887 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated: Re-order and document format detection in web console (#9887)
This is an automated email from the ASF dual-hosted git repository. vogievetsky pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/druid.git The following commit(s) were added to refs/heads/master by this push: new 132a1c9 Re-order and document format detection in web console (#9887) 132a1c9 is described below commit 132a1c9fe770209792973f52b8aee6f36f06aa3d Author: Joseph Glanville AuthorDate: Fri May 22 06:29:39 2020 +0700 Re-order and document format detection in web console (#9887) Motivation for this change is to not inadvertently identify binary formats that contain uncompressed string data as TSV or CSV. Moving detection of magic byte headers before heuristics should be more robust in general. --- web-console/src/utils/ingestion-spec.tsx | 32 +++- 1 file changed, 19 insertions(+), 13 deletions(-) diff --git a/web-console/src/utils/ingestion-spec.tsx b/web-console/src/utils/ingestion-spec.tsx index 0bc3ff3..9a2d700 100644 --- a/web-console/src/utils/ingestion-spec.tsx +++ b/web-console/src/utils/ingestion-spec.tsx @@ -2676,29 +2676,35 @@ function guessInputFormat(sampleData: string[]): InputFormat { if (sampleDatum) { sampleDatum = String(sampleDatum); // Really ensure it is a string -if (sampleDatum.startsWith('{') && sampleDatum.endsWith('}')) { - return inputFormatFromType('json'); -} - -if (sampleDatum.split('\t').length > 3) { - return inputFormatFromType('tsv', !/\t\d+\t/.test(sampleDatum)); -} - -if (sampleDatum.split(',').length > 3) { - return inputFormatFromType('csv', !/,\d+,/.test(sampleDatum)); -} +// First check for magic byte sequences as they rarely yield false positives +// Parquet 4 byte magic header: https://github.com/apache/parquet-format#file-format if (sampleDatum.startsWith('PAR1')) { return inputFormatFromType('parquet'); } - +// ORC 3 byte magic header: https://orc.apache.org/specification/ORCv1/ if (sampleDatum.startsWith('ORC')) { return inputFormatFromType('orc'); } - +// Avro OCF 4 byte magic header: https://avro.apache.org/docs/current/spec.html#Object+Container+Files if (sampleDatum.startsWith('Obj1')) { return inputFormatFromType('avro_ocf'); } + +// After checking for magic byte sequences perform heuristics to deduce string formats + +// If the string starts and ends with curly braces assume JSON +if (sampleDatum.startsWith('{') && sampleDatum.endsWith('}')) { + return inputFormatFromType('json'); +} +// Contains more than 3 tabs assume TSV +if (sampleDatum.split('\t').length > 3) { + return inputFormatFromType('tsv', !/\t\d+\t/.test(sampleDatum)); +} +// Contains more than 3 commas assume CSV +if (sampleDatum.split(',').length > 3) { + return inputFormatFromType('csv', !/,\d+,/.test(sampleDatum)); +} } return inputFormatFromType('regex'); - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] clintropolis merged pull request #9897: Fix web console query view crashing on simple query
clintropolis merged pull request #9897: URL: https://github.com/apache/druid/pull/9897 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (b91d500 -> 63baa29)
This is an automated email from the ASF dual-hosted git repository. cwylie pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from b91d500 add some details to the build doc (#9885) add 63baa29 Fix web console query view crashing on simple query (#9897) No new revisions were added by this update. Summary of changes: licenses.yaml | 2 +- web-console/package-lock.json | 6 +- web-console/package.json | 2 +- .../__snapshots__/query-view.spec.tsx.snap | 82 ++ .../number-menu-items/number-menu-items.spec.tsx | 8 +-- .../string-menu-items/string-menu-items.spec.tsx | 8 +-- .../time-menu-items/time-menu-items.spec.tsx | 8 +-- .../query-view/column-tree/column-tree.spec.tsx| 6 +- .../query-view/query-output/query-output.spec.tsx | 8 +-- .../src/views/query-view/query-view.spec.tsx | 5 ++ web-console/src/views/query-view/query-view.tsx| 29 11 files changed, 120 insertions(+), 44 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] vogievetsky commented on pull request #9887: Re-order and document format detection in web console
vogievetsky commented on pull request #9887: URL: https://github.com/apache/druid/pull/9887#issuecomment-632308074 Love it. Makes total sense. Out of curiosity was this change motivated by something you saw IRL that tripped up the current system? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[druid] branch master updated (ca5e66d -> b91d500)
This is an automated email from the ASF dual-hosted git repository. ccaominh pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/druid.git. from ca5e66d Fixed the Copyright year of Druid (#9859) add b91d500 add some details to the build doc (#9885) No new revisions were added by this update. Summary of changes: dev/intellij-setup.md | 16 +++- docs/development/build.md | 3 +++ website/.spelling | 4 +++- 3 files changed, 21 insertions(+), 2 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] ccaominh merged pull request #9885: add some details to the build doc
ccaominh merged pull request #9885: URL: https://github.com/apache/druid/pull/9885 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
suneet-s commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428846660 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -7300,6 +7301,74 @@ public void testRegexpExtract() throws Exception ); } + @Test + public void testRegexpExtractFilterViaNotNullCheck() throws Exception + { +// Cannot vectorize due to extractionFn in dimension spec. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_EXTRACT(dim1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' || dim1, '^Z2') IS NOT NULL", +ImmutableList.of( +Druids.newTimeseriesQueryBuilder() + .dataSource(CalciteTests.DATASOURCE1) + .intervals(querySegmentSpec(Filtration.eternity())) + .granularity(Granularities.ALL) + .virtualColumns( + expressionVirtualColumn("v0", "regexp_extract(concat('Z',\"dim1\"),'^Z2')", ValueType.STRING) + ) + .filters( + or( + not(selector("dim1", null, new RegexDimExtractionFn("^1", 0, true, null))), + not(selector("v0", null, null)) + ) + ) + .aggregators(new CountAggregatorFactory("a0")) + .context(TIMESERIES_CONTEXT_DEFAULT) + .build() +), +ImmutableList.of( +new Object[]{3L} +) +); + } + + @Test + public void testRegexpLikeFilter() throws Exception + { +// Cannot vectorize due to usage of regex filter. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_LIKE(dim1, '^1') OR REGEXP_LIKE('Z' || dim1, '^Z2')", Review comment: Sounds good to me! > I could see having a systematic way to test for this (a query generator, maybe) but I don't think adding one just for this function would make sense. I've been working on the beginnings of a query generator to match all the different rows in `druid.foo` table - I was kinda sneakily hoping there was already a systematic way we add tests for all these different conditions and I just hadn't seen it yet :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9731: ColumnCapabilities.hasMultipleValues refactor
gianm commented on a change in pull request #9731: URL: https://github.com/apache/druid/pull/9731#discussion_r428840570 ## File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndexStorageAdapter.java ## @@ -142,24 +141,8 @@ public Comparable getMaxValue(String column) @Override public ColumnCapabilities getColumnCapabilities(String column) { -// Different from index.getCapabilities because, in a way, IncrementalIndex's string-typed dimensions -// are always potentially multi-valued at query time. (Missing / null values for a row can potentially be -// represented by an empty array; see StringDimensionIndexer.IndexerDimensionSelector's getRow method.) -// -// We don't want to represent this as having-multiple-values in index.getCapabilities, because that's used -// at index-persisting time to determine if we need a multi-value column or not. However, that means we -// need to tweak the capabilities here in the StorageAdapter (a query-time construct), so at query time -// they appear multi-valued. - -final ColumnCapabilities capabilitiesFromIndex = index.getCapabilities(column); -final IncrementalIndex.DimensionDesc dimensionDesc = index.getDimension(column); -if (dimensionDesc != null && dimensionDesc.getCapabilities().getType() == ValueType.STRING) { - final ColumnCapabilitiesImpl retVal = ColumnCapabilitiesImpl.copyOf(capabilitiesFromIndex); - retVal.setHasMultipleValues(true); - return retVal; -} else { - return capabilitiesFromIndex; -} +// snapshot the current state +return ColumnCapabilitiesImpl.complete(index.getCapabilities(column), false); Review comment: Yes, that's what I was thinking: HMV could be set at any time, including between cap-fetch and selector-creation. By the way, it looks like StringDimensionIndexer's handling of `hasMultipleValues` is not thread-safe (it's updated by the indexing thread and read by querying threads without locking). ## File path: processing/src/main/java/org/apache/druid/segment/incremental/IncrementalIndexStorageAdapter.java ## @@ -142,24 +141,8 @@ public Comparable getMaxValue(String column) @Override public ColumnCapabilities getColumnCapabilities(String column) { -// Different from index.getCapabilities because, in a way, IncrementalIndex's string-typed dimensions -// are always potentially multi-valued at query time. (Missing / null values for a row can potentially be -// represented by an empty array; see StringDimensionIndexer.IndexerDimensionSelector's getRow method.) -// -// We don't want to represent this as having-multiple-values in index.getCapabilities, because that's used -// at index-persisting time to determine if we need a multi-value column or not. However, that means we -// need to tweak the capabilities here in the StorageAdapter (a query-time construct), so at query time -// they appear multi-valued. - -final ColumnCapabilities capabilitiesFromIndex = index.getCapabilities(column); -final IncrementalIndex.DimensionDesc dimensionDesc = index.getDimension(column); -if (dimensionDesc != null && dimensionDesc.getCapabilities().getType() == ValueType.STRING) { - final ColumnCapabilitiesImpl retVal = ColumnCapabilitiesImpl.copyOf(capabilitiesFromIndex); - retVal.setHasMultipleValues(true); - return retVal; -} else { - return capabilitiesFromIndex; -} +// snapshot the current state +return ColumnCapabilitiesImpl.complete(index.getCapabilities(column), false); Review comment: Yes, that's what I was thinking: HMV could be set at any time, including between cap-fetch and selector-creation. By the way, it looks like StringDimensionIndexer's handling of `hasMultipleValues` is not thread-safe (it's updated by the indexing thread and read by querying threads without locking). It'd be good to fix that too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428831347 ## File path: processing/src/main/java/org/apache/druid/query/expression/RegexpLikeExprMacro.java ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.expression; + +import org.apache.druid.common.config.NullHandling; +import org.apache.druid.java.util.common.IAE; +import org.apache.druid.java.util.common.StringUtils; +import org.apache.druid.math.expr.Expr; +import org.apache.druid.math.expr.ExprEval; +import org.apache.druid.math.expr.ExprMacroTable; +import org.apache.druid.math.expr.ExprType; + +import javax.annotation.Nonnull; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class RegexpLikeExprMacro implements ExprMacroTable.ExprMacro +{ + private static final String FN_NAME = "regexp_like"; + + @Override + public String name() + { +return FN_NAME; + } + + @Override + public Expr apply(final List args) + { +if (args.size() < 2 || args.size() > 3) { Review comment: Good catch. I'll fix it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428827872 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -7300,6 +7301,74 @@ public void testRegexpExtract() throws Exception ); } + @Test + public void testRegexpExtractFilterViaNotNullCheck() throws Exception + { +// Cannot vectorize due to extractionFn in dimension spec. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_EXTRACT(dim1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' || dim1, '^Z2') IS NOT NULL", +ImmutableList.of( +Druids.newTimeseriesQueryBuilder() + .dataSource(CalciteTests.DATASOURCE1) + .intervals(querySegmentSpec(Filtration.eternity())) + .granularity(Granularities.ALL) + .virtualColumns( + expressionVirtualColumn("v0", "regexp_extract(concat('Z',\"dim1\"),'^Z2')", ValueType.STRING) + ) + .filters( + or( + not(selector("dim1", null, new RegexDimExtractionFn("^1", 0, true, null))), + not(selector("v0", null, null)) + ) + ) + .aggregators(new CountAggregatorFactory("a0")) + .context(TIMESERIES_CONTEXT_DEFAULT) + .build() +), +ImmutableList.of( +new Object[]{3L} +) +); + } + + @Test + public void testRegexpLikeFilter() throws Exception + { +// Cannot vectorize due to usage of regex filter. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_LIKE(dim1, '^1') OR REGEXP_LIKE('Z' || dim1, '^Z2')", Review comment: > a multi-value column There aren't tests for multi-value columns for this specific expression, nor for other functions that aren't multi-value-aware. (There is a separate system that handles mapping over non-multi-value-aware functions.) I could see having a systematic way to test for this (a query generator, maybe) but I don't think adding one just for this function would make sense. > a numeric column There aren't tests for a numeric column. It should be a validation error but we don't currently have very comprehensive tests for the validator. I could add one in an ad-hoc way right now, and that would make sense. But I was actually hoping to add more systematic tests in a future patch, so could do it then too. > matching against null I added a test for matching against null to ExpressionsTest. > The docs say that "The pattern must match starting at the beginning of expr" but it looks like the regex pattern you are passing in is asking that it start at the beginning of the string via ^ in the pattern string. Can I use a $ in my regex to ask that it matches the end of the expr? Wow! After looking into your comment, I realized that I totally screwed up the docs here. I actually have a test in this patch for what happens when you skip the `^`, and it _does_ match substrings that are in the middle of the string, and the test expects that to happen, all in contradiction of what the docs say. I'll fix the docs. Thanks for calling this to my attention. And yes, you can use `$` to ask that it match the end. There are tests for this too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
suneet-s commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428809485 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -7300,6 +7301,74 @@ public void testRegexpExtract() throws Exception ); } + @Test + public void testRegexpExtractFilterViaNotNullCheck() throws Exception + { +// Cannot vectorize due to extractionFn in dimension spec. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_EXTRACT(dim1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' || dim1, '^Z2') IS NOT NULL", +ImmutableList.of( +Druids.newTimeseriesQueryBuilder() + .dataSource(CalciteTests.DATASOURCE1) + .intervals(querySegmentSpec(Filtration.eternity())) + .granularity(Granularities.ALL) + .virtualColumns( + expressionVirtualColumn("v0", "regexp_extract(concat('Z',\"dim1\"),'^Z2')", ValueType.STRING) + ) + .filters( + or( + not(selector("dim1", null, new RegexDimExtractionFn("^1", 0, true, null))), + not(selector("v0", null, null)) + ) + ) + .aggregators(new CountAggregatorFactory("a0")) + .context(TIMESERIES_CONTEXT_DEFAULT) + .build() +), +ImmutableList.of( +new Object[]{3L} +) +); + } + + @Test + public void testRegexpLikeFilter() throws Exception + { +// Cannot vectorize due to usage of regex filter. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_LIKE(dim1, '^1') OR REGEXP_LIKE('Z' || dim1, '^Z2')", Review comment: The docs say that "The pattern must match starting at the beginning of `expr`" but it looks like the regex pattern you are passing in is asking that it start at the beginning of the string via `^` in the pattern string. Can I use a `$` in my regex to ask that it matches the end of the expr? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
suneet-s commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428808032 ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -7300,6 +7301,74 @@ public void testRegexpExtract() throws Exception ); } + @Test + public void testRegexpExtractFilterViaNotNullCheck() throws Exception + { +// Cannot vectorize due to extractionFn in dimension spec. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_EXTRACT(dim1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' || dim1, '^Z2') IS NOT NULL", +ImmutableList.of( +Druids.newTimeseriesQueryBuilder() + .dataSource(CalciteTests.DATASOURCE1) + .intervals(querySegmentSpec(Filtration.eternity())) + .granularity(Granularities.ALL) + .virtualColumns( + expressionVirtualColumn("v0", "regexp_extract(concat('Z',\"dim1\"),'^Z2')", ValueType.STRING) + ) + .filters( + or( + not(selector("dim1", null, new RegexDimExtractionFn("^1", 0, true, null))), + not(selector("v0", null, null)) + ) + ) + .aggregators(new CountAggregatorFactory("a0")) + .context(TIMESERIES_CONTEXT_DEFAULT) + .build() +), +ImmutableList.of( +new Object[]{3L} +) +); + } + + @Test + public void testRegexpLikeFilter() throws Exception + { +// Cannot vectorize due to usage of regex filter. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_LIKE(dim1, '^1') OR REGEXP_LIKE('Z' || dim1, '^Z2')", Review comment: Do we have tests that check how the function performs against * a multi-value column * a numeric column * matching against null ## File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java ## @@ -7300,6 +7301,74 @@ public void testRegexpExtract() throws Exception ); } + @Test + public void testRegexpExtractFilterViaNotNullCheck() throws Exception + { +// Cannot vectorize due to extractionFn in dimension spec. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_EXTRACT(dim1, '^1') IS NOT NULL OR REGEXP_EXTRACT('Z' || dim1, '^Z2') IS NOT NULL", +ImmutableList.of( +Druids.newTimeseriesQueryBuilder() + .dataSource(CalciteTests.DATASOURCE1) + .intervals(querySegmentSpec(Filtration.eternity())) + .granularity(Granularities.ALL) + .virtualColumns( + expressionVirtualColumn("v0", "regexp_extract(concat('Z',\"dim1\"),'^Z2')", ValueType.STRING) + ) + .filters( + or( + not(selector("dim1", null, new RegexDimExtractionFn("^1", 0, true, null))), + not(selector("v0", null, null)) + ) + ) + .aggregators(new CountAggregatorFactory("a0")) + .context(TIMESERIES_CONTEXT_DEFAULT) + .build() +), +ImmutableList.of( +new Object[]{3L} +) +); + } + + @Test + public void testRegexpLikeFilter() throws Exception + { +// Cannot vectorize due to usage of regex filter. +cannotVectorize(); + +testQuery( +"SELECT COUNT(*)\n" ++ "FROM foo\n" ++ "WHERE REGEXP_LIKE(dim1, '^1') OR REGEXP_LIKE('Z' || dim1, '^Z2')", Review comment: The docs say that "The pattern must match starting at the beginning of `expr`" but it looks like the regex pattern you are passing in is asking that it start at the beginning of the string. Can I use a `$` in my regex to ask that it matches the end of the expr? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
suneet-s commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428792725 ## File path: processing/src/main/java/org/apache/druid/query/expression/RegexpLikeExprMacro.java ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.expression; + +import org.apache.druid.common.config.NullHandling; +import org.apache.druid.java.util.common.IAE; +import org.apache.druid.java.util.common.StringUtils; +import org.apache.druid.math.expr.Expr; +import org.apache.druid.math.expr.ExprEval; +import org.apache.druid.math.expr.ExprMacroTable; +import org.apache.druid.math.expr.ExprType; + +import javax.annotation.Nonnull; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class RegexpLikeExprMacro implements ExprMacroTable.ExprMacro +{ + private static final String FN_NAME = "regexp_like"; + + @Override + public String name() + { +return FN_NAME; + } + + @Override + public Expr apply(final List args) + { +if (args.size() < 2 || args.size() > 3) { Review comment: what is the 3rd argument for? I only see the first 2 being used in this expr ```suggestion if (args.size() != 2) { ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] suneet-s commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
suneet-s commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428792725 ## File path: processing/src/main/java/org/apache/druid/query/expression/RegexpLikeExprMacro.java ## @@ -0,0 +1,96 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package org.apache.druid.query.expression; + +import org.apache.druid.common.config.NullHandling; +import org.apache.druid.java.util.common.IAE; +import org.apache.druid.java.util.common.StringUtils; +import org.apache.druid.math.expr.Expr; +import org.apache.druid.math.expr.ExprEval; +import org.apache.druid.math.expr.ExprMacroTable; +import org.apache.druid.math.expr.ExprType; + +import javax.annotation.Nonnull; +import java.util.List; +import java.util.regex.Matcher; +import java.util.regex.Pattern; + +public class RegexpLikeExprMacro implements ExprMacroTable.ExprMacro +{ + private static final String FN_NAME = "regexp_like"; + + @Override + public String name() + { +return FN_NAME; + } + + @Override + public Expr apply(final List args) + { +if (args.size() < 2 || args.size() > 3) { Review comment: what is the 3rd argument for? I only see the first 2 being used in this expr This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428782343 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java ## @@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final SqlTypeFamily... operandTypes) return this; } -public OperatorBuilder operandTypeInference(final SqlOperandTypeInference operandTypeInference) +public OperatorBuilder requiredOperands(final int requiredOperands) { - this.operandTypeInference = operandTypeInference; + this.requiredOperands = requiredOperands; return this; } -public OperatorBuilder requiredOperands(final int requiredOperands) +public OperatorBuilder literalOperands(final int... literalOperands) Review comment: By the way, I also renamed `returnType` to `returnTypeNonNull` to make it clearer. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428782087 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java ## @@ -430,36 +434,64 @@ public void inferOperandTypes( /** * Operand type checker that is used in 'simple' situations: there are a particular number of operands, with - * particular types, some of which may be optional or nullable. + * particular types, some of which may be optional or nullable, and some of which may be required to be literals. */ private static class DefaultOperandTypeChecker implements SqlOperandTypeChecker { private final List operandTypes; private final int requiredOperands; private final IntSet nullableOperands; +private final IntSet literalOperands; DefaultOperandTypeChecker( final List operandTypes, final int requiredOperands, -final IntSet nullableOperands +final IntSet nullableOperands, +@Nullable final int[] literalOperands ) { Preconditions.checkArgument(requiredOperands <= operandTypes.size() && requiredOperands >= 0); this.operandTypes = Preconditions.checkNotNull(operandTypes, "operandTypes"); this.requiredOperands = requiredOperands; this.nullableOperands = Preconditions.checkNotNull(nullableOperands, "nullableOperands"); + + if (literalOperands == null) { +this.literalOperands = IntSets.EMPTY_SET; + } else { +this.literalOperands = new IntArraySet(); +Arrays.stream(literalOperands).forEach(this.literalOperands::add); + } } @Override public boolean checkOperandTypes(SqlCallBinding callBinding, boolean throwOnFailure) { - if (operandTypes.size() != callBinding.getOperandCount()) { -// Just like FamilyOperandTypeChecker: assume this is an inapplicable sub-rule of a composite rule; don't throw -return false; - } - for (int i = 0; i < callBinding.operands().size(); i++) { final SqlNode operand = callBinding.operands().get(i); + +if (literalOperands.contains(i)) { + // Verify that 'operand' is a literal. + if (!SqlUtil.isLiteral(operand)) { +return throwOrReturn( +throwOnFailure, +callBinding, +cb -> cb.getValidator() +.newValidationError( +operand, + Static.RESOURCE.argumentMustBeLiteral(callBinding.getOperator().getName()) +) +); + } + + if (!nullableOperands.contains(i) && SqlUtil.isNullLiteral(operand, true)) { Review comment: Sure, I think this makes sense. I changed it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428780412 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java ## @@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final SqlTypeFamily... operandTypes) return this; } -public OperatorBuilder operandTypeInference(final SqlOperandTypeInference operandTypeInference) +public OperatorBuilder requiredOperands(final int requiredOperands) { - this.operandTypeInference = operandTypeInference; + this.requiredOperands = requiredOperands; return this; } -public OperatorBuilder requiredOperands(final int requiredOperands) +public OperatorBuilder literalOperands(final int... literalOperands) Review comment: I added javadocs to all these methods. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on pull request #9905: Fix compact partially overlapping segments
jihoonson commented on pull request #9905: URL: https://github.com/apache/druid/pull/9905#issuecomment-632190086 @yuanlihan thanks for the comment. I think now I understand your use case. > As far as I know Druid will correctly handle this in `DruidInputSource`. Or do I misunderstand something? Oh, now I got what you are doing here. But, I believe this issue should be already being handled properly by the timeline. I left a comment on https://github.com/apache/druid/issues/9904. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on issue #9904: Fail to compact overlapping segments
jihoonson commented on issue #9904: URL: https://github.com/apache/druid/issues/9904#issuecomment-632189936 Hi @yuanlihan, I believe the timeline is already handling the overlapping segments properly. Looking at the logs above, the issue seems like this: ``` org.apache.druid.java.util.common.UOE: Cannot add overlapping segments [2020-05-15T05:00:00.000Z/2020-05-15T06:00:00.000Z and 2020-05-15T00:00:00.000Z/2020-05-16T00:00:00.000Z] with the same version [2020-05-17T12:14:26.722Z] ``` This log means, you got two segments of overlapping intervals and _same_ version. This is never allowed since Druid doesn't know what segment to read in this case. Druid should never create such segments of overlapping intervals but same version. Can you check why and how they were created? You may need to check what task created that segment in the task logs and overlord logs. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428758597 ## File path: docs/querying/sql.md ## @@ -322,17 +322,18 @@ String functions accept strings, and return a type appropriate to the function. |`LOWER(expr)`|Returns expr in all lowercase.| |`PARSE_LONG(string[, radix])`|Parses a string into a long (BIGINT) with the given radix, or 10 (decimal) if a radix is not provided.| |`POSITION(needle IN haystack [FROM fromIndex])`|Returns the index of needle within haystack, with indexes starting from 1. The search will begin at fromIndex, or 1 if fromIndex is not specified. If the needle is not found, returns 0.| -|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression pattern and extract a capture group, or null if there is no match. If index is unspecified or zero, returns the substring that matched the pattern.| +|`REGEXP_EXTRACT(expr, pattern, [index])`|Apply regular expression `pattern` to `expr` and extract a capture group, or `NULL` if there is no match. If index is unspecified or zero, returns the substring that matched the pattern. The pattern must match starting at the beginning of `expr`, but does not need to match the entire string. Note: when `druid.generic.useDefaultValueForNull = true`, it is not possible to differentiate an empty-string match from a non-match (both will return `NULL`).| +|`REGEXP_LIKE(expr, pattern)`|Returns whether `expr` matches regular expression `pattern`. The pattern must match starting at the beginning of `expr`, but does not need to match the entire string. Especially useful in WHERE clauses.| Review comment: Added both notes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] gianm commented on a change in pull request #9893: Add REGEXP_LIKE, fix bugs in REGEXP_EXTRACT.
gianm commented on a change in pull request #9893: URL: https://github.com/apache/druid/pull/9893#discussion_r428757383 ## File path: sql/src/main/java/org/apache/druid/sql/calcite/expression/OperatorConversions.java ## @@ -291,15 +294,15 @@ public OperatorBuilder operandTypes(final SqlTypeFamily... operandTypes) return this; } -public OperatorBuilder operandTypeInference(final SqlOperandTypeInference operandTypeInference) Review comment: Ah, yeah, I removed it because it wasn't being used; I suppose we could add it back if needed in the future. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] vvararu opened a new issue #9908: GroupBy returns multiple records for the same group after intermediate flush to disk when querying real-time with a multi-value dimension involved
vvararu opened a new issue #9908: URL: https://github.com/apache/druid/issues/9908 ### Affected Version Druid 0.17.0 ### Description Dependencies: - Kafka 2.12_1.0.1. Topic with 1 partition, no replication. Steps to reproduce: - Start a Kafka supervisor for realtime ingestion with the following spec: ``` { "type": "kafka", "dataSchema": { "dataSource": "test", "parser": { "type": "string", "parseSpec": { "format": "json", "timestampSpec": { "column": "timestamp", "format": "auto" }, "dimensionsSpec": { "dimensions": [], "dimensionExclusions": [ "test" ], "spatialDimensions": [] } } }, "metricsSpec": [ { "type": "count", "name": "events" } ], "granularitySpec": { "type": "uniform", "segmentGranularity": "hour", "queryGranularity": "hour" } }, "tuningConfig": { "type": "kafka", "maxRowsInMemory": 4, "maxRowsPerSegment": 500, "intermediatePersistPeriod": "PT1M", "maxPendingPersists": 0, "handoffConditionTimeout": 0, "resetOffsetAutomatically": true, "chatRetries": 20, "httpTimeout": "PT60S", "shutdownTimeout": "PT180S" }, "ioConfig": { "topic": "test", "consumerProperties": { "bootstrap.servers": "bootstrap-server:9092", "group.id": "test", "auto.offset.reset": "latest", "max.partition.fetch.bytes": 400 }, "replicas": "1", "taskCount": "1", "taskDuration": "PT3600S", "startDelay": "PT5S", "period": "PT30S", "useEarliestOffset": false, "completionTimeout": "PT30M" } } ``` - Push next messages to Kafka for real-time ingestion. Sending in bunches of 1000 (or 10.000... both work) message of each with small pause between bunches. I've put a timestamp from future, but any timestamp should make it. ``` {"multiValueDimension":["1"],"zoneId":"1","timestamp":2524608061000} {"zoneId":"1","timestamp":2524608061000} // observe the lack of the multiValueDimension {"zoneId":"2","timestamp":2524608061000} // observe the lack of the multiValueDimension ``` - Do the following groupBy query ``` { "queryType": "groupBy", "dataSource": "test", "intervals": "2050-01-01T00:00:00.000Z/2050-01-01T01:00:00.000Z", "dimensions": ["multiValueDimension","zoneId"], "aggregations": [ { "type": "longSum", "name": "events", "fieldName": "events" } ], "limit" : 2147483647, "granularity": "all" } ``` - At the beginning the results are correct... ``` [ { "version": "v1", "timestamp": "2050-01-01T00:00:00.000Z", "event": { "zoneId": "1", "events": 5000 } }, { "version": "v1", "timestamp": "2050-01-01T00:00:00.000Z", "event": { "zoneId": "2", "events": 4990 } }, { "version": "v1", "timestamp": "2050-01-01T00:00:00.000Z", "event": { "multiValueDimension": "1", "zoneId": "1", "events": 16000 } } ] ``` - ... but after the next intermediate persistence flush to disk... ``` 2020-05-21T13:58:23,854 INFO [[index_kafka_test_f4d3b0f570f9224_jhlpfjga]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Flushed in-memory data for segment[test_2050-01-01T00:00:00.000Z_2050-01-01T01:00:00.000Z_2020-05-21T13:39:14.308Z] spill[2] to disk in [7] ms (3 rows). 2020-05-21T13:58:23,857 INFO [[index_kafka_test_f4d3b0f570f9224_jhlpfjga]-appenderator-persist] org.apache.druid.segment.realtime.appenderator.AppenderatorImpl - Flushed in-memory data with commit metadata [AppenderatorDriverMetadata{segments={index_kafka_test_f4d3b0f570f9224_0=[SegmentWithState{segmentIdentifier=test_2050-01-01T00:00:00.000Z_2050-01-01T01:00:00.000Z_2020-05-21T13:39:14.308Z, state=APPENDING}]}, lastSegmentIds={index_kafka_test_f4d3b0f570f9224_0=test_2050-01-01T00:00:00.000Z_2050-01-01T01:00:00.000Z_2020-05-21T13:39:14.308Z}, callerMetadata={nextPartitions=SeekableStreamEndSequenceNumbers{stream='test', partitionSequenceNumberMap={0=58000] for segments: test_2050-01-01T00:00:00.000Z_2050-01-01T01:00:00.000Z_2020-05-21T13:39:14.308Z ``` - ... same groupBy starts to return multiple records for the same groups... ``` [ { "version": "v1", "timestamp": "2050-01-01T00:00:00.000Z", "event": { "zoneId": "1", "events": 7000
[GitHub] [druid] liujianhuanzz opened a new pull request #9907: fix docs error in hadoop-based part
liujianhuanzz opened a new pull request #9907: URL: https://github.com/apache/druid/pull/9907 ### Description There are several errors in the document. * missing type of `indexSpecForIntermediatePersists` of tuningConfig * table format error: `logParseExceptions` and `maxParseExceptions` of tuningConfig This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] yuanlihan commented on issue #9889: Add new configuration for kill task to skip segments that has been last modified within the config value time
yuanlihan commented on issue #9889: URL: https://github.com/apache/druid/issues/9889#issuecomment-631935055 @maytasm thanks for the explanation. It makes sense to me. This configuration will help a lot to reduce the impact of misoperation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] maytasm commented on issue #9889: Add new configuration for kill task to skip segments that has been last modified within the config value time
maytasm commented on issue #9889: URL: https://github.com/apache/druid/issues/9889#issuecomment-631919882 @yuanlihan I am thinking that the segment will have to satisfy both conditions to be eligible for the kill task. The new `druid.coordinator.kill.fromLastModified` offers additional safety to prevent accidental killing of segments. If the user does not want the `druid.coordinator.kill.fromLastModified` and want the existing behavior then they can simply set druid.coordinator.kill.fromLastModified to 0. Basically, this ignores druid.coordinator.kill.fromLastModified and only requires segment to satisfy `druid.coordinator.kill.durationToRetain`. The benefit of `druid.coordinator.kill.fromLastModified` is for old segments that might have been dropped accidentally from incorrectly configured rules (usually rules are set for intervals of many months). For example, you might have a dropForever rule before a load rule (instead of the other way around). Those segments may be older than `druid.coordinator.kill.durationToRetain` but will not be kill due to `druid.coordinator.kill.fromLastModified` safety net. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] xvrl commented on pull request #9902: update kafka client version to 2.5.0
xvrl commented on pull request #9902: URL: https://github.com/apache/druid/pull/9902#issuecomment-631902458 @clintropolis yep, I realized the rabbit hole was a little deeper. I think I fixed it now This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org