[GitHub] [druid] maytasm opened a new pull request #10740: Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

2021-01-08 Thread GitBox


maytasm opened a new pull request #10740:
URL: https://github.com/apache/druid/pull/10740


   Fix byte calculation for maxBytesInMemory to take into account of 
Sink/Hydrant Object overhead
   
   ### Description
   
   Fix byte calculation for maxBytesInMemory to take into account of 
Sink/Hydrant Object overhead.
   Right now one sink with 1000 rows and 1000 sink with 1 row (same rows) will 
result in the same byte calculation which is incorrect. This can cause OOM 
during ingestion when we have a lot of sink (case such as big time range and/or 
small segmentGranularity.
   
   This PR has:
   - [ ] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (2837a9b -> 118b501)

2021-01-08 Thread xvrl
This is an automated email from the ASF dual-hosted git repository.

xvrl pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 2837a9b  [Minor Doc Fix] Correct the default value of 
`druid.server.http.gracefulShutdownTimeout` (#10661)
 add 118b501  Introduce KafkaRecordEntity to support Kafka headers in 
InputFormats (#10730)

No new revisions were added by this update.

Summary of changes:
 .../org/apache/druid/data/input/InputEntity.java   |   3 +
 .../druid/data/input/kafka/KafkaRecordEntity.java  |  53 +
 .../IncrementalPublishingKafkaIndexTaskRunner.java |  13 +-
 .../druid/indexing/kafka/KafkaIndexTask.java   |   5 +-
 .../druid/indexing/kafka/KafkaRecordSupplier.java  |  10 +-
 .../indexing/kafka/supervisor/KafkaSupervisor.java |   9 +-
 .../druid/indexing/kafka/KafkaIndexTaskTest.java   | 256 ++---
 .../indexing/kafka/KafkaRecordSupplierTest.java|  26 ++-
 .../kafka/supervisor/KafkaSupervisorTest.java  |   3 +-
 .../druid/indexing/kinesis/KinesisIndexTask.java   |   5 +-
 .../indexing/kinesis/KinesisIndexTaskRunner.java   |   9 +-
 .../indexing/kinesis/KinesisRecordSupplier.java|  61 +++--
 .../kinesis/supervisor/KinesisSupervisor.java  |   9 +-
 .../indexing/kinesis/KinesisIndexTaskTest.java | 106 +
 .../kinesis/KinesisRecordSupplierTest.java |  38 +--
 .../indexing/kinesis/KinesisSamplerSpecTest.java   |  13 +-
 .../kinesis/supervisor/KinesisSupervisorTest.java  |   3 +-
 .../seekablestream/RecordSupplierInputSource.java  |  16 +-
 .../seekablestream/SeekableStreamIndexTask.java|  11 +-
 .../SeekableStreamIndexTaskRunner.java |  28 +--
 .../seekablestream/SeekableStreamSamplerSpec.java  |   6 +-
 .../indexing/seekablestream/SequenceMetadata.java  |  12 +-
 .../indexing/seekablestream/StreamChunkParser.java |  14 +-
 .../common/OrderedPartitionableRecord.java |  25 +-
 .../seekablestream/common/RecordSupplier.java  |   5 +-
 .../supervisor/SeekableStreamSupervisor.java   |  21 +-
 .../overlord/sampler/InputSourceSamplerTest.java   |  35 +--
 .../RecordSupplierInputSourceTest.java |  11 +-
 .../SeekableStreamIndexTaskTestBase.java   |  48 +++-
 .../seekablestream/StreamChunkParserTest.java  |  13 +-
 .../SeekableStreamSupervisorStateTest.java |  17 +-
 31 files changed, 593 insertions(+), 291 deletions(-)
 create mode 100644 
extensions-core/kafka-indexing-service/src/main/java/org/apache/druid/data/input/kafka/KafkaRecordEntity.java


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] xvrl merged pull request #10730: Introduce KafkaRecordEntity to support Kafka headers in InputFormats

2021-01-08 Thread GitBox


xvrl merged pull request #10730:
URL: https://github.com/apache/druid/pull/10730


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] xvrl commented on pull request #10730: Introduce KafkaRecordEntity to support Kafka headers in InputFormats

2021-01-08 Thread GitBox


xvrl commented on pull request #10730:
URL: https://github.com/apache/druid/pull/10730#issuecomment-757055227


   ok I will merge it as is then. Adding specific tests for the trace logging 
seems a little overkill. We can add it later if we feel strongly about it.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (d519264 -> 2837a9b)

2021-01-08 Thread jihoonson
This is an automated email from the ASF dual-hosted git repository.

jihoonson pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from d519264  remove extra comma (#10670)
 add 2837a9b  [Minor Doc Fix] Correct the default value of 
`druid.server.http.gracefulShutdownTimeout` (#10661)

No new revisions were added by this update.

Summary of changes:
 docs/configuration/index.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson merged pull request #10661: [Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout`

2021-01-08 Thread GitBox


jihoonson merged pull request #10661:
URL: https://github.com/apache/druid/pull/10661


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10661: [Minor Doc Fix] Correct the default value of `druid.server.http.gracefulShutdownTimeout`

2021-01-08 Thread GitBox


jihoonson commented on pull request #10661:
URL: https://github.com/apache/druid/pull/10661#issuecomment-757045393


   > I guess if we indeed do want to support shutdown without waiting, we might 
need another config property to toggle 
[setStopAtShutdown](https://www.eclipse.org/jetty/javadoc/current/org/eclipse/jetty/server/Server.html#setStopAtShutdown(boolean)).
 But I doubt the need for such a config as 
`druid.server.http.gracefulShutdownTimeout` satisfies most of the usecases.
   
   I agree. The current config seems enough.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (3624acb -> d519264)

2021-01-08 Thread jihoonson
This is an automated email from the ASF dual-hosted git repository.

jihoonson pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 3624acb  fix web-console show json bug (#10710)
 add d519264  remove extra comma (#10670)

No new revisions were added by this update.

Summary of changes:
 integration-tests/README.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson merged pull request #10670: [Doc-Fix] Remove extra comma in `druid/integration-tests/README.md`

2021-01-08 Thread GitBox


jihoonson merged pull request #10670:
URL: https://github.com/apache/druid/pull/10670


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-08 Thread GitBox


jihoonson commented on pull request #10677:
URL: https://github.com/apache/druid/pull/10677#issuecomment-757043116


   @nishantmonu51 BTW, since we are about to cut a branch for 0.21.0 release 
(scheduled on the next Monday), do you think we can get this PR merged before 
the branch cut? I'm asking because this change will introduce an incompatible 
change after the release. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-08 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r554239833



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java
##
@@ -153,6 +154,9 @@
   @JsonIgnore
   private final RetryPolicyFactory retryPolicyFactory;
 
+  @JsonIgnore
+  private final InputSourceSecurityConfig securityConfig;

Review comment:
   Why not `InputSourceSecurityConfig.ALLOW_ALL`? The system-wide 
securityConfig is ignored in DruidInputSource anyway.

##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -1044,17 +1048,21 @@ public IndexIOConfig(
   if (firehoseFactory != null && inputFormat != null) {
 throw new IAE("Cannot use firehose and inputFormat together. Try using 
inputSource instead of firehose.");
   }
+  if (inputSource != null) {
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   I guess you wanted to validate inputSource in the constructor to fail 
early. However, this seems possible to cause some issues because it will throw 
exception on validation failure, which in turn making creating an ioConfig 
instance failed. For example, suppose you had an ingestion spec stored in the 
metadata store. Then, for some reason, you wanted to update the security config 
to forbid some URIs which are in the ingestion spec in the metadata store. The 
update will break reading ingestion specs from metadata store which happens in 
the Overlord. It will also unnecessarily perform the duplicate check multiple 
times during ingestion. I think this should be called somewhere in the code 
path of ingestion, such as `Task.isReady()`.

##
File path: 
extensions-core/hdfs-storage/src/main/java/org/apache/druid/inputsource/hdfs/HdfsInputSource.java
##
@@ -214,10 +215,23 @@ public boolean needsFormat()
   private void cachePathsIfNeeded() throws IOException
   {
 if (cachedPaths == null) {
-  cachedPaths = 
ImmutableList.copyOf(Preconditions.checkNotNull(getPaths(inputPaths, 
configuration), "paths"));
+  Collection paths = Preconditions.checkNotNull(getPaths(inputPaths, 
configuration), "paths");
+  cachedPaths = ImmutableList.copyOf(paths);
 }
   }
 
+  @Override
+  public void validateAllowDenyPrefixList(InputSourceSecurityConfig 
securityConfig)
+  {
+try {
+  cachePathsIfNeeded();

Review comment:
   I think you won't probably need to cache paths since this will 
unnecessarily populate the cache in the Overlord.

##
File path: 
core/src/main/java/org/apache/druid/data/input/AbstractInputSource.java
##
@@ -31,13 +31,16 @@
  */
 public abstract class AbstractInputSource implements InputSource
 {
+  private transient boolean validated = false;

Review comment:
   Why `transient`? We don't use `Serializable`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] zhangyue19921010 commented on pull request #10688: Historical reloads damaged segments automatically when lazy on start.

2021-01-08 Thread GitBox


zhangyue19921010 commented on pull request #10688:
URL: https://github.com/apache/druid/pull/10688#issuecomment-757039352


   close and reopen to run ci again



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] zhangyue19921010 closed pull request #10688: Historical reloads damaged segments automatically when lazy on start.

2021-01-08 Thread GitBox


zhangyue19921010 closed pull request #10688:
URL: https://github.com/apache/druid/pull/10688


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (c62b7c1 -> 3624acb)

2021-01-08 Thread vogievetsky
This is an automated email from the ASF dual-hosted git repository.

vogievetsky pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from c62b7c1  javascript filter result convert to java boolean (#10721)
 add 3624acb  fix web-console show json bug (#10710)

No new revisions were added by this update.

Summary of changes:
 .../src/components/json-collapse/json-collapse.spec.tsx   |  3 ++-
 web-console/src/components/json-collapse/json-collapse.tsx|  3 ++-
 web-console/src/components/json-input/json-input.tsx  |  5 +++--
 web-console/src/components/show-history/show-history.tsx  |  3 ++-
 web-console/src/components/show-json/show-json.tsx|  3 ++-
 web-console/src/components/show-log/show-log.tsx  |  3 ++-
 web-console/src/components/table-cell/table-cell.tsx  |  3 ++-
 .../src/dialogs/edit-context-dialog/edit-context-dialog.tsx   |  3 ++-
 .../src/dialogs/history-dialog/history-dialog.spec.tsx|  5 +++--
 .../src/dialogs/query-plan-dialog/query-plan-dialog.tsx   | 11 ---
 web-console/src/dialogs/snitch-dialog/snitch-dialog.spec.tsx  |  5 +++--
 web-console/src/dialogs/spec-dialog/spec-dialog.tsx   |  5 -
 web-console/src/utils/local-storage-backed-array.tsx  |  4 +++-
 web-console/src/utils/local-storage-keys.tsx  |  4 +++-
 web-console/src/utils/object-change.spec.ts   |  4 +++-
 web-console/src/utils/query-history.ts|  4 +++-
 web-console/src/utils/sampler.ts  |  4 +++-
 web-console/src/views/load-data-view/load-data-view.tsx   |  9 +
 .../load-data-view/parse-data-table/parse-data-table.tsx  |  3 ++-
 .../src/views/query-view/query-output/query-output.tsx|  3 ++-
 web-console/src/views/query-view/query-view.tsx   |  5 +++--
 web-console/src/views/segments-view/segments-view.tsx |  5 +++--
 22 files changed, 65 insertions(+), 32 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] vogievetsky merged pull request #10710: fix web-console show json bug

2021-01-08 Thread GitBox


vogievetsky merged pull request #10710:
URL: https://github.com/apache/druid/pull/10710


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] vogievetsky commented on pull request #10710: fix web-console show json bug

2021-01-08 Thread GitBox


vogievetsky commented on pull request #10710:
URL: https://github.com/apache/druid/pull/10710#issuecomment-757038204


   I think this could be tested by adding a snapshot test to each of the 
changes to make sure the correct dom is rendered even when there are bigints in 
the JSON. That would be a very big undertaking though that I think is out of 
scope of this immediate fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #10635: Add SQL functions to format numbers into human readable format

2021-01-08 Thread GitBox


gianm commented on pull request #10635:
URL: https://github.com/apache/druid/pull/10635#issuecomment-757031980


   My 2ยข on the naming thing: I like having 3 separate functions, because I 
think a 1-function model works best if you have a good spec for format strings, 
which isn't what this patch is about. An example of a good spec is: 
https://www.postgresql.org/docs/current/functions-formatting.html#FUNCTIONS-FORMATTING-NUMERIC-TABLE
   
   If you just have a few different mutually exclusive options, like we do 
here, I think it's more SQL-y to have different functions. In the future, we 
might introduce a postgresql-style number formatting spec, but we don't have to 
do it today.
   
   A couple of things though:
   
   - IMO the name `decimal_format` is misleading. It sounds like a function 
that will format a number to a specific amount of decimals. Maybe instead we 
could call it `human_readable_decimal_format`. For consistency it may then be 
nice to call the others `human_readable_binary_byte_format` and 
`human_readable_decimal_byte_format`. The function names are long, but I think 
that's probably better than being misleading. Any thoughts @FrankChen021, 
@abhishekagarwal87, @clintropolis?
   - Please include examples in the documentation of what the formatted numbers 
will look like.
   
   I'm just writing this as a comment instead of a review, since I didn't read 
the code.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (f66fdbf -> c62b7c1)

2021-01-08 Thread jihoonson
This is an automated email from the ASF dual-hosted git repository.

jihoonson pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from f66fdbf  add offsetFetchPeriod to kinesis ingestion doc (#10734)
 add c62b7c1  javascript filter result convert to java boolean (#10721)

No new revisions were added by this update.

Summary of changes:
 .../druid/query/filter/JavaScriptDimFilter.java| 13 ++-
 .../query/filter/JavaScriptDimFilterTest.java  | 44 ++
 2 files changed, 55 insertions(+), 2 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson closed issue #10719: Javascript filter return true when the expression result is false

2021-01-08 Thread GitBox


jihoonson closed issue #10719:
URL: https://github.com/apache/druid/issues/10719


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson merged pull request #10721: javascript filter result convert to java boolean

2021-01-08 Thread GitBox


jihoonson merged pull request #10721:
URL: https://github.com/apache/druid/pull/10721


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10635: Add SQL functions to format numbers into human readable format

2021-01-08 Thread GitBox


gianm commented on a change in pull request #10635:
URL: https://github.com/apache/druid/pull/10635#discussion_r554228178



##
File path: docs/querying/sql.md
##
@@ -563,6 +563,9 @@ The [DataSketches 
extension](../development/extensions-core/datasketches-extensi
 |`COALESCE(value1, value2, ...)`|Returns the first value that is neither NULL 
nor empty string.|
 |`NVL(expr,expr-for-null)`|Returns 'expr-for-null' if 'expr' is null (or empty 
string for string type).|
 |`BLOOM_FILTER_TEST(, )`|Returns true if the value is 
contained in a Base64-serialized bloom filter. See the [Bloom filter 
extension](../development/extensions-core/bloom-filter.html) documentation for 
additional details.|
+|`BINARY_BYTE_FORMAT(value, [precision])`|Returns the value in human-readable 
[IEC](https://en.wikipedia.org/wiki/Binary_prefix) format. Supported unit 
suffix: `B`, `KiB`, `MiB`, `GiB`, `TiB`, `PiB`, `EiB`. `precision` must be in 
the range of [0,3] (default: 2).|
+|`DECIMAL_BYTE_FORMAT(value, [precision])`|Returns the value in human-readable 
[SI](https://en.wikipedia.org/wiki/Binary_prefix) format. Supported unit 
suffix: `B`, `KB`, `MB`, `GB`, `TB`, `PB`, `EB`. `precision` must be in the 
range of [0,3] (default: 2).|
+|`DECIMAL_FORMAT(value, [precision])`|Returns the value in human-readable SI 
format. Supported unit suffix: `K`, `M`, `G`, `T`, `P`, `E`. `precision` must 
be in the range of [0,3] (default: 2).|

Review comment:
   I think they make sense in the numeric section, because they accept 
numbers. It's OK that they don't return numbers. It's analogous to 
`TIME_FORMAT`, which still belongs in the time section.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] capistrant opened a new pull request #10739: Druid Server Guilds and Guild Aware Replication

2021-01-08 Thread GitBox


capistrant opened a new pull request #10739:
URL: https://github.com/apache/druid/pull/10739


   Fixes #9816 
   
   ### Description
   
   ***Note***: I use the term "Guild" throughout this proposal. It is an 
arbitrary name that I chose for the Historical grouping construct that I am 
proposing. A Guild would be a group of Historical servers. All servers who do 
not specify a Guild in runtime.properties would be assigned to a default Guild.
   
   I am adding the idea of guilds in Druid. A guild is a logical grouping of 
servers. With this PR, the only use of guilds is replicant distribution across 
Historical Servers. The idea has been born out of the desire for HDFS like rack 
aware replication for bare metal deployment on-prem. For this use case, Druid 
Historical services are assigned a guild based on the physical rack that they 
live on. The coordinator uses SegmentReplicantLookup to build a lookup for 
segment:guild replicant count. The Coordinator has a preference for loading 
replicants across 2 or more guilds. It is important to note that, instead of 
having less replicants than specified by Druid Load Rules, the Coordinator will 
load replicants on the same guild if it must.
   
   This idea can go beyond just physical racks in a data center and apply to 
things such as availability zones or arbitrary historical server groupings in a 
virtualized deployment. That is why I came up with the name "guild" instead of 
just saying rack explicitly.
   
   ### Implementation
   
   **Configs**
   runtime.properties:
   * `druid.server.guild` The STRING guild name assigned to a server. Default 
value applied if not specified.
   * `druid.coordinator.guildReplication.on` The BOOLEAN flag telling the 
coordinator if it should use server guilds as a part of its coordination logic. 
Default value is false.
   
   coordinator dynamic config:
   * `guildReplicaitonMaxPercentOfSegmentsToMove` What % of segments moved 
during BalanceSegments duty should be dedicated to moving segments that are not 
meeting guild replication threshold of 2? This is applied against what is left 
over after moving segments off of decommissioning servers, if there are any.
   
   `SegmentReplicantLookup`
   
   The coordinator currently builds this object each time the 
`HistoricalManagementDuties` run. Prior to this proposal, the object would 
contain two main data structures:
   
   `Table segmentsInCluster` - replicant count for 
each segment by tier for segments served by the cluster.
   `Table loadingSegments` - replicant count for 
each segment by tier for segments being loaded by the cluster. 
   
   My proposal adds a third data structure that is specific to guild 
replication:
   
   `Table historicalGuildDistribution` - replicant 
count for each segment by guild.
   
   This new structure is only created if guild replication is enabled. It is 
not worth the resources if we are not going to use it! The structure is used 
for quick lookup to get guild replication state for a given segment. It will be 
used by coordinator duties when making decisions on loading/moving/dropping 
segments. 
   
   `LoadRule`
   
   When Assigning replicas, use the changes in SegmentReplicantLookup in order 
to split `ServerHolders` into groups of servers who are on a guild that is 
serving a replicant of the segment and servers who are on a guild that is not 
serving a replicant of the segment. If possible `LoadRule` will load replicants 
to the best scored server(s) from the guild(s) not serving a replicant. 
However, we always fallback to just loading the specified number of replicants 
even if replication across guilds cannot be achieved.
   
   When picking replicants to drop, we will also split the `ServerHolders` in 
the same way. The segments will be dropped from decommissioning servers first, 
then from servers on guilds with > 1 replicant of the segment, then lastly from 
the remaining servers serving a replicant.
   
   `BalancerStrategy`
   
   Add a method to the interface. `pickSegmentToMove(List 
serverHolders, Set broadcastDataSources, DruidCoordinatorRuntimeParams 
params);`
   
   This new method is added so we have the `SegmentReplicantLookup` information 
needed to pick a segment who is not replicated across > 1 guild when balancing 
the cluster. It is needed because we introduce the dynamic config and 
associated balancing phase in BalanceSegments that prioritizes the balancing of 
segments who are not properly replicated across guilds. We need to only pick 
segments that meet the requirements. The existing `pickSegmentToMove` does not 
suffice.
   
   RandomBalancerStrategy and CostBalancerStrategy add implementations.
   
   `ReservoirSegmentSampler`
   
   Adds a method for getting a `SegmentHolder` that is violator of the goal of 
being on > 1 guilds. This method needs to have access to the 
`SegmentReplicantLookup` in order to quickly look up replication state of 
segments it is possibly selecting. It returns the 

[druid] branch master updated (6eef0e4 -> f66fdbf)

2021-01-08 Thread jihoonson
This is an automated email from the ASF dual-hosted git repository.

jihoonson pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 6eef0e4  Fix collision between #10689 and #10593. (#10738)
 add f66fdbf  add offsetFetchPeriod to kinesis ingestion doc (#10734)

No new revisions were added by this update.

Summary of changes:
 docs/development/extensions-core/kinesis-ingestion.md | 1 +
 1 file changed, 1 insertion(+)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson merged pull request #10734: add offsetFetchPeriod to kinesis ingestion doc

2021-01-08 Thread GitBox


jihoonson merged pull request #10734:
URL: https://github.com/apache/druid/pull/10734


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10605: bitwise math function expressions

2021-01-08 Thread GitBox


gianm commented on a change in pull request #10605:
URL: https://github.com/apache/druid/pull/10605#discussion_r554224361



##
File path: core/src/test/java/org/apache/druid/math/expr/FunctionTest.java
##
@@ -519,6 +519,31 @@ public void testLeast()
 assertExpr("least(1, null, 'A')", "1");
   }
 
+  @Test
+  public void testBitwise()
+  {
+assertExpr("bitwiseAnd(3, 1)", 1L);
+assertExpr("bitwiseAnd(2, 1)", 0L);
+assertExpr("bitwiseOr(3, 1)", 3L);
+assertExpr("bitwiseOr(2, 1)", 3L);
+assertExpr("bitwiseXor(3, 1)", 2L);
+assertExpr("bitwiseXor(2, 1)", 3L);
+assertExpr("bitwiseShiftLeft(2, 1)", 4L);
+assertExpr("bitwiseShiftRight(2, 1)", 1L);
+assertExpr("bitwiseAnd(bitwiseComplement(1), 7)", 6L);
+assertExpr("bitwiseAnd('2', '1')", null);
+assertExpr("bitwiseAnd(2, '1')", 0L);
+
+assertExpr("bitwiseOr(2.345, 1)", 4612462889363109315L);

Review comment:
   I see, I missed that test. Sounds good.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10605: bitwise math function expressions

2021-01-08 Thread GitBox


gianm commented on a change in pull request #10605:
URL: https://github.com/apache/druid/pull/10605#discussion_r554224182



##
File path: docs/misc/math-expr.md
##
@@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for 
each function.
 |acos|acos(x) would return the arc cosine of x|
 |asin|asin(x) would return the arc sine of x|
 |atan|atan(x) would return the arc tangent of x|
+|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values 
will be converted to their bit representation|
+|bitwiseComplement|bitwiseComplement(x) would return the result of ~x. Double 
values will be converted to their bit representation|
+|bitwiseConvertDouble|bitwiseConvertDouble(x) would convert the IEEE 754 
floating-point "double" bits stored in a long into a double value if the input 
is a long, or the copy bits of a double value into a long if the input is a 
double.|

Review comment:
   It should implicitly cast, I think.
   
   Generally I think function behavior is easier to understand if the function 
implicitly casts its inputs to the type that it expects, vs. changing behavior 
based on its input type.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10605:
URL: https://github.com/apache/druid/pull/10605#discussion_r554206473



##
File path: docs/misc/math-expr.md
##
@@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for 
each function.
 |acos|acos(x) would return the arc cosine of x|
 |asin|asin(x) would return the arc sine of x|
 |atan|atan(x) would return the arc tangent of x|
+|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values 
will be converted to their bit representation|
+|bitwiseComplement|bitwiseComplement(x) would return the result of ~x. Double 
values will be converted to their bit representation|
+|bitwiseConvertDouble|bitwiseConvertDouble(x) would convert the IEEE 754 
floating-point "double" bits stored in a long into a double value if the input 
is a long, or the copy bits of a double value into a long if the input is a 
double.|

Review comment:
   >I'd think that `bitwiseConvertDouble(bitwiseConvertDouble(x))` would be 
identical to `bitwiseConvertDouble(x)`.
   
   hmm, should the conversion just pass through if the type is already the 
output type or should it implicitly cast similar to the other bitwise functions?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #10730: Introduce KafkaRecordEntity to support Kafka headers in InputFormats

2021-01-08 Thread GitBox


gianm commented on pull request #10730:
URL: https://github.com/apache/druid/pull/10730#issuecomment-757000160


   > looks like tests are failing due to insufficient diff coverage for some 
lines. Many of them appear to be where we introduced additional type 
parameters. @gianm what's the guidance here?
   
   What I've done is either,
   
   - Add tests if it's something that really should be tested
   - Add a suppression if it's something that really shouldn't be tested
   - Ignore the coverage check if it's something that could be tested, but it 
isn't that important, and we want to skip it for now. (It's a diff-oriented 
check, so ignoring it won't affect future patches.)



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] xvrl commented on pull request #10730: Introduce KafkaRecordEntity to support Kafka headers in InputFormats

2021-01-08 Thread GitBox


xvrl commented on pull request #10730:
URL: https://github.com/apache/druid/pull/10730#issuecomment-756995841


   looks like tests are failing due to insufficient diff coverage for some 
lines. Many of them appear to be where we introduced additional type 
parameters. @gianm what's the guidance here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm merged pull request #10738: Fix collision between #10689 and #10593.

2021-01-08 Thread GitBox


gianm merged pull request #10738:
URL: https://github.com/apache/druid/pull/10738


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s commented on pull request #10710: fix web-console show json bug

2021-01-08 Thread GitBox


suneet-s commented on pull request #10710:
URL: https://github.com/apache/druid/pull/10710#issuecomment-756933916


   > > @bananaaggle Thanks for this fix - can you please add a test that 
catches parsing these large numbers in the web-console so that future 
refactorings don't accidentally break this functionality. Thanks!
   > > EDIT - tagged the wrong user initially
   > 
   > I'm not very familiar with console, can you give me some information for 
creating test about this PR? Like which direct I can refer to for creating test 
or other hints. Could you help me create it?
   
   I'm not familiar with front-end tests either. @vogievetsky do you have any 
advice on how to write a test for this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] capistrant commented on a change in pull request #10287: allow the LogUsedSegments duty to be skipped

2021-01-08 Thread GitBox


capistrant commented on a change in pull request #10287:
URL: https://github.com/apache/druid/pull/10287#discussion_r554119706



##
File path: 
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##
@@ -669,21 +669,31 @@ private void stopBeingLeader()
 
   private List makeHistoricalManagementDuties()
   {
-return ImmutableList.of(
-new LogUsedSegments(),
+List duties = new ArrayList<>();
+if (config.isLogUsedSegmentsDutyEnabled()) {
+  duties.add(new LogUsedSegments());
+}
+duties.addAll(ImmutableList.of(
 new UpdateCoordinatorStateAndPrepareCluster(),
 new RunRules(DruidCoordinator.this),
 new UnloadUnusedSegments(),
 new MarkAsUnusedOvershadowedSegments(DruidCoordinator.this),
 new BalanceSegments(DruidCoordinator.this),
 new EmitClusterStatsAndMetrics(DruidCoordinator.this)
+));
+log.debug(
+"Done making historical management duties %s",
+duties.stream().map(duty -> 
duty.getClass().getName()).collect(Collectors.toList())
 );
+return duties;

Review comment:
   reverting to the original way of doing this now that LogUsedSegments is 
always used again





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] capistrant commented on a change in pull request #10287: allow the LogUsedSegments duty to be skipped

2021-01-08 Thread GitBox


capistrant commented on a change in pull request #10287:
URL: https://github.com/apache/druid/pull/10287#discussion_r554117678



##
File path: 
server/src/main/java/org/apache/druid/server/coordinator/DruidCoordinator.java
##
@@ -669,21 +669,31 @@ private void stopBeingLeader()
 
   private List makeHistoricalManagementDuties()
   {
-return ImmutableList.of(
-new LogUsedSegments(),
+List duties = new ArrayList<>();
+if (config.isLogUsedSegmentsDutyEnabled()) {
+  duties.add(new LogUsedSegments());
+}
+duties.addAll(ImmutableList.of(
 new UpdateCoordinatorStateAndPrepareCluster(),
 new RunRules(DruidCoordinator.this),
 new UnloadUnusedSegments(),
 new MarkAsUnusedOvershadowedSegments(DruidCoordinator.this),
 new BalanceSegments(DruidCoordinator.this),
 new EmitClusterStatsAndMetrics(DruidCoordinator.this)
+));
+log.debug(
+"Done making historical management duties %s",
+duties.stream().map(duty -> 
duty.getClass().getName()).collect(Collectors.toList())
 );
+return duties;

Review comment:
   makeIndexingServiceDuties below used ImmutableList.copyOf(duties) so I 
will stick with that for consistency





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] capistrant commented on pull request #10287: allow the LogUsedSegments duty to be skipped

2021-01-08 Thread GitBox


capistrant commented on pull request #10287:
URL: https://github.com/apache/druid/pull/10287#issuecomment-756913585


   > I like this log message from LogUsedSegments and have found it very useful:
   > 
   > ```java
   > log.info("Found [%,d] used segments.", params.getUsedSegments().size());
   > ```
   > 
   > It's nearly free, since `params.getUsedSegments()` is something that's 
just passed into the method and doesn't need to be computed. Instead of a 
config to disable the duty, how about about removing this block and leaving 
everything else the same:
   > 
   > ```java
   > DataSourcesSnapshot dataSourcesSnapshot = 
params.getDataSourcesSnapshot();
   > for (DataSegment segment : 
dataSourcesSnapshot.iterateAllUsedSegmentsInSnapshot()) {
   >   if (segment.getSize() < 0) {
   > log.makeAlert("No size on a segment")
   >.addData("segment", segment)
   >.emit();
   >   }
   > }
   > ```
   > 
   > I'm pretty sure this is ancient debugging code and we can safely get rid 
of it.
   > 
   > The other potentially expensive part is skipped if log level is INFO or 
higher. (Which is the default, so it shouldn't be a problem.)
   
   I think I like this idea the most considering everything we have learned 
since I created the PR. I am going to amend the PR to remove the configuration 
to skip the duty. Instead we will remove this legacy code and leave everything 
else as is. Maybe in the future this duty will change and become expensive 
resulting in having the ability to skip become beneficial. but lets not put the 
cart ahead of the horse and over complicate things today.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #10247: Remove CloseQuietly and migrate its usages to other methods.

2021-01-08 Thread GitBox


gianm commented on pull request #10247:
URL: https://github.com/apache/druid/pull/10247#issuecomment-756907541


   Fixed up some merge conflicts.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (26bcd47 -> 6eef0e4)

2021-01-08 Thread gian
This is an automated email from the ASF dual-hosted git repository.

gian pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 26bcd47  Thread-safety for ResponseContext.REGISTERED_KEYS (#9667)
 add 6eef0e4  Fix collision between #10689 and #10593. (#10738)

No new revisions were added by this update.

Summary of changes:
 .../java/org/apache/druid/segment/IndexMergerTestBase.java   | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #10738: Fix collision between #10689 and #10593.

2021-01-08 Thread GitBox


gianm commented on pull request #10738:
URL: https://github.com/apache/druid/pull/10738#issuecomment-756905866


   Travis passed. I'll merge this.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] liran-funaro commented on pull request #10593: IncrementalIndex Tests and Benchmarks Parametrization

2021-01-08 Thread GitBox


liran-funaro commented on pull request #10593:
URL: https://github.com/apache/druid/pull/10593#issuecomment-756795145


   Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 edited a comment on issue #10735: Misleading info about tsColumn in the lookups-cached-global docs

2021-01-08 Thread GitBox


FrankChen021 edited a comment on issue #10735:
URL: https://github.com/apache/druid/issues/10735#issuecomment-756784327


   I guess why `getLookupPairs` currently reloads the whole table is that if 
there're records deletions in database, such a full reload could drop those 
deleted records from cache. Otherwise, if it loads records incrementally by 
tsColumn,  there must be extra code to check if records in cache have been 
deleted from database. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] FrankChen021 commented on issue #10735: Misleading info about tsColumn in the lookups-cached-global docs

2021-01-08 Thread GitBox


FrankChen021 commented on issue #10735:
URL: https://github.com/apache/druid/issues/10735#issuecomment-756784327


   I guess why `getLookupPairs` currently reload the whole table is that if 
there're records deletions in database, such a full reload could drop those 
deleted records from cache. Otherwise, if it loads records incrementally by 
tsColumn,  there must be extra code to check if records in cache have been 
deleted from database. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] ayushkul2910 commented on a change in pull request #10448: Added CronScheduler support as a proof to clock drift while emitting metrics

2021-01-08 Thread GitBox


ayushkul2910 commented on a change in pull request #10448:
URL: https://github.com/apache/druid/pull/10448#discussion_r553893369



##
File path: 
server/src/main/java/org/apache/druid/server/metrics/MetricsModule.java
##
@@ -108,10 +108,10 @@ public MonitorScheduler getMonitorScheduler(
 
 return new MonitorScheduler(
 config.get(),
-
CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorScheduler-%s").build(),
+
CronScheduler.newBuilder(Duration.ofSeconds(1L)).setThreadName("MonitorSchedulerThread").build(),
 emitter,
 monitors,
-Executors.newCachedThreadPool()
+Execs.multiThreaded(64, "MonitorThread-%d")

Review comment:
   This makes sense, we should reduce the number of monitor threads.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553847083



##
File path: 
core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##
@@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector)
   @Override
   public boolean canVectorize(InputBindingInspector inspector)
   {
-return inspector.areNumeric(left, right) && inspector.canVectorize(left, 
right);

Review comment:
   i _think_ canVectorize will still end up false because array 
types/functions are not yet vectorizable, but yeah i guess array types would 
not be well handled here even if they were, though maybe that is the case for 
non-vectorized expression as well. I'll do a bit of exploration and see what up





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553845596



##
File path: 
processing/src/main/java/org/apache/druid/query/dimension/DimensionSpec.java
##
@@ -70,6 +71,11 @@ default MultiValueDimensionVectorSelector 
decorate(MultiValueDimensionVectorSele
 throw new UOE("DimensionSpec[%s] cannot vectorize", getClass().getName());
   }
 
+  default VectorObjectSelector decorate(VectorObjectSelector selector)

Review comment:
   This is used with the poorly named 
`VectorColumnSelectorFactory.makeStringObjectSelector`, which should be renamed 
`VectorColumnSelectorFactory.makeDimensionObjectSelector` or something, which 
is for doing dimensiony things with a vector object selector similar to what is 
done for `SingleValueDimensionVectorSelector` or 
`MultiValueDimensionVectorSelector`. There are currently no implementations of 
this, like is the case with the other decorate methods with vector selectors, i 
was just trying for symmetry here so that the object selector could behave 
similar to the other string selectors when used in this context





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553840570



##
File path: 
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnProcessorFactory.java
##
@@ -64,7 +65,16 @@ public GroupByVectorColumnSelector 
makeMultiValueDimensionProcessor(
 ValueType.STRING == capabilities.getType(),
 "groupBy dimension processors must be STRING typed"
 );
-throw new UnsupportedOperationException("Multi-value dimensions not yet 
implemented for vectorized groupBys");
+throw new UnsupportedOperationException("Multi-value dimensions are not 
yet implemented for vectorized groupBys");
+  }
+
+  @Override
+  public GroupByVectorColumnSelector makeObjectProcessor(
+  ColumnCapabilities capabilities,
+  VectorObjectSelector selector
+  )
+  {
+throw new UnsupportedOperationException("Object columns are not yet 
implemented for vectorized groupBys");

Review comment:
   yes, I have a related branch that I will open as a follow-up to this PR 
that adds support for group-by queries on vectorized string expressions, using 
an object selector and a newly added dictionary building 
`GroupByVectorColumnSelector` similar to the non-vectorized group by code-path. 
In that branch this part checks that the column capabilities are string typed, 
and if so then will make the new dictionary building selector, else throw an 
exception for being an unsupported type.
   
   It seems beneficial to not have to make the string expression vector results 
have to masquerade as dictionary encoded, to be translated to string values to 
be re-encoded into a new dictionary, which is part of my reason for pushing for 
object selectors for the use case





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553833318



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##
@@ -27,6 +27,13 @@
 {
   Predicate makeStringPredicate();
 
+  default Predicate makeObjectPredicate()
+  {
+// default to try to use string predicate;

Review comment:
   currently I don't think it should be possible because only expression 
filter will make a matcher factory with object matching predicate and only for 
string column types, but i guess cast exceptions if an object selector of a 
complex column somehow got in here from future code, see other comment about 
thinking a bit further about this ๏˜… 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553833052



##
File path: 
processing/src/main/java/org/apache/druid/query/filter/DruidPredicateFactory.java
##
@@ -27,6 +27,13 @@
 {
   Predicate makeStringPredicate();
 
+  default Predicate makeObjectPredicate()
+  {
+// default to try to use string predicate;
+final Predicate stringPredicate = makeStringPredicate();
+return o -> stringPredicate.apply((String) o);

Review comment:
   uh, it is sort of built on hope right now... the hope that it will 
probably be a string because the only context it will currently happen in is 
from making vector matcher object processors on string expression columns, but 
if we opened it up to additional non-primitive number types like array 
expressions and complex column types then this would need to be reconsidered, 
or stronger checks in place that the filter supports filtering on the 
underlying object type. I might need to think about this a bit deeper.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553820619



##
File path: 
processing/src/main/java/org/apache/druid/segment/vector/VectorColumnSelectorFactory.java
##
@@ -57,6 +57,15 @@ default int getMaxVectorSize()
*/
   MultiValueDimensionVectorSelector 
makeMultiValueDimensionSelector(DimensionSpec dimensionSpec);
 
+  /**
+   * Returns an object selector for string columns, useful for non-dictionary 
encoded strings, or when

Review comment:
   hmm, i have no idea why I named this function like this, it should 
probably be called `makeDimensionObjectSelector`. Its just expected to make a 
`VectorObjectSelector` from a `DimensionSpec` rather than a raw column name





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553818153



##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called
+  // when testing equality of DimensionPredicateFilter, so it's the truly 
required method, but even that seems
+  // strange at best.
+  // Rather than tackle removing the annotation and reworking equality 
tests for now, will leave this to refactor
+  // as part of https://github.com/apache/druid/issues/8256 which suggests 
combining Filter and DimFilter
+  // implementations, which should clean up some of this mess.
+  @Override
+  public int hashCode()
+  {
+return super.hashCode();
+  }
+
+  @Override
+  public boolean equals(Object obj)
+  {
+return super.equals(obj);
+  }
+};
+
+
+final ExprType outputType = theExpr.getOutputType(factory);
+
+if (outputType == null) {
+  // if an expression is vectorizable, but the output type is null, the 
result will be null (or the default
+  // value in default mode) because expression is either all null 
constants or missing columns
+
+  // in sql compatible mode, this means no matches ever, so just use the 
false matcher:
+  if (NullHandling.sqlCompatible()) {
+return new FalseVectorMatcher(factory.getVectorSizeInspector());
+  }
+  // in default mode, just fallback to using a long matcher since nearly 
all boolean-ish expressions
+  // output a long value so it is probably a safe bet? idk, ending up here 
by using all null-ish things
+  // in default mode is dancing on the edge of insanity anyway...
+  return 
VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor(
+  
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG),
+  ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+  ).makeMatcher(predicateFactory);
+}
+
+switch (outputType) {
+  case LONG:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeLongProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.LONG),
+ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+).makeMatcher(predicateFactory);
+  case DOUBLE:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeDoubleProcessor(
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.DOUBLE),
+ExpressionVectorSelectors.makeVectorValueSelector(factory, theExpr)
+).makeMatcher(predicateFactory);
+  case STRING:
+return 
VectorValueMatcherColumnProcessorFactory.instance().makeObjectProcessor(
+// using 'numeric' capabilities creator so we are configured to 
NOT be dictionary encoded, etc
+
ColumnCapabilitiesImpl.createSimpleNumericColumnCapabilities(ValueType.STRING),

Review comment:
   it is strange, but on purpose for the reason the comment says since the 
numeric capabilities defaults are correct in this case. I will make sure all 
current users are numeric types, and if so then just make these capabilites 
from scratch because we should be counting on something geared towards numeric 
behavior in case those change in some unexpected way. Or, if it is used by 
other non-numbers then rename it to something more appropriate, like 
`createSimpleSingleValueColumnCapabilites` or similar





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to 

[GitHub] [druid] gianm commented on pull request #10267: DruidInputSource: Fix issues in column projection, timestamp handling.

2021-01-08 Thread GitBox


gianm commented on pull request #10267:
URL: https://github.com/apache/druid/pull/10267#issuecomment-756633682


   Fixed up some merge conflicts.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553816206



##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called

Review comment:
   yeah, I was planning on addressing this in a separate PR, the comments 
are partially to help remind us to do it in case I didn't get to it immediately





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553814648



##
File path: 
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##
@@ -305,17 +305,29 @@ private static ColumnCapabilities 
getEffectiveCapabilities(
 originalCapabilities == null || 
ValueType.COMPLEX.equals(originalCapabilities.getType());
 
 if (type == ValueType.STRING) {
-  if (!forceSingleValue && 
effectiveCapabilites.hasMultipleValues().isMaybeTrue()) {
-return strategyFactory.makeMultiValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeMultiValueDimensionSelector(dimensionSpec)
-);
-  } else {
-return strategyFactory.makeSingleValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeSingleValueDimensionSelector(dimensionSpec)
-);
+  if (!forceSingleValue) {
+// if column is not dictionary encoded (and not non-existent or 
complex), use object selector
+if (effectiveCapabilites.isDictionaryEncoded().isFalse() ||
+effectiveCapabilites.areDictionaryValuesUnique().isFalse()

Review comment:
   by regular string selector do you mean dimension selector? The answer is 
yes we _could_, but if the dictionary values are not unique, i came to the 
conclusion that they aren't especially useful since in all the cases I can 
think of you can't use them as is and you're going to have to lookup the string 
values anyway (e.g. filter matching and vector group by).
   
   I was trying to provide an avenue to avoid the extra steps with expression 
virtual columns, and allow an object selector to be created directly since the 
string is the useable part in this case, and the dictionary ids just feel like 
an obstruction. But maybe this is overly broad and there are cases where we 
would want it to act like a dictionary encoded column for some reason that I 
have missed?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #10287: allow the LogUsedSegments duty to be skipped

2021-01-08 Thread GitBox


gianm commented on pull request #10287:
URL: https://github.com/apache/druid/pull/10287#issuecomment-756630739


   > circling back on this. my cluster with 161k segments takes ~20ms to 
execute `LogUsedSegments`. This duty should scale fairly linearly with number 
of segments so you'd have to 50x that number of segments to hit 1 second if 
that is true. I really look forward to when I can deploy #10603 in my larger 
production environment to confirm. I should be able to do this in mid-Jan.
   
   You could also use `sjk` to grab a flame graph of a couple of coordinator 
run cycles (maybe let the tool run for 5 minutes). It would be illuminating as 
to exactly where the time is going. Some instructions here: 
https://support.imply.io/hc/en-us/articles/360033747953-Profiling-Druid-queries-using-flame-graphs
   
   (That's written with profiling queries in mind, but the technique would work 
just as well for profiling anything.)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10605:
URL: https://github.com/apache/druid/pull/10605#discussion_r553811715



##
File path: core/src/test/java/org/apache/druid/math/expr/FunctionTest.java
##
@@ -519,6 +519,31 @@ public void testLeast()
 assertExpr("least(1, null, 'A')", "1");
   }
 
+  @Test
+  public void testBitwise()
+  {
+assertExpr("bitwiseAnd(3, 1)", 1L);
+assertExpr("bitwiseAnd(2, 1)", 0L);
+assertExpr("bitwiseOr(3, 1)", 3L);
+assertExpr("bitwiseOr(2, 1)", 3L);
+assertExpr("bitwiseXor(3, 1)", 2L);
+assertExpr("bitwiseXor(2, 1)", 3L);
+assertExpr("bitwiseShiftLeft(2, 1)", 4L);
+assertExpr("bitwiseShiftRight(2, 1)", 1L);
+assertExpr("bitwiseAnd(bitwiseComplement(1), 7)", 6L);
+assertExpr("bitwiseAnd('2', '1')", null);
+assertExpr("bitwiseAnd(2, '1')", 0L);
+
+assertExpr("bitwiseOr(2.345, 1)", 4612462889363109315L);

Review comment:
   more exhaustive coverage which should include these combinations is done 
in `VectorExprSanityTest`, where non-vectorized and vectorized evaluation 
results are asserted to be equal with a variety of combinations of inputs, but 
I can add explicit tests here since those don't necessarily confirm 
correctness, just self consistency between the two evaluation modes.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10605:
URL: https://github.com/apache/druid/pull/10605#discussion_r553810857



##
File path: docs/misc/math-expr.md
##
@@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for 
each function.
 |acos|acos(x) would return the arc cosine of x|
 |asin|asin(x) would return the arc sine of x|
 |atan|atan(x) would return the arc tangent of x|
+|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values 
will be converted to their bit representation|
+|bitwiseComplement|bitwiseComplement(x) would return the result of ~x. Double 
values will be converted to their bit representation|
+|bitwiseConvertDouble|bitwiseConvertDouble(x) would convert the IEEE 754 
floating-point "double" bits stored in a long into a double value if the input 
is a long, or the copy bits of a double value into a long if the input is a 
double.|

Review comment:
   will split into `bitwiseConvertDoubleToLongBits` and 
`bitwiseConvertLongBitsToDouble`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10605:
URL: https://github.com/apache/druid/pull/10605#discussion_r553810678



##
File path: docs/misc/math-expr.md
##
@@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for 
each function.
 |acos|acos(x) would return the arc cosine of x|
 |asin|asin(x) would return the arc sine of x|
 |atan|atan(x) would return the arc tangent of x|
+|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values 
will be converted to their bit representation|

Review comment:
   thinking back, I think i originally did this behavior before I added 
`bitwiseConvertDouble`, so it was done as a way to do bitwise operations on 
double values. After I added the explicit function, it isn't really necessary 
anymore, so will revert to the behavior of casting and assuming long inputs.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions

2021-01-08 Thread GitBox


clintropolis commented on a change in pull request #10605:
URL: https://github.com/apache/druid/pull/10605#discussion_r553810003



##
File path: core/src/main/java/org/apache/druid/math/expr/Evals.java
##
@@ -71,4 +71,14 @@ public static boolean asBoolean(@Nullable String x)
   {
 return !NullHandling.isNullOrEquivalent(x) && Boolean.parseBoolean(x);
   }
+
+  public static long doubleToLongBits(double x)
+  {
+return Double.doubleToLongBits(x);

Review comment:
   Trying to remember.. I think i was just trying to be consistent with the 
other value conversion functions and put it here, but it probably could just 
call it directly as it seems unlike we would change the function





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (08ab82f -> 26bcd47)

2021-01-08 Thread gian
This is an automated email from the ASF dual-hosted git repository.

gian pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 08ab82f  IncrementalIndex Tests and Benchmarks Parametrization (#10593)
 add 26bcd47  Thread-safety for ResponseContext.REGISTERED_KEYS (#9667)

No new revisions were added by this update.

Summary of changes:
 .../druid/query/context/ResponseContext.java   | 34 +++---
 1 file changed, 17 insertions(+), 17 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm merged pull request #9667: Thread-safety for ResponseContext.REGISTERED_KEYS

2021-01-08 Thread GitBox


gianm merged pull request #9667:
URL: https://github.com/apache/druid/pull/9667


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm closed issue #9106: Unbalanced synchronization of ResponseContext.REGISTERED_KEYS

2021-01-08 Thread GitBox


gianm closed issue #9106:
URL: https://github.com/apache/druid/issues/9106


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #9667: Thread-safety for ResponseContext.REGISTERED_KEYS

2021-01-08 Thread GitBox


gianm commented on pull request #9667:
URL: https://github.com/apache/druid/pull/9667#issuecomment-756625948


   Thanks @alex-plekhanov!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] stale[bot] commented on pull request #10043: Add support for REGEXP_REPLACE

2021-01-08 Thread GitBox


stale[bot] commented on pull request #10043:
URL: https://github.com/apache/druid/pull/10043#issuecomment-756624168


   This issue is no longer marked as stale.
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on pull request #10043: Add support for REGEXP_REPLACE

2021-01-08 Thread GitBox


gianm commented on pull request #10043:
URL: https://github.com/apache/druid/pull/10043#issuecomment-756624149


   > one more time please stalebot ๏™
   
   two more times? ๏™‚



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10613: expression filter support for vectorized query engines

2021-01-08 Thread GitBox


gianm commented on a change in pull request #10613:
URL: https://github.com/apache/druid/pull/10613#discussion_r553782719



##
File path: 
core/src/main/java/org/apache/druid/math/expr/BinaryLogicalOperatorExpr.java
##
@@ -74,7 +74,7 @@ public ExprType getOutputType(InputBindingInspector inspector)
   @Override
   public boolean canVectorize(InputBindingInspector inspector)
   {
-return inspector.areNumeric(left, right) && inspector.canVectorize(left, 
right);

Review comment:
   This opens not just STRING but also LONG_ARRAY and DOUBLE_ARRAY. Is that 
intentional (will numeric arrays work)?

##
File path: core/src/main/java/org/apache/druid/math/expr/ExprTypeConversion.java
##
@@ -20,24 +20,29 @@
 package org.apache.druid.math.expr;
 
 import org.apache.druid.java.util.common.IAE;
+import org.apache.druid.java.util.common.Pair;
 
 import javax.annotation.Nullable;
 import java.util.List;
 
 public class ExprTypeConversion
 {
+  public static Pair coerceNull(Expr.InputBindingInspector 
inspector, Expr left, Expr right)

Review comment:
   This method deserves a javadoc.

##
File path: 
processing/src/main/java/org/apache/druid/segment/DimensionHandlerUtils.java
##
@@ -305,17 +305,29 @@ private static ColumnCapabilities 
getEffectiveCapabilities(
 originalCapabilities == null || 
ValueType.COMPLEX.equals(originalCapabilities.getType());
 
 if (type == ValueType.STRING) {
-  if (!forceSingleValue && 
effectiveCapabilites.hasMultipleValues().isMaybeTrue()) {
-return strategyFactory.makeMultiValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeMultiValueDimensionSelector(dimensionSpec)
-);
-  } else {
-return strategyFactory.makeSingleValueDimensionProcessor(
-effectiveCapabilites,
-selectorFactory.makeSingleValueDimensionSelector(dimensionSpec)
-);
+  if (!forceSingleValue) {
+// if column is not dictionary encoded (and not non-existent or 
complex), use object selector
+if (effectiveCapabilites.isDictionaryEncoded().isFalse() ||
+effectiveCapabilites.areDictionaryValuesUnique().isFalse()

Review comment:
   Why does it matter if the dictionary values are unique? (Even if they're 
not, can't we still use a regular string selector?)

##
File path: 
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/vector/GroupByVectorColumnProcessorFactory.java
##
@@ -64,7 +65,16 @@ public GroupByVectorColumnSelector 
makeMultiValueDimensionProcessor(
 ValueType.STRING == capabilities.getType(),
 "groupBy dimension processors must be STRING typed"
 );
-throw new UnsupportedOperationException("Multi-value dimensions not yet 
implemented for vectorized groupBys");
+throw new UnsupportedOperationException("Multi-value dimensions are not 
yet implemented for vectorized groupBys");
+  }
+
+  @Override
+  public GroupByVectorColumnSelector makeObjectProcessor(
+  ColumnCapabilities capabilities,
+  VectorObjectSelector selector
+  )
+  {
+throw new UnsupportedOperationException("Object columns are not yet 
implemented for vectorized groupBys");

Review comment:
   "yet"? Will they ever be?

##
File path: 
processing/src/main/java/org/apache/druid/segment/filter/ExpressionFilter.java
##
@@ -55,6 +69,105 @@ public ExpressionFilter(final Supplier expr, final 
FilterTuning filterTuni
 this.filterTuning = filterTuning;
   }
 
+  @Override
+  public boolean canVectorizeMatcher(ColumnInspector inspector)
+  {
+return expr.get().canVectorize(inspector);
+  }
+
+  @Override
+  public VectorValueMatcher makeVectorMatcher(VectorColumnSelectorFactory 
factory)
+  {
+final Expr theExpr = expr.get();
+
+DruidPredicateFactory predicateFactory = new DruidPredicateFactory()
+{
+  @Override
+  public Predicate makeStringPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidLongPredicate makeLongPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidFloatPredicate makeFloatPredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  @Override
+  public DruidDoublePredicate makeDoublePredicate()
+  {
+return Evals::asBoolean;
+  }
+
+  // The hashcode and equals are to make 
SubclassesMustOverrideEqualsAndHashCodeTest stop complaining..
+  // DruidPredicateFactory doesn't really need equals or hashcode, in fact 
only the 'toString' method is called
+  // when testing equality of DimensionPredicateFilter, so it's the truly 
required method, but even that seems
+  // strange at best.
+  // Rather than tackle removing the annotation and reworking equality 
tests for now, will leave this to refactor
+  // as part of https://github.com/apache/druid/issues/8256 which suggests