[GitHub] gianm commented on a change in pull request #6065: Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask
gianm commented on a change in pull request #6065: Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask URL: https://github.com/apache/incubator-druid/pull/6065#discussion_r205929239 ## File path: indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java ## @@ -416,6 +417,15 @@ public TaskStatus run(final TaskToolbox toolbox) throws Exception ((IngestSegmentFirehoseFactory) firehoseFactory).setTaskToolbox(toolbox); } + if (firehoseFactory instanceof CombiningFirehoseFactory) { Review comment: What if a Combining firehose has another Combining firehose within it? This should be recursive. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm opened a new pull request #6067: FinalizingFieldAccessPostAggregator: Fix serde.
gianm opened a new pull request #6067: FinalizingFieldAccessPostAggregator: Fix serde. URL: https://github.com/apache/incubator-druid/pull/6067 Fixes #6063. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] b-slim closed pull request #4776: Fix ClassNotFoundException in druid-kerberos extension
b-slim closed pull request #4776: Fix ClassNotFoundException in druid-kerberos extension URL: https://github.com/apache/incubator-druid/pull/4776 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/extensions-core/druid-kerberos/src/main/java/io/druid/security/kerberos/DruidKerberosUtil.java b/extensions-core/druid-kerberos/src/main/java/io/druid/security/kerberos/DruidKerberosUtil.java index d171821051a..904f71f81e0 100644 --- a/extensions-core/druid-kerberos/src/main/java/io/druid/security/kerberos/DruidKerberosUtil.java +++ b/extensions-core/druid-kerberos/src/main/java/io/druid/security/kerberos/DruidKerberosUtil.java @@ -98,6 +98,7 @@ public static void authenticateIfRequired(AuthenticationKerberosConfig config) String keytab = config.getKeytab(); if (!Strings.isNullOrEmpty(principal) && !Strings.isNullOrEmpty(keytab)) { Configuration conf = new Configuration(); + conf.setClassLoader(DruidKerberosModule.class.getClassLoader()); conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, "kerberos"); UserGroupInformation.setConfiguration(conf); try { This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jon-wei opened a new pull request #6065: Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask
jon-wei opened a new pull request #6065: Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask URL: https://github.com/apache/incubator-druid/pull/6065 If I define a CombiningFirehose with a IngestSegmentFirehose delegate in an IndexTask: ``` "firehose" : { "type": "combining", "delegates": [ { "type": "ingestSegment", "dataSource" : "updates-tutorial", "interval" : "2018-01-01/2018-01-03" }, { "type" : "local", "baseDir" : "quickstart/", "filter" : "updates-data3.json" } ] }, ``` The following null check in IngestSegmentFirehoseFactory.connect() is triggered: ``` @Override public Firehose connect(InputRowParser inputRowParser, File temporaryDirectory) throws ParseException { log.info("Connecting firehose: dataSource[%s], interval[%s]", dataSource, interval); Preconditions.checkNotNull(taskToolbox, "taskToolbox is not set"); ``` This PR adjusts IndexTask so that it passes the taskToolbox to delegate IngestSegmentFirehoseFactory objects inside a CombiningFirehoseFactory. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #6063: FinalizingFieldAccessPostAggregator cannot be deserialized once it gets decorated
gianm commented on issue #6063: FinalizingFieldAccessPostAggregator cannot be deserialized once it gets decorated URL: https://github.com/apache/incubator-druid/issues/6063#issuecomment-408575833 FinalizingFieldAccessPostAggregator is documented, so I think we should fix it. I'll do a patch. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] josephglanville commented on issue #6066: Sorting rows when rollup is disabled
josephglanville commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-408593384 This is especially useful in cases where no metrics are specified and all rows are dimensions only as leaving roll-up enabled in those cases emits a warning. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] leventov opened a new pull request #6062: Fix a bug in GroupByQueryEngine
leventov opened a new pull request #6062: Fix a bug in GroupByQueryEngine URL: https://github.com/apache/incubator-druid/pull/6062 It was potentially missing many unaggregated buffers. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] leventov commented on issue #6027: Make Parser.parseToMap() to return a mutable Map
leventov commented on issue #6027: Make Parser.parseToMap() to return a mutable Map URL: https://github.com/apache/incubator-druid/pull/6027#issuecomment-408565906 If Spark updates Kryo (BTW I cannot even find on what version it depends), and Spark cluster users update Spark, this change is not needed. But this is not a practical path IMO. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on issue #5566: Applying doubleSum over doubleFirst aggregations in nested groupBy fails
jihoonson commented on issue #5566: Applying doubleSum over doubleFirst aggregations in nested groupBy fails URL: https://github.com/apache/incubator-druid/issues/5566#issuecomment-408567669 Hi all, `DoubleFirstAggregator` returns `SerializedPair`s which is the intermediate format to merging doubleFirst values from multiple historicals. So, to compute some aggregations like `sum` over `doubleFirst`, you first need to _finalize_ the intermediate result to be the final format which contains only the double value. You can use `expression` postAggregator to do this. Here is an example. ```json { "queryType": "groupBy", "dataSource": { "type": "query", "query": { "queryType": "groupBy", "dataSource": "twitter", "aggregations": [ { "type": "doubleFirst", "name": "tweets", "fieldName": "tweets" } ], "postAggregations": [ { "type" : "expression", "name": "tweets_first", "expression" : "tweets" } ], "dimensions": [ "lang" ], "intervals": "2018-07-27/2018-07-28", "granularity": "hour" } }, "intervals": "2018-07-27/2018-07-28", "granularity": "hour", "aggregations": [ { "type": "doubleSum", "name": "value_num", "fieldName": "tweets_first" } ], "dimensions": [] } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] leventov opened a new issue #6064: Duplicate DataSegmentTest
leventov opened a new issue #6064: Duplicate DataSegmentTest URL: https://github.com/apache/incubator-druid/issues/6064 Seems that `io.druid.timeline.DataSegmentTest` is duplicated in druid-api and druid-server modules. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section
hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410013932 @jihoonson, Yes, I am interested in this. Do we have scripts ready for all these java commands or for just one? And can you explain why an " extra parameter for configuration directory" is needed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on issue #6047: Minor change in the "Start up Druid services" Section
jihoonson commented on issue #6047: Minor change in the "Start up Druid services" Section URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410022159 @hpandeycodeit we have the scripts for all Druid components. Please check `${DRUID_HOME}/bin` directory. > And can you explain why an " extra parameter for configuration directory" is needed? Ah, nvm. I thought those scripts doesn't allow to change configuration directory, and so they should be able to allow since the quickstart uses its own configuration files. But it was wrong. You can change the configuration directory path by setting the `DRUID_CONF_DIR` environment variable. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] asdf2014 opened a new pull request #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
asdf2014 opened a new pull request #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory` URL: https://github.com/apache/incubator-druid/pull/6090 Try to fix issues [#6024](https://github.com/apache/incubator-druid/issues/6024) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #5938: URL encode datasources, task ids, authenticator names.
gianm commented on issue #5938: URL encode datasources, task ids, authenticator names. URL: https://github.com/apache/incubator-druid/pull/5938#issuecomment-409604350 @himanshug, re: > looks like most of the fixes are in the calls to makeWorkerUrl(..), is it possible to put the fix inside impl of that method instead ? I made that change and re-pushed. I also consolidated the two makeWorkerURL methods into one helper method in TaskRunnerUtils. @jihoonson, re: > I think this is a very common mistake, so we need another way to prevent such mistakes. I suggest to add a sort of UrlBuilder which enforces encoding to input elements if needed. I considered doing this, but then we have a new problem of how to get callers to use UrlBuilder rather than making their own URLs. So I ended up just adding a urlEncode method to StringUtils and calling that where appropriate. (Note: it isn't always appropriate, since sometimes the format parameter is a literal path string like `"foo/bar"` and in that case we should _not_ url encode.) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] josephglanville commented on issue #5584: Decoupling FirehoseFactory and InputRowParser
josephglanville commented on issue #5584: Decoupling FirehoseFactory and InputRowParser URL: https://github.com/apache/incubator-druid/issues/5584#issuecomment-410030450 One thing that has become apparent is the `StringInputRowParser` is a bit of a bad citizen. It extends `ByteBufferInputRowParser` aka `InputRowParser`. This is not a problem in and of itself and it's completely reasonable to have a input row parser designed to read UTF-8 encoded strings from `ByteBuffer`. The issue is that isn't how it's used, it's primarily used via function with signature the `InputRow parse(String)`, the fact this is `String` rather than `ByteBuffer` not only violates the assumption of the interface but it also means that users of `StringInputRowParser` aren't using the `InputRowParser` interface at all, effectively making any code built on StringInputRowParser not very generic. It seems to me that it would make sense for there to be `StringInputRowParser` that is actually `InputRowParser` and `StringDecoderInputRowParser` which inherits from `ByteBufferInputRowParser` reads data from ByteBuffer decoding UTF-8. I would also move to put add a `reset()` to the `InputRowParser` interface as it's already on `Parser`. Currently there is `startFileFromBeginning()` on `StringInputRowParser` that calls the same function on the inner `Parser`. Seems as this may be useful in general for resetting the state of stateful parsers when opening the next object/input stream. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section
hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410037251 @jihoonson , The scripts are there but doesn't look like it's working: For eg: $./broker.sh start Started broker node (5282) $./broker.sh status STOPPED This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns
clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns URL: https://github.com/apache/incubator-druid/pull/6016#discussion_r207381285 ## File path: processing/src/main/java/io/druid/segment/data/ShapeShiftingColumnSerializer.java ## @@ -0,0 +1,298 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package io.druid.segment.data; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import io.druid.java.util.common.io.smoosh.FileSmoosher; +import io.druid.java.util.common.logger.Logger; +import io.druid.segment.IndexSpec; +import io.druid.segment.data.codecs.CompressedFormEncoder; +import io.druid.segment.data.codecs.FormEncoder; +import io.druid.segment.data.codecs.FormMetrics; +import io.druid.segment.serde.Serializer; +import io.druid.segment.writeout.SegmentWriteOutMedium; +import io.druid.segment.writeout.WriteOutBytes; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.nio.channels.WritableByteChannel; +import java.util.Map; +import java.util.function.Function; + +/** + * Base serializer for {@link ShapeShiftingColumn} implementations, providing most common functionality such as headers, + * value-chunking, encoder selection, and writing out values. + * + * Encoding Selection: + * The intention of this base structure is that implementors of this class will analyze incoming values and aggregate + * facts about the data which matching {@link FormEncoder} implementations might find interesting, while storing raw, + * unencoded values in {@link ShapeShiftingColumnSerializer#currentChunk}. When the threshold of + * {@link ShapeShiftingColumnSerializer#valuesPerChunk} is reached, {@link ShapeShiftingColumnSerializer} will attempt + * to find the "best" encoding by first computing the encoded size with + * {@link FormEncoder#getEncodedSize} and then applying a modifier to scale this value in order to influence behavior + * when sizes are relatively close according to the chosen {@link IndexSpec.ShapeShiftOptimizationTarget}. This + * effectively sways the decision towards using encodings with faster decoding speed or smaller encoded size as + * appropriate. Note that very often the best encoding is unambiguous and these settings don't matter, the nuanced + * differences of behavior of {@link IndexSpec.ShapeShiftOptimizationTarget} mainly come into play when things are + * close. + * + * Implementors need only supply an initialize method to allocate storage for {@code }, an add value method to + * populate {@code }, a reset method to prepare {@code } for the next chunk after a flush, and + * of course, matching {@link FormEncoder} implementations to perform actual value encoding. Generic compression is + * available to {@link FormEncoder} implementations by implementing + * {@link io.druid.segment.data.codecs.CompressibleFormEncoder} and wrapping in a + * {@link io.druid.segment.data.codecs.CompressedFormEncoder} in the codec list passed to the serializer. + * + * layout: + * | version (byte) | headerSize (int) | numValues (int) | numChunks (int) | logValuesPerChunk (byte) | offsetsOutSize (int) | compositionSize (int) | composition | offsets | values | + * + * @param + * @param + */ +public abstract class ShapeShiftingColumnSerializer implements Serializer +{ + /** + * | version (byte) | headerSize (int) | numValues (int) | numChunks (int) | logValuesPerChunk (byte) | offsetsOutSize (int) | compositionSize (int) | + */ + private static final int BASE_HEADER_BYTES = 1 + (3 * Integer.BYTES) + 1 + (2 * Integer.BYTES); + + private static Logger log = new Logger(ShapeShiftingColumnSerializer.class); + + protected final SegmentWriteOutMedium segmentWriteOutMedium; + protected final FormEncoder[] codecs; + protected final byte version; + protected final byte logValuesPerChunk; + protected final int valuesPerChunk; + protected final ByteBuffer intToBytesHelperBuffer; + protected final Map composition; + protected final IndexSpec.ShapeShiftOptimizationTarget optimizationTarget; + protected
[GitHub] gianm commented on issue #6017: Logging of invalid queries
gianm commented on issue #6017: Logging of invalid queries URL: https://github.com/apache/incubator-druid/issues/6017#issuecomment-410080968 Sounds like a good idea -- the main challenge is probably that right now, the request logger expects to get a valid Query object, but we could add a method that accepts invalid queries as byte arrays or something like that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1835: Schema differences of GroupBy vs TopN/TimeSeries return data should be not be present
vogievetsky commented on issue #1835: Schema differences of GroupBy vs TopN/TimeSeries return data should be not be present URL: https://github.com/apache/incubator-druid/issues/1835#issuecomment-410032064 I ed this issue back in the day but since then groupBy v2 drastically improved the situation if you are using Rune (Druid's JSON based language). If you want 100% standard results you should now use DSQL @gianm I think this issue can be closed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] pdeva closed issue #1598: upgrading from 0.6 to 0.7 will result in corruption of existing segment in realtime node
pdeva closed issue #1598: upgrading from 0.6 to 0.7 will result in corruption of existing segment in realtime node URL: https://github.com/apache/incubator-druid/issues/1598 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] pdeva closed issue #1816: Druid 0.8.1 coordinator throws a bunch of exceptions at startup
pdeva closed issue #1816: Druid 0.8.1 coordinator throws a bunch of exceptions at startup URL: https://github.com/apache/incubator-druid/issues/1816 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1449: Allow sorting by timestamp
vogievetsky commented on issue #1449: Allow sorting by timestamp URL: https://github.com/apache/incubator-druid/issues/1449#issuecomment-410036629 This is now possible with GroupBy v2 (and in Druid SQL) @gianm please confirm and close. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on issue #6047: Minor change in the "Start up Druid services" Section
jihoonson commented on issue #6047: Minor change in the "Start up Druid services" Section URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410051112 It should work. Would you please check your broker logs? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #6097: Possible performance issue during DruidSchema metadata refresh
gianm commented on issue #6097: Possible performance issue during DruidSchema metadata refresh URL: https://github.com/apache/incubator-druid/issues/6097#issuecomment-410063021 Hi @egor-ryashin, There have been some improvements to the refreshing mechanism in later versions (try 0.12.1?). It doesn't constantly refresh metadata for historical segments, but it does refresh periodically for realtime segments, which is important because new columns can show up at any time. However, the metadata does take up space, so if you were running kind of close to heap max with `druid.sql.enable=false` before then you'll need more heap in order to enable SQL. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1836: MapCache needs re-written
vogievetsky commented on issue #1836: MapCache needs re-written URL: https://github.com/apache/incubator-druid/issues/1836#issuecomment-410031205 Did https://github.com/apache/incubator-druid/pull/3028 address this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1711: null exception from filter with Query-Time Lookup
vogievetsky commented on issue #1711: null exception from filter with Query-Time Lookup URL: https://github.com/apache/incubator-druid/issues/1711#issuecomment-410034795 Jonathan! Long time no see! I remember when this was filed... good times. @fjy I believe this was resolved (as noted above) by #1578 and #1731 and can be closed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1560: convertSpec tool completely broken
vogievetsky commented on issue #1560: convertSpec tool completely broken URL: https://github.com/apache/incubator-druid/issues/1560#issuecomment-410036215 Is this still an issue? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #6094: Introduce SystemSchema tables (#5989)
vogievetsky commented on issue #6094: Introduce SystemSchema tables (#5989) URL: https://github.com/apache/incubator-druid/pull/6094#issuecomment-410038602 so excited about this one! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] egor-ryashin opened a new issue #6097: Possible performance issue during DruidSchema metadata refresh
egor-ryashin opened a new issue #6097: Possible performance issue during DruidSchema metadata refresh URL: https://github.com/apache/incubator-druid/issues/6097 `Druid Broker 0.11.0` with `druid.sql.enable=true` produces much more garbage than usual, profiling shows a lot of `Object[]` which are referenced by `TreeMap$Entry` which eventually leads to `io.druid.sql.calcite.schema.DruidSchema#refreshSegments`. I speculate it constantly refreshes metadata for every segment. With `-Xmx5G` and 1mln segments it leads to unresponsive service due to GC. Meanwhile it works fine with `druid.sql.enable=false`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns
clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns URL: https://github.com/apache/incubator-druid/pull/6016#discussion_r207370256 ## File path: processing/src/main/java/io/druid/segment/data/ShapeShiftingColumnSerializer.java ## @@ -0,0 +1,298 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package io.druid.segment.data; + +import com.google.common.base.Preconditions; +import com.google.common.collect.Maps; +import com.google.common.primitives.Ints; +import io.druid.java.util.common.io.smoosh.FileSmoosher; +import io.druid.java.util.common.logger.Logger; +import io.druid.segment.IndexSpec; +import io.druid.segment.data.codecs.CompressedFormEncoder; +import io.druid.segment.data.codecs.FormEncoder; +import io.druid.segment.data.codecs.FormMetrics; +import io.druid.segment.serde.Serializer; +import io.druid.segment.writeout.SegmentWriteOutMedium; +import io.druid.segment.writeout.WriteOutBytes; + +import javax.annotation.Nullable; +import java.io.IOException; +import java.nio.ByteBuffer; +import java.nio.ByteOrder; +import java.nio.channels.WritableByteChannel; +import java.util.Map; +import java.util.function.Function; + +/** + * Base serializer for {@link ShapeShiftingColumn} implementations, providing most common functionality such as headers, + * value-chunking, encoder selection, and writing out values. + * + * Encoding Selection: + * The intention of this base structure is that implementors of this class will analyze incoming values and aggregate + * facts about the data which matching {@link FormEncoder} implementations might find interesting, while storing raw, + * unencoded values in {@link ShapeShiftingColumnSerializer#currentChunk}. When the threshold of + * {@link ShapeShiftingColumnSerializer#valuesPerChunk} is reached, {@link ShapeShiftingColumnSerializer} will attempt + * to find the "best" encoding by first computing the encoded size with + * {@link FormEncoder#getEncodedSize} and then applying a modifier to scale this value in order to influence behavior + * when sizes are relatively close according to the chosen {@link IndexSpec.ShapeShiftOptimizationTarget}. This + * effectively sways the decision towards using encodings with faster decoding speed or smaller encoded size as + * appropriate. Note that very often the best encoding is unambiguous and these settings don't matter, the nuanced + * differences of behavior of {@link IndexSpec.ShapeShiftOptimizationTarget} mainly come into play when things are + * close. + * + * Implementors need only supply an initialize method to allocate storage for {@code }, an add value method to + * populate {@code }, a reset method to prepare {@code } for the next chunk after a flush, and + * of course, matching {@link FormEncoder} implementations to perform actual value encoding. Generic compression is + * available to {@link FormEncoder} implementations by implementing + * {@link io.druid.segment.data.codecs.CompressibleFormEncoder} and wrapping in a + * {@link io.druid.segment.data.codecs.CompressedFormEncoder} in the codec list passed to the serializer. + * + * layout: + * | version (byte) | headerSize (int) | numValues (int) | numChunks (int) | logValuesPerChunk (byte) | offsetsOutSize (int) | compositionSize (int) | composition | offsets | values | + * + * @param + * @param + */ +public abstract class ShapeShiftingColumnSerializer implements Serializer +{ + /** + * | version (byte) | headerSize (int) | numValues (int) | numChunks (int) | logValuesPerChunk (byte) | offsetsOutSize (int) | compositionSize (int) | + */ + private static final int BASE_HEADER_BYTES = 1 + (3 * Integer.BYTES) + 1 + (2 * Integer.BYTES); + + private static Logger log = new Logger(ShapeShiftingColumnSerializer.class); + + protected final SegmentWriteOutMedium segmentWriteOutMedium; + protected final FormEncoder[] codecs; + protected final byte version; + protected final byte logValuesPerChunk; + protected final int valuesPerChunk; + protected final ByteBuffer intToBytesHelperBuffer; + protected final Map composition; + protected final IndexSpec.ShapeShiftOptimizationTarget optimizationTarget; + protected
[GitHub] hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section
hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410098043 @jihoonson, I am getting "Not Enough direct memory" error: ``` 1) Not enough direct memory. Please adjust -XX:MaxDirectMemorySize, druid.processing.buffer.sizeBytes, druid.processing.numThreads, or druid.processing.numMergeBuffers: maxDirectMemory[4,294,967,296], memoryNeeded[5,368,709,120] = druid.processing.buffer.sizeBytes[536,870,912] * (druid.processing.numMergeBuffers[2] + druid.processing.numThreads[7] + 1) at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:110) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.Dru idProcessingModule) at io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:110) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.Dru idProcessingModule) while locating io.druid.collections.NonBlockingPool annotated with @io.druid.guice.annotations.Global() for the 2nd parameter of io.druid.query.groupby.GroupByQueryEngine.(GroupByQueryEngine.java:81) at io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:88) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.QueryRunnerFact oryModule) while locating io.druid.query.groupby.GroupByQueryEngine for the 2nd parameter of io.druid.query.groupby.strategy.GroupByStrategyV1.(GroupByStrategyV1.java:77) while locating io.druid.query.groupby.strategy.GroupByStrategyV1 for the 2nd parameter of io.druid.query.groupby.strategy.GroupByStrategySelector.(GroupByStrategySelector.java:43) while locating io.druid.query.groupby.strategy.GroupByStrategySelector for the 1st parameter of io.druid.query.groupby.GroupByQueryQueryToolChest.(GroupByQueryQueryToolChest.java:104) at io.druid.guice.QueryToolChestModule.configure(QueryToolChestModule.java:101) (via modules: com.google.inject.util.Modules$OverrideModule -> com.google.inject.util.Modules$OverrideModule -> io.druid.guice.QueryRunnerFactoryModu le) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] josephglanville edited a comment on issue #5584: Decoupling FirehoseFactory and InputRowParser
josephglanville edited a comment on issue #5584: Decoupling FirehoseFactory and InputRowParser URL: https://github.com/apache/incubator-druid/issues/5584#issuecomment-410030450 One thing that has become apparent is the `StringInputRowParser` is a bit of a bad citizen. It extends `ByteBufferInputRowParser` aka `InputRowParser`. This is not a problem in and of itself and it's completely reasonable to have a input row parser designed to read UTF-8 encoded strings from `ByteBuffer`. The issue is that isn't how it's used, it's primarily used via function with signature the `InputRow parse(String)`, the fact this is `String` rather than `ByteBuffer` not only violates the assumption of the interface but it also means that users of `StringInputRowParser` aren't using the `InputRowParser` interface at all, effectively making any code built on `StringInputRowParser` not very generic. It seems to me that it would make sense for there to be `StringInputRowParser` that is actually `InputRowParser` and `StringDecoderInputRowParser` which inherits from `ByteBufferInputRowParser` reads data from ByteBuffer decoding UTF-8. I would also move to put add a `reset()` to the `InputRowParser` interface as it's already on `Parser`. Currently there is `startFileFromBeginning()` on `StringInputRowParser` that calls the same function on the inner `Parser`. Seems as this may be useful in general for resetting the state of stateful parsers when opening the next object/input stream. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #6096: Windows Ubuntu Bash and Task Directory Name
vogievetsky commented on issue #6096: Windows Ubuntu Bash and Task Directory Name URL: https://github.com/apache/incubator-druid/issues/6096#issuecomment-410030657 I think it is a good point about directory names and should be considered for the project @gianm does it make sense to avoid contentious characters like `:`? For now, Rathish, I can tell you that Windows is solidly out of scope for Druid (and it has never been). Please move to a proper Linux environment. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1692: Why Deep storage No data?but realtime node can find 3413610 num data
vogievetsky commented on issue #1692: Why Deep storage No data?but realtime node can find 3413610 num data URL: https://github.com/apache/incubator-druid/issues/1692#issuecomment-410035126 @zengzhihai110 can this issue be closed? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #6076: Mutual TLS support
gianm commented on issue #6076: Mutual TLS support URL: https://github.com/apache/incubator-druid/pull/6076#issuecomment-410063706 @jon-wei, thanks, I think it is really valuable especially as we add more features to the feature. (I heard you like features so we put features in your features?) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1816: Druid 0.8.1 coordinator throws a bunch of exceptions at startup
vogievetsky commented on issue #1816: Druid 0.8.1 coordinator throws a bunch of exceptions at startup URL: https://github.com/apache/incubator-druid/issues/1816#issuecomment-410033063 This issue is super pretty old and does not look actionable (Who even knows where to get Druid 0.8.1 these days). @fjy I suggest we close it. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #6066: Sorting rows when rollup is disabled
gianm commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410039394 IMO the speed hit on ingestion is acceptable. It is small, and performance is not really worse than the rollup case. Adding sorting will incur a performance cost no matter what: either it has to happen continuously or it has to happen in one big shot before persisting. Any idea where the speed hit on groupBys comes from? Seems strange that other query types wouldn't care but groupBy does. At any rate, I am not super worried about it, since in most clusters QueryableIndex storage/perf is much more serious issue than IncrementalIndex query perf. If we end up incurring a small speed hit on IncrementalIndex queries in order to get better compression on QueryableIndexes, then that is a good tradeoff. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1204: Define granularities in UTC
vogievetsky commented on issue #1204: Define granularities in UTC URL: https://github.com/apache/incubator-druid/issues/1204#issuecomment-410039263 Is this still an issue after https://github.com/apache/incubator-druid/pull/4611 ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson closed issue #5934: eccn
jihoonson closed issue #5934: eccn URL: https://github.com/apache/incubator-druid/issues/5934 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #6066: Sorting rows when rollup is disabled
gianm commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410052884 > I'm guessing Druid does the rollup by sorting the dimensions in the order in the ingestion spec, which would indicate that dimension ordering would matter. Would ordering dims by based on cardinality help further? I'm guessing low->high card would be better than high->low, but idk enough about LZ(4?) compression to have a good intuition on this. Yep it does sort in the order that you provide them in the ingestion spec. LZ4 works by finding and eliminating redundancies in a sliding window, so it works best if you have a lot of local redundancy. CONCISE / Roaring also work better in that case (they can encode runs). And selecting values out of columns is faster if you are selecting rows that are close together. If you mostly care about size, I think you should do best starting with a lower cardinality column, especially one that a lot of other columns will have some kind of dependency on (like if you have a 'country' column and each region, city, user, etc will generally just be in one country). That way you're maximizing compression for as many columns as possible. If you care about retrieval speed too then it helps to think about locality: if you have a column you often filter on, you want it first/early in the sort order. Sometimes a column satisfies both of these, and in that case life is good (e.g. "tenant_id" for a multi-tenant-one-table design). In the case I mentioned (where we saw dramatically better compression after sorting) it was an e-commerce dataset, and we sorted rows first on product brand, then on product id. Btw, this line of thinking is most effective if your queryGranularity is coarse, since Druid always sorts time _first_ and then your other dimensions. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] vogievetsky commented on issue #1827: JsonConfigurator does not escape quoted options
vogievetsky commented on issue #1827: JsonConfigurator does not escape quoted options URL: https://github.com/apache/incubator-druid/issues/1827#issuecomment-410032588 Does https://github.com/apache/incubator-druid/pull/2033 address this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] mistercrunch commented on issue #6066: Sorting rows when rollup is disabled
mistercrunch commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410043357 I'm guessing Druid does the rollup by sorting the dimensions in the order in the ingestion spec, which would indicate that dimension ordering would matter. Would ordering dims by based on cardinality help further? I'm guessing low->high card would be better than high->low, but idk enough about LZ(4?) compression to have a good intuition on this. I'm also unclear on whether low->high card dim ordering would make it harder/easier on the sorting algorithm. (?) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #5584: Decoupling FirehoseFactory and InputRowParser
gianm commented on issue #5584: Decoupling FirehoseFactory and InputRowParser URL: https://github.com/apache/incubator-druid/issues/5584#issuecomment-410045743 Yeah -- StringInputRowParser is a bit of a special snowflake. Maybe doing this feature is the motivation we need to rethink how that all works. To add to the points you raised, there's also this gross code in TransformSpec (which decorates InputRowParsers with expression-based transforms): ``` if (parser instanceof StringInputRowParser) { // Hack to support the fact that some callers use special methods in StringInputRowParser, such as // parse(String) and startFileFromBeginning. return (InputRowParser) new TransformingStringInputRowParser( parser.getParseSpec(), ((StringInputRowParser) parser).getEncoding(), this ); } else { return new TransformingInputRowParser<>(parser, this); } ``` And there's also the warty fact that ParseSpec's `makeParser()` method is actually only used by StringInputRowParser (and a couple of extension parsers that work by converting inputs to JSON and then parsing them as strings). Other parsers call `getTimestampSpec` and `getDimensionsSpec`, but they don't call `makeParser`. Some parsers, however, can extract useful things from certain types of ParseSpecs. The way this is done is weird, leading to code like this in the Avro extension to yank out an optional FlattenSpec: ``` final JSONPathSpec flattenSpec; if (parseSpec != null && (parseSpec instanceof AvroParseSpec)) { flattenSpec = ((AvroParseSpec) parseSpec).getFlattenSpec(); } else { flattenSpec = JSONPathSpec.DEFAULT; } ``` The only reason StringInputRowParser doesn't have to do something like this is that the method _it_ needs (makeParser) has been granted special status as a member of the ParseSpec interface (even though many ParseSpecs don't implement it, they just throw exceptions). If we are wanting to be ambitious (I don't see why not!) then I think we can make nicer, clearer interfaces by rethinking how this all works. In a backwards compatible way, since we don't want all of our users to have to rewrite their ingestion specs (which will probably be the tricky part). @josephglanville what ambition level are you interested in having here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled
clintropolis commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410053497 @gianm Cool, I'll go ahead and make a PR as soon as I get some tests finished up. I'm not certain the cause of slower group-by performance, group by v2 doesn't seem to have any special additional interaction with facts holder. Is this worth looking into further? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns
clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns URL: https://github.com/apache/incubator-druid/pull/6016#discussion_r207370004 ## File path: processing/src/main/java/io/druid/segment/data/codecs/ints/IntFormMetrics.java ## @@ -0,0 +1,166 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package io.druid.segment.data.codecs.ints; + +import io.druid.segment.IndexSpec; +import io.druid.segment.data.codecs.FormMetrics; + +/** + * Aggregates statistics about blocks of integer values, such as total number of values processed, minimum and maximum + * values encountered, if the chunk is constant or all zeros, and various facts about data which is repeated more than + * twice ('runs') including number of distinct runs, longest run length, and total number of runs. This information is + * collected by {@link io.druid.segment.data.ShapeShiftingColumnarIntsSerializer} which processing row values, and is + * provided to {@link IntFormEncoder} implementations to do anything from estimate encoded size to influencing how + * {@link io.druid.segment.data.ShapeShiftingColumnarIntsSerializer} decides whether or not to employ that particular + * encoding. + */ +public class IntFormMetrics extends FormMetrics +{ + private int minValue = Integer.MAX_VALUE; + private int maxValue = Integer.MIN_VALUE; + private int numRunValues = 0; + private int numDistinctRuns = 0; + private int longestRun; + private int currentRun; + private int previousValue; + private int numValues = 0; + private boolean isFirstValue = true; + Review comment: Javadocs were on getters, re-arranged code so reader will have to scan over those before getting to `processNextRow` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] hpandeycodeit edited a comment on issue #6047: Minor change in the "Start up Druid services" Section
hpandeycodeit edited a comment on issue #6047: Minor change in the "Start up Druid services" Section URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410037251 @jihoonson , The scripts are there but doesn't look like it's working: For eg: $./broker.sh start Started broker node (5282) $./broker.sh status STOPPED So it's not starting. Anything I am missing here to start these? I just want to see how it works before updating the doc. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei commented on a change in pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
chengchengpei commented on a change in pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072#discussion_r207408616 ## File path: extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java ## @@ -41,9 +42,11 @@ public DerivativeDataSourceMetadata( @JsonProperty("metrics") Set metrics ) { -this.baseDataSource = Preconditions.checkNotNull(baseDataSource, "baseDataSource cannot be null. This is not a valid DerivativeDataSourceMetadata."); this.dimensions = Preconditions.checkNotNull(dimensions, "dimensions cannot be null. This is not a valid DerivativeDataSourceMetadata."); this.metrics = Preconditions.checkNotNull(metrics, "metrics cannot be null. This is not a valid DerivativeDataSourceMetadata."); + +Preconditions.checkArgument(!Strings.isNullOrEmpty(baseDataSource), "baseDataSource cannot be null or empty. Please provide a baseDataSource."); +this.baseDataSource = baseDataSource; Review comment: @jihoonson fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei closed pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
chengchengpei closed pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java b/extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java index ed102e354e0..3a82672e14d 100644 --- a/extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java +++ b/extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java @@ -23,6 +23,7 @@ import com.fasterxml.jackson.annotation.JsonProperty; import com.google.common.base.Preconditions; import com.google.common.collect.Sets; +import com.google.common.base.Strings; import io.druid.indexing.overlord.DataSourceMetadata; import java.util.Objects; @@ -41,7 +42,9 @@ public DerivativeDataSourceMetadata( @JsonProperty("metrics") Set metrics ) { -this.baseDataSource = Preconditions.checkNotNull(baseDataSource, "baseDataSource cannot be null. This is not a valid DerivativeDataSourceMetadata."); +Preconditions.checkArgument(!Strings.isNullOrEmpty(baseDataSource), "baseDataSource cannot be null or empty. Please provide a baseDataSource."); +this.baseDataSource = baseDataSource; + this.dimensions = Preconditions.checkNotNull(dimensions, "dimensions cannot be null. This is not a valid DerivativeDataSourceMetadata."); this.metrics = Preconditions.checkNotNull(metrics, "metrics cannot be null. This is not a valid DerivativeDataSourceMetadata."); } diff --git a/extensions-contrib/materialized-view-maintenance/src/test/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadataTest.java b/extensions-contrib/materialized-view-maintenance/src/test/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadataTest.java new file mode 100644 index 000..d62ea385d31 --- /dev/null +++ b/extensions-contrib/materialized-view-maintenance/src/test/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadataTest.java @@ -0,0 +1,61 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. The ASF licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package io.druid.indexing.materializedview; + +import com.google.common.collect.Sets; +import org.hamcrest.CoreMatchers; +import org.junit.Rule; +import org.junit.Test; +import org.junit.rules.ExpectedException; + +import java.util.Set; + + +public class DerivativeDataSourceMetadataTest +{ + @Rule + public ExpectedException expectedException = ExpectedException.none(); + + @Test + public void testEmptyBaseDataSource() throws Exception + { + expectedException.expect(CoreMatchers.instanceOf(IllegalArgumentException.class)); +expectedException.expectMessage( +"baseDataSource cannot be null or empty. Please provide a baseDataSource." +); +String baseDataSource = ""; +Set dims = Sets.newHashSet("dim1", "dim2", "dim3"); +Set metrics = Sets.newHashSet("cost"); +DerivativeDataSourceMetadata metadata = new DerivativeDataSourceMetadata(baseDataSource, dims, metrics); + } + + @Test + public void testNullBaseDataSource() throws Exception + { + expectedException.expect(CoreMatchers.instanceOf(IllegalArgumentException.class)); +expectedException.expectMessage( +"baseDataSource cannot be null or empty. Please provide a baseDataSource." +); +String baseDataSource = null; +Set dims = Sets.newHashSet("dim1", "dim2", "dim3"); +Set metrics = Sets.newHashSet("cost"); +DerivativeDataSourceMetadata metadata = new DerivativeDataSourceMetadata(baseDataSource, dims, metrics); + } +} This is an automated message from the
[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled
clintropolis commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410153475 I ran some additional benchmarks after realizing that the generated rows from previous benchmarks were rows with no opportunity for actual rollup to occur (all segments were approximately the same size for the numbers above). Here are timeseries benches with moderate rollup opportunity: ``` Benchmark(numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.A avgt 25 663840.128 ± 26363.127 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup75 basic.A avgt 25 679784.179 ± 81577.842 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup75 basic.A avgt 25 62446.589 ± 2224.296 us/op no-rollup: size [22387432] bytes. ordered-no-rollup: size [18195470] bytes. rollup: size [2206430] bytes. ``` and heavy rollup potential: ``` Benchmark(numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.A avgt 25 653316.845 ± 31964.338 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup75 basic.A avgt 25 769623.711 ± 12299.182 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup75 basic.A avgt 256545.777 ± 607.087 us/op no-rollup: size [22383561] bytes. ordered-no-rollup: size [16900327] bytes. rollup: size [237206] bytes. ``` and TopN: moderate rollup: ``` Benchmark (numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) (threshold) Mode Cnt Score Error Units TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.A 10 avgt 25 893805.325 ± 9592.710 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.A 10 avgt 25 898036.822 ± 8052.554 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.A 10 avgt 25 86100.936 ± 2844.073 us/op no-rollup: size [22387432] bytes. ordered-no-rollup: size [18195470] bytes. rollup: size [2206430] bytes. ``` heavy rollup: ``` Benchmark (numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) (threshold) Mode Cnt Score Error Units TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.A 10 avgt 25 888967.034 ± 25098.293 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.A 10 avgt 25 987568.305 ± 50955.718 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.A 10 avgt 258820.929 ± 699.516 us/op no-rollup: size [22383561] bytes. ordered-no-rollup: size [16900327] bytes. rollup: size [237206] bytes. ``` It would appear that performance difference is more notable when the `Deque` are deeper, at least for topN and timeseries, since previous benchmarks were basically comparing flat maps with the same number of keys and single element `Deque`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory` URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410119637 Hi, @jihoonson . Yep, that's right. :+1: Perhaps something in this PR is still useful, for example, use `channel.isOpen()` to check that the channel is open before call `channel.close()` method and use `ExpectedException` instead of `Assert.assertTrue`. Do you want to close this, or change the title and keep those useful changes? :sweat_smile: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis edited a comment on issue #6066: Sorting rows when rollup is disabled
clintropolis edited a comment on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410153475 I ran some additional benchmarks after realizing that the generated rows from previous benchmarks were rows with no opportunity for actual rollup to occur (all segments were approximately the same size for the numbers above). Here are timeseries benches with moderate rollup opportunity: ``` Benchmark(numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.A avgt 25 663840.128 ± 26363.127 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup75 basic.A avgt 25 679784.179 ± 81577.842 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup75 basic.A avgt 25 62446.589 ± 2224.296 us/op no-rollup: size [22387432] bytes. ordered-no-rollup: size [18195470] bytes. rollup: size [2206430] bytes. ``` and heavy rollup potential: ``` Benchmark(numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) Mode Cnt Score Error Units TimeseriesBenchmark.querySingleIncrementalIndex 1 no-rollup75 basic.A avgt 25 653316.845 ± 31964.338 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup75 basic.A avgt 25 769623.711 ± 12299.182 us/op TimeseriesBenchmark.querySingleIncrementalIndex 1 rollup75 basic.A avgt 256545.777 ± 607.087 us/op no-rollup: size [22383561] bytes. ordered-no-rollup: size [16900327] bytes. rollup: size [237206] bytes. ``` and TopN: moderate rollup: ``` Benchmark (numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) (threshold) Mode Cnt Score Error Units TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.A 10 avgt 25 893805.325 ± 9592.710 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.A 10 avgt 25 898036.822 ± 8052.554 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.A 10 avgt 25 86100.936 ± 2844.073 us/op no-rollup: size [22387432] bytes. ordered-no-rollup: size [18195470] bytes. rollup: size [2206430] bytes. ``` heavy rollup: ``` Benchmark (numSegments) (rollupSchema) (rowsPerSegment) (schemaAndQuery) (threshold) Mode Cnt Score Error Units TopNBenchmark.querySingleIncrementalIndex 1 no-rollup 75 basic.A 10 avgt 25 888967.034 ± 25098.293 us/op TopNBenchmark.querySingleIncrementalIndex 1 ordered-no-rollup 75 basic.A 10 avgt 25 987568.305 ± 50955.718 us/op TopNBenchmark.querySingleIncrementalIndex 1 rollup 75 basic.A 10 avgt 258820.929 ± 699.516 us/op no-rollup: size [22383561] bytes. ordered-no-rollup: size [16900327] bytes. rollup: size [237206] bytes. ``` It would appear that performance difference is more notable when the `Deque` are deeper, at least for topN and timeseries, since previous benchmarks were basically comparing flat maps with the same number of keys and single element `Deque`. Size savings will likely vary quite wildly based on dimension order and correlated to how effective rollup would be if were enabled at default millisecond granularity. In this case, with a few low cardinality dimensions and 1-10k events per timestamp sizes were 20-25% smaller. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] asdf2014 edited a comment on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
asdf2014 edited a comment on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory` URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410119637 Hi, @jihoonson . Yep, that's right. :+1: Perhaps something in this PR is still useful, for example, use `channel.isOpen()` to check that the channel is open before call `channel.close()` method and use `ExpectedException` instead of `Assert.assertTrue`. Do you want me to close this, or change the title and keep those useful changes, or patch [these changes](https://github.com/apache/incubator-druid/compare/master...jihoonson:fix-netty-client?expand=1) to this PR? :sweat_smile: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072 https://github.com/apache/incubator-druid/issues/6071 The changes validate that baseDataSource is not a empty string. similar to https://github.com/apache/incubator-druid/pull/5785 ready for review now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072 https://github.com/apache/incubator-druid/issues/6071 The changes validate that baseDataSource is not a empty string. similar to https://github.com/apache/incubator-druid/pull/5785 ready for review now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory` URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410121501 @jihoonson Okay, I keep those changes, and patch them to get the job done. Thanks for your comments. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jacktomcat closed issue #6098: Inconsistencies in the result of the quantile aggregator
jacktomcat closed issue #6098: Inconsistencies in the result of the quantile aggregator URL: https://github.com/apache/incubator-druid/issues/6098 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory` URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410165818 Hi, @jihoonson . After patching these changes, I found the same problem still exist in `ChannelResourceFactory`, you can run the `JankyServersTest#testHttpsEchoServer` test case to reproduce it. So, i added another anonymous class of `SimpleChannelUpstreamHandler` to fix it. PTAL. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory` URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410170672 BTW, when i tried to use `ExpectedException` instead of `Assert.assertTrue` in `JankyServersTest` for other test cases, i realized the `JankyServersTest#isChannelClosedException` situation was hard to convert to `ExpectedException` way, so i created a sub-calss of `TypeSafeMatcher` to resolve the problem. ```java private static class CauseMatcher extends TypeSafeMatcher { private final Class expectedType; private final String expectedMessage; private final boolean isRegex; public CauseMatcher(Class expectedType, String expectedMessage) { this.expectedType = expectedType; this.expectedMessage = expectedMessage; this.isRegex = false; } public CauseMatcher(Class expectedType, String expectedMessage, boolean isRegex) { this.expectedType = expectedType; this.expectedMessage = expectedMessage; this.isRegex = isRegex; } @Override protected boolean matchesSafely(Throwable item) { if (item == null || item.getClass() == null || item.getMessage() == null) { return false; } if (!item.getClass().isAssignableFrom(expectedType)) { return false; } if (isRegex) { return Pattern.compile(expectedMessage).matcher(item.getMessage()).find(); } else { return item.getMessage().contains(expectedMessage); } } @Override public void describeTo(Description description) { description.appendText("expects type is ") .appendValue(expectedType) .appendText(" and message is ") .appendValue(expectedMessage); } } ``` Then, use the following code to express the `JankyServersTest#isChannelClosedException` logic. ```java expectedException.expectCause( anyOf( new CauseMatcher(ChannelException.class, "Faulty channel in resource pool"), new CauseMatcher(IOException.class, ".*Connection reset by peer.*", true) ) ); ``` However, this change will add too many new lines of code. If just to solve this single situation would be a huge cost, but it can be used as a common util class for other test cases, then it might be worth adding. What do you think? @jihoonson This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] dyanarose opened a new issue #6100: DOCS: ingestion spec documentation should be expanded
dyanarose opened a new issue #6100: DOCS: ingestion spec documentation should be expanded URL: https://github.com/apache/incubator-druid/issues/6100 - The dataSchema would be much clearer if an example of the raw data was included. - The importance of the metric spec would be clearer if there was a dedicated section explaining how it's used when querying along with example queries that are made possible by the example metrics spec (some information is spread out in the design document and the schema-design document, but the metricSpec should have it's own dedicated and complete documentation) https://github.com/apache/incubator-druid/blob/master/docs/content/ingestion/index.md This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] asdf2014 commented on issue #6099: Inconsistencies in the result of the quantile aggregator
asdf2014 commented on issue #6099: Inconsistencies in the result of the quantile aggregator URL: https://github.com/apache/incubator-druid/issues/6099#issuecomment-410201965 Hi, @jacktomcat . This result is obtained because the quantile aggregator uses an approximate algorithm called `sketch`. You can find it in the Druid [doc](http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html). A [parper](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf) and [blog](https://metamarkets.com/2013/histograms/) are also available in the document to learn the algorithm deeply. BTW, there is a [Table](https://datasketches.github.io/docs/Quantiles/QuantilesAccuracy.html) Guide for Quantiles DoublesSketch Size in Bytes and Approximate Error, which might help you. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled
clintropolis commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410183701 Ok, so I had to know, so I went ahead and did benchmarks if we do the other way and sort at persist time. no rollup opportunity: ``` Benchmark(rollup) (rowsPerSegment) (schema) Mode Cnt ScoreError Units IndexPersistBenchmark.persistV9 true 75000 basic avgt 25 499315.212 ± 154036.971 us/op IndexPersistBenchmark.persistV9 false 75000 basic avgt 25 449792.742 ± 28218.504 us/op IndexPersistBenchmark.persistV9 false (ordered) 75000 basic avgt 25 508051.563 ± 63033.662 us/op all size: [3038874] bytes. ``` moderate rollup opportunity: ``` Benchmark(rollup) (rowsPerSegment) (schema) Mode Cnt Score Error Units IndexPersistBenchmark.persistV9 true 75000 basic avgt 25 406840.576 ± 20732.769 us/op IndexPersistBenchmark.persistV9 false 75000 basic avgt 25 431725.214 ± 18793.693 us/op IndexPersistBenchmark.persistV9 false (ordered) 75000 basic avgt 25 494056.572 ± 34396.770 us/op rollup: size [2285574] bytes. no-rollup: size [2741399] bytes. ordered-no-rollup: size [2516639] bytes. ``` more rollup: ``` Benchmark(rollup) (rowsPerSegment) (schema) Mode Cnt Score Error Units IndexPersistBenchmark.persistV9 true 75000 basic avgt 25 338251.339 ± 22031.319 us/op IndexPersistBenchmark.persistV9 false 75000 basic avgt 25 443272.327 ± 25099.425 us/op IndexPersistBenchmark.persistV9 false (ordered) 75000 basic avgt 25 552234.263 ± 41889.207 us/op rollup: size [1755456] bytes. no-rollup: size [2741017] bytes. ordered-no-rollup: size [2346649] bytes. ``` I don't have strong feelings about the best way to do this, persist performance cost looks to be on the range of 15-20% slower here. Maybe better to sort at persist time to not risk impact to query performance? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] asdf2014 commented on issue #6047: Minor change in the "Start up Druid services" Section
asdf2014 commented on issue #6047: Minor change in the "Start up Druid services" Section URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410184869 Hi, @hpandeycodeit . I also encountered this situation. There is [formula](http://druid.io/docs/latest/configuration/broker.html#processing) for it in the Druid doc. In my experience with this problem, you may need to add `-XX:MaxDirectMemorySize` option to both of those `conf/druid/historical/jvm.config` and `conf/druid/broker/jvm.config` files :sweat_smile: This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] aryaflow commented on issue #5221: Support Hadoop batch ingestion for druid-azure-extensions
aryaflow commented on issue #5221: Support Hadoop batch ingestion for druid-azure-extensions URL: https://github.com/apache/incubator-druid/pull/5221#issuecomment-410241234 @hoesler hadoop-azure is not added by default. But it's needed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] robertervin opened a new issue #6101: Druid InDimFilter Fails on Empty Values
robertervin opened a new issue #6101: Druid InDimFilter Fails on Empty Values URL: https://github.com/apache/incubator-druid/issues/6101 ### Use Case: We dynamically build Druid queries by allowing the user to filter the UI. Sometimes this results in the user filtering out all options (sometimes unintentionally), and is something we want to allow for. ### Problem In https://github.com/apache/incubator-druid/blob/04ea3c9f8c1f5ea34b023217bd709509ace4d30d/processing/src/main/java/io/druid/query/filter/InDimFilter.java#L78 Druid fails when filtering on empty values, though I see no reason why it should since the result should just be empty. e.g. in pseudocode, ``` mydatasource.filter(mycolumn in []) = [] ``` Is there a better reason for this, or can I submit a PR to remove that restriction? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] aryaflow commented on issue #5221: Support Hadoop batch ingestion for druid-azure-extensions
aryaflow commented on issue #5221: Support Hadoop batch ingestion for druid-azure-extensions URL: https://github.com/apache/incubator-druid/pull/5221#issuecomment-410234843 Thanks for the feature. I tried this in a cluster and succeeded to make it work... but I had to add wasb to JobHelper.java as well. `} else if ("wasbs".equals(type)) { segmentLocURI = URI.create(loadSpec.get("path").toString()); }` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] bohemia420 commented on issue #5150: Druid-parquet-extensions fails on timestamps (stored as INT96) in parquet files
bohemia420 commented on issue #5150: Druid-parquet-extensions fails on timestamps (stored as INT96) in parquet files URL: https://github.com/apache/incubator-druid/issues/5150#issuecomment-410239779 is this resolved? @amalakar how did u remove avro converter? @gianm how does one change the parquet reading strategy? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072#issuecomment-410242447 @jihoonson NP. How can I become committer? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] aoeiuvb opened a new issue #6102: "Cannot have same delimiter and list delimiter of \u0001"
aoeiuvb opened a new issue #6102: "Cannot have same delimiter and list delimiter of \u0001" URL: https://github.com/apache/incubator-druid/issues/6102 When I used hadoop to absorb data, I used DelimitedParser to parse the row. The delimiter I set was **\u0001**, but the following error was reported to me:"**Cannot have the same delimiter and list delimiter of \u0001**". I looked at the source code of the DelimitedParser and found that in the constructor, the delimiter and listDelimiter are checked to see if they are equal, and if so, an error will be reported.I did not set the value of listDelimiter, which is also the default value: **\u0001**, so this error was reported.To solve this problem, I specifically specified other values in the listDelimiter so that the program could pass. May I ask if there is any problem with the design of this piece? https://user-images.githubusercontent.com/15936294/43644856-be5ec84a-9762-11e8-8cb1-899a8bb02098.png;> This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] leventov commented on a change in pull request #5957: Renamed GenericColumnSerializer to ColumnSerializer; 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; Fixe
leventov commented on a change in pull request #5957: Renamed GenericColumnSerializer to ColumnSerializer; 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; Fixed a bug in SingleStringInputDimensionSelector; misc refinements URL: https://github.com/apache/incubator-druid/pull/5957#discussion_r207604065 ## File path: benchmarks/src/main/java/io/druid/benchmark/package-info.java ## @@ -17,17 +17,7 @@ * under the License. */ -package io.druid.segment; +@EverythingIsNonnullByDefault +package io.druid.benchmark; -import io.druid.guice.annotations.ExtensionPoint; -import io.druid.segment.serde.Serializer; - -import java.io.IOException; - -@ExtensionPoint -public interface GenericColumnSerializer extends Serializer Review comment: Next release is a major release (0.12.x -> 0.13). I don't think we should fall into conservatism (not fixing things because users may depend on them) until Druid 1.0, and clean up APIs eagerly until before that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] leventov commented on a change in pull request #5957: Renamed GenericColumnSerializer to ColumnSerializer; 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; Fixe
leventov commented on a change in pull request #5957: Renamed GenericColumnSerializer to ColumnSerializer; 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; Fixed a bug in SingleStringInputDimensionSelector; misc refinements URL: https://github.com/apache/incubator-druid/pull/5957#discussion_r207604065 ## File path: benchmarks/src/main/java/io/druid/benchmark/package-info.java ## @@ -17,17 +17,7 @@ * under the License. */ -package io.druid.segment; +@EverythingIsNonnullByDefault +package io.druid.benchmark; -import io.druid.guice.annotations.ExtensionPoint; -import io.druid.segment.serde.Serializer; - -import java.io.IOException; - -@ExtensionPoint -public interface GenericColumnSerializer extends Serializer Review comment: Next release is a major release (0.12.x -> 0.13). I don't think we should fall into conservatism (not fixing things because users may depend on them) until Druid 1.0, and clean up APIs eagerly before that. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] himanshug commented on issue #6066: Sorting rows when rollup is disabled
himanshug commented on issue #6066: Sorting rows when rollup is disabled URL: https://github.com/apache/incubator-druid/issues/6066#issuecomment-410326897 @clintropolis if you do end up further benchmarking sorting at persist time, then you can also consider reordering the dimensions from low to high cardinality as they are known at that point to potentially improve on size. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] yuppie-flu opened a new pull request #6070: Use JUnit TemporaryFolder rule instead of system temp folder
yuppie-flu opened a new pull request #6070: Use JUnit TemporaryFolder rule instead of system temp folder URL: https://github.com/apache/incubator-druid/pull/6070 This should help to avoid transient failures, mentioned in #6013 ([comment](https://github.com/apache/incubator-druid/issues/6013#issuecomment-406821311)) when running tests in parallel. Otherwise several tests use the same /tmp working folder. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #5903: Druid allows adding empty string dataSources (it shouldn't)
gianm commented on issue #5903: Druid allows adding empty string dataSources (it shouldn't) URL: https://github.com/apache/incubator-druid/issues/5903#issuecomment-408942682 Fixed by #5785. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm closed pull request #6059: CompactionTask: Reject empty intervals on construction.
gianm closed pull request #6059: CompactionTask: Reject empty intervals on construction. URL: https://github.com/apache/incubator-druid/pull/6059 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/content/ingestion/tasks.md b/docs/content/ingestion/tasks.md index 7d63884e2ee..dace2978dd5 100644 --- a/docs/content/ingestion/tasks.md +++ b/docs/content/ingestion/tasks.md @@ -279,13 +279,16 @@ An example of compaction task is This compaction task reads _all segments_ of the interval `2017-01-01/2018-01-01` and results in new segments. Note that intervals of the input segments are merged into a single interval of `2017-01-01/2018-01-01` no matter what the segmentGranularity was. -To controll the number of result segments, you can set `targetPartitionSize` or `numShards`. See [indexTuningConfig](#tuningconfig) for more details. +To control the number of result segments, you can set `targetPartitionSize` or `numShards`. See [indexTuningConfig](#tuningconfig) for more details. To merge each day's worth of data into separate segments, you can submit multiple `compact` tasks, one for each day. They will run in parallel. A compaction task internally generates an `index` task spec for performing compaction work with some fixed parameters. For example, its `firehose` is always the [ingestSegmentSpec](./firehose.html), and `dimensionsSpec` and `metricsSpec` include all dimensions and metrics of the input segments by default. +Compaction tasks will exit with a failure status code, without doing anything, if the interval you specify has no +data segments loaded in it (or if the interval you specify is empty). + The output segment can have different metadata from the input segments unless all input segments have the same metadata. - Dimensions: since Druid supports schema change, the dimensions can be different across segments even if they are a part of the same dataSource. diff --git a/indexing-service/src/main/java/io/druid/indexing/common/task/CompactionTask.java b/indexing-service/src/main/java/io/druid/indexing/common/task/CompactionTask.java index 53855574db2..f81999f15f2 100644 --- a/indexing-service/src/main/java/io/druid/indexing/common/task/CompactionTask.java +++ b/indexing-service/src/main/java/io/druid/indexing/common/task/CompactionTask.java @@ -49,6 +49,7 @@ import io.druid.indexing.common.task.IndexTask.IndexIngestionSpec; import io.druid.indexing.common.task.IndexTask.IndexTuningConfig; import io.druid.indexing.firehose.IngestSegmentFirehoseFactory; +import io.druid.java.util.common.IAE; import io.druid.java.util.common.ISE; import io.druid.java.util.common.JodaUtils; import io.druid.java.util.common.Pair; @@ -136,6 +137,10 @@ public CompactionTask( Preconditions.checkArgument(interval != null || segments != null, "interval or segments should be specified"); Preconditions.checkArgument(interval == null || segments == null, "one of interval and segments should be null"); +if (interval != null && interval.toDurationMillis() == 0) { + throw new IAE("Interval[%s] is empty, must specify a nonempty interval", interval); +} + this.interval = interval; this.segments = segments; this.dimensionsSpec = dimensionsSpec; @@ -225,7 +230,7 @@ public TaskStatus run(final TaskToolbox toolbox) throws Exception } if (indexTaskSpec == null) { - log.warn("Failed to generate compaction spec"); + log.warn("Interval[%s] has no segments, nothing to do.", interval); return TaskStatus.failure(getId()); } else { final String json = jsonMapper.writerWithDefaultPrettyPrinter().writeValueAsString(indexTaskSpec); @@ -237,7 +242,7 @@ public TaskStatus run(final TaskToolbox toolbox) throws Exception /** * Generate {@link IndexIngestionSpec} from input segments. - + * * @return null if input segments don't exist. Otherwise, a generated ingestionSpec. */ @Nullable diff --git a/indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java b/indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java index a6d3c53060f..5f2cbad32bc 100644 --- a/indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java +++ b/indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java @@ -41,6 +41,7 @@ import io.druid.data.input.Rows; import io.druid.hll.HyperLogLogCollector; import io.druid.indexer.IngestionState; +import io.druid.indexer.TaskStatus; import io.druid.indexing.appenderator.ActionBasedSegmentAllocator; import io.druid.indexing.appenderator.ActionBasedUsedSegmentChecker; import io.druid.indexing.common.IngestionStatsAndErrorsTaskReport; @@ -48,7 +49,6 @@ import
[GitHub] gianm closed issue #5903: Druid allows adding empty string dataSources (it shouldn't)
gianm closed issue #5903: Druid allows adding empty string dataSources (it shouldn't) URL: https://github.com/apache/incubator-druid/issues/5903 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072#issuecomment-408943015 how to add WIP tag? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on a change in pull request #6062: Fix a bug in GroupByQueryEngine
gianm commented on a change in pull request #6062: Fix a bug in GroupByQueryEngine URL: https://github.com/apache/incubator-druid/pull/6062#discussion_r206202261 ## File path: processing/src/main/java/io/druid/query/groupby/GroupByQueryEngine.java ## @@ -178,37 +178,33 @@ public int getNumRows() return positions; } -private List updateValues( -ByteBuffer key, -List dims -) +@Nullable +private List updateValues(ByteBuffer key, List dims) { if (dims.size() > 0) { -List retVal = null; -List unaggregatedBuffers = null; - final DimensionSelector dimSelector = dims.get(0); final IndexedInts row = dimSelector.getRow(); final int rowSize = row.size(); if (rowSize == 0) { ByteBuffer newKey = key.duplicate(); newKey.putInt(MISSING_VALUE); - unaggregatedBuffers = updateValues(newKey, dims.subList(1, dims.size())); + return updateValues(newKey, dims.subList(1, dims.size())); } else { + List retVal = null; for (int i = 0; i < rowSize; i++) { ByteBuffer newKey = key.duplicate(); int dimValue = row.get(i); newKey.putInt(dimValue); -unaggregatedBuffers = updateValues(newKey, dims.subList(1, dims.size())); - } -} -if (unaggregatedBuffers != null) { - if (retVal == null) { -retVal = Lists.newArrayList(); +List unaggregatedBuffers = updateValues(newKey, dims.subList(1, dims.size())); Review comment: > @jihoonson or @jon-wei is that correct? If yes, do you plan to remove groupBy v1? It is correct, groupBy v1 is legacy and should be removed in the future. There is one case I am aware of where it is 'better' than v2: if you have some growable aggregators (like some sketches), v1 will probably use less memory than v2. This is because v2 uses off-heap aggregators which allocate all the memory they need up-front, but v1 uses on-heap aggregators that can grow gradually. I don't have the issue number handy, but someone raised this in the past as a reason that they switched back from v2 to v1 (they were doing groupBy on a high-cardinality dimension with thetaSketch metrics). I think if off-heap aggregators could learn how to start small and then grow, then v1 will be really outpaced by v2 in all known cases and could be removed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #6069: BUG: intermediate persist directories are not cleaned up
gianm commented on issue #6069: BUG: intermediate persist directories are not cleaned up URL: https://github.com/apache/incubator-druid/issues/6069#issuecomment-408912020 Hi @varaga, The MiddleManager processes themselves are meant to clean up these directories when tasks exit. Are the MiddleManagers (or the machine itself) crashing before the cleanup can happen? Also, I _think_ they do not clean up 'stale' directories for unused tasks on startup. This would be a nice feature to add. Something like: remove all task working directories for tasks that are _not_ restored (i.e. in restore.json) Are you interested in contributing a patch for that? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072 https://github.com/apache/incubator-druid/issues/6071 the following two tests failed: Tests in error: TestAWSCredentialsProvider.testWithFileSessionCredentials:98 » SdkClient Unabl... TestAWSCredentialsProvider.testWithFixedAWSKeys:67 » SdkClient Unable to find ... working on this This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument
gianm commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206212482 ## File path: indexing-service/src/main/java/io/druid/indexing/common/task/Tasks.java ## @@ -74,7 +73,8 @@ toBeAccumulated.add(interval); } else { compactIntervals.add(JodaUtils.umbrellaInterval(toBeAccumulated)); - toBeAccumulated = Lists.newArrayList(interval); + toBeAccumulated.clear(); Review comment: Joda Interval objects are immutable so I think this point is moot. It doesn't matter if `JodaUtils.umbrellaInterval` returns a new Interval instance or not. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument
gianm commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206212950 ## File path: processing/src/main/java/io/druid/collections/spatial/Node.java ## @@ -56,7 +57,7 @@ public Node(float[] minCoordinates, float[] maxCoordinates, boolean isLeaf, Bitm public Node( float[] minCoordinates, float[] maxCoordinates, - List children, + @Nullable Node child, Review comment: On YAGNI vs BDUF: IMO, YAGNI is better for internal APIs (which this is) so it is appropriate to get rid of the list. BDUF has some value for external APIs that we can't change later without disrupting users. It can pay to try to think about how usage may evolve over time, before the APIs are introduced. So my 2¢ is that as an internal API, it's better for this one to offer no more than we need today. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm edited a comment on issue #6070: Use JUnit TemporaryFolder rule instead of system temp folder
gianm edited a comment on issue #6070: Use JUnit TemporaryFolder rule instead of system temp folder URL: https://github.com/apache/incubator-druid/pull/6070#issuecomment-408925522 Hi @yuppie-flu, there seems to be some problem with the new forbidden-api check: ``` [ERROR] Failed to execute goal de.thetaphi:forbiddenapis:2.3:check (validate) on project java-util: Parsing signatures failed: Class 'org.apache.commons.io.FileUtils' not found on classpath while parsing signature: org.apache.commons.io.FileUtils#getTempDirectory() -> [Help 1] ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei opened a new issue #6071: DerivativeDataSourceMetadata allows adding empty string baseDataSource
chengchengpei opened a new issue #6071: DerivativeDataSourceMetadata allows adding empty string baseDataSource URL: https://github.com/apache/incubator-druid/issues/6071 similar to https://github.com/apache/incubator-druid/issues/5903. `baseDataSource` should not be empty string. I am new to druid, but would like to contribute to druid. I am working on this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #6036: use S3 as a backup storage for hdfs deep storage
gianm commented on issue #6036: use S3 as a backup storage for hdfs deep storage URL: https://github.com/apache/incubator-druid/pull/6036#issuecomment-408899333 >Agreed. Earlier I tried to load primary and secondary deep storage extensions inside composite-deep-storage module and found it difficult. But now I think maybe we can just let users config druid to load three extensions (primary + second + composite deep storage) if they want to use this feature. Then it would be straightforward to implement composite-deep-storage. That sounds great. It's sort of like how the composing emitter works: you need to load all emitter extensions that you want to compose. >Btw, not sure whether composite-deep-storage is the right name or not, since it's not writing to all components as composite-emitter. Maybe `chained-deep-storage` would be a good name for the extension. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] gianm commented on issue #6058: Remove some unnecessary task storage internal APIs.
gianm commented on issue #6058: Remove some unnecessary task storage internal APIs. URL: https://github.com/apache/incubator-druid/pull/6058#issuecomment-408899571 @asdf2014 I will fix them and re-push. Anything that makes the CI failure message more useful sounds great This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries
jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries URL: https://github.com/apache/incubator-druid/pull/5857#discussion_r206301783 ## File path: processing/src/main/java/io/druid/query/aggregation/AggregatorFactory.java ## @@ -141,6 +142,17 @@ public AggregatorFactory getMergingFactory(AggregatorFactory other) throws Aggre */ public abstract int getMaxIntermediateSize(); + /** + * Return a potentially optimized form of this AggregatorFactory for per-segment queries. + * + * @param optimizationContext + * @return Review comment: nit: you can remove `@param` and `@return` here. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries
jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries URL: https://github.com/apache/incubator-druid/pull/5857#discussion_r206301554 ## File path: processing/src/main/java/io/druid/query/PerSegmentQueryOptimizationContext.java ## @@ -0,0 +1,37 @@ +/* + * Licensed to Metamarkets Group Inc. (Metamarkets) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. Metamarkets licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package io.druid.query; + +public class PerSegmentQueryOptimizationContext Review comment: Would you add a javadoc here too? Also, can we add some more per-segment information in the future? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on issue #6061: skip travis on doc only changes
jihoonson commented on issue #6061: skip travis on doc only changes URL: https://github.com/apache/incubator-druid/pull/6061#issuecomment-408976290 @clintropolis it sounds good. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on issue #5492: Native parallel batch indexing without shuffle
jihoonson commented on issue #5492: Native parallel batch indexing without shuffle URL: https://github.com/apache/incubator-druid/pull/5492#issuecomment-409004658 @drcrallen do you have further comments? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on issue #5584: Decoupling FirehoseFactory and InputRowParser
jihoonson commented on issue #5584: Decoupling FirehoseFactory and InputRowParser URL: https://github.com/apache/incubator-druid/issues/5584#issuecomment-409010765 @josephglanville, thanks for the comment. It sounds good to me. I also agree to support Reader for realtime ingestion too. Do you want to make a PR for this? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] Caroline1000 opened a new pull request #6073: Query gran0730
Caroline1000 opened a new pull request #6073: Query gran0730 URL: https://github.com/apache/incubator-druid/pull/6073 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument
jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206272902 ## File path: indexing-service/src/main/java/io/druid/indexing/common/task/Tasks.java ## @@ -74,7 +73,8 @@ toBeAccumulated.add(interval); } else { compactIntervals.add(JodaUtils.umbrellaInterval(toBeAccumulated)); - toBeAccumulated = Lists.newArrayList(interval); + toBeAccumulated.clear(); Review comment: Here is a simple example of `umbrellaInterval()` to break this method. Note that every caller currently passes a list. This method can be changed to accept a list without further change. ```java public static Interval umbrellaInterval(List intervals) { if (intervals.size() == 1) { return intervals.get(0); } ... ``` > Also, as per "Effective Java", the responsibility for making defensive copies is on the method (API) implementation side, not on the caller side. I basically agree on this. If you want to keep as it is, please add a javadoc to `umbrellaInterval()` to make sure that it always returns a new instance. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument
jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206277090 ## File path: indexing-service/src/main/java/io/druid/indexing/common/task/Tasks.java ## @@ -74,7 +73,8 @@ toBeAccumulated.add(interval); } else { compactIntervals.add(JodaUtils.umbrellaInterval(toBeAccumulated)); - toBeAccumulated = Lists.newArrayList(interval); + toBeAccumulated.clear(); Review comment: Oh, you're correct. This looks fine. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries
jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries URL: https://github.com/apache/incubator-druid/pull/5857#discussion_r206304373 ## File path: processing/src/main/java/io/druid/query/aggregation/FilteredAggregatorFactory.java ## @@ -140,6 +144,61 @@ public int getMaxIntermediateSize() return delegate.getMaxIntermediateSize(); } + @Override + public AggregatorFactory optimizeForSegment(PerSegmentQueryOptimizationContext optimizationContext) + { +if (filter instanceof IntervalDimFilter) { Review comment: It looks that this optimization would work only when there is a single `IntervalDimFilter` here. But, probably there would be more complicated filter using `AndDimFilter` or `OrDimFilter`. To support that, we need to check all filters. What do you think about adding a new interface like `optimizeForSegment` to `DimFilter`? Probably we also need to use the visitor pattern to fully optimize this. I'm not sure this should be a part of this PR. If you think it's not, would you please raise an issue about this and add a link here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei opened a new pull request #6075: validate non-empty String baseDataSource in MaterializedViewSuperviso…
chengchengpei opened a new pull request #6075: validate non-empty String baseDataSource in MaterializedViewSuperviso… URL: https://github.com/apache/incubator-druid/pull/6075 re: https://github.com/apache/incubator-druid/issues/6074 I am working on this. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] leventov commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument
leventov commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206274500 ## File path: indexing-service/src/main/java/io/druid/indexing/common/task/Tasks.java ## @@ -74,7 +73,8 @@ toBeAccumulated.add(interval); } else { compactIntervals.add(JodaUtils.umbrellaInterval(toBeAccumulated)); - toBeAccumulated = Lists.newArrayList(interval); + toBeAccumulated.clear(); Review comment: @jihoonson but `umbrellaInterval()` is called before the list is cleared. The version that you presented will work fine as well, unless there is *concurrent* modification of the passed list, that is a total mess and would break `umbrellaInterval()` anyway. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument
jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206274914 ## File path: processing/src/main/java/io/druid/collections/spatial/Node.java ## @@ -56,7 +57,7 @@ public Node(float[] minCoordinates, float[] maxCoordinates, boolean isLeaf, Bitm public Node( float[] minCoordinates, float[] maxCoordinates, - List children, + @Nullable Node child, Review comment: I don't have a strong opinion here, but I think anyone would wonder why this `Node` accepts only a single child in its constructor if he is not familiar with this part because they would know what the tree is and usually a node can have multiple children. I think that BDUF and YAGNI are the concepts to give some basic guidance and we should be able to apply it flexibly. Anyway, if you want to keep this, please add a comment why it currently accepts a single child. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
jihoonson commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072#issuecomment-408966069 Hi @chengchengpei, thank you for the contribution! I've added `WIP` tag. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] jihoonson commented on a change in pull request #5789: Add stringLast and stringFirst aggregators extension
jihoonson commented on a change in pull request #5789: Add stringLast and stringFirst aggregators extension URL: https://github.com/apache/incubator-druid/pull/5789#discussion_r206319063 ## File path: processing/src/main/java/io/druid/query/aggregation/first/StringFirstBufferAggregator.java ## @@ -0,0 +1,151 @@ +/* + * Licensed to Metamarkets Group Inc. (Metamarkets) under one + * or more contributor license agreements. See the NOTICE file + * distributed with this work for additional information + * regarding copyright ownership. Metamarkets licenses this file + * to you under the Apache License, Version 2.0 (the + * "License"); you may not use this file except in compliance + * with the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, + * software distributed under the License is distributed on an + * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + * KIND, either express or implied. See the License for the + * specific language governing permissions and limitations + * under the License. + */ + +package io.druid.query.aggregation.first; + +import io.druid.java.util.common.StringUtils; +import io.druid.query.aggregation.BufferAggregator; +import io.druid.query.aggregation.SerializablePairLongString; +import io.druid.query.monomorphicprocessing.RuntimeShapeInspector; +import io.druid.segment.BaseLongColumnValueSelector; +import io.druid.segment.BaseObjectColumnValueSelector; + +import java.nio.ByteBuffer; + +// TODO: Unit Test +public class StringFirstBufferAggregator implements BufferAggregator +{ + private final BaseLongColumnValueSelector timeSelector; + private final BaseObjectColumnValueSelector valueSelector; + private final int maxStringBytes; + + public StringFirstBufferAggregator( + BaseLongColumnValueSelector timeSelector, + BaseObjectColumnValueSelector valueSelector, + int maxStringBytes + ) + { +this.timeSelector = timeSelector; +this.valueSelector = valueSelector; +this.maxStringBytes = maxStringBytes; + } + + @Override + public void init(ByteBuffer buf, int position) + { +buf.putLong(position, Long.MAX_VALUE); +buf.putInt(position + Long.BYTES, 0); + } + + @Override + public void aggregate(ByteBuffer buf, int position) + { +ByteBuffer mutationBuffer = buf.duplicate(); +mutationBuffer.position(position); + +Object value = valueSelector.getObject(); + +long time = Long.MAX_VALUE; +String firstString = null; + +if (value instanceof SerializablePairLongString) { + SerializablePairLongString serializablePair = (SerializablePairLongString) value; + time = serializablePair.lhs; + firstString = serializablePair.rhs; +} else if (value instanceof String) { + time = timeSelector.getLong(); + firstString = (String) value; +} else if (value != null) { + throw new IllegalStateException( + "Try to aggregate unsuported class type [" + + value.getClass().getName() + + "]. Supported class types: String or SerializablePairLongString" + ); +} + +long lastTime = mutationBuffer.getLong(position); +if (firstString != null && time < lastTime) { Review comment: > In this case, you can't use filters. If you use filters you discard the row minute: 1, client: AAA, location: null, but you can't recover the previous value minute: 1, client: AAA, location: zoneC. This is how `having` filter works, not `filter`. The difference between them is, `filter` is applied _before_ aggregation while `having` is applied _after_ aggregation. So, in the above example, with the `not-null` filter, the new row `minute: 1.4, client: AAA, location: null` will be filtered out but other rows can be still consumed by aggregators. The result should be `minute: 1, client: AAA, location: zoneC`. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…
chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad… URL: https://github.com/apache/incubator-druid/pull/6072#issuecomment-408974324 @jihoonson Hi, Thanks. It seems that the failed tests are not related to my change. most of them failed because of timeout. Is there any way to trigger another test? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org