[GitHub] gianm commented on a change in pull request #6065: Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask

2018-07-27 Thread GitBox
gianm commented on a change in pull request #6065: Fix CombiningFirehoseFactory 
with IngestSegmentFirehoseFactory in IndexTask
URL: https://github.com/apache/incubator-druid/pull/6065#discussion_r205929239
 
 

 ##
 File path: 
indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java
 ##
 @@ -416,6 +417,15 @@ public TaskStatus run(final TaskToolbox toolbox) throws 
Exception
 ((IngestSegmentFirehoseFactory) 
firehoseFactory).setTaskToolbox(toolbox);
   }
 
+  if (firehoseFactory instanceof CombiningFirehoseFactory) {
 
 Review comment:
   What if a Combining firehose has another Combining firehose within it? This 
should be recursive.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm opened a new pull request #6067: FinalizingFieldAccessPostAggregator: Fix serde.

2018-07-27 Thread GitBox
gianm opened a new pull request #6067: FinalizingFieldAccessPostAggregator: Fix 
serde.
URL: https://github.com/apache/incubator-druid/pull/6067
 
 
   Fixes #6063.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] b-slim closed pull request #4776: Fix ClassNotFoundException in druid-kerberos extension

2018-07-27 Thread GitBox
b-slim closed pull request #4776: Fix ClassNotFoundException in druid-kerberos 
extension
URL: https://github.com/apache/incubator-druid/pull/4776
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/extensions-core/druid-kerberos/src/main/java/io/druid/security/kerberos/DruidKerberosUtil.java
 
b/extensions-core/druid-kerberos/src/main/java/io/druid/security/kerberos/DruidKerberosUtil.java
index d171821051a..904f71f81e0 100644
--- 
a/extensions-core/druid-kerberos/src/main/java/io/druid/security/kerberos/DruidKerberosUtil.java
+++ 
b/extensions-core/druid-kerberos/src/main/java/io/druid/security/kerberos/DruidKerberosUtil.java
@@ -98,6 +98,7 @@ public static void 
authenticateIfRequired(AuthenticationKerberosConfig config)
 String keytab = config.getKeytab();
 if (!Strings.isNullOrEmpty(principal) && !Strings.isNullOrEmpty(keytab)) {
   Configuration conf = new Configuration();
+  conf.setClassLoader(DruidKerberosModule.class.getClassLoader());
   conf.set(CommonConfigurationKeysPublic.HADOOP_SECURITY_AUTHENTICATION, 
"kerberos");
   UserGroupInformation.setConfiguration(conf);
   try {


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jon-wei opened a new pull request #6065: Fix CombiningFirehoseFactory with IngestSegmentFirehoseFactory in IndexTask

2018-07-27 Thread GitBox
jon-wei opened a new pull request #6065: Fix CombiningFirehoseFactory with 
IngestSegmentFirehoseFactory in IndexTask
URL: https://github.com/apache/incubator-druid/pull/6065
 
 
   If I define a CombiningFirehose with a IngestSegmentFirehose delegate in an 
IndexTask:
   
   ```
   "firehose" : {
   "type": "combining",
   "delegates": [
 {
   "type": "ingestSegment",
   "dataSource"   : "updates-tutorial",
   "interval" : "2018-01-01/2018-01-03"
 },
 {
   "type" : "local",
   "baseDir" : "quickstart/",
   "filter" : "updates-data3.json"
 }
   ]
 },
   ```
   
   The following null check in IngestSegmentFirehoseFactory.connect() is 
triggered:
   
   ```
 @Override
 public Firehose connect(InputRowParser inputRowParser, File 
temporaryDirectory) throws ParseException
 {
   log.info("Connecting firehose: dataSource[%s], interval[%s]", 
dataSource, interval);
   
   Preconditions.checkNotNull(taskToolbox, "taskToolbox is not set");
   ```
   
   This PR adjusts IndexTask so that it passes the taskToolbox to delegate 
IngestSegmentFirehoseFactory objects inside a CombiningFirehoseFactory.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #6063: FinalizingFieldAccessPostAggregator cannot be deserialized once it gets decorated

2018-07-27 Thread GitBox
gianm commented on issue #6063: FinalizingFieldAccessPostAggregator cannot be 
deserialized once it gets decorated
URL: 
https://github.com/apache/incubator-druid/issues/6063#issuecomment-408575833
 
 
   FinalizingFieldAccessPostAggregator is documented, so I think we should fix 
it. I'll do a patch.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] josephglanville commented on issue #6066: Sorting rows when rollup is disabled

2018-07-28 Thread GitBox
josephglanville commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-408593384
 
 
   This is especially useful in cases where no metrics are specified and all 
rows are dimensions only as leaving roll-up enabled in those cases emits a 
warning.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] leventov opened a new pull request #6062: Fix a bug in GroupByQueryEngine

2018-07-27 Thread GitBox
leventov opened a new pull request #6062: Fix a bug in GroupByQueryEngine
URL: https://github.com/apache/incubator-druid/pull/6062
 
 
   It was potentially missing many unaggregated buffers.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] leventov commented on issue #6027: Make Parser.parseToMap() to return a mutable Map

2018-07-27 Thread GitBox
leventov commented on issue #6027: Make Parser.parseToMap() to return a mutable 
Map
URL: https://github.com/apache/incubator-druid/pull/6027#issuecomment-408565906
 
 
   If Spark updates Kryo (BTW I cannot even find on what version it depends), 
and Spark cluster users update Spark, this change is not needed. But this is 
not a practical path IMO.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on issue #5566: Applying doubleSum over doubleFirst aggregations in nested groupBy fails

2018-07-27 Thread GitBox
jihoonson commented on issue #5566: Applying doubleSum over doubleFirst 
aggregations in nested groupBy fails
URL: 
https://github.com/apache/incubator-druid/issues/5566#issuecomment-408567669
 
 
   Hi all,
   
   `DoubleFirstAggregator` returns `SerializedPair`s  which is the intermediate 
format to merging doubleFirst values from multiple historicals. So, to compute 
some aggregations like `sum` over `doubleFirst`, you first need to _finalize_ 
the intermediate result to be the final format which contains only the double 
value. You can use `expression` postAggregator to do this. Here is an example.
   
   ```json
   {
 "queryType": "groupBy",
 "dataSource": {
   "type": "query",
   "query": {
 "queryType": "groupBy",
 "dataSource": "twitter",
 "aggregations": [
   {
 "type": "doubleFirst",
 "name": "tweets",
 "fieldName": "tweets"
   }
 ],
 "postAggregations": [
   { "type" : "expression", "name": "tweets_first", "expression" : 
"tweets" }
 ],
 "dimensions": [
   "lang"
 ],
 "intervals": "2018-07-27/2018-07-28",
 "granularity": "hour"
   }
 },
 "intervals": "2018-07-27/2018-07-28",
 "granularity": "hour",
 "aggregations": [
   {
 "type": "doubleSum",
 "name": "value_num",
 "fieldName": "tweets_first"
   }
 ],
 "dimensions": []
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] leventov opened a new issue #6064: Duplicate DataSegmentTest

2018-07-27 Thread GitBox
leventov opened a new issue #6064: Duplicate DataSegmentTest
URL: https://github.com/apache/incubator-druid/issues/6064
 
 
   Seems that `io.druid.timeline.DataSegmentTest` is duplicated in druid-api 
and druid-server modules.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section

2018-08-02 Thread GitBox
hpandeycodeit commented on issue #6047:  Minor change in the "Start up Druid 
services" Section
URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410013932
 
 
   @jihoonson, 
   Yes, I am interested in this. Do we have scripts ready for all these java 
commands or for just one? 
   And can you explain why an " extra parameter for configuration directory" is 
needed? 
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on issue #6047: Minor change in the "Start up Druid services" Section

2018-08-02 Thread GitBox
jihoonson commented on issue #6047:  Minor change in the "Start up Druid 
services" Section
URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410022159
 
 
   @hpandeycodeit we have the scripts for all Druid components. Please check 
`${DRUID_HOME}/bin` directory.
   
   > And can you explain why an " extra parameter for configuration directory" 
is needed?
   
   Ah, nvm. I thought those scripts doesn't allow to change configuration 
directory, and so they should be able to allow since the quickstart uses its 
own configuration files.
   But it was wrong. You can change the configuration directory path by setting 
the `DRUID_CONF_DIR` environment variable.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] asdf2014 opened a new pull request #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`

2018-08-01 Thread GitBox
asdf2014 opened a new pull request #6090: Fix missing exception handling as 
part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
URL: https://github.com/apache/incubator-druid/pull/6090
 
 
   Try to fix issues 
[#6024](https://github.com/apache/incubator-druid/issues/6024)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #5938: URL encode datasources, task ids, authenticator names.

2018-08-01 Thread GitBox
gianm commented on issue #5938: URL encode datasources, task ids, authenticator 
names.
URL: https://github.com/apache/incubator-druid/pull/5938#issuecomment-409604350
 
 
   @himanshug, re:
   
   > looks like most of the fixes are in the calls to makeWorkerUrl(..), is it 
possible to put the fix inside impl of that method instead ?
   
   I made that change and re-pushed. I also consolidated the two makeWorkerURL 
methods into one helper method in TaskRunnerUtils.
   
   @jihoonson, re:
   
   > I think this is a very common mistake, so we need another way to prevent 
such mistakes. I suggest to add a sort of UrlBuilder which enforces encoding to 
input elements if needed.
   
   I considered doing this, but then we have a new problem of how to get 
callers to use UrlBuilder rather than making their own URLs. So I ended up just 
adding a urlEncode method to StringUtils and calling that where appropriate. 
(Note: it isn't always appropriate, since sometimes the format parameter is a 
literal path string like `"foo/bar"` and in that case we should _not_ url 
encode.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] josephglanville commented on issue #5584: Decoupling FirehoseFactory and InputRowParser

2018-08-02 Thread GitBox
josephglanville commented on issue #5584: Decoupling FirehoseFactory and 
InputRowParser
URL: 
https://github.com/apache/incubator-druid/issues/5584#issuecomment-410030450
 
 
   One thing that has become apparent is the `StringInputRowParser` is a bit of 
a bad citizen.
   It extends `ByteBufferInputRowParser` aka `InputRowParser`.
   
   This is not a problem in and of itself and it's completely reasonable to 
have a input row parser designed to read UTF-8 encoded strings from 
`ByteBuffer`.
   
   The issue is that isn't how it's used, it's primarily used via function with 
signature the `InputRow parse(String)`, the fact this is `String` rather than 
`ByteBuffer` not only violates the assumption of the interface but it also 
means that users of `StringInputRowParser` aren't using the `InputRowParser` 
interface at all, effectively making any code built on StringInputRowParser not 
very generic.
   
   It seems to me that it would make sense for there to be 
`StringInputRowParser` that is actually `InputRowParser` and 
`StringDecoderInputRowParser` which inherits from `ByteBufferInputRowParser` 
reads data from ByteBuffer decoding UTF-8.
   
   I would also move to put add a `reset()` to the `InputRowParser` interface 
as it's already on `Parser`. Currently there is `startFileFromBeginning()` on 
`StringInputRowParser` that calls the same function on the inner `Parser`. 
Seems as this may be useful in general for resetting the state of stateful 
parsers when opening the next object/input stream.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section

2018-08-02 Thread GitBox
hpandeycodeit commented on issue #6047:  Minor change in the "Start up Druid 
services" Section
URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410037251
 
 
   @jihoonson ,
   The scripts are there but doesn't look like it's working: 
   
   For eg: 
   
   $./broker.sh start
   Started broker node (5282)
   $./broker.sh status
   STOPPED
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns

2018-08-02 Thread GitBox
clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' 
Columns
URL: https://github.com/apache/incubator-druid/pull/6016#discussion_r207381285
 
 

 ##
 File path: 
processing/src/main/java/io/druid/segment/data/ShapeShiftingColumnSerializer.java
 ##
 @@ -0,0 +1,298 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package io.druid.segment.data;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import io.druid.java.util.common.io.smoosh.FileSmoosher;
+import io.druid.java.util.common.logger.Logger;
+import io.druid.segment.IndexSpec;
+import io.druid.segment.data.codecs.CompressedFormEncoder;
+import io.druid.segment.data.codecs.FormEncoder;
+import io.druid.segment.data.codecs.FormMetrics;
+import io.druid.segment.serde.Serializer;
+import io.druid.segment.writeout.SegmentWriteOutMedium;
+import io.druid.segment.writeout.WriteOutBytes;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.WritableByteChannel;
+import java.util.Map;
+import java.util.function.Function;
+
+/**
+ * Base serializer for {@link ShapeShiftingColumn} implementations, providing 
most common functionality such as headers,
+ * value-chunking, encoder selection, and writing out values.
+ *
+ * Encoding Selection:
+ * The intention of this base structure is that implementors of this class 
will analyze incoming values and aggregate
+ * facts about the data which matching {@link FormEncoder} implementations 
might find interesting, while storing raw,
+ * unencoded values in {@link ShapeShiftingColumnSerializer#currentChunk}. 
When the threshold of
+ * {@link ShapeShiftingColumnSerializer#valuesPerChunk} is reached, {@link 
ShapeShiftingColumnSerializer} will attempt
+ * to find the "best" encoding by first computing the encoded size with
+ * {@link FormEncoder#getEncodedSize} and then applying a modifier to scale 
this value in order to influence behavior
+ * when sizes are relatively close according to the chosen {@link 
IndexSpec.ShapeShiftOptimizationTarget}. This
+ * effectively sways the decision towards using encodings with faster decoding 
speed or smaller encoded size as
+ * appropriate. Note that very often the best encoding is unambiguous and 
these settings don't matter, the nuanced
+ * differences of behavior of {@link IndexSpec.ShapeShiftOptimizationTarget} 
mainly come into play when things are
+ * close.
+ *
+ * Implementors need only supply an initialize method to allocate storage for 
{@code }, an add value method to
+ * populate {@code }, a reset method to prepare {@code 
} for the next chunk after a flush, and
+ * of course, matching {@link FormEncoder} implementations to perform actual 
value encoding. Generic compression is
+ * available to {@link FormEncoder} implementations by implementing
+ * {@link io.druid.segment.data.codecs.CompressibleFormEncoder} and wrapping 
in a
+ * {@link io.druid.segment.data.codecs.CompressedFormEncoder} in the codec 
list passed to the serializer.
+ *
+ * layout:
+ * | version (byte) | headerSize (int) | numValues (int) | numChunks (int) | 
logValuesPerChunk (byte) | offsetsOutSize (int) |  compositionSize (int) | 
composition | offsets | values |
+ *
+ * @param 
+ * @param 
+ */
+public abstract class ShapeShiftingColumnSerializer implements Serializer
+{
+  /**
+   * | version (byte) | headerSize (int) | numValues (int) | numChunks (int) | 
logValuesPerChunk (byte) | offsetsOutSize (int) | compositionSize (int) |
+   */
+  private static final int BASE_HEADER_BYTES = 1 + (3 * Integer.BYTES) + 1 + 
(2 * Integer.BYTES);
+
+  private static Logger log = new Logger(ShapeShiftingColumnSerializer.class);
+
+  protected final SegmentWriteOutMedium segmentWriteOutMedium;
+  protected final FormEncoder[] codecs;
+  protected final byte version;
+  protected final byte logValuesPerChunk;
+  protected final int valuesPerChunk;
+  protected final ByteBuffer intToBytesHelperBuffer;
+  protected final Map composition;
+  protected final IndexSpec.ShapeShiftOptimizationTarget optimizationTarget;
+  protected 

[GitHub] gianm commented on issue #6017: Logging of invalid queries

2018-08-02 Thread GitBox
gianm commented on issue #6017: Logging of invalid queries
URL: 
https://github.com/apache/incubator-druid/issues/6017#issuecomment-410080968
 
 
   Sounds like a good idea -- the main challenge is probably that right now, 
the request logger expects to get a valid Query object, but we could add a 
method that accepts invalid queries as byte arrays or something like that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1835: Schema differences of GroupBy vs TopN/TimeSeries return data should be not be present

2018-08-02 Thread GitBox
vogievetsky commented on issue #1835: Schema differences of GroupBy vs 
TopN/TimeSeries return data should be not be present
URL: 
https://github.com/apache/incubator-druid/issues/1835#issuecomment-410032064
 
 
   I ed this issue back in the day but since then groupBy v2 drastically 
improved the situation if you are using Rune (Druid's JSON based language). If 
you want 100% standard results you should now use DSQL  
   @gianm I think this issue can be closed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] pdeva closed issue #1598: upgrading from 0.6 to 0.7 will result in corruption of existing segment in realtime node

2018-08-02 Thread GitBox
pdeva closed issue #1598: upgrading from 0.6 to 0.7  will result in corruption 
of existing segment in realtime node
URL: https://github.com/apache/incubator-druid/issues/1598
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] pdeva closed issue #1816: Druid 0.8.1 coordinator throws a bunch of exceptions at startup

2018-08-02 Thread GitBox
pdeva closed issue #1816: Druid 0.8.1 coordinator throws a bunch of exceptions 
at startup
URL: https://github.com/apache/incubator-druid/issues/1816
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1449: Allow sorting by timestamp

2018-08-02 Thread GitBox
vogievetsky commented on issue #1449: Allow sorting by timestamp
URL: 
https://github.com/apache/incubator-druid/issues/1449#issuecomment-410036629
 
 
   This is now possible with GroupBy v2 (and in Druid SQL)
   @gianm please confirm and close.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on issue #6047: Minor change in the "Start up Druid services" Section

2018-08-02 Thread GitBox
jihoonson commented on issue #6047:  Minor change in the "Start up Druid 
services" Section
URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410051112
 
 
   It should work. Would you please check your broker logs?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #6097: Possible performance issue during DruidSchema metadata refresh

2018-08-02 Thread GitBox
gianm commented on issue #6097: Possible performance issue during DruidSchema 
metadata refresh
URL: 
https://github.com/apache/incubator-druid/issues/6097#issuecomment-410063021
 
 
   Hi @egor-ryashin,
   
   There have been some improvements to the refreshing mechanism in later 
versions (try 0.12.1?). It doesn't constantly refresh metadata for historical 
segments, but it does refresh periodically for realtime segments, which is 
important because new columns can show up at any time. However, the metadata 
does take up space, so if you were running kind of close to heap max with 
`druid.sql.enable=false` before then you'll need more heap in order to enable 
SQL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1836: MapCache needs re-written

2018-08-02 Thread GitBox
vogievetsky commented on issue #1836: MapCache needs re-written
URL: 
https://github.com/apache/incubator-druid/issues/1836#issuecomment-410031205
 
 
   Did https://github.com/apache/incubator-druid/pull/3028 address this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1711: null exception from filter with Query-Time Lookup

2018-08-02 Thread GitBox
vogievetsky commented on issue #1711: null exception from filter with 
Query-Time Lookup
URL: 
https://github.com/apache/incubator-druid/issues/1711#issuecomment-410034795
 
 
   Jonathan! Long time no see! I remember when this was filed... good times.
   @fjy I believe this was resolved (as noted above) by #1578 and #1731 and can 
be closed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1560: convertSpec tool completely broken

2018-08-02 Thread GitBox
vogievetsky commented on issue #1560: convertSpec tool completely broken
URL: 
https://github.com/apache/incubator-druid/issues/1560#issuecomment-410036215
 
 
   Is this still an issue?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #6094: Introduce SystemSchema tables (#5989)

2018-08-02 Thread GitBox
vogievetsky commented on issue #6094: Introduce SystemSchema tables (#5989)
URL: https://github.com/apache/incubator-druid/pull/6094#issuecomment-410038602
 
 
   so excited about this one!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] egor-ryashin opened a new issue #6097: Possible performance issue during DruidSchema metadata refresh

2018-08-02 Thread GitBox
egor-ryashin opened a new issue #6097: Possible performance issue during 
DruidSchema metadata refresh
URL: https://github.com/apache/incubator-druid/issues/6097
 
 
   `Druid Broker 0.11.0` with `druid.sql.enable=true` produces much more 
garbage than usual, profiling shows a lot of `Object[]` which are referenced by 
`TreeMap$Entry` which eventually leads to 
`io.druid.sql.calcite.schema.DruidSchema#refreshSegments`. I speculate it 
constantly refreshes metadata for every segment. With `-Xmx5G` and 1mln 
segments it leads to unresponsive service due to GC. Meanwhile it works fine 
with `druid.sql.enable=false`.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns

2018-08-02 Thread GitBox
clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' 
Columns
URL: https://github.com/apache/incubator-druid/pull/6016#discussion_r207370256
 
 

 ##
 File path: 
processing/src/main/java/io/druid/segment/data/ShapeShiftingColumnSerializer.java
 ##
 @@ -0,0 +1,298 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package io.druid.segment.data;
+
+import com.google.common.base.Preconditions;
+import com.google.common.collect.Maps;
+import com.google.common.primitives.Ints;
+import io.druid.java.util.common.io.smoosh.FileSmoosher;
+import io.druid.java.util.common.logger.Logger;
+import io.druid.segment.IndexSpec;
+import io.druid.segment.data.codecs.CompressedFormEncoder;
+import io.druid.segment.data.codecs.FormEncoder;
+import io.druid.segment.data.codecs.FormMetrics;
+import io.druid.segment.serde.Serializer;
+import io.druid.segment.writeout.SegmentWriteOutMedium;
+import io.druid.segment.writeout.WriteOutBytes;
+
+import javax.annotation.Nullable;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+import java.nio.channels.WritableByteChannel;
+import java.util.Map;
+import java.util.function.Function;
+
+/**
+ * Base serializer for {@link ShapeShiftingColumn} implementations, providing 
most common functionality such as headers,
+ * value-chunking, encoder selection, and writing out values.
+ *
+ * Encoding Selection:
+ * The intention of this base structure is that implementors of this class 
will analyze incoming values and aggregate
+ * facts about the data which matching {@link FormEncoder} implementations 
might find interesting, while storing raw,
+ * unencoded values in {@link ShapeShiftingColumnSerializer#currentChunk}. 
When the threshold of
+ * {@link ShapeShiftingColumnSerializer#valuesPerChunk} is reached, {@link 
ShapeShiftingColumnSerializer} will attempt
+ * to find the "best" encoding by first computing the encoded size with
+ * {@link FormEncoder#getEncodedSize} and then applying a modifier to scale 
this value in order to influence behavior
+ * when sizes are relatively close according to the chosen {@link 
IndexSpec.ShapeShiftOptimizationTarget}. This
+ * effectively sways the decision towards using encodings with faster decoding 
speed or smaller encoded size as
+ * appropriate. Note that very often the best encoding is unambiguous and 
these settings don't matter, the nuanced
+ * differences of behavior of {@link IndexSpec.ShapeShiftOptimizationTarget} 
mainly come into play when things are
+ * close.
+ *
+ * Implementors need only supply an initialize method to allocate storage for 
{@code }, an add value method to
+ * populate {@code }, a reset method to prepare {@code 
} for the next chunk after a flush, and
+ * of course, matching {@link FormEncoder} implementations to perform actual 
value encoding. Generic compression is
+ * available to {@link FormEncoder} implementations by implementing
+ * {@link io.druid.segment.data.codecs.CompressibleFormEncoder} and wrapping 
in a
+ * {@link io.druid.segment.data.codecs.CompressedFormEncoder} in the codec 
list passed to the serializer.
+ *
+ * layout:
+ * | version (byte) | headerSize (int) | numValues (int) | numChunks (int) | 
logValuesPerChunk (byte) | offsetsOutSize (int) |  compositionSize (int) | 
composition | offsets | values |
+ *
+ * @param 
+ * @param 
+ */
+public abstract class ShapeShiftingColumnSerializer implements Serializer
+{
+  /**
+   * | version (byte) | headerSize (int) | numValues (int) | numChunks (int) | 
logValuesPerChunk (byte) | offsetsOutSize (int) | compositionSize (int) |
+   */
+  private static final int BASE_HEADER_BYTES = 1 + (3 * Integer.BYTES) + 1 + 
(2 * Integer.BYTES);
+
+  private static Logger log = new Logger(ShapeShiftingColumnSerializer.class);
+
+  protected final SegmentWriteOutMedium segmentWriteOutMedium;
+  protected final FormEncoder[] codecs;
+  protected final byte version;
+  protected final byte logValuesPerChunk;
+  protected final int valuesPerChunk;
+  protected final ByteBuffer intToBytesHelperBuffer;
+  protected final Map composition;
+  protected final IndexSpec.ShapeShiftOptimizationTarget optimizationTarget;
+  protected 

[GitHub] hpandeycodeit commented on issue #6047: Minor change in the "Start up Druid services" Section

2018-08-02 Thread GitBox
hpandeycodeit commented on issue #6047:  Minor change in the "Start up Druid 
services" Section
URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410098043
 
 
   @jihoonson, 
   
   I am getting "Not Enough direct memory" error: 
   
   ```
1) Not enough direct memory.  Please adjust -XX:MaxDirectMemorySize, 
druid.processing.buffer.sizeBytes, druid.processing.numThreads, or 
druid.processing.numMergeBuffers: maxDirectMemory[4,294,967,296], 
memoryNeeded[5,368,709,120] =
druid.processing.buffer.sizeBytes[536,870,912] * 
(druid.processing.numMergeBuffers[2] + druid.processing.numThreads[7] + 1)
 at 
io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:110)
 (via modules: com.google.inject.util.Modules$OverrideModule -> 
com.google.inject.util.Modules$OverrideModule -> io.druid.guice.Dru
   idProcessingModule)
 at 
io.druid.guice.DruidProcessingModule.getIntermediateResultsPool(DruidProcessingModule.java:110)
 (via modules: com.google.inject.util.Modules$OverrideModule -> 
com.google.inject.util.Modules$OverrideModule -> io.druid.guice.Dru
   idProcessingModule)
 while locating io.druid.collections.NonBlockingPool 
annotated with @io.druid.guice.annotations.Global()
   for the 2nd parameter of 
io.druid.query.groupby.GroupByQueryEngine.(GroupByQueryEngine.java:81)
 at 
io.druid.guice.QueryRunnerFactoryModule.configure(QueryRunnerFactoryModule.java:88)
 (via modules: com.google.inject.util.Modules$OverrideModule -> 
com.google.inject.util.Modules$OverrideModule -> io.druid.guice.QueryRunnerFact
   oryModule)
 while locating io.druid.query.groupby.GroupByQueryEngine
   for the 2nd parameter of 
io.druid.query.groupby.strategy.GroupByStrategyV1.(GroupByStrategyV1.java:77)
 while locating io.druid.query.groupby.strategy.GroupByStrategyV1
   for the 2nd parameter of 
io.druid.query.groupby.strategy.GroupByStrategySelector.(GroupByStrategySelector.java:43)
 while locating io.druid.query.groupby.strategy.GroupByStrategySelector
   for the 1st parameter of 
io.druid.query.groupby.GroupByQueryQueryToolChest.(GroupByQueryQueryToolChest.java:104)
 at 
io.druid.guice.QueryToolChestModule.configure(QueryToolChestModule.java:101) 
(via modules: com.google.inject.util.Modules$OverrideModule -> 
com.google.inject.util.Modules$OverrideModule -> 
io.druid.guice.QueryRunnerFactoryModu
   le)
```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] josephglanville edited a comment on issue #5584: Decoupling FirehoseFactory and InputRowParser

2018-08-02 Thread GitBox
josephglanville edited a comment on issue #5584: Decoupling FirehoseFactory and 
InputRowParser
URL: 
https://github.com/apache/incubator-druid/issues/5584#issuecomment-410030450
 
 
   One thing that has become apparent is the `StringInputRowParser` is a bit of 
a bad citizen.
   It extends `ByteBufferInputRowParser` aka `InputRowParser`.
   
   This is not a problem in and of itself and it's completely reasonable to 
have a input row parser designed to read UTF-8 encoded strings from 
`ByteBuffer`.
   
   The issue is that isn't how it's used, it's primarily used via function with 
signature the `InputRow parse(String)`, the fact this is `String` rather than 
`ByteBuffer` not only violates the assumption of the interface but it also 
means that users of `StringInputRowParser` aren't using the `InputRowParser` 
interface at all, effectively making any code built on `StringInputRowParser` 
not very generic.
   
   It seems to me that it would make sense for there to be 
`StringInputRowParser` that is actually `InputRowParser` and 
`StringDecoderInputRowParser` which inherits from `ByteBufferInputRowParser` 
reads data from ByteBuffer decoding UTF-8.
   
   I would also move to put add a `reset()` to the `InputRowParser` interface 
as it's already on `Parser`. Currently there is `startFileFromBeginning()` on 
`StringInputRowParser` that calls the same function on the inner `Parser`. 
Seems as this may be useful in general for resetting the state of stateful 
parsers when opening the next object/input stream.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #6096: Windows Ubuntu Bash and Task Directory Name

2018-08-02 Thread GitBox
vogievetsky commented on issue #6096: Windows Ubuntu Bash and Task Directory 
Name
URL: 
https://github.com/apache/incubator-druid/issues/6096#issuecomment-410030657
 
 
   I think it is a good point about directory names and should be considered 
for the project @gianm does it make sense to avoid contentious characters like 
`:`?
   For now, Rathish, I can tell you that Windows is solidly out of scope for 
Druid (and it has never been). Please move to a proper Linux environment.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1692: Why Deep storage No data?but realtime node can find 3413610 num data

2018-08-02 Thread GitBox
vogievetsky commented on issue #1692: Why Deep storage No data?but realtime 
node can find 3413610 num data
URL: 
https://github.com/apache/incubator-druid/issues/1692#issuecomment-410035126
 
 
   @zengzhihai110 can this issue be closed?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #6076: Mutual TLS support

2018-08-02 Thread GitBox
gianm commented on issue #6076: Mutual TLS support
URL: https://github.com/apache/incubator-druid/pull/6076#issuecomment-410063706
 
 
   @jon-wei, thanks, I think it is really valuable especially as we add more 
features to the feature.
   
   (I heard you like features so we put features in your features?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1816: Druid 0.8.1 coordinator throws a bunch of exceptions at startup

2018-08-02 Thread GitBox
vogievetsky commented on issue #1816: Druid 0.8.1 coordinator throws a bunch of 
exceptions at startup
URL: 
https://github.com/apache/incubator-druid/issues/1816#issuecomment-410033063
 
 
   This issue is super pretty old and does not look actionable (Who even knows 
where to get Druid 0.8.1 these days). @fjy I suggest we close it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #6066: Sorting rows when rollup is disabled

2018-08-02 Thread GitBox
gianm commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410039394
 
 
   IMO the speed hit on ingestion is acceptable. It is small, and performance 
is not really worse than the rollup case. Adding sorting will incur a 
performance cost no matter what: either it has to happen continuously or it has 
to happen in one big shot before persisting.
   
   Any idea where the speed hit on groupBys comes from? Seems strange that 
other query types wouldn't care but groupBy does. At any rate, I am not super 
worried about it, since in most clusters QueryableIndex storage/perf is much 
more serious issue than IncrementalIndex query perf. If we end up incurring a 
small speed hit on IncrementalIndex queries in order to get better compression 
on QueryableIndexes, then that is a good tradeoff.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1204: Define granularities in UTC

2018-08-02 Thread GitBox
vogievetsky commented on issue #1204: Define granularities in UTC
URL: 
https://github.com/apache/incubator-druid/issues/1204#issuecomment-410039263
 
 
   Is this still an issue after 
https://github.com/apache/incubator-druid/pull/4611 ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson closed issue #5934: eccn

2018-08-02 Thread GitBox
jihoonson closed issue #5934: eccn
URL: https://github.com/apache/incubator-druid/issues/5934
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #6066: Sorting rows when rollup is disabled

2018-08-02 Thread GitBox
gianm commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410052884
 
 
   > I'm guessing Druid does the rollup by sorting the dimensions in the order 
in the ingestion spec, which would indicate that dimension ordering would 
matter. Would ordering dims by based on cardinality help further? I'm guessing 
low->high card would be better than high->low, but idk enough about LZ(4?) 
compression to have a good intuition on this.
   
   Yep it does sort in the order that you provide them in the ingestion spec. 
LZ4 works by finding and eliminating redundancies in a sliding window, so it 
works best if you have a lot of local redundancy. CONCISE / Roaring also work 
better in that case (they can encode runs). And selecting values out of columns 
is faster if you are selecting rows that are close together.
   
   If you mostly care about size, I think you should do best starting with a 
lower cardinality column, especially one that a lot of other columns will have 
some kind of dependency on (like if you have a 'country' column and each 
region, city, user, etc will generally just be in one country). That way you're 
maximizing compression for as many columns as possible. If you care about 
retrieval speed too then it helps to think about locality: if you have a column 
you often filter on, you want it first/early in the sort order. Sometimes a 
column satisfies both of these, and in that case life is good (e.g. "tenant_id" 
for a multi-tenant-one-table design).
   
   In the case I mentioned (where we saw dramatically better compression after 
sorting) it was an e-commerce dataset, and we sorted rows first on product 
brand, then on product id.
   
   Btw, this line of thinking is most effective if your queryGranularity is 
coarse, since Druid always sorts time _first_ and then your other dimensions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] vogievetsky commented on issue #1827: JsonConfigurator does not escape quoted options

2018-08-02 Thread GitBox
vogievetsky commented on issue #1827: JsonConfigurator does not escape quoted 
options
URL: 
https://github.com/apache/incubator-druid/issues/1827#issuecomment-410032588
 
 
   Does https://github.com/apache/incubator-druid/pull/2033 address this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] mistercrunch commented on issue #6066: Sorting rows when rollup is disabled

2018-08-02 Thread GitBox
mistercrunch commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410043357
 
 
   I'm guessing Druid does the rollup by sorting the dimensions in the order in 
the ingestion spec, which would indicate that dimension ordering would matter. 
Would ordering dims by based on cardinality help further? I'm guessing 
low->high card would be better than high->low, but idk enough about LZ(4?) 
compression to have a good intuition on this.
   
   I'm also unclear on whether low->high card dim ordering would make it 
harder/easier on the sorting algorithm. (?)


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #5584: Decoupling FirehoseFactory and InputRowParser

2018-08-02 Thread GitBox
gianm commented on issue #5584: Decoupling FirehoseFactory and InputRowParser
URL: 
https://github.com/apache/incubator-druid/issues/5584#issuecomment-410045743
 
 
   Yeah -- StringInputRowParser is a bit of a special snowflake. Maybe doing 
this feature is the motivation we need to rethink how that all works. To add to 
the points you raised, there's also this gross code in TransformSpec (which 
decorates InputRowParsers with expression-based transforms):
   
   ```
   if (parser instanceof StringInputRowParser) {
 // Hack to support the fact that some callers use special methods in 
StringInputRowParser, such as
 // parse(String) and startFileFromBeginning.
 return (InputRowParser) new TransformingStringInputRowParser(
 parser.getParseSpec(),
 ((StringInputRowParser) parser).getEncoding(),
 this
 );
   } else {
 return new TransformingInputRowParser<>(parser, this);
   }
   ```
   
   And there's also the warty fact that ParseSpec's `makeParser()` method is 
actually only used by StringInputRowParser (and a couple of extension parsers 
that work by converting inputs to JSON and then parsing them as strings). Other 
parsers call `getTimestampSpec` and `getDimensionsSpec`, but they don't call 
`makeParser`. Some parsers, however, can extract useful things from certain 
types of ParseSpecs. The way this is done is weird, leading to code like this 
in the Avro extension to yank out an optional FlattenSpec:
   
   ```
   final JSONPathSpec flattenSpec;
   if (parseSpec != null && (parseSpec instanceof AvroParseSpec)) {
 flattenSpec = ((AvroParseSpec) parseSpec).getFlattenSpec();
   } else {
 flattenSpec = JSONPathSpec.DEFAULT;
   }
   ```
   
   The only reason StringInputRowParser doesn't have to do something like this 
is that the method _it_ needs (makeParser) has been granted special status as a 
member of the ParseSpec interface (even though many ParseSpecs don't implement 
it, they just throw exceptions).
   
   If we are wanting to be ambitious (I don't see why not!) then I think we can 
make nicer, clearer interfaces by rethinking how this all works. In a backwards 
compatible way, since we don't want all of our users to have to rewrite their 
ingestion specs (which will probably be the tricky part).
   
   @josephglanville what ambition level are you interested in having here? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled

2018-08-02 Thread GitBox
clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410053497
 
 
   @gianm Cool, I'll go ahead and make a PR as soon as I get some tests 
finished up. I'm not certain the cause of slower group-by performance, group by 
v2 doesn't seem to have any special additional interaction with facts holder. 
Is this worth looking into further?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' Columns

2018-08-02 Thread GitBox
clintropolis commented on a change in pull request #6016: Druid 'Shapeshifting' 
Columns
URL: https://github.com/apache/incubator-druid/pull/6016#discussion_r207370004
 
 

 ##
 File path: 
processing/src/main/java/io/druid/segment/data/codecs/ints/IntFormMetrics.java
 ##
 @@ -0,0 +1,166 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package io.druid.segment.data.codecs.ints;
+
+import io.druid.segment.IndexSpec;
+import io.druid.segment.data.codecs.FormMetrics;
+
+/**
+ * Aggregates statistics about blocks of integer values, such as total number 
of values processed, minimum and maximum
+ * values encountered, if the chunk is constant or all zeros, and various 
facts about data which is repeated more than
+ * twice ('runs') including number of distinct runs, longest run length, and 
total number of runs. This information is
+ * collected by {@link 
io.druid.segment.data.ShapeShiftingColumnarIntsSerializer} which processing row 
values, and is
+ * provided to {@link IntFormEncoder} implementations to do anything from 
estimate encoded size to influencing how
+ * {@link io.druid.segment.data.ShapeShiftingColumnarIntsSerializer} decides 
whether or not to employ that particular
+ * encoding.
+ */
+public class IntFormMetrics extends FormMetrics
+{
+  private int minValue = Integer.MAX_VALUE;
+  private int maxValue = Integer.MIN_VALUE;
+  private int numRunValues = 0;
+  private int numDistinctRuns = 0;
+  private int longestRun;
+  private int currentRun;
+  private int previousValue;
+  private int numValues = 0;
+  private boolean isFirstValue = true;
+
 
 Review comment:
   Javadocs were on getters, re-arranged code so reader will have to scan over 
those before getting to `processNextRow`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] hpandeycodeit edited a comment on issue #6047: Minor change in the "Start up Druid services" Section

2018-08-02 Thread GitBox
hpandeycodeit edited a comment on issue #6047:  Minor change in the "Start up 
Druid services" Section
URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410037251
 
 
   @jihoonson ,
   The scripts are there but doesn't look like it's working: 
   
   For eg: 
   
   $./broker.sh start
   Started broker node (5282)
   $./broker.sh status
   STOPPED
   
   So it's not starting. Anything I am missing here to start these? I just want 
to see how it works before updating the doc. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei commented on a change in pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-08-02 Thread GitBox
chengchengpei commented on a change in pull request #6072: validate 
baseDataSource non-empty string in DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072#discussion_r207408616
 
 

 ##
 File path: 
extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java
 ##
 @@ -41,9 +42,11 @@ public DerivativeDataSourceMetadata(
   @JsonProperty("metrics") Set metrics
   )
   {
-this.baseDataSource = Preconditions.checkNotNull(baseDataSource, 
"baseDataSource cannot be null. This is not a valid 
DerivativeDataSourceMetadata.");
 this.dimensions = Preconditions.checkNotNull(dimensions, "dimensions 
cannot be null. This is not a valid DerivativeDataSourceMetadata.");
 this.metrics = Preconditions.checkNotNull(metrics, "metrics cannot be 
null. This is not a valid DerivativeDataSourceMetadata.");
+
+Preconditions.checkArgument(!Strings.isNullOrEmpty(baseDataSource), 
"baseDataSource cannot be null or empty. Please provide a baseDataSource.");
+this.baseDataSource = baseDataSource;
 
 Review comment:
   @jihoonson fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei closed pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-08-02 Thread GitBox
chengchengpei closed pull request #6072: validate baseDataSource non-empty 
string in DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java
 
b/extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java
index ed102e354e0..3a82672e14d 100644
--- 
a/extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java
+++ 
b/extensions-contrib/materialized-view-maintenance/src/main/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadata.java
@@ -23,6 +23,7 @@
 import com.fasterxml.jackson.annotation.JsonProperty;
 import com.google.common.base.Preconditions;
 import com.google.common.collect.Sets;
+import com.google.common.base.Strings;
 import io.druid.indexing.overlord.DataSourceMetadata;
 
 import java.util.Objects;
@@ -41,7 +42,9 @@ public DerivativeDataSourceMetadata(
   @JsonProperty("metrics") Set metrics
   )
   {
-this.baseDataSource = Preconditions.checkNotNull(baseDataSource, 
"baseDataSource cannot be null. This is not a valid 
DerivativeDataSourceMetadata.");
+Preconditions.checkArgument(!Strings.isNullOrEmpty(baseDataSource), 
"baseDataSource cannot be null or empty. Please provide a baseDataSource.");
+this.baseDataSource = baseDataSource;
+
 this.dimensions = Preconditions.checkNotNull(dimensions, "dimensions 
cannot be null. This is not a valid DerivativeDataSourceMetadata.");
 this.metrics = Preconditions.checkNotNull(metrics, "metrics cannot be 
null. This is not a valid DerivativeDataSourceMetadata.");
   }
diff --git 
a/extensions-contrib/materialized-view-maintenance/src/test/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadataTest.java
 
b/extensions-contrib/materialized-view-maintenance/src/test/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadataTest.java
new file mode 100644
index 000..d62ea385d31
--- /dev/null
+++ 
b/extensions-contrib/materialized-view-maintenance/src/test/java/io/druid/indexing/materializedview/DerivativeDataSourceMetadataTest.java
@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package io.druid.indexing.materializedview;
+
+import com.google.common.collect.Sets;
+import org.hamcrest.CoreMatchers;
+import org.junit.Rule;
+import org.junit.Test;
+import org.junit.rules.ExpectedException;
+
+import java.util.Set;
+
+
+public class DerivativeDataSourceMetadataTest
+{
+  @Rule
+  public ExpectedException expectedException = ExpectedException.none();
+
+  @Test
+  public void testEmptyBaseDataSource() throws Exception
+  {
+
expectedException.expect(CoreMatchers.instanceOf(IllegalArgumentException.class));
+expectedException.expectMessage(
+"baseDataSource cannot be null or empty. Please provide a 
baseDataSource."
+);
+String baseDataSource = "";
+Set dims = Sets.newHashSet("dim1", "dim2", "dim3");
+Set metrics = Sets.newHashSet("cost");
+DerivativeDataSourceMetadata metadata = new 
DerivativeDataSourceMetadata(baseDataSource, dims, metrics);
+  }
+
+  @Test
+  public void testNullBaseDataSource() throws Exception
+  {
+
expectedException.expect(CoreMatchers.instanceOf(IllegalArgumentException.class));
+expectedException.expectMessage(
+"baseDataSource cannot be null or empty. Please provide a 
baseDataSource."
+);
+String baseDataSource = null;
+Set dims = Sets.newHashSet("dim1", "dim2", "dim3");
+Set metrics = Sets.newHashSet("cost");
+DerivativeDataSourceMetadata metadata = new 
DerivativeDataSourceMetadata(baseDataSource, dims, metrics);
+  }
+}


 


This is an automated message from the 

[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled

2018-08-03 Thread GitBox
clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410153475
 
 
   I ran some additional benchmarks after realizing that the generated rows 
from previous benchmarks were rows with no opportunity for actual rollup to 
occur (all segments were approximately the same size for the numbers above).
   
   Here are timeseries benches
   
   with moderate rollup opportunity:
   ```
   Benchmark(numSegments) 
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt   Score   
Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75   basic.A  avgt   25  663840.128 ± 
26363.127  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
ordered-no-rollup75   basic.A  avgt   25  679784.179 ± 
81577.842  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1 
rollup75   basic.A  avgt   25   62446.589 ±  2224.296  
us/op
   
   no-rollup:  size [22387432] bytes.
   ordered-no-rollup:  size [18195470] bytes.
   rollup: size [2206430] bytes.
   ```
   
   and heavy rollup potential:
   ```
   Benchmark(numSegments) 
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt   Score   
Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75   basic.A  avgt   25  653316.845 ± 
31964.338  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
ordered-no-rollup75   basic.A  avgt   25  769623.711 ± 
12299.182  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1 
rollup75   basic.A  avgt   256545.777 ±   607.087  
us/op
   
   no-rollup:  size [22383561] bytes.
   ordered-no-rollup:  size [16900327] bytes.
   rollup: size [237206] bytes.
   ```
   
   and TopN:
   moderate rollup:
   ```
   Benchmark  (numSegments) (rollupSchema)  
(rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt   Score  
Error  Units
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75   basic.A   10  avgt   25  893805.325 ± 
9592.710  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75   basic.A   10  avgt   25  898036.822 ± 
8052.554  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75   basic.A   10  avgt   25   86100.936 ± 
2844.073  us/op
   
   no-rollup:  size [22387432] bytes.
   ordered-no-rollup:  size [18195470] bytes.
   rollup: size [2206430] bytes.
   
   ```
   
   heavy rollup:
   ```
   Benchmark  (numSegments) (rollupSchema)  
(rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt   Score   
Error  Units
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75   basic.A   10  avgt   25  888967.034 ± 
25098.293  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75   basic.A   10  avgt   25  987568.305 ± 
50955.718  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75   basic.A   10  avgt   258820.929 ±   
699.516  us/op
   
   no-rollup:  size [22383561] bytes.
   ordered-no-rollup:  size [16900327] bytes.
   rollup: size [237206] bytes.
   ```
   
   It would appear that performance difference is more notable when the `Deque` 
are deeper, at least for topN and timeseries, since previous benchmarks were 
basically comparing flat maps with the same number of keys and single element 
`Deque`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`

2018-08-02 Thread GitBox
asdf2014 commented on issue #6090: Fix missing exception handling as part of 
`io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410119637
 
 
   Hi, @jihoonson . Yep, that's right. :+1: Perhaps something in this PR is 
still useful, for example, use `channel.isOpen()` to check that the channel is 
open before call `channel.close()` method and use `ExpectedException` instead 
of `Assert.assertTrue`. Do you want to close this, or change the title and keep 
those useful changes? :sweat_smile:


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis edited a comment on issue #6066: Sorting rows when rollup is disabled

2018-08-03 Thread GitBox
clintropolis edited a comment on issue #6066: Sorting rows when rollup is 
disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410153475
 
 
   I ran some additional benchmarks after realizing that the generated rows 
from previous benchmarks were rows with no opportunity for actual rollup to 
occur (all segments were approximately the same size for the numbers above).
   
   Here are timeseries benches
   
   with moderate rollup opportunity:
   ```
   Benchmark(numSegments) 
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt   Score   
Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75   basic.A  avgt   25  663840.128 ± 
26363.127  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
ordered-no-rollup75   basic.A  avgt   25  679784.179 ± 
81577.842  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1 
rollup75   basic.A  avgt   25   62446.589 ±  2224.296  
us/op
   
   no-rollup:  size [22387432] bytes.
   ordered-no-rollup:  size [18195470] bytes.
   rollup: size [2206430] bytes.
   ```
   
   and heavy rollup potential:
   ```
   Benchmark(numSegments) 
(rollupSchema)  (rowsPerSegment)  (schemaAndQuery)  Mode  Cnt   Score   
Error  Units
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
no-rollup75   basic.A  avgt   25  653316.845 ± 
31964.338  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1  
ordered-no-rollup75   basic.A  avgt   25  769623.711 ± 
12299.182  us/op
   TimeseriesBenchmark.querySingleIncrementalIndex  1 
rollup75   basic.A  avgt   256545.777 ±   607.087  
us/op
   
   no-rollup:  size [22383561] bytes.
   ordered-no-rollup:  size [16900327] bytes.
   rollup: size [237206] bytes.
   ```
   
   and TopN:
   moderate rollup:
   ```
   Benchmark  (numSegments) (rollupSchema)  
(rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt   Score  
Error  Units
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75   basic.A   10  avgt   25  893805.325 ± 
9592.710  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75   basic.A   10  avgt   25  898036.822 ± 
8052.554  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75   basic.A   10  avgt   25   86100.936 ± 
2844.073  us/op
   
   no-rollup:  size [22387432] bytes.
   ordered-no-rollup:  size [18195470] bytes.
   rollup: size [2206430] bytes.
   
   ```
   
   heavy rollup:
   ```
   Benchmark  (numSegments) (rollupSchema)  
(rowsPerSegment)  (schemaAndQuery)  (threshold)  Mode  Cnt   Score   
Error  Units
   TopNBenchmark.querySingleIncrementalIndex  1  no-rollup  
  75   basic.A   10  avgt   25  888967.034 ± 
25098.293  us/op
   TopNBenchmark.querySingleIncrementalIndex  1  ordered-no-rollup  
  75   basic.A   10  avgt   25  987568.305 ± 
50955.718  us/op
   TopNBenchmark.querySingleIncrementalIndex  1 rollup  
  75   basic.A   10  avgt   258820.929 ±   
699.516  us/op
   
   no-rollup:  size [22383561] bytes.
   ordered-no-rollup:  size [16900327] bytes.
   rollup: size [237206] bytes.
   ```
   
   It would appear that performance difference is more notable when the `Deque` 
are deeper, at least for topN and timeseries, since previous benchmarks were 
basically comparing flat maps with the same number of keys and single element 
`Deque`.
   
   Size savings will likely vary quite wildly based on dimension order and 
correlated to how effective rollup would be if were enabled at default 
millisecond granularity. In this case, with a few low cardinality dimensions 
and 1-10k events per timestamp sizes were 20-25% smaller.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] asdf2014 edited a comment on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`

2018-08-02 Thread GitBox
asdf2014 edited a comment on issue #6090: Fix missing exception handling as 
part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410119637
 
 
   Hi, @jihoonson . Yep, that's right. :+1: Perhaps something in this PR is 
still useful, for example, use `channel.isOpen()` to check that the channel is 
open before call `channel.close()` method and use `ExpectedException` instead 
of `Assert.assertTrue`. Do you want me to close this, or change the title and 
keep those useful changes, or patch [these 
changes](https://github.com/apache/incubator-druid/compare/master...jihoonson:fix-netty-client?expand=1)
 to this PR? :sweat_smile:


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-08-02 Thread GitBox
chengchengpei opened a new pull request #6072: validate baseDataSource 
non-empty string in DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072
 
 
   https://github.com/apache/incubator-druid/issues/6071
   
   The changes validate that baseDataSource is not a empty string. similar to 
https://github.com/apache/incubator-druid/pull/5785
   
   ready for review now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-08-02 Thread GitBox
chengchengpei opened a new pull request #6072: validate baseDataSource 
non-empty string in DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072
 
 
   https://github.com/apache/incubator-druid/issues/6071
   
   The changes validate that baseDataSource is not a empty string. similar to 
https://github.com/apache/incubator-druid/pull/5785
   
   ready for review now.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`

2018-08-02 Thread GitBox
asdf2014 commented on issue #6090: Fix missing exception handling as part of 
`io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410121501
 
 
   @jihoonson Okay, I keep those changes, and patch them to get the job done. 
Thanks for your comments.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jacktomcat closed issue #6098: Inconsistencies in the result of the quantile aggregator

2018-08-02 Thread GitBox
jacktomcat closed issue #6098: Inconsistencies in the result of the quantile 
aggregator
URL: https://github.com/apache/incubator-druid/issues/6098
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`

2018-08-03 Thread GitBox
asdf2014 commented on issue #6090: Fix missing exception handling as part of 
`io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410165818
 
 
   Hi, @jihoonson . After patching these changes, I found the same problem 
still exist in `ChannelResourceFactory`, you can run the  
`JankyServersTest#testHttpsEchoServer` test case to reproduce it. So, i added 
another anonymous class of `SimpleChannelUpstreamHandler` to fix it. PTAL.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] asdf2014 commented on issue #6090: Fix missing exception handling as part of `io.druid.java.util.http.client.netty.HttpClientPipelineFactory`

2018-08-03 Thread GitBox
asdf2014 commented on issue #6090: Fix missing exception handling as part of 
`io.druid.java.util.http.client.netty.HttpClientPipelineFactory`
URL: https://github.com/apache/incubator-druid/pull/6090#issuecomment-410170672
 
 
   BTW, when i tried to use `ExpectedException` instead of `Assert.assertTrue` 
in `JankyServersTest` for other test cases, i realized the 
`JankyServersTest#isChannelClosedException` situation was hard to convert to 
`ExpectedException` way, so i created a sub-calss of `TypeSafeMatcher` to 
resolve the problem.
   
   ```java
   private static class CauseMatcher extends TypeSafeMatcher
   {
 private final Class expectedType;
 private final String expectedMessage;
 private final boolean isRegex;
   
 public CauseMatcher(Class expectedType, String 
expectedMessage)
 {
   this.expectedType = expectedType;
   this.expectedMessage = expectedMessage;
   this.isRegex = false;
 }
   
 public CauseMatcher(Class expectedType, String 
expectedMessage, boolean isRegex)
 {
   this.expectedType = expectedType;
   this.expectedMessage = expectedMessage;
   this.isRegex = isRegex;
 }
   
 @Override
 protected boolean matchesSafely(Throwable item)
 {
   if (item == null || item.getClass() == null || item.getMessage() == 
null) {
 return false;
   }
   if (!item.getClass().isAssignableFrom(expectedType)) {
 return false;
   }
   if (isRegex) {
 return 
Pattern.compile(expectedMessage).matcher(item.getMessage()).find();
   } else {
 return item.getMessage().contains(expectedMessage);
   }
 }
   
 @Override
 public void describeTo(Description description)
 {
   description.appendText("expects type is ")
   .appendValue(expectedType)
   .appendText(" and message is ")
   .appendValue(expectedMessage);
 }
   }
   ```
   
   Then, use the following code to express the 
`JankyServersTest#isChannelClosedException` logic.
   
   ```java
   expectedException.expectCause(
   anyOf(
   new CauseMatcher(ChannelException.class, "Faulty channel in resource 
pool"),
   new CauseMatcher(IOException.class, ".*Connection reset by peer.*", 
true)
   )
   );
   ```
   
   However, this change will add too many new lines of code. If just to solve 
this single situation would be a huge cost, but it can be used as a common util 
class for other test cases, then it might be worth adding. What do you think? 
@jihoonson


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] dyanarose opened a new issue #6100: DOCS: ingestion spec documentation should be expanded

2018-08-03 Thread GitBox
dyanarose opened a new issue #6100: DOCS:  ingestion spec documentation should 
be expanded
URL: https://github.com/apache/incubator-druid/issues/6100
 
 
   
   - The dataSchema would be much clearer if an example of the raw data was 
included.
   - The importance of the metric spec would be clearer if there was a 
dedicated section explaining how it's used when querying along with example 
queries that are made possible by the example metrics spec (some information is 
spread out in the design document and the schema-design document, but the 
metricSpec should have it's own dedicated and complete documentation)
   
   
https://github.com/apache/incubator-druid/blob/master/docs/content/ingestion/index.md


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] asdf2014 commented on issue #6099: Inconsistencies in the result of the quantile aggregator

2018-08-03 Thread GitBox
asdf2014 commented on issue #6099: Inconsistencies in the result of the 
quantile aggregator
URL: 
https://github.com/apache/incubator-druid/issues/6099#issuecomment-410201965
 
 
   Hi, @jacktomcat . This result is obtained because the quantile aggregator 
uses an approximate algorithm called `sketch`. You can find it in the Druid 
[doc](http://druid.io/docs/latest/development/extensions-core/approximate-histograms.html).
 A [parper](http://jmlr.org/papers/volume11/ben-haim10a/ben-haim10a.pdf) and 
[blog](https://metamarkets.com/2013/histograms/) are also available in the 
document to learn the algorithm deeply. BTW, there is a 
[Table](https://datasketches.github.io/docs/Quantiles/QuantilesAccuracy.html) 
Guide for Quantiles DoublesSketch Size in Bytes and Approximate Error, which 
might help you.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] clintropolis commented on issue #6066: Sorting rows when rollup is disabled

2018-08-03 Thread GitBox
clintropolis commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410183701
 
 
   Ok, so I had to know, so I went ahead and did benchmarks if we do the other 
way and sort at persist time.
   
   no rollup opportunity:
   ```
   Benchmark(rollup)  (rowsPerSegment)  (schema)  Mode  
Cnt   ScoreError  Units
   IndexPersistBenchmark.persistV9  true 75000 basic  avgt  
 25  499315.212 ± 154036.971  us/op
   IndexPersistBenchmark.persistV9 false 75000 basic  avgt  
 25  449792.742 ± 28218.504  us/op
   IndexPersistBenchmark.persistV9 false (ordered)   75000 basic  avgt  
 25  508051.563 ±  63033.662  us/op
   
   all size: [3038874] bytes.
   ```
   moderate rollup opportunity:
   
   ```
   Benchmark(rollup)  (rowsPerSegment)  (schema)  Mode  
Cnt   Score   Error  Units
   IndexPersistBenchmark.persistV9  true 75000 basic  avgt  
 25  406840.576 ± 20732.769  us/op
   IndexPersistBenchmark.persistV9 false 75000 basic  avgt  
 25  431725.214 ± 18793.693  us/op
   IndexPersistBenchmark.persistV9 false (ordered)   75000 basic  avgt  
 25  494056.572 ± 34396.770  us/op
   
   rollup: size [2285574] bytes.
   no-rollup:  size [2741399] bytes.
   ordered-no-rollup:  size [2516639] bytes.
   ```
   more rollup:
   
   ```
   Benchmark(rollup)  (rowsPerSegment)  (schema)  Mode  
Cnt   Score   Error  Units
   IndexPersistBenchmark.persistV9  true 75000 basic  avgt  
 25  338251.339 ± 22031.319  us/op
   IndexPersistBenchmark.persistV9 false 75000 basic  avgt  
 25  443272.327 ± 25099.425  us/op
   IndexPersistBenchmark.persistV9 false (ordered)   75000 basic  avgt  
 25  552234.263 ± 41889.207  us/op
   
   rollup: size [1755456] bytes.
   no-rollup:  size [2741017] bytes.
   ordered-no-rollup:  size [2346649] bytes.
   ```
   
   I don't have strong feelings about the best way to do this, persist 
performance cost looks to be on the range of 15-20% slower here. Maybe better 
to sort at persist time to not risk impact to query performance?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] asdf2014 commented on issue #6047: Minor change in the "Start up Druid services" Section

2018-08-03 Thread GitBox
asdf2014 commented on issue #6047:  Minor change in the "Start up Druid 
services" Section
URL: https://github.com/apache/incubator-druid/pull/6047#issuecomment-410184869
 
 
   Hi, @hpandeycodeit . I also encountered this situation. There is 
[formula](http://druid.io/docs/latest/configuration/broker.html#processing) for 
it in the Druid doc. In my experience with this problem, you may need to add 
`-XX:MaxDirectMemorySize` option to both of those 
`conf/druid/historical/jvm.config` and `conf/druid/broker/jvm.config` files  
:sweat_smile:


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] aryaflow commented on issue #5221: Support Hadoop batch ingestion for druid-azure-extensions

2018-08-03 Thread GitBox
aryaflow commented on issue #5221: Support Hadoop batch ingestion for 
druid-azure-extensions
URL: https://github.com/apache/incubator-druid/pull/5221#issuecomment-410241234
 
 
   @hoesler hadoop-azure is not added by default. But it's needed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] robertervin opened a new issue #6101: Druid InDimFilter Fails on Empty Values

2018-08-03 Thread GitBox
robertervin opened a new issue #6101: Druid InDimFilter Fails on Empty Values
URL: https://github.com/apache/incubator-druid/issues/6101
 
 
   ### Use Case:
   We dynamically build Druid queries by allowing the user to filter the UI. 
Sometimes this results in the user filtering out all options (sometimes 
unintentionally), and is something we want to allow for.
   
   ### Problem
   In 
https://github.com/apache/incubator-druid/blob/04ea3c9f8c1f5ea34b023217bd709509ace4d30d/processing/src/main/java/io/druid/query/filter/InDimFilter.java#L78
   Druid fails when filtering on empty values, though I see no reason why it 
should since the result should just be empty. e.g. in pseudocode,
   
   ```
mydatasource.filter(mycolumn in []) = []
   ```
   
   Is there a better reason for this, or can I submit a PR to remove that 
restriction?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] aryaflow commented on issue #5221: Support Hadoop batch ingestion for druid-azure-extensions

2018-08-03 Thread GitBox
aryaflow commented on issue #5221: Support Hadoop batch ingestion for 
druid-azure-extensions
URL: https://github.com/apache/incubator-druid/pull/5221#issuecomment-410234843
 
 
   Thanks for the feature. I tried this in a cluster and succeeded to make it 
work... but I had to add wasb to JobHelper.java as well.
   `} else if ("wasbs".equals(type)) {
 segmentLocURI = URI.create(loadSpec.get("path").toString());
   }`


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] bohemia420 commented on issue #5150: Druid-parquet-extensions fails on timestamps (stored as INT96) in parquet files

2018-08-03 Thread GitBox
bohemia420 commented on issue #5150: Druid-parquet-extensions fails on 
timestamps (stored as INT96) in parquet files
URL: 
https://github.com/apache/incubator-druid/issues/5150#issuecomment-410239779
 
 
   is this resolved? @amalakar  how did u remove avro converter? @gianm  how 
does one change the parquet reading strategy?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-08-03 Thread GitBox
chengchengpei commented on issue #6072: validate baseDataSource non-empty 
string in DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072#issuecomment-410242447
 
 
   @jihoonson NP. How can I become committer?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] aoeiuvb opened a new issue #6102: "Cannot have same delimiter and list delimiter of \u0001"

2018-08-03 Thread GitBox
aoeiuvb opened a new issue #6102: "Cannot have same delimiter and list 
delimiter of \u0001"
URL: https://github.com/apache/incubator-druid/issues/6102
 
 
   When I used hadoop to absorb data, I used DelimitedParser to parse the row. 
The delimiter I set was **\u0001**, but the following error was reported to 
me:"**Cannot have the same delimiter and list delimiter of \u0001**".
   
   I looked at the source code of the DelimitedParser and found that in the 
constructor, the delimiter and listDelimiter are checked to see if they are 
equal, and if so, an error will be reported.I did not set the value of 
listDelimiter, which is also the default value: **\u0001**, so this error was 
reported.To solve this problem, I specifically specified other values in the 
listDelimiter so that the program could pass.
   
   May I ask if there is any problem with the design of this piece?
   https://user-images.githubusercontent.com/15936294/43644856-be5ec84a-9762-11e8-8cb1-899a8bb02098.png;>
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] leventov commented on a change in pull request #5957: Renamed GenericColumnSerializer to ColumnSerializer; 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; Fixe

2018-08-03 Thread GitBox
leventov commented on a change in pull request #5957: Renamed 
GenericColumnSerializer to ColumnSerializer; 'Generic Column' -> 'Numeric 
Column'; Fixed a few resource leaks in processing; Fixed a bug in 
SingleStringInputDimensionSelector; misc refinements
URL: https://github.com/apache/incubator-druid/pull/5957#discussion_r207604065
 
 

 ##
 File path: benchmarks/src/main/java/io/druid/benchmark/package-info.java
 ##
 @@ -17,17 +17,7 @@
  * under the License.
  */
 
-package io.druid.segment;
+@EverythingIsNonnullByDefault
+package io.druid.benchmark;
 
-import io.druid.guice.annotations.ExtensionPoint;
-import io.druid.segment.serde.Serializer;
-
-import java.io.IOException;
-
-@ExtensionPoint
-public interface GenericColumnSerializer extends Serializer
 
 Review comment:
   Next release is a major release (0.12.x -> 0.13). I don't think we should 
fall into conservatism (not fixing things because users may depend on them) 
until Druid 1.0, and clean up APIs eagerly until before that. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] leventov commented on a change in pull request #5957: Renamed GenericColumnSerializer to ColumnSerializer; 'Generic Column' -> 'Numeric Column'; Fixed a few resource leaks in processing; Fixe

2018-08-03 Thread GitBox
leventov commented on a change in pull request #5957: Renamed 
GenericColumnSerializer to ColumnSerializer; 'Generic Column' -> 'Numeric 
Column'; Fixed a few resource leaks in processing; Fixed a bug in 
SingleStringInputDimensionSelector; misc refinements
URL: https://github.com/apache/incubator-druid/pull/5957#discussion_r207604065
 
 

 ##
 File path: benchmarks/src/main/java/io/druid/benchmark/package-info.java
 ##
 @@ -17,17 +17,7 @@
  * under the License.
  */
 
-package io.druid.segment;
+@EverythingIsNonnullByDefault
+package io.druid.benchmark;
 
-import io.druid.guice.annotations.ExtensionPoint;
-import io.druid.segment.serde.Serializer;
-
-import java.io.IOException;
-
-@ExtensionPoint
-public interface GenericColumnSerializer extends Serializer
 
 Review comment:
   Next release is a major release (0.12.x -> 0.13). I don't think we should 
fall into conservatism (not fixing things because users may depend on them) 
until Druid 1.0, and clean up APIs eagerly before that. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] himanshug commented on issue #6066: Sorting rows when rollup is disabled

2018-08-03 Thread GitBox
himanshug commented on issue #6066: Sorting rows when rollup is disabled
URL: 
https://github.com/apache/incubator-druid/issues/6066#issuecomment-410326897
 
 
   @clintropolis if you do end up further benchmarking sorting at persist time, 
then you can also consider reordering the dimensions from low to high 
cardinality as they are known at that point to potentially improve on size.
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] yuppie-flu opened a new pull request #6070: Use JUnit TemporaryFolder rule instead of system temp folder

2018-07-30 Thread GitBox
yuppie-flu opened a new pull request #6070: Use JUnit TemporaryFolder rule 
instead of system temp folder
URL: https://github.com/apache/incubator-druid/pull/6070
 
 
   This should help to avoid transient failures, mentioned in #6013 
([comment](https://github.com/apache/incubator-druid/issues/6013#issuecomment-406821311))
 when running tests in parallel. Otherwise several tests use the same /tmp 
working folder.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #5903: Druid allows adding empty string dataSources (it shouldn't)

2018-07-30 Thread GitBox
gianm commented on issue #5903: Druid allows adding empty string dataSources 
(it shouldn't)
URL: 
https://github.com/apache/incubator-druid/issues/5903#issuecomment-408942682
 
 
   Fixed by #5785.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm closed pull request #6059: CompactionTask: Reject empty intervals on construction.

2018-07-30 Thread GitBox
gianm closed pull request #6059: CompactionTask: Reject empty intervals on 
construction.
URL: https://github.com/apache/incubator-druid/pull/6059
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/docs/content/ingestion/tasks.md b/docs/content/ingestion/tasks.md
index 7d63884e2ee..dace2978dd5 100644
--- a/docs/content/ingestion/tasks.md
+++ b/docs/content/ingestion/tasks.md
@@ -279,13 +279,16 @@ An example of compaction task is
 
 This compaction task reads _all segments_ of the interval 
`2017-01-01/2018-01-01` and results in new segments.
 Note that intervals of the input segments are merged into a single interval of 
`2017-01-01/2018-01-01` no matter what the segmentGranularity was.
-To controll the number of result segments, you can set `targetPartitionSize` 
or `numShards`. See [indexTuningConfig](#tuningconfig) for more details.
+To control the number of result segments, you can set `targetPartitionSize` or 
`numShards`. See [indexTuningConfig](#tuningconfig) for more details.
 To merge each day's worth of data into separate segments, you can submit 
multiple `compact` tasks, one for each day. They will run in parallel.
 
 A compaction task internally generates an `index` task spec for performing 
compaction work with some fixed parameters.
 For example, its `firehose` is always the 
[ingestSegmentSpec](./firehose.html), and `dimensionsSpec` and `metricsSpec`
 include all dimensions and metrics of the input segments by default.
 
+Compaction tasks will exit with a failure status code, without doing anything, 
if the interval you specify has no
+data segments loaded in it (or if the interval you specify is empty).
+
 The output segment can have different metadata from the input segments unless 
all input segments have the same metadata.
 
 - Dimensions: since Druid supports schema change, the dimensions can be 
different across segments even if they are a part of the same dataSource.
diff --git 
a/indexing-service/src/main/java/io/druid/indexing/common/task/CompactionTask.java
 
b/indexing-service/src/main/java/io/druid/indexing/common/task/CompactionTask.java
index 53855574db2..f81999f15f2 100644
--- 
a/indexing-service/src/main/java/io/druid/indexing/common/task/CompactionTask.java
+++ 
b/indexing-service/src/main/java/io/druid/indexing/common/task/CompactionTask.java
@@ -49,6 +49,7 @@
 import io.druid.indexing.common.task.IndexTask.IndexIngestionSpec;
 import io.druid.indexing.common.task.IndexTask.IndexTuningConfig;
 import io.druid.indexing.firehose.IngestSegmentFirehoseFactory;
+import io.druid.java.util.common.IAE;
 import io.druid.java.util.common.ISE;
 import io.druid.java.util.common.JodaUtils;
 import io.druid.java.util.common.Pair;
@@ -136,6 +137,10 @@ public CompactionTask(
 Preconditions.checkArgument(interval != null || segments != null, 
"interval or segments should be specified");
 Preconditions.checkArgument(interval == null || segments == null, "one of 
interval and segments should be null");
 
+if (interval != null && interval.toDurationMillis() == 0) {
+  throw new IAE("Interval[%s] is empty, must specify a nonempty interval", 
interval);
+}
+
 this.interval = interval;
 this.segments = segments;
 this.dimensionsSpec = dimensionsSpec;
@@ -225,7 +230,7 @@ public TaskStatus run(final TaskToolbox toolbox) throws 
Exception
 }
 
 if (indexTaskSpec == null) {
-  log.warn("Failed to generate compaction spec");
+  log.warn("Interval[%s] has no segments, nothing to do.", interval);
   return TaskStatus.failure(getId());
 } else {
   final String json = 
jsonMapper.writerWithDefaultPrettyPrinter().writeValueAsString(indexTaskSpec);
@@ -237,7 +242,7 @@ public TaskStatus run(final TaskToolbox toolbox) throws 
Exception
 
   /**
* Generate {@link IndexIngestionSpec} from input segments.
-
+   *
* @return null if input segments don't exist. Otherwise, a generated 
ingestionSpec.
*/
   @Nullable
diff --git 
a/indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java 
b/indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java
index a6d3c53060f..5f2cbad32bc 100644
--- 
a/indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java
+++ 
b/indexing-service/src/main/java/io/druid/indexing/common/task/IndexTask.java
@@ -41,6 +41,7 @@
 import io.druid.data.input.Rows;
 import io.druid.hll.HyperLogLogCollector;
 import io.druid.indexer.IngestionState;
+import io.druid.indexer.TaskStatus;
 import io.druid.indexing.appenderator.ActionBasedSegmentAllocator;
 import io.druid.indexing.appenderator.ActionBasedUsedSegmentChecker;
 import io.druid.indexing.common.IngestionStatsAndErrorsTaskReport;
@@ -48,7 +49,6 @@
 import 

[GitHub] gianm closed issue #5903: Druid allows adding empty string dataSources (it shouldn't)

2018-07-30 Thread GitBox
gianm closed issue #5903: Druid allows adding empty string dataSources (it 
shouldn't)
URL: https://github.com/apache/incubator-druid/issues/5903
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-07-30 Thread GitBox
chengchengpei commented on issue #6072: validate baseDataSource non-empty 
string in DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072#issuecomment-408943015
 
 
   how to add WIP tag?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on a change in pull request #6062: Fix a bug in GroupByQueryEngine

2018-07-30 Thread GitBox
gianm commented on a change in pull request #6062: Fix a bug in 
GroupByQueryEngine
URL: https://github.com/apache/incubator-druid/pull/6062#discussion_r206202261
 
 

 ##
 File path: 
processing/src/main/java/io/druid/query/groupby/GroupByQueryEngine.java
 ##
 @@ -178,37 +178,33 @@ public int getNumRows()
   return positions;
 }
 
-private List updateValues(
-ByteBuffer key,
-List dims
-)
+@Nullable
+private List updateValues(ByteBuffer key, 
List dims)
 {
   if (dims.size() > 0) {
-List retVal = null;
-List unaggregatedBuffers = null;
-
 final DimensionSelector dimSelector = dims.get(0);
 final IndexedInts row = dimSelector.getRow();
 final int rowSize = row.size();
 if (rowSize == 0) {
   ByteBuffer newKey = key.duplicate();
   newKey.putInt(MISSING_VALUE);
-  unaggregatedBuffers = updateValues(newKey, dims.subList(1, 
dims.size()));
+  return updateValues(newKey, dims.subList(1, dims.size()));
 } else {
+  List retVal = null;
   for (int i = 0; i < rowSize; i++) {
 ByteBuffer newKey = key.duplicate();
 int dimValue = row.get(i);
 newKey.putInt(dimValue);
-unaggregatedBuffers = updateValues(newKey, dims.subList(1, 
dims.size()));
-  }
-}
-if (unaggregatedBuffers != null) {
-  if (retVal == null) {
-retVal = Lists.newArrayList();
+List unaggregatedBuffers = updateValues(newKey, 
dims.subList(1, dims.size()));
 
 Review comment:
   > @jihoonson or @jon-wei is that correct? If yes, do you plan to remove 
groupBy v1?
   
   It is correct, groupBy v1 is legacy and should be removed in the future. 
There is one case I am aware of where it is 'better' than v2: if you have some 
growable aggregators (like some sketches), v1 will probably use less memory 
than v2. This is because v2 uses off-heap aggregators which allocate all the 
memory they need up-front, but v1 uses on-heap aggregators that can grow 
gradually. I don't have the issue number handy, but someone raised this in the 
past as a reason that they switched back from v2 to v1 (they were doing groupBy 
on a high-cardinality dimension with thetaSketch metrics).
   
   I think if off-heap aggregators could learn how to start small and then 
grow, then v1 will be really outpaced by v2 in all known cases and could be 
removed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #6069: BUG: intermediate persist directories are not cleaned up

2018-07-30 Thread GitBox
gianm commented on issue #6069: BUG: intermediate persist directories are not 
cleaned up
URL: 
https://github.com/apache/incubator-druid/issues/6069#issuecomment-408912020
 
 
   Hi @varaga,
   
   The MiddleManager processes themselves are meant to clean up these 
directories when tasks exit. Are the MiddleManagers (or the machine itself) 
crashing before the cleanup can happen?
   
   Also, I _think_ they do not clean up 'stale' directories for unused tasks on 
startup. This would be a nice feature to add. Something like: remove all task 
working directories for tasks that are _not_ restored (i.e. in restore.json) 
Are you interested in contributing a patch for that?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei opened a new pull request #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-07-30 Thread GitBox
chengchengpei opened a new pull request #6072: validate baseDataSource 
non-empty string in DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072
 
 
   https://github.com/apache/incubator-druid/issues/6071
   
   the following two tests failed:
   
   Tests in error:
 TestAWSCredentialsProvider.testWithFileSessionCredentials:98 » SdkClient 
Unabl...
 TestAWSCredentialsProvider.testWithFixedAWSKeys:67 » SdkClient Unable to 
find ...
   
   working on this


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument

2018-07-30 Thread GitBox
gianm commented on a change in pull request #6068: Prohibit 
Lists.newArrayList() with a single argument
URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206212482
 
 

 ##
 File path: 
indexing-service/src/main/java/io/druid/indexing/common/task/Tasks.java
 ##
 @@ -74,7 +73,8 @@
   toBeAccumulated.add(interval);
 } else {
   compactIntervals.add(JodaUtils.umbrellaInterval(toBeAccumulated));
-  toBeAccumulated = Lists.newArrayList(interval);
+  toBeAccumulated.clear();
 
 Review comment:
   Joda Interval objects are immutable so I think this point is moot. It 
doesn't matter if `JodaUtils.umbrellaInterval` returns a new Interval instance 
or not.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument

2018-07-30 Thread GitBox
gianm commented on a change in pull request #6068: Prohibit 
Lists.newArrayList() with a single argument
URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206212950
 
 

 ##
 File path: processing/src/main/java/io/druid/collections/spatial/Node.java
 ##
 @@ -56,7 +57,7 @@ public Node(float[] minCoordinates, float[] maxCoordinates, 
boolean isLeaf, Bitm
   public Node(
   float[] minCoordinates,
   float[] maxCoordinates,
-  List children,
+  @Nullable Node child,
 
 Review comment:
   On YAGNI vs BDUF: IMO, YAGNI is better for internal APIs (which this is) so 
it is appropriate to get rid of the list. BDUF has some value for external APIs 
that we can't change later without disrupting users. It can pay to try to think 
about how usage may evolve over time, before the APIs are introduced.
   
   So my 2¢ is that as an internal API, it's better for this one to offer no 
more than we need today.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm edited a comment on issue #6070: Use JUnit TemporaryFolder rule instead of system temp folder

2018-07-30 Thread GitBox
gianm edited a comment on issue #6070: Use JUnit TemporaryFolder rule instead 
of system temp folder
URL: https://github.com/apache/incubator-druid/pull/6070#issuecomment-408925522
 
 
   Hi @yuppie-flu, there seems to be some problem with the new forbidden-api 
check:
   
   ```
   [ERROR] Failed to execute goal de.thetaphi:forbiddenapis:2.3:check 
(validate) on project java-util: Parsing signatures failed: Class 
'org.apache.commons.io.FileUtils' not found on classpath while parsing 
signature: org.apache.commons.io.FileUtils#getTempDirectory() -> [Help 1]
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei opened a new issue #6071: DerivativeDataSourceMetadata allows adding empty string baseDataSource

2018-07-30 Thread GitBox
chengchengpei opened a new issue #6071: DerivativeDataSourceMetadata allows 
adding empty string baseDataSource
URL: https://github.com/apache/incubator-druid/issues/6071
 
 
   similar to https://github.com/apache/incubator-druid/issues/5903. 
`baseDataSource` should not be empty string.
   
   I am new to druid, but would like to contribute to druid. I am working on 
this.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #6036: use S3 as a backup storage for hdfs deep storage

2018-07-30 Thread GitBox
gianm commented on issue #6036: use S3 as a backup storage for hdfs deep storage
URL: https://github.com/apache/incubator-druid/pull/6036#issuecomment-408899333
 
 
   >Agreed. Earlier I tried to load primary and secondary deep storage 
extensions inside composite-deep-storage module and found it difficult. But now 
I think maybe we can just let users config druid to load three extensions 
(primary + second + composite deep storage) if they want to use this feature. 
Then it would be straightforward to implement composite-deep-storage.
   
   That sounds great. It's sort of like how the composing emitter works: you 
need to load all emitter extensions that you want to compose.
   
   >Btw, not sure whether composite-deep-storage is the right name or not, 
since it's not writing to all components as composite-emitter.
   
   Maybe `chained-deep-storage` would be a good name for the extension.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] gianm commented on issue #6058: Remove some unnecessary task storage internal APIs.

2018-07-30 Thread GitBox
gianm commented on issue #6058: Remove some unnecessary task storage internal 
APIs.
URL: https://github.com/apache/incubator-druid/pull/6058#issuecomment-408899571
 
 
   @asdf2014 I will fix them and re-push.
   
   Anything that makes the CI failure message more useful sounds great  


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries

2018-07-30 Thread GitBox
jihoonson commented on a change in pull request #5857: Optimize filtered aggs 
with interval filters in per-segment queries
URL: https://github.com/apache/incubator-druid/pull/5857#discussion_r206301783
 
 

 ##
 File path: 
processing/src/main/java/io/druid/query/aggregation/AggregatorFactory.java
 ##
 @@ -141,6 +142,17 @@ public AggregatorFactory 
getMergingFactory(AggregatorFactory other) throws Aggre
*/
   public abstract int getMaxIntermediateSize();
 
+  /**
+   * Return a potentially optimized form of this AggregatorFactory for 
per-segment queries.
+   *
+   * @param optimizationContext
+   * @return
 
 Review comment:
   nit: you can remove `@param` and `@return` here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries

2018-07-30 Thread GitBox
jihoonson commented on a change in pull request #5857: Optimize filtered aggs 
with interval filters in per-segment queries
URL: https://github.com/apache/incubator-druid/pull/5857#discussion_r206301554
 
 

 ##
 File path: 
processing/src/main/java/io/druid/query/PerSegmentQueryOptimizationContext.java
 ##
 @@ -0,0 +1,37 @@
+/*
+ * Licensed to Metamarkets Group Inc. (Metamarkets) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. Metamarkets licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package io.druid.query;
+
+public class PerSegmentQueryOptimizationContext
 
 Review comment:
   Would you add a javadoc here too? Also, can we add some more per-segment 
information in the future?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on issue #6061: skip travis on doc only changes

2018-07-30 Thread GitBox
jihoonson commented on issue #6061: skip travis on doc only changes
URL: https://github.com/apache/incubator-druid/pull/6061#issuecomment-408976290
 
 
   @clintropolis it sounds good.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on issue #5492: Native parallel batch indexing without shuffle

2018-07-30 Thread GitBox
jihoonson commented on issue #5492: Native parallel batch indexing without 
shuffle
URL: https://github.com/apache/incubator-druid/pull/5492#issuecomment-409004658
 
 
   @drcrallen do you have further comments?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on issue #5584: Decoupling FirehoseFactory and InputRowParser

2018-07-30 Thread GitBox
jihoonson commented on issue #5584: Decoupling FirehoseFactory and 
InputRowParser
URL: 
https://github.com/apache/incubator-druid/issues/5584#issuecomment-409010765
 
 
   @josephglanville, thanks for the comment. It sounds good to me. I also agree 
to support Reader for realtime ingestion too. Do you want to make a PR for this?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] Caroline1000 opened a new pull request #6073: Query gran0730

2018-07-30 Thread GitBox
Caroline1000 opened a new pull request #6073: Query gran0730
URL: https://github.com/apache/incubator-druid/pull/6073
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument

2018-07-30 Thread GitBox
jihoonson commented on a change in pull request #6068: Prohibit 
Lists.newArrayList() with a single argument
URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206272902
 
 

 ##
 File path: 
indexing-service/src/main/java/io/druid/indexing/common/task/Tasks.java
 ##
 @@ -74,7 +73,8 @@
   toBeAccumulated.add(interval);
 } else {
   compactIntervals.add(JodaUtils.umbrellaInterval(toBeAccumulated));
-  toBeAccumulated = Lists.newArrayList(interval);
+  toBeAccumulated.clear();
 
 Review comment:
   Here is a simple example of `umbrellaInterval()` to break this method. Note 
that every caller currently passes a list. This method can be changed to accept 
a list without further change.
   
   ```java
 public static Interval umbrellaInterval(List intervals)
 {
   if (intervals.size() == 1) {
 return intervals.get(0);
   }
   ...
   ```
   
   > Also, as per "Effective Java", the responsibility for making defensive 
copies is on the method (API) implementation side, not on the caller side.
   
   I basically agree on this. If you want to keep as it is, please add a 
javadoc to `umbrellaInterval()` to make sure that it always returns a new 
instance.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument

2018-07-30 Thread GitBox
jihoonson commented on a change in pull request #6068: Prohibit 
Lists.newArrayList() with a single argument
URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206277090
 
 

 ##
 File path: 
indexing-service/src/main/java/io/druid/indexing/common/task/Tasks.java
 ##
 @@ -74,7 +73,8 @@
   toBeAccumulated.add(interval);
 } else {
   compactIntervals.add(JodaUtils.umbrellaInterval(toBeAccumulated));
-  toBeAccumulated = Lists.newArrayList(interval);
+  toBeAccumulated.clear();
 
 Review comment:
   Oh, you're correct. This looks fine.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on a change in pull request #5857: Optimize filtered aggs with interval filters in per-segment queries

2018-07-30 Thread GitBox
jihoonson commented on a change in pull request #5857: Optimize filtered aggs 
with interval filters in per-segment queries
URL: https://github.com/apache/incubator-druid/pull/5857#discussion_r206304373
 
 

 ##
 File path: 
processing/src/main/java/io/druid/query/aggregation/FilteredAggregatorFactory.java
 ##
 @@ -140,6 +144,61 @@ public int getMaxIntermediateSize()
 return delegate.getMaxIntermediateSize();
   }
 
+  @Override
+  public AggregatorFactory 
optimizeForSegment(PerSegmentQueryOptimizationContext optimizationContext)
+  {
+if (filter instanceof IntervalDimFilter) {
 
 Review comment:
   It looks that this optimization would work only when there is a single 
`IntervalDimFilter` here. But, probably there would be more complicated filter 
using `AndDimFilter` or `OrDimFilter`. To support that, we need to check all 
filters. What do you think about adding a new interface like 
`optimizeForSegment` to `DimFilter`? Probably we also need to use the visitor 
pattern to fully optimize this.
   
   I'm not sure this should be a part of this PR. If you think it's not, would 
you please raise an issue about this and add a link here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei opened a new pull request #6075: validate non-empty String baseDataSource in MaterializedViewSuperviso…

2018-07-30 Thread GitBox
chengchengpei opened a new pull request #6075: validate non-empty String 
baseDataSource in MaterializedViewSuperviso…
URL: https://github.com/apache/incubator-druid/pull/6075
 
 
   re: https://github.com/apache/incubator-druid/issues/6074
   
   I am working on this.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] leventov commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument

2018-07-30 Thread GitBox
leventov commented on a change in pull request #6068: Prohibit 
Lists.newArrayList() with a single argument
URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206274500
 
 

 ##
 File path: 
indexing-service/src/main/java/io/druid/indexing/common/task/Tasks.java
 ##
 @@ -74,7 +73,8 @@
   toBeAccumulated.add(interval);
 } else {
   compactIntervals.add(JodaUtils.umbrellaInterval(toBeAccumulated));
-  toBeAccumulated = Lists.newArrayList(interval);
+  toBeAccumulated.clear();
 
 Review comment:
   @jihoonson but `umbrellaInterval()` is called before the list is cleared. 
The version that you presented will work fine as well, unless there is 
*concurrent* modification of the passed list, that is a total mess and would 
break `umbrellaInterval()` anyway.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on a change in pull request #6068: Prohibit Lists.newArrayList() with a single argument

2018-07-30 Thread GitBox
jihoonson commented on a change in pull request #6068: Prohibit 
Lists.newArrayList() with a single argument
URL: https://github.com/apache/incubator-druid/pull/6068#discussion_r206274914
 
 

 ##
 File path: processing/src/main/java/io/druid/collections/spatial/Node.java
 ##
 @@ -56,7 +57,7 @@ public Node(float[] minCoordinates, float[] maxCoordinates, 
boolean isLeaf, Bitm
   public Node(
   float[] minCoordinates,
   float[] maxCoordinates,
-  List children,
+  @Nullable Node child,
 
 Review comment:
   I don't have a strong opinion here, but I think anyone would wonder why this 
`Node` accepts only a single child in its constructor if he is not familiar 
with this part because they would know what the tree is and usually a node can 
have multiple children. I think that BDUF and YAGNI are the concepts to give 
some basic guidance and we should be able to apply it flexibly. 
   
   Anyway, if you want to keep this, please add a comment why it currently 
accepts a single child.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-07-30 Thread GitBox
jihoonson commented on issue #6072: validate baseDataSource non-empty string in 
DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072#issuecomment-408966069
 
 
   Hi @chengchengpei, thank you for the contribution! I've added `WIP` tag.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] jihoonson commented on a change in pull request #5789: Add stringLast and stringFirst aggregators extension

2018-07-30 Thread GitBox
jihoonson commented on a change in pull request #5789: Add stringLast and 
stringFirst aggregators extension
URL: https://github.com/apache/incubator-druid/pull/5789#discussion_r206319063
 
 

 ##
 File path: 
processing/src/main/java/io/druid/query/aggregation/first/StringFirstBufferAggregator.java
 ##
 @@ -0,0 +1,151 @@
+/*
+ * Licensed to Metamarkets Group Inc. (Metamarkets) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. Metamarkets licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package io.druid.query.aggregation.first;
+
+import io.druid.java.util.common.StringUtils;
+import io.druid.query.aggregation.BufferAggregator;
+import io.druid.query.aggregation.SerializablePairLongString;
+import io.druid.query.monomorphicprocessing.RuntimeShapeInspector;
+import io.druid.segment.BaseLongColumnValueSelector;
+import io.druid.segment.BaseObjectColumnValueSelector;
+
+import java.nio.ByteBuffer;
+
+// TODO: Unit Test
+public class StringFirstBufferAggregator implements BufferAggregator
+{
+  private final BaseLongColumnValueSelector timeSelector;
+  private final BaseObjectColumnValueSelector valueSelector;
+  private final int maxStringBytes;
+
+  public StringFirstBufferAggregator(
+  BaseLongColumnValueSelector timeSelector,
+  BaseObjectColumnValueSelector valueSelector,
+  int maxStringBytes
+  )
+  {
+this.timeSelector = timeSelector;
+this.valueSelector = valueSelector;
+this.maxStringBytes = maxStringBytes;
+  }
+
+  @Override
+  public void init(ByteBuffer buf, int position)
+  {
+buf.putLong(position, Long.MAX_VALUE);
+buf.putInt(position + Long.BYTES, 0);
+  }
+
+  @Override
+  public void aggregate(ByteBuffer buf, int position)
+  {
+ByteBuffer mutationBuffer = buf.duplicate();
+mutationBuffer.position(position);
+
+Object value = valueSelector.getObject();
+
+long time = Long.MAX_VALUE;
+String firstString = null;
+
+if (value instanceof SerializablePairLongString) {
+  SerializablePairLongString serializablePair = 
(SerializablePairLongString) value;
+  time = serializablePair.lhs;
+  firstString = serializablePair.rhs;
+} else if (value instanceof String) {
+  time = timeSelector.getLong();
+  firstString = (String) value;
+} else if (value != null) {
+  throw new IllegalStateException(
+  "Try to aggregate unsuported class type ["
+  + value.getClass().getName() +
+  "]. Supported class types: String or SerializablePairLongString"
+  );
+}
+
+long lastTime = mutationBuffer.getLong(position);
+if (firstString != null && time < lastTime) {
 
 Review comment:
   > In this case, you can't use filters. If you use filters you discard the 
row minute: 1, client: AAA, location: null, but you can't recover the previous 
value minute: 1, client: AAA, location: zoneC.
   
   This is how `having` filter works, not `filter`. The difference between them 
is, `filter` is applied _before_ aggregation while `having` is applied _after_ 
aggregation. So, in the above example, with the `not-null` filter, the new row 
`minute: 1.4, client: AAA, location: null` will be filtered out but other rows 
can be still consumed by aggregators. The result should be `minute: 1, client: 
AAA, location: zoneC`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] chengchengpei commented on issue #6072: validate baseDataSource non-empty string in DerivativeDataSourceMetad…

2018-07-30 Thread GitBox
chengchengpei commented on issue #6072: validate baseDataSource non-empty 
string in DerivativeDataSourceMetad…
URL: https://github.com/apache/incubator-druid/pull/6072#issuecomment-408974324
 
 
   @jihoonson Hi, Thanks. It seems that the failed tests are not related to my 
change. most of them failed because of timeout. Is there any way to trigger 
another test?
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



  1   2   3   4   5   6   7   8   9   10   >