[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r556280161



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -232,6 +237,10 @@ public boolean isReady(TaskActionClient taskActionClient) 
throws Exception
 throw new UOE("partitionsSpec[%s] is not supported", 
tuningConfig.getPartitionsSpec().getClass().getName());
   }
 }
+InputSource inputSource = 
getIngestionSchema().getIOConfig().getNonNullInputSource(
+getIngestionSchema().getDataSchema().getParser()
+);
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   This should be done in `ParallelIndexSupervisorTask.isReady()` as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r556275449



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -1044,17 +1048,21 @@ public IndexIOConfig(
   if (firehoseFactory != null && inputFormat != null) {
 throw new IAE("Cannot use firehose and inputFormat together. Try using 
inputSource instead of firehose.");
   }
+  if (inputSource != null) {
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   > Just to double check, if i move it to isReady the task will be 
accepted by the overlord and it will fail when the task is actually tried to be 
run. I was thinking that if someone is not allowed, we should block the 
ingestion at that time when the task is submitted at first place. Is that the 
behavior you are expecting ?
   
   @nishantmonu51 `Task.isReady()` is actually called as soon as it is issued 
to the Overlord. When it returns true, the Overlord changes its state from 
`waiting` to `pending`, and tries to assign it to one of middleManagers. So, I 
think it will be effectively the same as what you want to achieve.
   
   > isReady is also called multiple times, so not sure if that is any better 
in terms of number of checks,
   
   Good point. I think it won't be not that expensive, but maybe we can check 
the `validated` state and avoid further checking if it is already validated?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10746: Tidy up HTTP status codes for query errors

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10746:
URL: https://github.com/apache/druid/pull/10746#discussion_r556271084



##
File path: server/src/main/java/org/apache/druid/server/QueryResource.java
##
@@ -463,36 +481,36 @@ Response ok(Object object) throws IOException
 
 Response gotError(Exception e) throws IOException
 {
-  return Response.serverError()
- .type(contentType)
- .entity(
- newOutputWriter(null, null, false)
- 
.writeValueAsBytes(QueryInterruptedException.wrapIfNeeded(e))
- )
- .build();
+  return buildNonOkResponse(
+  Status.INTERNAL_SERVER_ERROR.getStatusCode(),
+  QueryInterruptedException.wrapIfNeeded(e)
+  );
 }
 
 Response gotTimeout(QueryTimeoutException e) throws IOException
 {
-  return Response.status(QueryTimeoutException.STATUS_CODE)
- .type(contentType)
- .entity(
- newOutputWriter(null, null, false)
- .writeValueAsBytes(e)
- )
- .build();
+  return buildNonOkResponse(QueryTimeoutException.STATUS_CODE, e);
 }
 
 Response gotLimited(QueryCapacityExceededException e) throws IOException
 {
-  return Response.status(QueryCapacityExceededException.STATUS_CODE)
- .entity(newOutputWriter(null, null, 
false).writeValueAsBytes(e))
- .build();
+  return buildNonOkResponse(QueryCapacityExceededException.STATUS_CODE, e);
 }
 
 Response gotUnsupported(QueryUnsupportedException e) throws IOException
 {
-  return Response.status(QueryUnsupportedException.STATUS_CODE)
+  return buildNonOkResponse(QueryUnsupportedException.STATUS_CODE, e);
+}
+
+Response gotBadQuery(BadQueryException e) throws IOException
+{
+  return buildNonOkResponse(BadQueryException.STATUS_CODE, e);

Review comment:
   Thanks!

##
File path: 
processing/src/main/java/org/apache/druid/query/QueryUnsupportedException.java
##
@@ -24,21 +24,20 @@
 import com.fasterxml.jackson.annotation.JsonProperty;
 
 import javax.annotation.Nullable;
-import java.net.InetAddress;
 
 /**
  * This exception is for the query engine to surface when a query cannot be 
run. This can be due to the
  * following reasons: 1) The query is not supported yet. 2) The query is not 
something Druid would ever supports.
  * For these cases, the exact causes and details should also be documented in 
Druid user facing documents.
  *
- * As a {@link QueryException} it is expected to be serialied to a json 
response, but will be mapped to
- * {@link #STATUS_CODE} instead of the default HTTP 500 status.
+ * As a {@link QueryException} it is expected to be serialied to a json 
response with a proper HTTP error code

Review comment:
   Oops, thanks.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10746: Tidy up HTTP status codes for query errors

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10746:
URL: https://github.com/apache/druid/pull/10746#discussion_r556270879



##
File path: 
processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java
##
@@ -19,18 +19,33 @@
 
 package org.apache.druid.query;
 
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
 import org.apache.druid.java.util.common.StringUtils;
 
 /**
  * Exception indicating that an operation failed because it exceeded some 
configured resource limit.
  *
- * This is used as a marker exception by {@link QueryInterruptedException} to 
report the "Resource limit exceeded"
- * error code.
+ * This is a {@link BadQueryException} because it will likely a user's 
misbehavior when this exception is thrown.
+ * Druid cluster operators set some set of resource limitations for some good 
reason. When a user query requires
+ * too many resources, it will likely mean that the user query should be 
modified to not use excessive resources.

Review comment:
   @a2l007 oh, you are correct about the exception type in 
`checkTotalBytesLimit()`. My point was that this exception doesn't seem to be 
propagated to `JsonParserIterator` anyway because [`NettyHttpClient` doesn't do 
it](https://github.com/apache/druid/blob/master/core/src/main/java/org/apache/druid/java/util/http/client/NettyHttpClient.java#L263).
 I think `NettyHttpClient` should call `retVal.setException()` instead of 
`retVal.set(null)`, but I haven't studied the code enough around it to figure 
out why we are doing this. Anyway, it doesn't look harm to change the exception 
type in this PR, so I changed :slightly_smiling_face: 
   
   @jon-wei :+1: Updated the javadoc.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] clintropolis commented on a change in pull request #10605: bitwise math function expressions

2021-01-12 Thread GitBox


clintropolis commented on a change in pull request #10605:
URL: https://github.com/apache/druid/pull/10605#discussion_r556267351



##
File path: docs/misc/math-expr.md
##
@@ -119,6 +119,13 @@ See javadoc of java.lang.Math for detailed explanation for 
each function.
 |acos|acos(x) would return the arc cosine of x|
 |asin|asin(x) would return the arc sine of x|
 |atan|atan(x) would return the arc tangent of x|
+|bitwiseAnd|bitwiseAnd(x,y) would return the result of x & y. Double values 
will be converted to their bit representation|
+|bitwiseComplement|bitwiseComplement(x) would return the result of ~x. Double 
values will be converted to their bit representation|
+|bitwiseConvertDouble|bitwiseConvertDouble(x) would convert the IEEE 754 
floating-point "double" bits stored in a long into a double value if the input 
is a long, or the copy bits of a double value into a long if the input is a 
double.|

Review comment:
   btw, testing 
`bitwiseConvertLongBitsToDouble(bitwiseConvertLongBitsToDouble(..))` uncovered 
an issue with the parser when trying to parse the output of `Expr.stringify` 
(because unit tests cover this round trip scenario, and when flatten is true it 
turns it into a constant), where large doubles with exponents, e.g. `1E10` or 
whatever, could not be correctly parsed, so I expanded the grammar to allow it 
[roughly according to these 
rules](https://docs.oracle.com/javase/8/docs/api/java/lang/Double.html#valueOf-java.lang.String-)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10743: fix limited queries with subtotals

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10743:
URL: https://github.com/apache/druid/pull/10743#discussion_r556259073



##
File path: 
processing/src/main/java/org/apache/druid/query/groupby/epinephelinae/ByteBufferMinMaxOffsetHeap.java
##
@@ -43,6 +43,7 @@
   private final LimitedBufferHashGrouper.BufferGrouperOffsetHeapIndexUpdater 
heapIndexUpdater;
 
   private int heapSize;
+  private int maxHeapSize;

Review comment:
   nit: this sounds like a max limit on the heap size while it is actually 
to remember the max value of heap size. Maybe some javadoc would help.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jon-wei commented on a change in pull request #10746: Tidy up HTTP status codes for query errors

2021-01-12 Thread GitBox


jon-wei commented on a change in pull request #10746:
URL: https://github.com/apache/druid/pull/10746#discussion_r556245770



##
File path: 
processing/src/main/java/org/apache/druid/query/QueryUnsupportedException.java
##
@@ -24,21 +24,20 @@
 import com.fasterxml.jackson.annotation.JsonProperty;
 
 import javax.annotation.Nullable;
-import java.net.InetAddress;
 
 /**
  * This exception is for the query engine to surface when a query cannot be 
run. This can be due to the
  * following reasons: 1) The query is not supported yet. 2) The query is not 
something Druid would ever supports.
  * For these cases, the exact causes and details should also be documented in 
Druid user facing documents.
  *
- * As a {@link QueryException} it is expected to be serialied to a json 
response, but will be mapped to
- * {@link #STATUS_CODE} instead of the default HTTP 500 status.
+ * As a {@link QueryException} it is expected to be serialied to a json 
response with a proper HTTP error code

Review comment:
   serialied -> serialized

##
File path: 
processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java
##
@@ -19,18 +19,33 @@
 
 package org.apache.druid.query;
 
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
 import org.apache.druid.java.util.common.StringUtils;
 
 /**
  * Exception indicating that an operation failed because it exceeded some 
configured resource limit.
  *
- * This is used as a marker exception by {@link QueryInterruptedException} to 
report the "Resource limit exceeded"
- * error code.
+ * This is a {@link BadQueryException} because it will likely a user's 
misbehavior when this exception is thrown.
+ * Druid cluster operators set some set of resource limitations for some good 
reason. When a user query requires
+ * too many resources, it will likely mean that the user query should be 
modified to not use excessive resources.

Review comment:
   Suggest rewording to something like:
   
   "This is a {@link BadQueryException} because it likely indicates a user's 
misbehavior when this exception is thrown. The resource limitations set by 
Druid cluster operators are typically less flexible than the parameters of a 
user query, so when a user query requires too many resources, the likely remedy 
is that the user query should be modified to use fewer resources, or to reduce 
query volume."





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] a2l007 commented on a change in pull request #10746: Tidy up HTTP status codes for query errors

2021-01-12 Thread GitBox


a2l007 commented on a change in pull request #10746:
URL: https://github.com/apache/druid/pull/10746#discussion_r556218910



##
File path: 
processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java
##
@@ -19,18 +19,33 @@
 
 package org.apache.druid.query;
 
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
 import org.apache.druid.java.util.common.StringUtils;
 
 /**
  * Exception indicating that an operation failed because it exceeded some 
configured resource limit.
  *
- * This is used as a marker exception by {@link QueryInterruptedException} to 
report the "Resource limit exceeded"
- * error code.
+ * This is a {@link BadQueryException} because it will likely a user's 
misbehavior when this exception is thrown.
+ * Druid cluster operators set some set of resource limitations for some good 
reason. When a user query requires
+ * too many resources, it will likely mean that the user query should be 
modified to not use excessive resources.

Review comment:
   If we change this, the exception type and code changes from RE to 
ResourceLimitExceeded for queries exceeding maxScatterGatherBytes. What are the 
side effects that could happen? Sorry if I'm missing something :)

##
File path: server/src/main/java/org/apache/druid/server/QueryResource.java
##
@@ -463,36 +481,36 @@ Response ok(Object object) throws IOException
 
 Response gotError(Exception e) throws IOException
 {
-  return Response.serverError()
- .type(contentType)
- .entity(
- newOutputWriter(null, null, false)
- 
.writeValueAsBytes(QueryInterruptedException.wrapIfNeeded(e))
- )
- .build();
+  return buildNonOkResponse(
+  Status.INTERNAL_SERVER_ERROR.getStatusCode(),
+  QueryInterruptedException.wrapIfNeeded(e)
+  );
 }
 
 Response gotTimeout(QueryTimeoutException e) throws IOException
 {
-  return Response.status(QueryTimeoutException.STATUS_CODE)
- .type(contentType)
- .entity(
- newOutputWriter(null, null, false)
- .writeValueAsBytes(e)
- )
- .build();
+  return buildNonOkResponse(QueryTimeoutException.STATUS_CODE, e);
 }
 
 Response gotLimited(QueryCapacityExceededException e) throws IOException
 {
-  return Response.status(QueryCapacityExceededException.STATUS_CODE)
- .entity(newOutputWriter(null, null, 
false).writeValueAsBytes(e))
- .build();
+  return buildNonOkResponse(QueryCapacityExceededException.STATUS_CODE, e);
 }
 
 Response gotUnsupported(QueryUnsupportedException e) throws IOException
 {
-  return Response.status(QueryUnsupportedException.STATUS_CODE)
+  return buildNonOkResponse(QueryUnsupportedException.STATUS_CODE, e);
+}
+
+Response gotBadQuery(BadQueryException e) throws IOException
+{
+  return buildNonOkResponse(BadQueryException.STATUS_CODE, e);

Review comment:
   Thanks works fine now. I agree that we can avoid deeper tinkering of 
QueryException before the release and it can be picked up post release.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10732: Add a config for monitorScheduler type

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10732:
URL: https://github.com/apache/druid/pull/10732#discussion_r556211781



##
File path: 
server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java
##
@@ -28,9 +29,17 @@
  */
 public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig
 {
+  @JsonProperty
+  private String schedulerClassName = 
ClockDriftSafeMonitorScheduler.class.getName();

Review comment:
   Changed the default back to `BasicMonitorScheduler`. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10732: Add a config for monitorScheduler type

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10732:
URL: https://github.com/apache/druid/pull/10732#discussion_r556211566



##
File path: 
server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java
##
@@ -28,9 +29,17 @@
  */
 public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig
 {
+  @JsonProperty
+  private String schedulerClassName = 
ClockDriftSafeMonitorScheduler.class.getName();

Review comment:
   > So, because the potential reward has a small impact, and the potential 
risk has a large impact, I think it's best to default to the old scheduler for 
another release or so. Just until such time as people have been able to do 
long-running tests in production and have found that there are no issues.
   
   This makes sense to me. I think we can do more extensive testing by 
ourselves instead of rushing to change the default.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10746: Tidy up HTTP status codes for query errors

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10746:
URL: https://github.com/apache/druid/pull/10746#discussion_r556202316



##
File path: server/src/main/java/org/apache/druid/server/QueryResource.java
##
@@ -463,36 +481,36 @@ Response ok(Object object) throws IOException
 
 Response gotError(Exception e) throws IOException
 {
-  return Response.serverError()
- .type(contentType)
- .entity(
- newOutputWriter(null, null, false)
- 
.writeValueAsBytes(QueryInterruptedException.wrapIfNeeded(e))
- )
- .build();
+  return buildNonOkResponse(
+  Status.INTERNAL_SERVER_ERROR.getStatusCode(),
+  QueryInterruptedException.wrapIfNeeded(e)
+  );
 }
 
 Response gotTimeout(QueryTimeoutException e) throws IOException
 {
-  return Response.status(QueryTimeoutException.STATUS_CODE)
- .type(contentType)
- .entity(
- newOutputWriter(null, null, false)
- .writeValueAsBytes(e)
- )
- .build();
+  return buildNonOkResponse(QueryTimeoutException.STATUS_CODE, e);
 }
 
 Response gotLimited(QueryCapacityExceededException e) throws IOException
 {
-  return Response.status(QueryCapacityExceededException.STATUS_CODE)
- .entity(newOutputWriter(null, null, 
false).writeValueAsBytes(e))
- .build();
+  return buildNonOkResponse(QueryCapacityExceededException.STATUS_CODE, e);
 }
 
 Response gotUnsupported(QueryUnsupportedException e) throws IOException
 {
-  return Response.status(QueryUnsupportedException.STATUS_CODE)
+  return buildNonOkResponse(QueryUnsupportedException.STATUS_CODE, e);
+}
+
+Response gotBadQuery(BadQueryException e) throws IOException
+{
+  return buildNonOkResponse(BadQueryException.STATUS_CODE, e);

Review comment:
   Thanks for testing! I thought I fixed all cases where exceptions are 
wrapped in `QueryInterruptedException`, but missed `JsonParserIterator`. My 
intention was to fix it in this PR, so I went ahead and fixed it. It turns out 
there is an issue of that the type of query exception is lost after JSON 
deserialization. I ended up with [this ugly switch 
clause](https://github.com/apache/druid/pull/10746/files#diff-28a786f3719500e89592ff9617a926736337b723e4245c410601669db0f29ee5R207-R259)
 to restore the exception type. A better approach is modifying `QueryException` 
to store the type using Jackson, but I wasn't 100% sure what is the side effect 
there too. The ugly switch clause seems at least safer than modifying 
`QueryException`. I think we can rework the serde system of `QueryException` 
after release. What do you think?

##
File path: 
processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java
##
@@ -19,18 +19,33 @@
 
 package org.apache.druid.query;
 
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
 import org.apache.druid.java.util.common.StringUtils;
 
 /**
  * Exception indicating that an operation failed because it exceeded some 
configured resource limit.
  *
- * This is used as a marker exception by {@link QueryInterruptedException} to 
report the "Resource limit exceeded"
- * error code.
+ * This is a {@link BadQueryException} because it will likely a user's 
misbehavior when this exception is thrown.
+ * Druid cluster operators set some set of resource limitations for some good 
reason. When a user query requires
+ * too many resources, it will likely mean that the user query should be 
modified to not use excessive resources.

Review comment:
   Good catch :slightly_smiling_face: It should be, but I wasn't 100% sure 
whether it could cause any side effects if I did it. I left a comment about it 
[here](https://github.com/apache/druid/pull/10746/files#diff-28a786f3719500e89592ff9617a926736337b723e4245c410601669db0f29ee5R171).
 I will also make an issue to track it.

##
File path: 
processing/src/main/java/org/apache/druid/query/BadQueryException.java
##
@@ -20,12 +20,16 @@
 package org.apache.druid.query;
 
 /**
- * This exception is thrown when the requested operation cannot be completed 
due to a lack of available resources.
+ * An abstract class for all query exceptions that should return a bad request 
status code (404).

Review comment:
   Oops, yes it should be.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [druid] techdocsmith commented on a change in pull request #10748: reword for clarity

2021-01-12 Thread GitBox


techdocsmith commented on a change in pull request #10748:
URL: https://github.com/apache/druid/pull/10748#discussion_r556188376



##
File path: docs/ingestion/native-batch.md
##
@@ -192,7 +192,7 @@ that range if there's some stray data with unexpected 
timestamps.
 ||---|---|-|
 |type|The task type, this should always be `index_parallel`.|none|yes|
 |inputFormat|[`inputFormat`](./data-formats.md#input-format) to specify how to 
parse input data.|none|yes|
-|appendToExisting|Creates segments as additional shards of the latest version, 
effectively appending to the segment set instead of replacing it. The current 
limitation is that you can append to any datasources regardless of their 
original partitioning scheme, but the appended segments should be partitioned 
using the `dynamic` partitionsSpec.|false|no|
+|appendToExisting|Creates segments as additional shards of the latest version, 
effectively appending to the segment set instead of replacing it. This means 
that you can append new segments to any datasource regardless of its original 
partitioning scheme. However, you must use the `dynamic` partitioning type for 
the appended segments .|false|no|

Review comment:
   I am still a noob, but I tried setting the partitionspec to `hashed` and 
appendToExisting to `true`. The task succeeded, but when I check the payload it 
looks like appendToExisting was set to `false`
   ```
   "ioConfig": {
 "type": "index_parallel",
 "inputSource": {
   "type": "http",
   "uris": [
 "https://static.imply.io/data/wikipedia.json.gz;
   ],
   "httpAuthenticationUsername": null,
   "httpAuthenticationPassword": null
 },
 "inputFormat": {
   "type": "json",
   "flattenSpec": {
 "useFieldDiscovery": true,
 "fields": []
   },
   "featureSpec": {}
 },
 "appendToExisting": false
   ```
   I'll need to try again maybe and follow up with someone with deeper insight 
to verify how it is supposed to work in this case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10614: re-use expression vector evaluation results for the same offset in expression vector selectors

2021-01-12 Thread GitBox


jihoonson commented on pull request #10614:
URL: https://github.com/apache/druid/pull/10614#issuecomment-759125756


   LGTM. Thanks @clintropolis.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (aecc9e5 -> 2fc2938)

2021-01-12 Thread suneet
This is an automated email from the ASF dual-hosted git repository.

suneet pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from aecc9e5  Remove legacy code from LogUsedSegments duty (#10287)
 add 2fc2938  Web console: fix bad results if there is not native bigint 
(#10741)

No new revisions were added by this update.

Summary of changes:
 licenses.yaml  | 34 ++
 web-console/package-lock.json  | 14 +-
 web-console/package.json   |  2 +-
 web-console/src/singletons/api.spec.ts | 28 
 4 files changed, 40 insertions(+), 38 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s commented on pull request #10741: Web console: fix bad results if there is not native bigint

2021-01-12 Thread GitBox


suneet-s commented on pull request #10741:
URL: https://github.com/apache/druid/pull/10741#issuecomment-759124141


   nice test!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s merged pull request #10741: Web console: fix bad results if there is not native bigint

2021-01-12 Thread GitBox


suneet-s merged pull request #10741:
URL: https://github.com/apache/druid/pull/10741


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10748: reword for clarity

2021-01-12 Thread GitBox


gianm commented on a change in pull request #10748:
URL: https://github.com/apache/druid/pull/10748#discussion_r556167558



##
File path: docs/ingestion/native-batch.md
##
@@ -192,7 +192,7 @@ that range if there's some stray data with unexpected 
timestamps.
 ||---|---|-|
 |type|The task type, this should always be `index_parallel`.|none|yes|
 |inputFormat|[`inputFormat`](./data-formats.md#input-format) to specify how to 
parse input data.|none|yes|
-|appendToExisting|Creates segments as additional shards of the latest version, 
effectively appending to the segment set instead of replacing it. The current 
limitation is that you can append to any datasources regardless of their 
original partitioning scheme, but the appended segments should be partitioned 
using the `dynamic` partitionsSpec.|false|no|
+|appendToExisting|Creates segments as additional shards of the latest version, 
effectively appending to the segment set instead of replacing it. This means 
that you can append new segments to any datasource regardless of its original 
partitioning scheme. However, you must use the `dynamic` partitioning type for 
the appended segments .|false|no|

Review comment:
   The doc should address what happens if you disregard this advice and set 
appendToExisting: true but don't use dynamic partitioning. (Do you get an 
error? Does something weird happen?)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] techdocsmith opened a new pull request #10748: reword for clarity

2021-01-12 Thread GitBox


techdocsmith opened a new pull request #10748:
URL: https://github.com/apache/druid/pull/10748


   
   
   Fixes awkward wording around the behavior of `appendToExsiting`.
   
   
   
   
   
   ### Description
   
   
   
   
   
   
   
    Removes ambiguity around the "current limitation". Describes the option 
in terms of the current product functionaltiy.
   
   
   
   
   
   
   
   
   
   
   
   
   This PR has:
   - [x] been self-reviewed.
   
   
   
   
   
   
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (ca32652 -> aecc9e5)

2021-01-12 Thread jihoonson
This is an automated email from the ASF dual-hosted git repository.

jihoonson pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from ca32652  Fix potential deadlock in batch ingestion (#10736)
 add aecc9e5  Remove legacy code from LogUsedSegments duty (#10287)

No new revisions were added by this update.

Summary of changes:
 .../apache/druid/server/coordinator/DruidCoordinator.java|  2 +-
 .../druid/server/coordinator/duty/LogUsedSegments.java   | 12 ++--
 2 files changed, 3 insertions(+), 11 deletions(-)


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson merged pull request #10287: Remove legacy code from LogUsedSegments duty

2021-01-12 Thread GitBox


jihoonson merged pull request #10287:
URL: https://github.com/apache/druid/pull/10287


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[druid] branch master updated (3984457 -> ca32652)

2021-01-12 Thread jihoonson
This is an automated email from the ASF dual-hosted git repository.

jihoonson pushed a change to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git.


from 3984457  Add missing unit tests for segment loading in historicals 
(#10737)
 add ca32652  Fix potential deadlock in batch ingestion (#10736)

No new revisions were added by this update.

Summary of changes:
 .../common/task/AbstractBatchIndexTask.java|  12 +-
 .../druid/indexing/common/task/TaskLockHelper.java |  11 -
 .../druid/indexing/overlord/TaskLockbox.java   |  29 ++-
 .../apache/druid/indexing/overlord/TaskQueue.java  | 134 +-
 .../indexing/common/task/IngestionTestBase.java|  16 ++
 .../druid/indexing/overlord/TaskQueueTest.java | 278 +
 6 files changed, 397 insertions(+), 83 deletions(-)
 create mode 100644 
indexing-service/src/test/java/org/apache/druid/indexing/overlord/TaskQueueTest.java


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10736: Fix potential deadlock in batch ingestion

2021-01-12 Thread GitBox


jihoonson commented on pull request #10736:
URL: https://github.com/apache/druid/pull/10736#issuecomment-758968538


   @loquisgon @maytasm thanks for the review!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson merged pull request #10736: Fix potential deadlock in batch ingestion

2021-01-12 Thread GitBox


jihoonson merged pull request #10736:
URL: https://github.com/apache/druid/pull/10736


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] a2l007 commented on a change in pull request #10746: Tidy up HTTP status codes for query errors

2021-01-12 Thread GitBox


a2l007 commented on a change in pull request #10746:
URL: https://github.com/apache/druid/pull/10746#discussion_r556043383



##
File path: 
processing/src/main/java/org/apache/druid/query/BadQueryException.java
##
@@ -20,12 +20,16 @@
 package org.apache.druid.query;
 
 /**
- * This exception is thrown when the requested operation cannot be completed 
due to a lack of available resources.
+ * An abstract class for all query exceptions that should return a bad request 
status code (404).

Review comment:
   Shouldn't it be 400 here?

##
File path: 
processing/src/main/java/org/apache/druid/query/ResourceLimitExceededException.java
##
@@ -19,18 +19,33 @@
 
 package org.apache.druid.query;
 
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonProperty;
 import org.apache.druid.java.util.common.StringUtils;
 
 /**
  * Exception indicating that an operation failed because it exceeded some 
configured resource limit.
  *
- * This is used as a marker exception by {@link QueryInterruptedException} to 
report the "Resource limit exceeded"
- * error code.
+ * This is a {@link BadQueryException} because it will likely a user's 
misbehavior when this exception is thrown.
+ * Druid cluster operators set some set of resource limitations for some good 
reason. When a user query requires
+ * too many resources, it will likely mean that the user query should be 
modified to not use excessive resources.

Review comment:
   Based on the description, shouldn't [this 
exception](https://github.com/apache/druid/blob/master/server/src/main/java/org/apache/druid/client/DirectDruidClient.java#L447)
 also be categorized as ResourceLimitExceeded ?

##
File path: server/src/main/java/org/apache/druid/server/QueryResource.java
##
@@ -463,36 +481,36 @@ Response ok(Object object) throws IOException
 
 Response gotError(Exception e) throws IOException
 {
-  return Response.serverError()
- .type(contentType)
- .entity(
- newOutputWriter(null, null, false)
- 
.writeValueAsBytes(QueryInterruptedException.wrapIfNeeded(e))
- )
- .build();
+  return buildNonOkResponse(
+  Status.INTERNAL_SERVER_ERROR.getStatusCode(),
+  QueryInterruptedException.wrapIfNeeded(e)
+  );
 }
 
 Response gotTimeout(QueryTimeoutException e) throws IOException
 {
-  return Response.status(QueryTimeoutException.STATUS_CODE)
- .type(contentType)
- .entity(
- newOutputWriter(null, null, false)
- .writeValueAsBytes(e)
- )
- .build();
+  return buildNonOkResponse(QueryTimeoutException.STATUS_CODE, e);
 }
 
 Response gotLimited(QueryCapacityExceededException e) throws IOException
 {
-  return Response.status(QueryCapacityExceededException.STATUS_CODE)
- .entity(newOutputWriter(null, null, 
false).writeValueAsBytes(e))
- .build();
+  return buildNonOkResponse(QueryCapacityExceededException.STATUS_CODE, e);
 }
 
 Response gotUnsupported(QueryUnsupportedException e) throws IOException
 {
-  return Response.status(QueryUnsupportedException.STATUS_CODE)
+  return buildNonOkResponse(QueryUnsupportedException.STATUS_CODE, e);
+}
+
+Response gotBadQuery(BadQueryException e) throws IOException
+{
+  return buildNonOkResponse(BadQueryException.STATUS_CODE, e);

Review comment:
   I was curious whether this would catch ResourceLimitExceeded exceptions 
wrapped under `QueryInterruptedException` and so I ran the PR on my test 
cluster. It looks like such tests are still giving us 500 instead of 400. 
Related stack trace:
   
   ```
   HTTP/1.1 500 Internal Server Error
   < Content-Type: application/json
   < Content-Length: 372
   {
 "error" : "Resource limit exceeded",
"trace":
   org.apache.druid.query.QueryInterruptedException: Not enough aggregation 
buffer space to execute this query. Try increasing 
druid.processing.buffer.sizeBytes or enable disk spilling by setting 
druid.query.groupBy.maxOnDiskStorage to a positive number.
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
Method) ~[?:1.8.0_231]
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
 ~[?:1.8.0_231]
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
 ~[?:1.8.0_231]
at java.lang.reflect.Constructor.newInstance(Constructor.java:423) 
~[?:1.8.0_231]
at 
com.fasterxml.jackson.databind.introspect.AnnotatedConstructor.call(AnnotatedConstructor.java:124)
 ~[jackson-databind-2.10.5.1.jar:2.10.5.1]
at 

[GitHub] [druid] suneet-s commented on a change in pull request #10732: Add a config for monitorScheduler type

2021-01-12 Thread GitBox


suneet-s commented on a change in pull request #10732:
URL: https://github.com/apache/druid/pull/10732#discussion_r556068500



##
File path: 
server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java
##
@@ -28,9 +29,17 @@
  */
 public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig
 {
+  @JsonProperty
+  private String schedulerClassName = 
ClockDriftSafeMonitorScheduler.class.getName();

Review comment:
   I've also thought about this a bunch and have changed my opinion on 
whether or not we should change the scheduler to a new dependency by default a 
few times.
   
   While changing to use the CronScheduler might fix a bug, it isn't clear 
whether any users have run into this in the field. I thought about documenting 
why a user would want to change the scheduler to CronScheduler instead of the 
older implementation, and I couldn't think of a good user facing reason to do 
so. So if we set the default to the old implementation, I don't think anyone 
would test it in production, so it would continue to live as dead code, and 
we'll have the same dilemma in the next release or 2 when we ask whether or not 
this has been run in production.
   
   Setting the default to the older implementation reduces the impact of any 
bug that might show up in long running tests (even though this library was 
specifically built to fix issues found with long running processes). The 
drawback here is finding a reason for some users to try this in production so 
that we can sunset the feature flag in a release or 2.
   
   Writing out this comment, I now think the more cautious approach - keeping 
the default the same - is better as it's hard to articulate the benefit for 
switching the scheduling and taking on the risk associated with changing the 
older behavior.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10732: Add a config for monitorScheduler type

2021-01-12 Thread GitBox


gianm commented on a change in pull request #10732:
URL: https://github.com/apache/druid/pull/10732#discussion_r556054132



##
File path: 
server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java
##
@@ -28,9 +29,17 @@
  */
 public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig
 {
+  @JsonProperty
+  private String schedulerClassName = 
ClockDriftSafeMonitorScheduler.class.getName();

Review comment:
   By the way, if anyone has been running the patch in production for a 
while already, now would be a good time to speak up. If we have already built 
up a good amount of confidence then I think it makes sense to default to the 
new one.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] gianm commented on a change in pull request #10732: Add a config for monitorScheduler type

2021-01-12 Thread GitBox


gianm commented on a change in pull request #10732:
URL: https://github.com/apache/druid/pull/10732#discussion_r556053575



##
File path: 
server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java
##
@@ -28,9 +29,17 @@
  */
 public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig
 {
+  @JsonProperty
+  private String schedulerClassName = 
ClockDriftSafeMonitorScheduler.class.getName();

Review comment:
   My 2ยข: the best plan is to default to the old one, and then in the 
future (after some people have enabled the new one in production) we should 
switch to the new one, and remove the old one and remove the config entirely.
   
   Rationale:
   
   The new scheduler is designed to eliminate potential clock drift for 
monitors. This reward is real but is pretty small impact. I don't expect 
anything bad will happen if the schedule drifts a bit. The main risk of the new 
scheduler, I suppose, is that there's some case where it goes haywire, and 
either locks up completely or fires much more often than it should. I'm not 
sure how likely this is, but it's (a) hard to test for, (b) quite bad if it 
happens.
   
   So, because the potential reward has a small impact, and the potential risk 
has a large impact, I think it's best to default to the old scheduler for 
another release or so. Just until such time as people have been able to do 
long-running tests in production and have found that there are no issues.
   
   At any rate, it's good that this is undocumented, since it's an 
inside-baseball sort of config that we would only want to exist for a few 
releases.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s commented on a change in pull request #10743: fix limited queries with subtotals

2021-01-12 Thread GitBox


suneet-s commented on a change in pull request #10743:
URL: https://github.com/apache/druid/pull/10743#discussion_r556004791



##
File path: sql/src/test/java/org/apache/druid/sql/calcite/CalciteQueryTest.java
##
@@ -16281,6 +16281,71 @@ public void testTimeStampAddConversion() throws 
Exception
 );
   }
 
+  @Test
+  public void testGroupingSetsWithLimit() throws Exception

Review comment:
   Skimming through the code - I think we should add some tests to test the 
iterators built for a query with an `ORDER BY` on dimensions that are part of 
the grouping sets (like `dim2`) and for something that isn't (like `SUM(cnt)` 
or another dimension) 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid-website-src] druid-matt opened a new pull request #200: remove 1 meetup, add 2

2021-01-12 Thread GitBox


druid-matt opened a new pull request #200:
URL: https://github.com/apache/druid-website-src/pull/200


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] maytasm commented on pull request #10740: Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

2021-01-12 Thread GitBox


maytasm commented on pull request #10740:
URL: https://github.com/apache/druid/pull/10740#issuecomment-758863964


   > ```
   > [ERROR] Failures: 
   > [ERROR] 
org.apache.druid.segment.realtime.appenderator.AppenderatorTest.testMaxBytesInMemory(org.apache.druid.segment.realtime.appenderator.AppenderatorTest)
   > [ERROR]   Run 1: AppenderatorTest.testMaxBytesInMemory:204 expected:<0> 
but was:<364>
   > ```
   > 
   > Test failure looks legit
   
   fixed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] loquisgon commented on a change in pull request #10742: Granularity interval materialization

2021-01-12 Thread GitBox


loquisgon commented on a change in pull request #10742:
URL: https://github.com/apache/druid/pull/10742#discussion_r555991064



##
File path: 
server/src/main/java/org/apache/druid/segment/indexing/granularity/GranularitySpec.java
##
@@ -45,7 +46,7 @@
*
* @return set of all time groups

Review comment:
   done

##
File path: 
server/src/main/java/org/apache/druid/segment/indexing/granularity/UniformGranularitySpec.java
##
@@ -57,18 +60,17 @@ public UniformGranularitySpec(
 this.segmentGranularity = segmentGranularity == null ? 
DEFAULT_SEGMENT_GRANULARITY : segmentGranularity;
 
 if (inputIntervals != null) {
-  List granularIntervals = new ArrayList<>();
-  for (Interval inputInterval : inputIntervals) {
-Iterables.addAll(granularIntervals, 
this.segmentGranularity.getIterable(inputInterval));
-  }
-  this.inputIntervals = ImmutableList.copyOf(inputIntervals);
-  this.wrappedSpec = new ArbitraryGranularitySpec(queryGranularity, 
rollup, granularIntervals);
+  // sort them

Review comment:
   done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] suneet-s commented on pull request #10740: Fix byte calculation for maxBytesInMemory to take into account of Sink/Hydrant Object overhead

2021-01-12 Thread GitBox


suneet-s commented on pull request #10740:
URL: https://github.com/apache/druid/pull/10740#issuecomment-758803244


   ```
   [ERROR] Failures: 
   [ERROR] 
org.apache.druid.segment.realtime.appenderator.AppenderatorTest.testMaxBytesInMemory(org.apache.druid.segment.realtime.appenderator.AppenderatorTest)
   [ERROR]   Run 1: AppenderatorTest.testMaxBytesInMemory:204 expected:<0> but 
was:<364>
   ```
   
   Test failure looks legit



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] loquisgon commented on a change in pull request #10742: Granularity interval materialization

2021-01-12 Thread GitBox


loquisgon commented on a change in pull request #10742:
URL: https://github.com/apache/druid/pull/10742#discussion_r555931298



##
File path: 
server/src/main/java/org/apache/druid/segment/indexing/granularity/UniformGranularitySpec.java
##
@@ -57,18 +60,17 @@ public UniformGranularitySpec(
 this.segmentGranularity = segmentGranularity == null ? 
DEFAULT_SEGMENT_GRANULARITY : segmentGranularity;
 
 if (inputIntervals != null) {
-  List granularIntervals = new ArrayList<>();
-  for (Interval inputInterval : inputIntervals) {
-Iterables.addAll(granularIntervals, 
this.segmentGranularity.getIterable(inputInterval));
-  }
-  this.inputIntervals = ImmutableList.copyOf(inputIntervals);
-  this.wrappedSpec = new ArbitraryGranularitySpec(queryGranularity, 
rollup, granularIntervals);
+  // sort them

Review comment:
   Contract indicates that inputIntervals must be left "as is" will 
remove sorting in next commit





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] nishantmonu51 commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


nishantmonu51 commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r555825958



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -1044,17 +1048,21 @@ public IndexIOConfig(
   if (firehoseFactory != null && inputFormat != null) {
 throw new IAE("Cannot use firehose and inputFormat together. Try using 
inputSource instead of firehose.");
   }
+  if (inputSource != null) {
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   isReady is also called multiple times, so not sure if that is any better 
in terms of number of checks, 
   
   Just to double check, if i move it to isReady the task will be accepted by 
the overlord and it will fail when the task is actually tried to be run. I was 
thinking that if someone is not allowed, we should block the ingestion at that 
time when the task is submitted at first place. Is that the behavior you are 
expecting ?
   
   
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] nishantmonu51 commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


nishantmonu51 commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r555815427



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java
##
@@ -153,6 +154,9 @@
   @JsonIgnore
   private final RetryPolicyFactory retryPolicyFactory;
 
+  @JsonIgnore
+  private final InputSourceSecurityConfig securityConfig;

Review comment:
   done. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] nishantmonu51 commented on pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


nishantmonu51 commented on pull request #10677:
URL: https://github.com/apache/druid/pull/10677#issuecomment-758695886


   @jihoonson thanks for the review, I hv updated the PR, yes, it will be nice 
to get this in 0.21.0 release. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] nishantmonu51 commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


nishantmonu51 commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r555809503



##
File path: 
extensions-core/hdfs-storage/src/main/java/org/apache/druid/inputsource/hdfs/HdfsInputSource.java
##
@@ -214,10 +215,23 @@ public boolean needsFormat()
   private void cachePathsIfNeeded() throws IOException
   {
 if (cachedPaths == null) {
-  cachedPaths = 
ImmutableList.copyOf(Preconditions.checkNotNull(getPaths(inputPaths, 
configuration), "paths"));
+  Collection paths = Preconditions.checkNotNull(getPaths(inputPaths, 
configuration), "paths");
+  cachedPaths = ImmutableList.copyOf(paths);
 }
   }
 
+  @Override
+  public void validateAllowDenyPrefixList(InputSourceSecurityConfig 
securityConfig)
+  {
+try {
+  cachePathsIfNeeded();

Review comment:
   done. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] nishantmonu51 commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


nishantmonu51 commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r555808142



##
File path: 
core/src/main/java/org/apache/druid/data/input/AbstractInputSource.java
##
@@ -31,13 +31,16 @@
  */
 public abstract class AbstractInputSource implements InputSource
 {
+  private transient boolean validated = false;

Review comment:
   equals tests on base classes were failing without making it transient as 
they were not including this field in the comparison





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] licl2014 opened a new issue #10747: Hyperunique is giving different results on multiple execution of same query

2021-01-12 Thread GitBox


licl2014 opened a new issue #10747:
URL: https://github.com/apache/druid/issues/10747


   Hyperunique is giving different results on multiple execution of same query
   
   ### Affected Version
   
   0.12.3
   
   ### Description
   
   What my query as follows:
   ```
   {
   "queryType":"timeseries",
   "dataSource":{
   "type":"table",
   "name":"datasource"
   },
   "intervals":{
   "type":"LegacySegmentSpec",
   "intervals":[
   "2020-12-31T16:00:00.000Z/2021-01-01T16:00:00.000Z"
   ]
   },
   "descending":false,
   "virtualColumns":[
   
   ],
   "filter":null,
   "granularity":{
   "type":"period",
   "period":"P1D",
   "timeZone":"Asia/Shanghai",
   "origin":null
   },
   "aggregations":[
   {
   "type":"hyperUnique",
   "name":"hll_cd",
   "fieldName":"cd",
   "isInputHyperUnique":false,
   "round":false
   }
   ],
   "postAggregations":[
   
   ],
   "context":{
   "groupByStrategy":"v2",
   "skipEmptyBuckets":true,
   "timeout":3
   }
   }
   ```
   **Results**
   attempt1:
   ```
   [ {
 "timestamp" : "2021-01-01T00:00:00.000+08:00",
 "result" : {
   "hll_cd" : 3.0347768393719055E7
 }
   } ]
   ```
   
   attempt2:
   ```
   [ {
 "timestamp" : "2021-01-01T00:00:00.000+08:00",
 "result" : {
   "hll_cd" : 3.036636100442829E7
 }
   } ]
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] fabes99 commented on issue #8663: Kafka indexing service duplicate entry exception in druid_pendingSegments

2021-01-12 Thread GitBox


fabes99 commented on issue #8663:
URL: https://github.com/apache/druid/issues/8663#issuecomment-758589129


   Facing the same issue also with version 0.18.1.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10732: Add a config for monitorScheduler type

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10732:
URL: https://github.com/apache/druid/pull/10732#discussion_r555600385



##
File path: 
core/src/main/java/org/apache/druid/java/util/metrics/ClockDriftSafeMonitorScheduler.java
##
@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.java.util.metrics;
+
+import io.timeandspace.cronscheduler.CronScheduler;
+import io.timeandspace.cronscheduler.CronTask;
+import org.apache.druid.java.util.common.logger.Logger;
+import org.apache.druid.java.util.emitter.service.ServiceEmitter;
+
+import java.util.List;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Future;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicReference;
+
+/**
+ * A {@link MonitorScheduler} implementation based on {@link CronScheduler}.
+ */
+public class ClockDriftSafeMonitorScheduler extends MonitorScheduler
+{
+  private static final Logger LOG = new 
Logger(ClockDriftSafeMonitorScheduler.class);
+
+  private final CronScheduler monitorScheduler;
+  private final ExecutorService monitorRunner;
+
+  public ClockDriftSafeMonitorScheduler(
+  MonitorSchedulerConfig config,
+  ServiceEmitter emitter,
+  List monitors,
+  CronScheduler monitorScheduler,
+  ExecutorService monitorRunner
+  )
+  {
+super(config, emitter, monitors);
+this.monitorScheduler = monitorScheduler;
+this.monitorRunner = monitorRunner;
+  }
+
+  @Override
+  void startMonitor(final Monitor monitor)
+  {
+monitor.start();
+long rate = getConfig().getEmitterPeriod().getMillis();
+final AtomicReference> scheduleFutureReference = new 
AtomicReference<>();
+Future scheduledFuture = monitorScheduler.scheduleAtFixedRate(
+rate,
+rate,
+TimeUnit.MILLISECONDS,
+new CronTask()
+{
+  private Future scheduleFuture = null;

Review comment:
   Ah, forgot to clean them up before commit. Thanks :+1: 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10732: Add a config for monitorScheduler type

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10732:
URL: https://github.com/apache/druid/pull/10732#discussion_r1



##
File path: 
server/src/main/java/org/apache/druid/server/metrics/DruidMonitorSchedulerConfig.java
##
@@ -28,9 +29,17 @@
  */
 public class DruidMonitorSchedulerConfig extends MonitorSchedulerConfig
 {
+  @JsonProperty
+  private String schedulerClassName = 
ClockDriftSafeMonitorScheduler.class.getName();

Review comment:
   Yes, I have done some testing before and will do more. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on pull request #10746: Tidy up HTTP status codes for query errors

2021-01-12 Thread GitBox


jihoonson commented on pull request #10746:
URL: https://github.com/apache/druid/pull/10746#issuecomment-758500465


   Will update docs as a follow-up.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson opened a new pull request #10746: Tidy up HTTP status codes for query errors

2021-01-12 Thread GitBox


jihoonson opened a new pull request #10746:
URL: https://github.com/apache/druid/pull/10746


   ### Description
   
   Druid querying system returns an HTTP status code on query failures. 
Currently, most of query failures return an internal error (500) which could be 
misleading in some cases. This PR is to tidy up this and return a proper status 
code. Here is a list of query errors, their current status codes, and proposed 
changes.
   
    Query planning errors
   
   Exception | Description| Current code | Proposed new code
    | - | - | -
   SqlParseException and ValidationException from Calcite | Query planning 
failed | 500 | **400**
   CannotPlanException from Calcite | Failed to find the best plan. Likely an 
internal error | 500 | 
   RelConversionException from Calcite | Conversion from SQL to Druid failed | 
500 | 
   CannotBuildQueryException | Failed to create a Druid native query from 
DruidRel. Likely an internal error | 500 | 
   Others | Unknown errors | 500 | 
   
    Query execution errors
   
   Exception | Description| Current code | Proposed new code
    | - | - | -
   QueryTimeoutException | Query execution didn't finish in timeout | 504 | 
   Unauthorized exception | Authentication failed to execute the query | 401 | 
   ForbiddenException | Authorization failed to execute the query | 403 | 
   QueryCapacityExceededException | Query failed to schedule because of the 
lack of resources available at the time when the query was submitted | 429 | 
   ResourceLimitExceededException | Query asked more resources than configured 
threshold | 500 | **400**
   InsufficientResourceException | Query failed to schedule because of lack of 
merge buffers available at the time when it was submitted | 500 | **429**, 
merged to QueryCapacityExceededException
   QueryUnsupportedException | Unsupported functionality | 400 | **501**
   Others | Unknown exceptions | 500 | 
   
   `BadQueryException` is added to represent all kinds of query exceptions that 
should return 400.
   
   
   
   This PR has:
   - [x] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [x] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   
   
   
   # Key changed/added classes in this PR
* `BadQueryException`
* `ResourceLimitExceededException`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] kaijianding opened a new pull request #10745: can reset kafka Supervisor to offset by specified time

2021-01-12 Thread GitBox


kaijianding opened a new pull request #10745:
URL: https://github.com/apache/druid/pull/10745


   ### Description
   
   Currently, druid can only reset to earliest/latest offset, sometimes, users 
want to read from a specified time like the start of today
   
   A new optional param `timestamp` is added to api `POST 
/druid/indexer/v1/supervisor//reset?timestamp=`
   
   
   
   
   Add `Map getPositionFromTime(long 
offsetTime);` to get offsets from time  
   Only kafka supports this feature
   
   Add `OverrideNotice` to replace the whole DataSourceMetadata in metastore 
and kill tasks to let tasks restart and read starting from the new 
DatasourceMetadata(the same logic as ResetNotice)
   
   
   
   
   
   
   
   
   
   
   
   
   This PR has:
   - [x] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [x] added documentation for new or modified features or behaviors.
   - [x] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [x] been tested in a test Druid cluster.
   
   
   
   # Key changed/added classes in this PR
* `KafkaRecordSupplier`
* `IndexerSQLMetadataStorageCoordinator`
* `SeekableStreamSupervisor.java`
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org