Re: [PR] only create `sqlSegmentsMetadataManager` once to speed up test (druid)

2024-02-02 Thread via GitHub


TestBoost commented on PR #15557:
URL: https://github.com/apache/druid/pull/15557#issuecomment-1925151449

   Thank you very much for pointing out!
   
   We don't find any other tests like this in server module and other modules 
in druid as of now. But there are other test classes that require splitting 
tests that utilize the same variables/resources into different test classes to 
make tests faster. I don't know if it's suitable. And I also applied the 
changes as you said in this pull request.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] only create used and unused segments once to make the test faster (druid)

2024-02-02 Thread via GitHub


TestBoost commented on PR #15533:
URL: https://github.com/apache/druid/pull/15533#issuecomment-1925146292

   Sorry for the late reply and I just came back from holiday. I tried to make 
the changes as you said. However, I found there is another test 
`testRetrieveUnusedSegmentsActionWithNowUsedLastUpdatedTime` in this test class 
which also needs `expectedUnusedSegments`. We need to create 
`expectedUsedSegments` in the method `testRetrieveUsedSegmentsAction` and 
create `expectedUnusedSegments` in the method 
`testRetrieveUnusedSegmentsAction` and 
`testRetrieveUnusedSegmentsActionWithNowUsedLastUpdatedTime`. This makes the 
test even slower than the original one. The original test takes `3.583 s`. 
After applying this commit, it takes `2.939 s`. And after making 
`expectedUsedSegments` and `expectedUnusedSegments` as local variables, it 
takes `4.761 s`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Sql Single Value Aggregator for scalar queries (druid)

2024-02-02 Thread via GitHub


somu-imply commented on code in PR #15700:
URL: https://github.com/apache/druid/pull/15700#discussion_r1476870634


##
processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java:
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * This AggregatorFactory is meant to wrap the subquery used as an expression 
into a single value
+ * and is expected to throw an exception when the subquery results in more 
than one row
+ *
+ * 
+ * This consumes columnType as well along with name and fieldName to pass it 
on to underlying
+ * {@link SingleValueBufferAggregator} to work with different ColumnTypes
+ */
+@JsonTypeName("singleValue")
+public class SingleValueAggregatorFactory extends AggregatorFactory
+{
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final String name;
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final String fieldName;
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final ColumnType columnType;
+
+  public static final int DEFAULT_MAX_STRING_SIZE = 1024;
+
+  @JsonCreator
+  public SingleValueAggregatorFactory(
+  @JsonProperty("name") String name,
+  @JsonProperty("fieldName") final String fieldName,
+  @JsonProperty("columnType") final ColumnType columnType
+  )
+  {
+Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator 
name");
+Preconditions.checkNotNull(fieldName, "Must have a valid, non-null 
fieldName");
+
+this.name = name;
+this.fieldName = fieldName;
+this.columnType = columnType;
+  }
+
+  @Override
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
+  {
+ColumnValueSelector selector = 
metricFactory.makeColumnValueSelector(fieldName);
+return new SingleValueAggregator(selector);
+  }
+
+  @Override
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory 
metricFactory)
+  {
+ColumnValueSelector selector = 
metricFactory.makeColumnValueSelector(fieldName);
+ColumnCapabilities columnCapabilities = 
metricFactory.getColumnCapabilities(fieldName);
+ColumnType columnType = new ColumnType(columnCapabilities.getType(), null, 
null);
+return new SingleValueBufferAggregator(selector, columnType);
+  }
+
+  @Override
+  public Comparator getComparator()
+  {
+throw DruidException.defensive("Single Value Aggregator would not have 
more than one row to compare");
+  }
+
+  @Override
+  @Nullable
+  public Object combine(@Nullable Object lhs, @Nullable Object rhs)

Review Comment:
   IMO this would never be called as there's nothing to combine for this agg. 
This runs on the broker level on the result of the subquery. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Sql Single Value Aggregator for scalar queries (druid)

2024-02-02 Thread via GitHub


somu-imply commented on code in PR #15700:
URL: https://github.com/apache/druid/pull/15700#discussion_r1476870634


##
processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java:
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * This AggregatorFactory is meant to wrap the subquery used as an expression 
into a single value
+ * and is expected to throw an exception when the subquery results in more 
than one row
+ *
+ * 
+ * This consumes columnType as well along with name and fieldName to pass it 
on to underlying
+ * {@link SingleValueBufferAggregator} to work with different ColumnTypes
+ */
+@JsonTypeName("singleValue")
+public class SingleValueAggregatorFactory extends AggregatorFactory
+{
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final String name;
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final String fieldName;
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final ColumnType columnType;
+
+  public static final int DEFAULT_MAX_STRING_SIZE = 1024;
+
+  @JsonCreator
+  public SingleValueAggregatorFactory(
+  @JsonProperty("name") String name,
+  @JsonProperty("fieldName") final String fieldName,
+  @JsonProperty("columnType") final ColumnType columnType
+  )
+  {
+Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator 
name");
+Preconditions.checkNotNull(fieldName, "Must have a valid, non-null 
fieldName");
+
+this.name = name;
+this.fieldName = fieldName;
+this.columnType = columnType;
+  }
+
+  @Override
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
+  {
+ColumnValueSelector selector = 
metricFactory.makeColumnValueSelector(fieldName);
+return new SingleValueAggregator(selector);
+  }
+
+  @Override
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory 
metricFactory)
+  {
+ColumnValueSelector selector = 
metricFactory.makeColumnValueSelector(fieldName);
+ColumnCapabilities columnCapabilities = 
metricFactory.getColumnCapabilities(fieldName);
+ColumnType columnType = new ColumnType(columnCapabilities.getType(), null, 
null);
+return new SingleValueBufferAggregator(selector, columnType);
+  }
+
+  @Override
+  public Comparator getComparator()
+  {
+throw DruidException.defensive("Single Value Aggregator would not have 
more than one row to compare");
+  }
+
+  @Override
+  @Nullable
+  public Object combine(@Nullable Object lhs, @Nullable Object rhs)

Review Comment:
   IMO this would never be called as there's nothing to group. This runs on the 
broker level on the result of the subquery. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Addressing shapeshifting issues with window functions (druid)

2024-02-02 Thread via GitHub


krishnat2 commented on PR #15807:
URL: https://github.com/apache/druid/pull/15807#issuecomment-1924945519

   Thanks to @somu-imply and @abhishekagarwal87 for a quick fix! Really 
appreciate it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Add configuration for max S3 connections (druid)

2024-02-02 Thread via GitHub


github-actions[bot] closed pull request #13697: Add configuration for max S3 
connections
URL: https://github.com/apache/druid/pull/13697


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [I] Race condition when killing kafka ingestion tasks with replication > 1 (druid)

2024-02-02 Thread via GitHub


github-actions[bot] commented on issue #13705:
URL: https://github.com/apache/druid/issues/13705#issuecomment-1924931106

   This issue has been marked as stale due to 280 days of inactivity.
   It will be closed in 4 weeks if no further activity occurs. If this issue is 
still
   relevant, please simply write any comment. Even if closed, you can still 
revive the
   issue at any time or discuss it on the d...@druid.apache.org list.
   Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [I] when installing druid on k8s (druid)

2024-02-02 Thread via GitHub


github-actions[bot] commented on issue #13702:
URL: https://github.com/apache/druid/issues/13702#issuecomment-1924931078

   This issue has been marked as stale due to 280 days of inactivity.
   It will be closed in 4 weeks if no further activity occurs. If this issue is 
still
   relevant, please simply write any comment. Even if closed, you can still 
revive the
   issue at any time or discuss it on the d...@druid.apache.org list.
   Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [I] [FR] Jupysql integration (druid)

2024-02-02 Thread via GitHub


github-actions[bot] commented on issue #13700:
URL: https://github.com/apache/druid/issues/13700#issuecomment-1924931060

   This issue has been marked as stale due to 280 days of inactivity.
   It will be closed in 4 weeks if no further activity occurs. If this issue is 
still
   relevant, please simply write any comment. Even if closed, you can still 
revive the
   issue at any time or discuss it on the d...@druid.apache.org list.
   Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Add configuration for max S3 connections (druid)

2024-02-02 Thread via GitHub


github-actions[bot] commented on PR #13697:
URL: https://github.com/apache/druid/pull/13697#issuecomment-1924931050

   This pull request/issue has been closed due to lack of activity. If you 
think that
   is incorrect, or the pull request requires review, you can revive the PR at 
any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[PR] Bump io.grpc:grpc-netty-shaded from 1.57.2 to 1.61.1 (druid)

2024-02-02 Thread via GitHub


dependabot[bot] opened a new pull request, #15829:
URL: https://github.com/apache/druid/pull/15829

   Bumps [io.grpc:grpc-netty-shaded](https://github.com/grpc/grpc-java) from 
1.57.2 to 1.61.1.
   
   Release notes
   Sourced from https://github.com/grpc/grpc-java/releases;>io.grpc:grpc-netty-shaded's 
releases.
   
   v1.61.1
   Bug Fixes
   xds: Fix a bug in WeightedRoundRobinLoadBalancer policy that 
could raise NullPointerException and further cause channel panic 
when picking a subchannel. This bug can only be triggered when connection can 
not be established and the channel reports TRANSIENT_FAILURE 
state. (https://redirect.github.com/grpc/grpc-java/issues/10868;>#10868)
   v1.61.0
   API Changes
   
   Remove unused experimental API 
ManagedChannelBuilder.enableFullStreamDecompression (https://redirect.github.com/grpc/grpc-java/issues/10744;>#10744)
   api: Deprecate LoadBalancer.EMPTY_PICKER added in 1.58.0 in favor of 
FixedResultPicker (860b5cb1f)
   
   New Features
   
   binder: Experimental support for asynchronous security policies (https://redirect.github.com/grpc/grpc-java/issues/10566;>#10566)
   
   Improvements
   
   core: reduce CompositeReadableBuffer allocation (https://redirect.github.com/grpc/grpc-java/issues/3279;>#3279)
   core: Improve error message clarity when a channel leak is detected 
(201893f5e)
   util: use shared index across round_robin pickers 
(dca89b25b). This makes its implementation more similar to 
weighted_round_robin.
   xds: Implement ADS stream flow control mechanism (https://redirect.github.com/grpc/grpc-java/issues/10674;>#10674). 
This limits the maximum memory consumed if the control plane sends updates more 
rapidly than they can be processed.
   
   Bug Fixes
   
   core: Check outbound maximum message size for the compressed size in 
addition to the already-checked uncompressed size (https://redirect.github.com/grpc/grpc-java/issues/10739;>#10739). 
Fixed the status code to be RESOURCE_EXHAUSTED instead of UNKNOWN.
   util: Fix NPE when multiple addresses are in an address group for 
petiole load balancer policies (https://redirect.github.com/grpc/grpc-java/issues/10769;>#10769)
   util: Disable publishing of fixtures (8ac43dd81). The Gradle test 
fixtures are for use by grpc-java's internal tests.
   okhttp: Ignore known conscrypt socket close issue (https://redirect.github.com/grpc/grpc-java/issues/10812;>#10812). 
This stops an exception from being thrown when a known Conscrypt 
synchronization issue happens.
   
   Dependencies
   
   Drop support for Bazel 5 (55a9c012c). Bazel 7 is available, and Protobuf 
has already dropped support for Bazel 5.
   Change many compile deps to runtime deps (d6830d7f9). This reduces the 
transitive classes leaked into the compile classpath. In 
particular, grpc-core (io.grpc.internal) will be less frequently 
included transitively at compile time.
   Upgrade dependencies (c985797d9)
   
   Protobuf to 3.25.1
   auto-value-annotations to 1.10.4
   error_prone_annotations to 2.23.0
   proto-google-common-protos to 2.29.0
   google-cloud-logging to 3.15.14
   guava to 32.1.3-android
   okio to 3.4.0
   
   
   
   Acknowledgements
   
   https://github.com/Gordiychuk;>@​Gordiychuk
   https://github.com/jroper;>@​jroper
   https://github.com/jyane;>@​jyane
   https://github.com/ulfjack;>@​ulfjack
   
   v1.60.2
   Bug Fixes
   xds: Fix a bug in WeightedRoundRobinLoadBalancer policy that 
could raise NullPointerException and further cause channel panic 
when picking a subchannel. This bug can only be triggered when connection can 
not be established and the channel reports TRANSIENT_FAILURE 
state. (https://redirect.github.com/grpc/grpc-java/issues/10868;>#10868)
   v1.60.1
   Bug Fixes
   
   util: Fix NPE when multiple addresses in an address group for petiole 
load balancer policies (https://redirect.github.com/grpc/grpc-java/issues/10770;>#10770)
   
   
   
   ... (truncated)
   
   
   Commits
   
   https://github.com/grpc/grpc-java/commit/dfff9a9475a394c455742a9b45a69cf6d9640cf5;>dfff9a9
 Bump version to 1.61.1
   https://github.com/grpc/grpc-java/commit/df1bb36ea4c3b51d23c78a6e64b41de9d57d81e8;>df1bb36
 Update README etc to reference 1.61.1
   https://github.com/grpc/grpc-java/commit/1abdaf36e8c363b5f0d445b8c08f49cbb69dd55f;>1abdaf3
 xds: fix NPE in wrr in TF state (https://redirect.github.com/grpc/grpc-java/issues/10875;>#10875)
   https://github.com/grpc/grpc-java/commit/529d0ab330395bf66a5e3942de314bedb95d8bd8;>529d0ab
 Bump version to 1.61.1-SNAPSHOT
   https://github.com/grpc/grpc-java/commit/f06abeb6fda2985c19e9792edb6f2b071b6fd8b5;>f06abeb
 Bump version to 1.61.0
   https://github.com/grpc/grpc-java/commit/77005107aaa1847cdbac0919d5e73752649c3485;>7700510
 Update README protoc references to 3.25.1
   https://github.com/grpc/grpc-java/commit/c639b8161bd16d01ddec2a42ee3bf68c3a49e296;>c639b81
 Update README etc to reference 1.61.0
   

Re: [PR] Bump io.grpc:grpc-netty-shaded from 1.57.2 to 1.61.0 (druid)

2024-02-02 Thread via GitHub


dependabot[bot] commented on PR #15678:
URL: https://github.com/apache/druid/pull/15678#issuecomment-1924929767

   Superseded by #15829.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



(druid) branch dependabot/maven/io.grpc-grpc-netty-shaded-1.61.0 deleted (was bb43982ca22)

2024-02-02 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/io.grpc-grpc-netty-shaded-1.61.0
in repository https://gitbox.apache.org/repos/asf/druid.git


 was bb43982ca22 Bump io.grpc:grpc-netty-shaded from 1.57.2 to 1.61.0

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



(druid) branch dependabot/maven/io.grpc-grpc-netty-shaded-1.61.1 created (now f0e4192a3ef)

2024-02-02 Thread github-bot
This is an automated email from the ASF dual-hosted git repository.

github-bot pushed a change to branch 
dependabot/maven/io.grpc-grpc-netty-shaded-1.61.1
in repository https://gitbox.apache.org/repos/asf/druid.git


  at f0e4192a3ef Bump io.grpc:grpc-netty-shaded from 1.57.2 to 1.61.1

No new revisions were added by this update.


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Bump io.grpc:grpc-netty-shaded from 1.57.2 to 1.61.0 (druid)

2024-02-02 Thread via GitHub


dependabot[bot] closed pull request #15678: Bump io.grpc:grpc-netty-shaded from 
1.57.2 to 1.61.0
URL: https://github.com/apache/druid/pull/15678


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Sql Single Value Aggregator for scalar queries (druid)

2024-02-02 Thread via GitHub


somu-imply commented on code in PR #15700:
URL: https://github.com/apache/druid/pull/15700#discussion_r1476870634


##
processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java:
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * This AggregatorFactory is meant to wrap the subquery used as an expression 
into a single value
+ * and is expected to throw an exception when the subquery results in more 
than one row
+ *
+ * 
+ * This consumes columnType as well along with name and fieldName to pass it 
on to underlying
+ * {@link SingleValueBufferAggregator} to work with different ColumnTypes
+ */
+@JsonTypeName("singleValue")
+public class SingleValueAggregatorFactory extends AggregatorFactory
+{
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final String name;
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final String fieldName;
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final ColumnType columnType;
+
+  public static final int DEFAULT_MAX_STRING_SIZE = 1024;
+
+  @JsonCreator
+  public SingleValueAggregatorFactory(
+  @JsonProperty("name") String name,
+  @JsonProperty("fieldName") final String fieldName,
+  @JsonProperty("columnType") final ColumnType columnType
+  )
+  {
+Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator 
name");
+Preconditions.checkNotNull(fieldName, "Must have a valid, non-null 
fieldName");
+
+this.name = name;
+this.fieldName = fieldName;
+this.columnType = columnType;
+  }
+
+  @Override
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
+  {
+ColumnValueSelector selector = 
metricFactory.makeColumnValueSelector(fieldName);
+return new SingleValueAggregator(selector);
+  }
+
+  @Override
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory 
metricFactory)
+  {
+ColumnValueSelector selector = 
metricFactory.makeColumnValueSelector(fieldName);
+ColumnCapabilities columnCapabilities = 
metricFactory.getColumnCapabilities(fieldName);
+ColumnType columnType = new ColumnType(columnCapabilities.getType(), null, 
null);
+return new SingleValueBufferAggregator(selector, columnType);
+  }
+
+  @Override
+  public Comparator getComparator()
+  {
+throw DruidException.defensive("Single Value Aggregator would not have 
more than one row to compare");
+  }
+
+  @Override
+  @Nullable
+  public Object combine(@Nullable Object lhs, @Nullable Object rhs)

Review Comment:
   IMO this would never be called as there's nothing to group. This runs on the 
broker level on the result of the subquery. We should throw an exception in the 
`getCombiningFactory` as well because that also is also not needed



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Extension to read and ingest Delta Lake tables (druid)

2024-02-02 Thread via GitHub


vkorukanti commented on code in PR #15755:
URL: https://github.com/apache/druid/pull/15755#discussion_r1476750537


##
extensions-contrib/druid-deltalake-extensions/src/main/java/org/apache/druid/delta/input/DeltaInputSourceReader.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.delta.input;
+
+import io.delta.kernel.data.FilteredColumnarBatch;
+import io.delta.kernel.data.Row;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.data.input.InputRowListPlusRawValues;
+import org.apache.druid.data.input.InputRowSchema;
+import org.apache.druid.data.input.InputSourceReader;
+import org.apache.druid.data.input.InputStats;
+import org.apache.druid.java.util.common.parsers.CloseableIterator;
+
+import java.io.IOException;
+import java.util.NoSuchElementException;
+
+/**
+ * A reader for the Delta Lake input source. It initializes an iterator {@link 
DeltaInputSourceIterator}
+ * for a subset of Delta records given by {@link FilteredColumnarBatch} and 
schema {@link InputRowSchema}.
+ *
+ */
+public class DeltaInputSourceReader implements InputSourceReader
+{
+  private final io.delta.kernel.utils.CloseableIterator 
filteredColumnarBatchCloseableIterator;
+  private final InputRowSchema inputRowSchema;
+
+  public DeltaInputSourceReader(
+  io.delta.kernel.utils.CloseableIterator 
filteredColumnarBatchCloseableIterator,
+  InputRowSchema inputRowSchema
+  )
+  {
+this.filteredColumnarBatchCloseableIterator = 
filteredColumnarBatchCloseableIterator;
+this.inputRowSchema = inputRowSchema;
+  }
+
+  @Override
+  public CloseableIterator read()
+  {
+return new 
DeltaInputSourceIterator(filteredColumnarBatchCloseableIterator, 
inputRowSchema);
+  }
+
+  @Override
+  public CloseableIterator read(InputStats inputStats)
+  {
+return new 
DeltaInputSourceIterator(filteredColumnarBatchCloseableIterator, 
inputRowSchema);
+  }
+
+  @Override
+  public CloseableIterator sample()
+  {
+
+CloseableIterator inner = read();
+return new CloseableIterator()
+{
+  @Override
+  public void close() throws IOException
+  {
+inner.close();
+  }
+
+  @Override
+  public boolean hasNext()
+  {
+return inner.hasNext();
+  }
+
+  @Override
+  public InputRowListPlusRawValues next()
+  {
+DeltaInputRow deltaInputRow = (DeltaInputRow) inner.next();
+return InputRowListPlusRawValues.of(deltaInputRow, 
deltaInputRow.getRawRowAsMap());
+  }
+};
+  }
+
+  private static class DeltaInputSourceIterator implements 
CloseableIterator
+  {
+private final 
io.delta.kernel.utils.CloseableIterator 
filteredColumnarBatchCloseableIterator;
+
+private io.delta.kernel.utils.CloseableIterator currentBatch = null;
+private final InputRowSchema inputRowSchema;
+
+public DeltaInputSourceIterator(
+io.delta.kernel.utils.CloseableIterator 
filteredColumnarBatchCloseableIterator,
+InputRowSchema inputRowSchema
+)
+{
+  this.filteredColumnarBatchCloseableIterator = 
filteredColumnarBatchCloseableIterator;
+  this.inputRowSchema = inputRowSchema;
+}
+
+@Override
+public boolean hasNext()
+{
+  while (currentBatch == null || !currentBatch.hasNext()) {
+if (!filteredColumnarBatchCloseableIterator.hasNext()) {
+  return false; // No more batches or records to read!
+}
+currentBatch = filteredColumnarBatchCloseableIterator.next().getRows();

Review Comment:
   one qn: do we want to check if the `currentBatch` can return `hasNext` to 
true? because it could contain zero rows due to the selection vector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: 

[PR] Handling cast boolean to integer (druid)

2024-02-02 Thread via GitHub


somu-imply opened a new pull request, #15828:
URL: https://github.com/apache/druid/pull/15828

   Previously queries like `SELECT CAST(TRUE as INTEGER)` would throw errors. 
This PR aims to support it by allowing lenient conformance 
   
   This PR has:
   
   - [ ] been self-reviewed.
  - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] a release note entry in the PR description.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/dev/license.md)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Extension to read and ingest Delta Lake tables (druid)

2024-02-02 Thread via GitHub


vkorukanti commented on code in PR #15755:
URL: https://github.com/apache/druid/pull/15755#discussion_r1476750537


##
extensions-contrib/druid-deltalake-extensions/src/main/java/org/apache/druid/delta/input/DeltaInputSourceReader.java:
##
@@ -0,0 +1,138 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.delta.input;
+
+import io.delta.kernel.data.FilteredColumnarBatch;
+import io.delta.kernel.data.Row;
+import org.apache.druid.data.input.InputRow;
+import org.apache.druid.data.input.InputRowListPlusRawValues;
+import org.apache.druid.data.input.InputRowSchema;
+import org.apache.druid.data.input.InputSourceReader;
+import org.apache.druid.data.input.InputStats;
+import org.apache.druid.java.util.common.parsers.CloseableIterator;
+
+import java.io.IOException;
+import java.util.NoSuchElementException;
+
+/**
+ * A reader for the Delta Lake input source. It initializes an iterator {@link 
DeltaInputSourceIterator}
+ * for a subset of Delta records given by {@link FilteredColumnarBatch} and 
schema {@link InputRowSchema}.
+ *
+ */
+public class DeltaInputSourceReader implements InputSourceReader
+{
+  private final io.delta.kernel.utils.CloseableIterator 
filteredColumnarBatchCloseableIterator;
+  private final InputRowSchema inputRowSchema;
+
+  public DeltaInputSourceReader(
+  io.delta.kernel.utils.CloseableIterator 
filteredColumnarBatchCloseableIterator,
+  InputRowSchema inputRowSchema
+  )
+  {
+this.filteredColumnarBatchCloseableIterator = 
filteredColumnarBatchCloseableIterator;
+this.inputRowSchema = inputRowSchema;
+  }
+
+  @Override
+  public CloseableIterator read()
+  {
+return new 
DeltaInputSourceIterator(filteredColumnarBatchCloseableIterator, 
inputRowSchema);
+  }
+
+  @Override
+  public CloseableIterator read(InputStats inputStats)
+  {
+return new 
DeltaInputSourceIterator(filteredColumnarBatchCloseableIterator, 
inputRowSchema);
+  }
+
+  @Override
+  public CloseableIterator sample()
+  {
+
+CloseableIterator inner = read();
+return new CloseableIterator()
+{
+  @Override
+  public void close() throws IOException
+  {
+inner.close();
+  }
+
+  @Override
+  public boolean hasNext()
+  {
+return inner.hasNext();
+  }
+
+  @Override
+  public InputRowListPlusRawValues next()
+  {
+DeltaInputRow deltaInputRow = (DeltaInputRow) inner.next();
+return InputRowListPlusRawValues.of(deltaInputRow, 
deltaInputRow.getRawRowAsMap());
+  }
+};
+  }
+
+  private static class DeltaInputSourceIterator implements 
CloseableIterator
+  {
+private final 
io.delta.kernel.utils.CloseableIterator 
filteredColumnarBatchCloseableIterator;
+
+private io.delta.kernel.utils.CloseableIterator currentBatch = null;
+private final InputRowSchema inputRowSchema;
+
+public DeltaInputSourceIterator(
+io.delta.kernel.utils.CloseableIterator 
filteredColumnarBatchCloseableIterator,
+InputRowSchema inputRowSchema
+)
+{
+  this.filteredColumnarBatchCloseableIterator = 
filteredColumnarBatchCloseableIterator;
+  this.inputRowSchema = inputRowSchema;
+}
+
+@Override
+public boolean hasNext()
+{
+  while (currentBatch == null || !currentBatch.hasNext()) {
+if (!filteredColumnarBatchCloseableIterator.hasNext()) {
+  return false; // No more batches or records to read!
+}
+currentBatch = filteredColumnarBatchCloseableIterator.next().getRows();

Review Comment:
   one qn: do we want to check if the `currentBatch` can return `hasNext` to 
true? because it could contain zero rows due to the selection vector.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: 

Re: [PR] Addressing shapeshifting issues with window functions (druid)

2024-02-02 Thread via GitHub


somu-imply commented on PR #15807:
URL: https://github.com/apache/druid/pull/15807#issuecomment-1924571571

   Although the comment is there, I don't see any loss of generalization for 
calling the super's default method for these Segments. Updated the occurrences


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Updating and removing unnecessary white spaces on use case page [druid-website-src]

2024-02-02 Thread via GitHub


2bethere commented on PR #457:
URL: 
https://github.com/apache/druid-website-src/pull/457#issuecomment-1924285125

   Looks like this now:
   https://github.com/apache/druid-website-src/assets/1326022/062c83c4-aa2e-4544-a743-45741e7b0e6e;>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Updating and removing unnecessary white spaces on use case page [druid-website-src]

2024-02-02 Thread via GitHub


2bethere commented on PR #457:
URL: 
https://github.com/apache/druid-website-src/pull/457#issuecomment-1924268383

   Can do, will modify.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



(druid) branch 29.0.0 updated (d27c38b59b1 -> f1c29f6904e)

2024-02-02 Thread abhishekrb
This is an automated email from the ASF dual-hosted git repository.

abhishekrb pushed a change to branch 29.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git


from d27c38b59b1 [Backport] docs: concurrent append copyedits (#15821)
 new b44bd34762a Add range filtering support for iceberg ingestion (#15782)
 new f1c29f6904e Update input-sources.md for fixing the warehouse path 
example under S3 (#15823)

The 2 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.


Summary of changes:
 docs/ingestion/input-sources.md| 13 +++-
 .../druid-iceberg-extensions/pom.xml   |  4 --
 .../apache/druid/iceberg/filter/IcebergFilter.java |  3 +-
 ...IntervalFilter.java => IcebergRangeFilter.java} | 69 +++---
 .../iceberg/filter/IcebergRangeFilterTest.java | 68 +
 website/.spelling  |  2 +
 6 files changed, 119 insertions(+), 40 deletions(-)
 copy 
extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/filter/{IcebergIntervalFilter.java
 => IcebergRangeFilter.java} (51%)
 create mode 100644 
extensions-contrib/druid-iceberg-extensions/src/test/java/org/apache/druid/iceberg/filter/IcebergRangeFilterTest.java


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



(druid) 02/02: Update input-sources.md for fixing the warehouse path example under S3 (#15823)

2024-02-02 Thread abhishekrb
This is an automated email from the ASF dual-hosted git repository.

abhishekrb pushed a commit to branch 29.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git

commit f1c29f6904eb972379c2f3ecc611b4e2a87e7254
Author: Aru Raghuwanshi <35107623+aruraghuwan...@users.noreply.github.com>
AuthorDate: Thu Feb 1 23:32:05 2024 -0800

Update input-sources.md for fixing the warehouse path example under S3 
(#15823)
---
 docs/ingestion/input-sources.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/ingestion/input-sources.md b/docs/ingestion/input-sources.md
index 0adc001cd23..f34dbbfe713 100644
--- a/docs/ingestion/input-sources.md
+++ b/docs/ingestion/input-sources.md
@@ -889,7 +889,7 @@ The following is a sample spec for a S3 warehouse source:
 "namespace": "iceberg_namespace",
 "icebergCatalog": {
   "type": "hive",
-  "warehousePath": "hdfs://warehouse/path",
+  "warehousePath": "s3://warehouse/path",
   "catalogUri": "thrift://hive-metastore.x.com:8970",
   "catalogProperties": {
 "hive.metastore.connect.retries": "1",


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



(druid) 01/02: Add range filtering support for iceberg ingestion (#15782)

2024-02-02 Thread abhishekrb
This is an automated email from the ASF dual-hosted git repository.

abhishekrb pushed a commit to branch 29.0.0
in repository https://gitbox.apache.org/repos/asf/druid.git

commit b44bd34762a380fb04235e48600e8c265ebc5e65
Author: Atul Mohan 
AuthorDate: Thu Feb 1 23:32:30 2024 -0800

Add range filtering support for iceberg ingestion (#15782)

* Add range filtering support for iceberg ingestion

* Docs formatting

* Spelling
---
 docs/ingestion/input-sources.md| 11 +++
 .../druid-iceberg-extensions/pom.xml   |  4 -
 .../apache/druid/iceberg/filter/IcebergFilter.java |  3 +-
 .../druid/iceberg/filter/IcebergRangeFilter.java   | 93 ++
 .../iceberg/filter/IcebergRangeFilterTest.java | 68 
 website/.spelling  |  2 +
 6 files changed, 176 insertions(+), 5 deletions(-)

diff --git a/docs/ingestion/input-sources.md b/docs/ingestion/input-sources.md
index d4693e7925a..0adc001cd23 100644
--- a/docs/ingestion/input-sources.md
+++ b/docs/ingestion/input-sources.md
@@ -1013,6 +1013,17 @@ This input source provides the following filters: `and`, 
`equals`, `interval`, a
 |type|Set this value to `not`.|yes|
 |filter|The iceberg filter on which logical NOT is applied|yes|
 
+`range` Filter:
+
+|Property|Description|Default|Required|
+||---|---||
+|type|Set this value to `range`.|None|yes|
+|filterColumn|The column name from the iceberg table schema based on which 
range filtering needs to happen.|None|yes|
+|lower|Lower bound value to match.|None|no. At least one of `lower` or `upper` 
must not be null.|
+|upper|Upper bound value to match. |None|no. At least one of `lower` or 
`upper` must not be null.|
+|lowerOpen|Boolean indicating if lower bound is open in the interval of values 
defined by the range (">" instead of ">="). |false|no|
+|upperOpen|Boolean indicating if upper bound is open on the interval of values 
defined by range ("<" instead of "<="). |false|no|
+
 ## Delta Lake input source
 
 :::info
diff --git a/extensions-contrib/druid-iceberg-extensions/pom.xml 
b/extensions-contrib/druid-iceberg-extensions/pom.xml
index 577d56978bc..d5be8235d4a 100644
--- a/extensions-contrib/druid-iceberg-extensions/pom.xml
+++ b/extensions-contrib/druid-iceberg-extensions/pom.xml
@@ -206,10 +206,6 @@
   org.slf4j
   slf4j-reload4j
 
-
-  com.google.re2j
-  re2j
-
 
   com.google.code.gson
   gson
diff --git 
a/extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/filter/IcebergFilter.java
 
b/extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/filter/IcebergFilter.java
index 10c07cdbe24..cff8797d60b 100644
--- 
a/extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/filter/IcebergFilter.java
+++ 
b/extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/filter/IcebergFilter.java
@@ -33,7 +33,8 @@ import org.apache.iceberg.expressions.Expression;
 @JsonSubTypes.Type(name = "equals", value = IcebergEqualsFilter.class),
 @JsonSubTypes.Type(name = "and", value = IcebergAndFilter.class),
 @JsonSubTypes.Type(name = "not", value = IcebergNotFilter.class),
-@JsonSubTypes.Type(name = "or", value = IcebergOrFilter.class)
+@JsonSubTypes.Type(name = "or", value = IcebergOrFilter.class),
+@JsonSubTypes.Type(name = "range", value = IcebergRangeFilter.class)
 })
 public interface IcebergFilter
 {
diff --git 
a/extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/filter/IcebergRangeFilter.java
 
b/extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/filter/IcebergRangeFilter.java
new file mode 100644
index 000..2e3f42dbba9
--- /dev/null
+++ 
b/extensions-contrib/druid-iceberg-extensions/src/main/java/org/apache/druid/iceberg/filter/IcebergRangeFilter.java
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.iceberg.filter;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import 

Re: [PR] [Backport] Add range filtering support for Iceberg ingestion and fixup typo in example (druid)

2024-02-02 Thread via GitHub


abhishekrb19 merged PR #15824:
URL: https://github.com/apache/druid/pull/15824


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] CVE Fix: Update json-path version (druid)

2024-02-02 Thread via GitHub


xvrl commented on code in PR #15772:
URL: https://github.com/apache/druid/pull/15772#discussion_r1476301395


##
extensions-core/orc-extensions/src/test/java/org/apache/druid/data/input/orc/OrcReaderTest.java:
##
@@ -314,8 +314,8 @@ public void testJsonPathFunctions() throws IOException
 //deviation of [7,8,9] is 1/3, stddev is sqrt(1/3), approximately 
0.8165
 Assert.assertEquals(0.8165, 
Double.parseDouble(Iterables.getOnlyElement(row.getDimension("stddev"))), 
0.0001);
 
-//append is not supported
-Assert.assertEquals(Collections.emptyList(), 
row.getDimension("append"));
+//Support for append has been added in json-path-2.9.0
+//Assert.assertEquals(Collections.emptyList(), 
row.getDimension("append"));

Review Comment:
   understood, but there are two issues here:
   1. as a principle we don't leave commented code unless there's a very good 
reason.
   2. This is potentially being breaking change. Any json-path expression using 
append today would have silently been ignored, but would now cause an error.
   
   My suggestion would be to:
   1. update the test to check that we do get the exception
   2. make a note in the PR description that we should call this out in the 
release notes as a potential issue to watch out for when upgrading. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



(druid) branch master updated: Add QueryLifecycle#authorize for grpc-query-extension (#15816)

2024-02-02 Thread abhishek
This is an automated email from the ASF dual-hosted git repository.

abhishek pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
 new de959e513d1 Add QueryLifecycle#authorize for grpc-query-extension 
(#15816)
de959e513d1 is described below

commit de959e513d120348dfbfcf3e13d265b200a7db6c
Author: Rishabh Singh <6513075+findingr...@users.noreply.github.com>
AuthorDate: Fri Feb 2 21:49:57 2024 +0530

Add QueryLifecycle#authorize for grpc-query-extension (#15816)

Proposal #13469

Original PR #14024

A new method is being added in QueryLifecycle class to authorise a query 
based on authentication result.
This method is required since we authenticate the query by intercepting it 
in the grpc extension and pass down the authentication result.
---
 .../org/apache/druid/server/QueryLifecycle.java| 31 +
 .../apache/druid/server/QueryLifecycleTest.java| 74 --
 2 files changed, 85 insertions(+), 20 deletions(-)

diff --git a/server/src/main/java/org/apache/druid/server/QueryLifecycle.java 
b/server/src/main/java/org/apache/druid/server/QueryLifecycle.java
index 0f46e5da4d9..e0bb9875240 100644
--- a/server/src/main/java/org/apache/druid/server/QueryLifecycle.java
+++ b/server/src/main/java/org/apache/druid/server/QueryLifecycle.java
@@ -242,6 +242,37 @@ public class QueryLifecycle
 );
   }
 
+  /**
+   * Authorize the query using the authentication result.
+   * Will return an Access object denoting whether the query is authorized or 
not.
+   * This method is to be used by the grpc-query-extension.
+   *
+   * @param authenticationResult authentication result indicating identity of 
the requester
+   * @return authorization result of requester
+   */
+  public Access authorize(AuthenticationResult authenticationResult)
+  {
+transition(State.INITIALIZED, State.AUTHORIZING);
+final Iterable resourcesToAuthorize = Iterables.concat(
+Iterables.transform(
+baseQuery.getDataSource().getTableNames(),
+AuthorizationUtils.DATASOURCE_READ_RA_GENERATOR
+),
+Iterables.transform(
+authConfig.contextKeysToAuthorize(userContextKeys),
+contextParam -> new ResourceAction(new Resource(contextParam, 
ResourceType.QUERY_CONTEXT), Action.WRITE)
+)
+);
+return doAuthorize(
+authenticationResult,
+AuthorizationUtils.authorizeAllResourceActions(
+authenticationResult,
+resourcesToAuthorize,
+authorizerMapper
+)
+);
+  }
+
   private void preAuthorized(final AuthenticationResult authenticationResult, 
final Access access)
   {
 // gotta transition those states, even if we are already authorized
diff --git 
a/server/src/test/java/org/apache/druid/server/QueryLifecycleTest.java 
b/server/src/test/java/org/apache/druid/server/QueryLifecycleTest.java
index 578661ea768..a2691297d0b 100644
--- a/server/src/test/java/org/apache/druid/server/QueryLifecycleTest.java
+++ b/server/src/test/java/org/apache/druid/server/QueryLifecycleTest.java
@@ -188,15 +188,15 @@ public class QueryLifecycleTest
 
EasyMock.expect(authenticationResult.getIdentity()).andReturn(IDENTITY).anyTimes();
 
EasyMock.expect(authenticationResult.getAuthorizerName()).andReturn(AUTHORIZER).anyTimes();
 EasyMock.expect(authorizer.authorize(authenticationResult, new 
Resource(DATASOURCE, ResourceType.DATASOURCE), Action.READ))
-.andReturn(Access.OK);
+.andReturn(Access.OK).times(2);
 EasyMock.expect(authorizer.authorize(authenticationResult, new 
Resource("foo", ResourceType.QUERY_CONTEXT), Action.WRITE))
-.andReturn(Access.OK);
+.andReturn(Access.OK).times(2);
 EasyMock.expect(authorizer.authorize(authenticationResult, new 
Resource("baz", ResourceType.QUERY_CONTEXT), Action.WRITE))
-.andReturn(Access.OK);
+.andReturn(Access.OK).times(2);
 
 EasyMock.expect(toolChestWarehouse.getToolChest(EasyMock.anyObject()))
 .andReturn(toolChest)
-.once();
+.times(2);
 
 replayAll();
 
@@ -223,6 +223,10 @@ public class QueryLifecycleTest
 );
 
 Assert.assertTrue(lifecycle.authorize(mockRequest()).isAllowed());
+
+lifecycle = createLifecycle(authConfig);
+lifecycle.initialize(query);
+Assert.assertTrue(lifecycle.authorize(authenticationResult).isAllowed());
   }
 
   @Test
@@ -232,13 +236,15 @@ public class QueryLifecycleTest
 
EasyMock.expect(authenticationResult.getIdentity()).andReturn(IDENTITY).anyTimes();
 
EasyMock.expect(authenticationResult.getAuthorizerName()).andReturn(AUTHORIZER).anyTimes();
 EasyMock.expect(authorizer.authorize(authenticationResult, new 
Resource(DATASOURCE, ResourceType.DATASOURCE), Action.READ))
-.andReturn(Access.OK);
+

Re: [PR] Add QueryLifecycle#authorize for grpc-query-extension (druid)

2024-02-02 Thread via GitHub


abhishekagarwal87 merged PR #15816:
URL: https://github.com/apache/druid/pull/15816


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] [Backport] Strict window frame checks (druid)

2024-02-02 Thread via GitHub


kgyrtkirk closed pull request #15826: [Backport] Strict window frame checks
URL: https://github.com/apache/druid/pull/15826


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[PR] [Backport] Strict window frame checks (druid)

2024-02-02 Thread via GitHub


kgyrtkirk opened a new pull request, #15827:
URL: https://github.com/apache/druid/pull/15827

   Backport of #15746 to 29.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[PR] [Backport] Strict window frame checks (druid)

2024-02-02 Thread via GitHub


kgyrtkirk opened a new pull request, #15826:
URL: https://github.com/apache/druid/pull/15826

   Backport of #15746 to 28.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



(druid) branch master updated: Strict window frame checks (#15746)

2024-02-02 Thread abhishek
This is an automated email from the ASF dual-hosted git repository.

abhishek pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/druid.git


The following commit(s) were added to refs/heads/master by this push:
 new 8f5b7522c72 Strict window frame checks (#15746)
8f5b7522c72 is described below

commit 8f5b7522c72f33afdd36c7b70e4de85f455d65a1
Author: Zoltan Haindrich 
AuthorDate: Fri Feb 2 11:51:53 2024 +0100

Strict window frame checks (#15746)

introduce checks to ensure that window frame is supported
added check to ensure that no expressions are set as bounds
added logic to detect following/following like cases - described in Window 
function fails to demarcate if 2 following are used #15739
currently RANGE frames are only supported correctly if both endpoints are 
unbounded or current row Offset based window range support #15767
added windowingStrictValidation context key to provide a way to override 
the check
---
 .../java/org/apache/druid/query/QueryContext.java  | 10 +++
 .../java/org/apache/druid/query/QueryContexts.java |  3 +
 .../org/apache/druid/query/QueryContextsTest.java  |  9 ++
 .../sql/calcite/planner/DruidSqlValidator.java | 99 +-
 .../druid/sql/calcite/planner/PlannerContext.java  |  2 +-
 .../druid/sql/calcite/BaseCalciteQueryTest.java| 10 +++
 .../apache/druid/sql/calcite/CalciteQueryTest.java | 46 ++
 .../druid/sql/calcite/CalciteWindowQueryTest.java  |  7 +-
 .../druid/sql/calcite/DrillWindowQueryTest.java|  2 +-
 .../calcite/expression/ExpressionTestHelper.java   |  4 +-
 10 files changed, 185 insertions(+), 7 deletions(-)

diff --git a/processing/src/main/java/org/apache/druid/query/QueryContext.java 
b/processing/src/main/java/org/apache/druid/query/QueryContext.java
index 6fa73f0b65c..3364512052b 100644
--- a/processing/src/main/java/org/apache/druid/query/QueryContext.java
+++ b/processing/src/main/java/org/apache/druid/query/QueryContext.java
@@ -27,6 +27,7 @@ import org.apache.druid.query.QueryContexts.Vectorize;
 import org.apache.druid.segment.QueryableIndexStorageAdapter;
 
 import javax.annotation.Nullable;
+
 import java.io.IOException;
 import java.util.Collections;
 import java.util.Map;
@@ -582,6 +583,15 @@ public class QueryContext
 );
   }
 
+  public boolean isWindowingStrictValidation()
+  {
+return getBoolean(
+QueryContexts.WINDOWING_STRICT_VALIDATION,
+QueryContexts.DEFAULT_WINDOWING_STRICT_VALIDATION
+);
+  }
+
+
   public String getBrokerServiceName()
   {
 return getString(QueryContexts.BROKER_SERVICE_NAME);
diff --git a/processing/src/main/java/org/apache/druid/query/QueryContexts.java 
b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
index 7d39edfde66..0359b13e2f0 100644
--- a/processing/src/main/java/org/apache/druid/query/QueryContexts.java
+++ b/processing/src/main/java/org/apache/druid/query/QueryContexts.java
@@ -85,6 +85,8 @@ public class QueryContexts
   public static final String SERIALIZE_DATE_TIME_AS_LONG_INNER_KEY = 
"serializeDateTimeAsLongInner";
   public static final String UNCOVERED_INTERVALS_LIMIT_KEY = 
"uncoveredIntervalsLimit";
   public static final String MIN_TOP_N_THRESHOLD = "minTopNThreshold";
+  public static final String WINDOWING_STRICT_VALIDATION = 
"windowingStrictValidation";
+
 
   // SQL query context keys
   public static final String CTX_SQL_QUERY_ID = BaseQuery.SQL_QUERY_ID;
@@ -117,6 +119,7 @@ public class QueryContexts
   public static final boolean DEFAULT_ENABLE_DEBUG = false;
   public static final int DEFAULT_IN_SUB_QUERY_THRESHOLD = Integer.MAX_VALUE;
   public static final boolean DEFAULT_ENABLE_TIME_BOUNDARY_PLANNING = false;
+  public static final boolean DEFAULT_WINDOWING_STRICT_VALIDATION = true;
 
   @SuppressWarnings("unused") // Used by Jackson serialization
   public enum Vectorize
diff --git 
a/processing/src/test/java/org/apache/druid/query/QueryContextsTest.java 
b/processing/src/test/java/org/apache/druid/query/QueryContextsTest.java
index 38b5384ded9..09496ece9f2 100644
--- a/processing/src/test/java/org/apache/druid/query/QueryContextsTest.java
+++ b/processing/src/test/java/org/apache/druid/query/QueryContextsTest.java
@@ -151,6 +151,15 @@ public class QueryContextsTest
 );
   }
 
+  @Test
+  public void testDefaultWindowingStrictValidation()
+  {
+Assert.assertEquals(
+QueryContexts.DEFAULT_WINDOWING_STRICT_VALIDATION,
+QueryContext.empty().isWindowingStrictValidation()
+);
+  }
+
   @Test
   public void testGetEnableJoinLeftScanDirect()
   {
diff --git 
a/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java 
b/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
index 34f3c7410f3..390ddf96baf 100644
--- 
a/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java
+++ 
b/sql/src/main/java/org/apache/druid/sql/calcite/planner/DruidSqlValidator.java

Re: [PR] Strict window frame checks (druid)

2024-02-02 Thread via GitHub


abhishekagarwal87 merged PR #15746:
URL: https://github.com/apache/druid/pull/15746


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] fix: handle BOOKMARK events in kubernetes pod discovery (druid)

2024-02-02 Thread via GitHub


lkm commented on code in PR #15819:
URL: https://github.com/apache/druid/pull/15819#discussion_r1475854162


##
extensions-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/DefaultK8sApiClient.java:
##
@@ -131,7 +131,7 @@ public boolean hasNext() throws SocketTimeoutException
   try {
 while (watch.hasNext()) {
   Watch.Response item = watch.next();
-  if (item != null && item.type != null) {
+  if (item != null && item.type != null && 
!item.type.equals(WatchResult.BOOKMARK)) {

Review Comment:
   We're actually not totally ignoring the result, we are updating the 
resourceVersion which helps to reduce load on Kubernetes API so it does make 
sense to get those. See for details 
https://kubernetes.io/docs/reference/using-api/api-concepts/#watch-bookmarks
   
   
https://github.com/apache/druid/blob/2e46a980245bc1f2694dd21af9425855e78632f9/extensions-core/kubernetes-extensions/src/main/java/org/apache/druid/k8s/discovery/K8sDruidNodeDiscoveryProvider.java#L282
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

2024-02-02 Thread via GitHub


adarshsanjeev commented on code in PR #15689:
URL: https://github.com/apache/druid/pull/15689#discussion_r1475852597


##
docs/multi-stage-query/reference.md:
##
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+ `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be 
exported.
+This variation of EXTERN requires one argument, the details of the destination 
as specified below.
+This variation additionally requires an `AS` clause to specify the format of 
the exported rows.
+
+INSERT statements and REPLACE statements are both supported with an `EXTERN` 
destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED 
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number 
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+  EXTERN()
+AS CSV
+SELECT
+  
+FROM 
+```
+
+REPLACE statements have an additional OVERWRITE clause. As partitioning is not 
yet supported, only `OVERWRITE ALL`
+is allowed. REPLACE deletes any currently existing files at the specified 
directory, and creates new files with the results of the query.
+
+
+```sql
+REPLACE INTO
+  EXTERN()
+AS CSV
+OVERWRITE ALL
+SELECT
+  
+FROM 
+```
+
+Exporting is currently supported for Amazon S3 storage. This can be done 
passing the function `S3()` as an argument to the `EXTERN` function. The 
`druid-s3-extensions` should be loaded.

Review Comment:
   Going through the currently used syntax, it would makes sense to use 
`FN(x=>'a')` for export as well, so this should . I've also added a line about 
using single quotes to the docs. The functions would not allow named 
parameters. I've added this information to the docs as well.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

2024-02-02 Thread via GitHub


adarshsanjeev commented on code in PR #15689:
URL: https://github.com/apache/druid/pull/15689#discussion_r1475840275


##
docs/multi-stage-query/reference.md:
##
@@ -90,6 +93,66 @@ can precede the column list: `EXTEND (timestamp VARCHAR...)`.
 
 For more information, see [Read external data with 
EXTERN](concepts.md#read-external-data-with-extern).
 
+ `EXTERN` to export to a destination
+
+`EXTERN` can be used to specify a destination, where the data needs to be 
exported.
+This variation of EXTERN requires one argument, the details of the destination 
as specified below.
+This variation additionally requires an `AS` clause to specify the format of 
the exported rows.
+
+INSERT statements and REPLACE statements are both supported with an `EXTERN` 
destination.
+Only `CSV` format is supported at the moment.
+Please note that partitioning (`PARTITIONED BY`) and clustering (`CLUSTERED 
BY`) is not currently supported with export statements.
+
+Export statements support the context parameter `rowsPerPage` for the number 
of rows in each exported file. The default value
+is 100,000.
+
+INSERT statements append the results to the existing files at the destination.
+```sql
+INSERT INTO
+  EXTERN()
+AS CSV
+SELECT
+  
+FROM 
+```
+
+REPLACE statements have an additional OVERWRITE clause. As partitioning is not 
yet supported, only `OVERWRITE ALL`
+is allowed. REPLACE deletes any currently existing files at the specified 
directory, and creates new files with the results of the query.
+
+
+```sql
+REPLACE INTO
+  EXTERN()
+AS CSV
+OVERWRITE ALL
+SELECT
+  
+FROM 
+```
+
+Exporting is currently supported for Amazon S3 storage. This can be done 
passing the function `S3()` as an argument to the `EXTERN` function. The 
`druid-s3-extensions` should be loaded.
+
+```sql
+INSERT INTO
+  EXTERN(S3(bucket=<...>, prefix=<...>, tempDir=<...>))

Review Comment:
   Changed the example here



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Sql Single Value Aggregator for scalar queries (druid)

2024-02-02 Thread via GitHub


LakshSingla commented on code in PR #15700:
URL: https://github.com/apache/druid/pull/15700#discussion_r1475661119


##
processing/src/main/java/org/apache/druid/query/aggregation/SingleValueAggregatorFactory.java:
##
@@ -0,0 +1,206 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.query.aggregation;
+
+import com.fasterxml.jackson.annotation.JsonCreator;
+import com.fasterxml.jackson.annotation.JsonInclude;
+import com.fasterxml.jackson.annotation.JsonProperty;
+import com.fasterxml.jackson.annotation.JsonTypeName;
+import com.google.common.base.Preconditions;
+import org.apache.druid.error.DruidException;
+import org.apache.druid.segment.ColumnSelectorFactory;
+import org.apache.druid.segment.ColumnValueSelector;
+import org.apache.druid.segment.column.ColumnCapabilities;
+import org.apache.druid.segment.column.ColumnType;
+import org.apache.druid.segment.column.ValueType;
+
+import javax.annotation.Nullable;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.List;
+import java.util.Objects;
+
+/**
+ * This AggregatorFactory is meant to wrap the subquery used as an expression 
into a single value
+ * and is expected to throw an exception when the subquery results in more 
than one row
+ *
+ * 
+ * This consumes columnType as well along with name and fieldName to pass it 
on to underlying
+ * {@link SingleValueBufferAggregator} to work with different ColumnTypes
+ */
+@JsonTypeName("singleValue")
+public class SingleValueAggregatorFactory extends AggregatorFactory
+{
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final String name;
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final String fieldName;
+
+  @JsonProperty
+  @JsonInclude(JsonInclude.Include.NON_NULL)
+  private final ColumnType columnType;
+
+  public static final int DEFAULT_MAX_STRING_SIZE = 1024;
+
+  @JsonCreator
+  public SingleValueAggregatorFactory(
+  @JsonProperty("name") String name,
+  @JsonProperty("fieldName") final String fieldName,
+  @JsonProperty("columnType") final ColumnType columnType
+  )
+  {
+Preconditions.checkNotNull(name, "Must have a valid, non-null aggregator 
name");
+Preconditions.checkNotNull(fieldName, "Must have a valid, non-null 
fieldName");
+
+this.name = name;
+this.fieldName = fieldName;
+this.columnType = columnType;
+  }
+
+  @Override
+  public Aggregator factorize(ColumnSelectorFactory metricFactory)
+  {
+ColumnValueSelector selector = 
metricFactory.makeColumnValueSelector(fieldName);
+return new SingleValueAggregator(selector);
+  }
+
+  @Override
+  public BufferAggregator factorizeBuffered(ColumnSelectorFactory 
metricFactory)
+  {
+ColumnValueSelector selector = 
metricFactory.makeColumnValueSelector(fieldName);
+ColumnCapabilities columnCapabilities = 
metricFactory.getColumnCapabilities(fieldName);
+ColumnType columnType = new ColumnType(columnCapabilities.getType(), null, 
null);
+return new SingleValueBufferAggregator(selector, columnType);
+  }
+
+  @Override
+  public Comparator getComparator()
+  {
+throw DruidException.defensive("Single Value Aggregator would not have 
more than one row to compare");
+  }
+
+  @Override
+  @Nullable
+  public Object combine(@Nullable Object lhs, @Nullable Object rhs)

Review Comment:
   What would happen if `lhs` represents something empty (i.e. hasn't 
encountered a single row), while `rhs` is something that has encountered a 
single row. In that case `rhs` should be returned. Similarly, the cases for 
when `lhs` represents something which means a single row, while `rhs` is empty; 
or when both are empty). 
   
   I am not sure if this will ever be called if one of the side represents 
"something empty", hence the confusion.
   This also is at odds with `getCombiningFactory`, since that doesn't throw 
upon being called, and returns a correct implementation. I assumed that both 
should have the same semantics. The calling patterns might be such that it's 
okay to not implement the .combine() and throw an exception here, and if so, I 
think there should be some comment explaining why. 



-- 
This is 

[PR] [Backport] Resolve CVE issues (druid)

2024-02-02 Thread via GitHub


LakshSingla opened a new pull request, #15825:
URL: https://github.com/apache/druid/pull/15825

   Backport of #15814 to 29.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



Re: [PR] Add export capabilities to MSQ with SQL syntax (druid)

2024-02-02 Thread via GitHub


adarshsanjeev commented on code in PR #15689:
URL: https://github.com/apache/druid/pull/15689#discussion_r1475764572


##
extensions-core/multi-stage-query/src/main/java/org/apache/druid/msq/exec/ControllerImpl.java:
##
@@ -1872,6 +1871,44 @@ private static QueryDefinition makeQueryDefinition(
   } else {
 return queryDef;
   }
+} else if (querySpec.getDestination() instanceof ExportMSQDestination) {
+  final ExportMSQDestination exportMSQDestination = (ExportMSQDestination) 
querySpec.getDestination();
+  final StorageConnectorProvider storageConnectorProvider = 
exportMSQDestination.getStorageConnectorProvider();
+
+  final ResultFormat resultFormat = exportMSQDestination.getResultFormat();
+
+  // If the statement is a 'REPLACE' statement, delete the existing files 
at the destination.
+  if (exportMSQDestination.getReplaceTimeChunks() != null) {
+if 
(Intervals.ONLY_ETERNITY.equals(exportMSQDestination.getReplaceTimeChunks())) {
+  StorageConnector storageConnector = storageConnectorProvider.get();
+  try {
+storageConnector.deleteRecursively("");

Review Comment:
   Instead of remove the delete call, we can add a directory like 
`druid-export/` to the path, and delete it for REPLACE. This way, we would not 
delete any other files unless the user stores them in a directory with the same 
name.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org