[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-12-15 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r770233437



##
File path: docs/configuration/index.md
##
@@ -1525,6 +1525,19 @@ The amount of direct memory needed by Druid is at least
 ensure at least this amount of direct memory is available by providing 
`-XX:MaxDirectMemorySize=` at the command
 line.
 
+ Indexer Security Configuration

Review comment:
   I find this section confusing because the `Indexer` here means the node 
type `Indexer not ingestion. Can you move the new docs to [Ingestion Security 
Configuration](https://github.com/apache/druid/blob/master/docs/configuration/index.md#ingestion-security-configuration)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-13 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r556771796



##
File path: 
extensions-core/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureInputSourceTest.java
##
@@ -214,6 +216,78 @@ public void test_toString_returnsExpectedString()
 Assert.assertEquals("AzureInputSource{uris=[], 
prefixes=[azure://container/blob], objects=[]}", actualToString);
   }
 
+  @Test(expected = IllegalArgumentException.class)
+  public void test_Deny_All()

Review comment:
   Is this the same as the `testDenyAll()` test below?

##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamSamplerSpec.java
##
@@ -102,7 +106,7 @@ public SamplerResponse sample()
   "[spec.ioConfig.inputFormat] is required"
   );
 }
-
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   nit: this doesn't look harm because neither `RecordSupplierInputSource` 
nor `FirehoseFactoryToInputSourceAdaptor` do anything in 
`validateAllowDenyPrefixList()`. However, it looks confusing since 
`validateAllowDenyPrefixList()` is not called in `RecordSupplier`. I would 
suggest to delete this line and add it back when we truly support this security 
config for stream ingestion.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r556280161



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -232,6 +237,10 @@ public boolean isReady(TaskActionClient taskActionClient) 
throws Exception
 throw new UOE("partitionsSpec[%s] is not supported", 
tuningConfig.getPartitionsSpec().getClass().getName());
   }
 }
+InputSource inputSource = 
getIngestionSchema().getIOConfig().getNonNullInputSource(
+getIngestionSchema().getDataSchema().getParser()
+);
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   This should be done in `ParallelIndexSupervisorTask.isReady()` as well.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-12 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r556275449



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -1044,17 +1048,21 @@ public IndexIOConfig(
   if (firehoseFactory != null && inputFormat != null) {
 throw new IAE("Cannot use firehose and inputFormat together. Try using 
inputSource instead of firehose.");
   }
+  if (inputSource != null) {
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   > Just to double check, if i move it to isReady the task will be 
accepted by the overlord and it will fail when the task is actually tried to be 
run. I was thinking that if someone is not allowed, we should block the 
ingestion at that time when the task is submitted at first place. Is that the 
behavior you are expecting ?
   
   @nishantmonu51 `Task.isReady()` is actually called as soon as it is issued 
to the Overlord. When it returns true, the Overlord changes its state from 
`waiting` to `pending`, and tries to assign it to one of middleManagers. So, I 
think it will be effectively the same as what you want to achieve.
   
   > isReady is also called multiple times, so not sure if that is any better 
in terms of number of checks,
   
   Good point. I think it won't be not that expensive, but maybe we can check 
the `validated` state and avoid further checking if it is already validated?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-09 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r554239833



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java
##
@@ -153,6 +154,9 @@
   @JsonIgnore
   private final RetryPolicyFactory retryPolicyFactory;
 
+  @JsonIgnore
+  private final InputSourceSecurityConfig securityConfig;

Review comment:
   Why not `InputSourceSecurityConfig.ALLOW_ALL`? The system-wide 
securityConfig is ignored in DruidInputSource anyway.

##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -1044,17 +1048,21 @@ public IndexIOConfig(
   if (firehoseFactory != null && inputFormat != null) {
 throw new IAE("Cannot use firehose and inputFormat together. Try using 
inputSource instead of firehose.");
   }
+  if (inputSource != null) {
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   I guess you wanted to validate inputSource in the constructor to fail 
early. However, this seems possible to cause some issues because it will throw 
exception on validation failure, which in turn making creating an ioConfig 
instance failed. For example, suppose you had an ingestion spec stored in the 
metadata store. Then, for some reason, you wanted to update the security config 
to forbid some URIs which are in the ingestion spec in the metadata store. The 
update will break reading ingestion specs from metadata store which happens in 
the Overlord. It will also unnecessarily perform the duplicate check multiple 
times during ingestion. I think this should be called somewhere in the code 
path of ingestion, such as `Task.isReady()`.

##
File path: 
extensions-core/hdfs-storage/src/main/java/org/apache/druid/inputsource/hdfs/HdfsInputSource.java
##
@@ -214,10 +215,23 @@ public boolean needsFormat()
   private void cachePathsIfNeeded() throws IOException
   {
 if (cachedPaths == null) {
-  cachedPaths = 
ImmutableList.copyOf(Preconditions.checkNotNull(getPaths(inputPaths, 
configuration), "paths"));
+  Collection paths = Preconditions.checkNotNull(getPaths(inputPaths, 
configuration), "paths");
+  cachedPaths = ImmutableList.copyOf(paths);
 }
   }
 
+  @Override
+  public void validateAllowDenyPrefixList(InputSourceSecurityConfig 
securityConfig)
+  {
+try {
+  cachePathsIfNeeded();

Review comment:
   I think you won't probably need to cache paths since this will 
unnecessarily populate the cache in the Overlord.

##
File path: 
core/src/main/java/org/apache/druid/data/input/AbstractInputSource.java
##
@@ -31,13 +31,16 @@
  */
 public abstract class AbstractInputSource implements InputSource
 {
+  private transient boolean validated = false;

Review comment:
   Why `transient`? We don't use `Serializable`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org



[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList

2021-01-08 Thread GitBox


jihoonson commented on a change in pull request #10677:
URL: https://github.com/apache/druid/pull/10677#discussion_r554239833



##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java
##
@@ -153,6 +154,9 @@
   @JsonIgnore
   private final RetryPolicyFactory retryPolicyFactory;
 
+  @JsonIgnore
+  private final InputSourceSecurityConfig securityConfig;

Review comment:
   Why not `InputSourceSecurityConfig.ALLOW_ALL`? The system-wide 
securityConfig is ignored in DruidInputSource anyway.

##
File path: 
indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java
##
@@ -1044,17 +1048,21 @@ public IndexIOConfig(
   if (firehoseFactory != null && inputFormat != null) {
 throw new IAE("Cannot use firehose and inputFormat together. Try using 
inputSource instead of firehose.");
   }
+  if (inputSource != null) {
+inputSource.validateAllowDenyPrefixList(securityConfig);

Review comment:
   I guess you wanted to validate inputSource in the constructor to fail 
early. However, this seems possible to cause some issues because it will throw 
exception on validation failure, which in turn making creating an ioConfig 
instance failed. For example, suppose you had an ingestion spec stored in the 
metadata store. Then, for some reason, you wanted to update the security config 
to forbid some URIs which are in the ingestion spec in the metadata store. The 
update will break reading ingestion specs from metadata store which happens in 
the Overlord. It will also unnecessarily perform the duplicate check multiple 
times during ingestion. I think this should be called somewhere in the code 
path of ingestion, such as `Task.isReady()`.

##
File path: 
extensions-core/hdfs-storage/src/main/java/org/apache/druid/inputsource/hdfs/HdfsInputSource.java
##
@@ -214,10 +215,23 @@ public boolean needsFormat()
   private void cachePathsIfNeeded() throws IOException
   {
 if (cachedPaths == null) {
-  cachedPaths = 
ImmutableList.copyOf(Preconditions.checkNotNull(getPaths(inputPaths, 
configuration), "paths"));
+  Collection paths = Preconditions.checkNotNull(getPaths(inputPaths, 
configuration), "paths");
+  cachedPaths = ImmutableList.copyOf(paths);
 }
   }
 
+  @Override
+  public void validateAllowDenyPrefixList(InputSourceSecurityConfig 
securityConfig)
+  {
+try {
+  cachePathsIfNeeded();

Review comment:
   I think you won't probably need to cache paths since this will 
unnecessarily populate the cache in the Overlord.

##
File path: 
core/src/main/java/org/apache/druid/data/input/AbstractInputSource.java
##
@@ -31,13 +31,16 @@
  */
 public abstract class AbstractInputSource implements InputSource
 {
+  private transient boolean validated = false;

Review comment:
   Why `transient`? We don't use `Serializable`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org
For additional commands, e-mail: commits-h...@druid.apache.org