[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r770233437 ## File path: docs/configuration/index.md ## @@ -1525,6 +1525,19 @@ The amount of direct memory needed by Druid is at least ensure at least this amount of direct memory is available by providing `-XX:MaxDirectMemorySize=` at the command line. + Indexer Security Configuration Review comment: I find this section confusing because the `Indexer` here means the node type `Indexer not ingestion. Can you move the new docs to [Ingestion Security Configuration](https://github.com/apache/druid/blob/master/docs/configuration/index.md#ingestion-security-configuration)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r556771796 ## File path: extensions-core/azure-extensions/src/test/java/org/apache/druid/data/input/azure/AzureInputSourceTest.java ## @@ -214,6 +216,78 @@ public void test_toString_returnsExpectedString() Assert.assertEquals("AzureInputSource{uris=[], prefixes=[azure://container/blob], objects=[]}", actualToString); } + @Test(expected = IllegalArgumentException.class) + public void test_Deny_All() Review comment: Is this the same as the `testDenyAll()` test below? ## File path: indexing-service/src/main/java/org/apache/druid/indexing/seekablestream/SeekableStreamSamplerSpec.java ## @@ -102,7 +106,7 @@ public SamplerResponse sample() "[spec.ioConfig.inputFormat] is required" ); } - +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: nit: this doesn't look harm because neither `RecordSupplierInputSource` nor `FirehoseFactoryToInputSourceAdaptor` do anything in `validateAllowDenyPrefixList()`. However, it looks confusing since `validateAllowDenyPrefixList()` is not called in `RecordSupplier`. I would suggest to delete this line and add it back when we truly support this security config for stream ingestion. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r556280161 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -232,6 +237,10 @@ public boolean isReady(TaskActionClient taskActionClient) throws Exception throw new UOE("partitionsSpec[%s] is not supported", tuningConfig.getPartitionsSpec().getClass().getName()); } } +InputSource inputSource = getIngestionSchema().getIOConfig().getNonNullInputSource( +getIngestionSchema().getDataSchema().getParser() +); +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: This should be done in `ParallelIndexSupervisorTask.isReady()` as well. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r556275449 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -1044,17 +1048,21 @@ public IndexIOConfig( if (firehoseFactory != null && inputFormat != null) { throw new IAE("Cannot use firehose and inputFormat together. Try using inputSource instead of firehose."); } + if (inputSource != null) { +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: > Just to double check, if i move it to isReady the task will be accepted by the overlord and it will fail when the task is actually tried to be run. I was thinking that if someone is not allowed, we should block the ingestion at that time when the task is submitted at first place. Is that the behavior you are expecting ? @nishantmonu51 `Task.isReady()` is actually called as soon as it is issued to the Overlord. When it returns true, the Overlord changes its state from `waiting` to `pending`, and tries to assign it to one of middleManagers. So, I think it will be effectively the same as what you want to achieve. > isReady is also called multiple times, so not sure if that is any better in terms of number of checks, Good point. I think it won't be not that expensive, but maybe we can check the `validated` state and avoid further checking if it is already validated? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r554239833 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java ## @@ -153,6 +154,9 @@ @JsonIgnore private final RetryPolicyFactory retryPolicyFactory; + @JsonIgnore + private final InputSourceSecurityConfig securityConfig; Review comment: Why not `InputSourceSecurityConfig.ALLOW_ALL`? The system-wide securityConfig is ignored in DruidInputSource anyway. ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -1044,17 +1048,21 @@ public IndexIOConfig( if (firehoseFactory != null && inputFormat != null) { throw new IAE("Cannot use firehose and inputFormat together. Try using inputSource instead of firehose."); } + if (inputSource != null) { +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: I guess you wanted to validate inputSource in the constructor to fail early. However, this seems possible to cause some issues because it will throw exception on validation failure, which in turn making creating an ioConfig instance failed. For example, suppose you had an ingestion spec stored in the metadata store. Then, for some reason, you wanted to update the security config to forbid some URIs which are in the ingestion spec in the metadata store. The update will break reading ingestion specs from metadata store which happens in the Overlord. It will also unnecessarily perform the duplicate check multiple times during ingestion. I think this should be called somewhere in the code path of ingestion, such as `Task.isReady()`. ## File path: extensions-core/hdfs-storage/src/main/java/org/apache/druid/inputsource/hdfs/HdfsInputSource.java ## @@ -214,10 +215,23 @@ public boolean needsFormat() private void cachePathsIfNeeded() throws IOException { if (cachedPaths == null) { - cachedPaths = ImmutableList.copyOf(Preconditions.checkNotNull(getPaths(inputPaths, configuration), "paths")); + Collection paths = Preconditions.checkNotNull(getPaths(inputPaths, configuration), "paths"); + cachedPaths = ImmutableList.copyOf(paths); } } + @Override + public void validateAllowDenyPrefixList(InputSourceSecurityConfig securityConfig) + { +try { + cachePathsIfNeeded(); Review comment: I think you won't probably need to cache paths since this will unnecessarily populate the cache in the Overlord. ## File path: core/src/main/java/org/apache/druid/data/input/AbstractInputSource.java ## @@ -31,13 +31,16 @@ */ public abstract class AbstractInputSource implements InputSource { + private transient boolean validated = false; Review comment: Why `transient`? We don't use `Serializable`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org
[GitHub] [druid] jihoonson commented on a change in pull request #10677: Add URI based allowPrefexList and denyPrefixList
jihoonson commented on a change in pull request #10677: URL: https://github.com/apache/druid/pull/10677#discussion_r554239833 ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/CompactionTask.java ## @@ -153,6 +154,9 @@ @JsonIgnore private final RetryPolicyFactory retryPolicyFactory; + @JsonIgnore + private final InputSourceSecurityConfig securityConfig; Review comment: Why not `InputSourceSecurityConfig.ALLOW_ALL`? The system-wide securityConfig is ignored in DruidInputSource anyway. ## File path: indexing-service/src/main/java/org/apache/druid/indexing/common/task/IndexTask.java ## @@ -1044,17 +1048,21 @@ public IndexIOConfig( if (firehoseFactory != null && inputFormat != null) { throw new IAE("Cannot use firehose and inputFormat together. Try using inputSource instead of firehose."); } + if (inputSource != null) { +inputSource.validateAllowDenyPrefixList(securityConfig); Review comment: I guess you wanted to validate inputSource in the constructor to fail early. However, this seems possible to cause some issues because it will throw exception on validation failure, which in turn making creating an ioConfig instance failed. For example, suppose you had an ingestion spec stored in the metadata store. Then, for some reason, you wanted to update the security config to forbid some URIs which are in the ingestion spec in the metadata store. The update will break reading ingestion specs from metadata store which happens in the Overlord. It will also unnecessarily perform the duplicate check multiple times during ingestion. I think this should be called somewhere in the code path of ingestion, such as `Task.isReady()`. ## File path: extensions-core/hdfs-storage/src/main/java/org/apache/druid/inputsource/hdfs/HdfsInputSource.java ## @@ -214,10 +215,23 @@ public boolean needsFormat() private void cachePathsIfNeeded() throws IOException { if (cachedPaths == null) { - cachedPaths = ImmutableList.copyOf(Preconditions.checkNotNull(getPaths(inputPaths, configuration), "paths")); + Collection paths = Preconditions.checkNotNull(getPaths(inputPaths, configuration), "paths"); + cachedPaths = ImmutableList.copyOf(paths); } } + @Override + public void validateAllowDenyPrefixList(InputSourceSecurityConfig securityConfig) + { +try { + cachePathsIfNeeded(); Review comment: I think you won't probably need to cache paths since this will unnecessarily populate the cache in the Overlord. ## File path: core/src/main/java/org/apache/druid/data/input/AbstractInputSource.java ## @@ -31,13 +31,16 @@ */ public abstract class AbstractInputSource implements InputSource { + private transient boolean validated = false; Review comment: Why `transient`? We don't use `Serializable`. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: commits-unsubscr...@druid.apache.org For additional commands, e-mail: commits-h...@druid.apache.org