[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17347109#comment-17347109 ] Bill SAndman commented on NIFI-4360: I believe this Jira can be closed in light of NIFI-7259 , and associated JIRAS > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna >Priority: Major > Labels: adls, azure, hdfs > Original Estimate: 336h > Time Spent: 10m > Remaining Estimate: 335h 50m > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16628876#comment-16628876 ] ASF GitHub Bot commented on NIFI-4360: -- Github user mohitgargk commented on the issue: https://github.com/apache/nifi/pull/2158 Hi @milanchandna, was wondering if you are still working on this. I need this feature for my use case and was going the same path as you. Seems you are almost there. In case you are working on it, Can you please try syncing your fork and rebase your commits. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna >Priority: Major > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408541#comment-16408541 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on the issue: https://github.com/apache/nifi/pull/2158 @jzonthemtn @jzonthemtn @pvillard31 Please review. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna >Priority: Major > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16408514#comment-16408514 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on the issue: https://github.com/apache/nifi/pull/2158 Can I please get some feedback here. Thanks. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna >Priority: Major > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253324#comment-16253324 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on the issue: https://github.com/apache/nifi/pull/2158 @joewitt, @jzonthemtn @pvillard31 - Hey, did you guys got a chance to look at this? > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196630#comment-16196630 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r143404820 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ListADLSFile.java --- @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.DirectoryEntry; +import com.microsoft.azure.datalake.store.DirectoryEntryType; +import org.apache.nifi.annotation.behavior.*; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.state.Scope; +import org.apache.nifi.components.state.StateMap; +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.*; +import java.util.concurrent.TimeUnit; + +import static org.apache.nifi.processors.adls.ADLSConstants.ACCOUNT_NAME; +import static org.apache.nifi.processors.adls.ADLSConstants.REL_SUCCESS; + +@TriggerSerially +@TriggerWhenEmpty +@InputRequirement(InputRequirement.Requirement.INPUT_FORBIDDEN) +@Tags({"hadoop", "ADLS", "get", "list", "ingest", "ingress", "source", "filesystem"}) +@CapabilityDescription("Retrieves a listing of files from ADLS. For each file that is " ++ "listed in ADLS, this processor creates a FlowFile that represents the ADLS file to be fetched in conjunction with FetchADLSFile. This Processor is " ++ "designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left " ++ "off without duplicating all of the data.") +@WritesAttributes({ +@WritesAttribute(attribute="filename", description="The name of the file that was read from ADLS"), +@WritesAttribute(attribute="path", description="The path is set to the path of the file on ADLS"), +@WritesAttribute(attribute="adls.owner", description="The user that owns the file"), +@WritesAttribute(attribute="adls.group", description="The group that owns the file"), +@WritesAttribute(attribute="adls.lastModified", description="The timestamp of when the file was last modified, as milliseconds since midnight Jan 1, 1970 UTC"), +@WritesAttribute(attribute="adls.length", description="The number of bytes in the file"), +@WritesAttribute(attribute="adls.expirytime", description="The number of bytes in the file"), +@WritesAttribute(attribute="adls.permissions", description="The permissions for the file in ADLS. This is formatted as 3 characters for the owner, " ++ "3 for the group, and 3 for other users. For example rw-rw-r--") +}) +@Stateful(scopes = Scope.CLUSTER, description = "After performing a listing of ADLS files, the latest timestamp of all the files transferred is stored. " ++ "This allows the Processor to list only files that have been added or modified after " ++ "this date the next time that the Processor is run, without having to
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196631#comment-16196631 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r143404854 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/PutADLSFile.java --- @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLFileOutputStream; +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.IfExists; +import com.microsoft.azure.datalake.store.acl.AclEntry; +import org.apache.nifi.annotation.behavior.*; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.AllowableValue; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; +import org.apache.nifi.util.StopWatch; +import org.apache.nifi.util.StringUtils; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static org.apache.nifi.processors.adls.ADLSConstants.*; + +@InputRequirement(InputRequirement.Requirement.INPUT_REQUIRED) +@Tags({"hadoop", "ADLS", "put", "copy", "filesystem", "restricted"}) --- End diff -- done. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196627#comment-16196627 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r143404758 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ADLSConstants.java --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.util.StandardValidators; + + +public class ADLSConstants { + +public static final int CHUNK_SIZE_IN_BYTES = 400; + +public static final PropertyDescriptor PATH_NAME = new PropertyDescriptor.Builder() +.name("Path") +.description("Path for file in Azure Data Lake, e.g. /adlshome/") +.required(true) +.expressionLanguageSupported(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor RETRY_ON_FAIL = new PropertyDescriptor.Builder() +.name("Overwrite policy") +.description("How many times to retry if read fails per chunk read, defaults to 3") +.required(true) +//TODO add Integer validator --- End diff -- property wasn't being used anymore, removed it. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16196628#comment-16196628 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r143404785 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ListADLSFile.java --- @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.DirectoryEntry; +import com.microsoft.azure.datalake.store.DirectoryEntryType; +import org.apache.nifi.annotation.behavior.*; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.state.Scope; +import org.apache.nifi.components.state.StateMap; +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.*; +import java.util.concurrent.TimeUnit; + +import static org.apache.nifi.processors.adls.ADLSConstants.ACCOUNT_NAME; +import static org.apache.nifi.processors.adls.ADLSConstants.REL_SUCCESS; + +@TriggerSerially +@TriggerWhenEmpty +@InputRequirement(InputRequirement.Requirement.INPUT_FORBIDDEN) +@Tags({"hadoop", "ADLS", "get", "list", "ingest", "ingress", "source", "filesystem"}) --- End diff -- done. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16187500#comment-16187500 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r142037960 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/PutADLSFile.java --- @@ -0,0 +1,282 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLFileOutputStream; +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.IfExists; +import com.microsoft.azure.datalake.store.acl.AclEntry; +import org.apache.nifi.annotation.behavior.InputRequirement; +import org.apache.nifi.annotation.behavior.ReadsAttribute; +import org.apache.nifi.annotation.behavior.ReadsAttributes; +import org.apache.nifi.annotation.behavior.WritesAttribute; +import org.apache.nifi.annotation.behavior.WritesAttributes; +import org.apache.nifi.annotation.behavior.Restricted; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.SeeAlso; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.AllowableValue; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; +import org.apache.nifi.util.StopWatch; +import org.apache.nifi.util.StringUtils; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static org.apache.nifi.processors.adls.ADLSConstants.ERR_ACL_ENTRY; +import static org.apache.nifi.processors.adls.ADLSConstants.ERR_FLOWFILE_CORE_ATTR_FILENAME; +import static org.apache.nifi.processors.adls.ADLSConstants.REL_FAILURE; +import static org.apache.nifi.processors.adls.ADLSConstants.FILE_NAME_ATTRIBUTE; +import static org.apache.nifi.processors.adls.ADLSConstants.CHUNK_SIZE_IN_BYTES; +import static org.apache.nifi.processors.adls.ADLSConstants.ADLS_FILE_PATH_ATTRIBUTE; +import static org.apache.nifi.processors.adls.ADLSConstants.REL_SUCCESS; + +@InputRequirement(InputRequirement.Requirement.INPUT_REQUIRED) +@Tags({"azure", "hadoop", "ADLS", "put", "egress", "copy", "filesystem", "restricted"}) +@CapabilityDescription("Write FlowFile data to Azure Data Lake Store (ADLS)") +@SeeAlso({ListADLSFile.class, FetchADLSFile.class}) +@ReadsAttributes({ +@ReadsAttribute(attribute = "filename", description = "The name of the file written to ADLS comes from the value of this attribute."), +}) +@WritesAttributes({ +@WritesAttribute(attribute = "filename", description = "The name of the file written to ADLS is stored in this attribute."), +@WritesAttribute(attribute = "absolute.adls.path", description = "The absolute path to the file on ADLS is stored in this attribute.") +}) +@Restricted("Provides operator the ability to write to any file that NiFi has access to in ADLS.") +public class PutADLSFile extends ADLSAbstractProcessor { + +private static final String PART_FILE_EXTENSION = ".nifipart"; +private static final String REPLACE_RESOLUTION = "replace"; +private static final
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176476#comment-16176476 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140503820 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/test/java/org/apache/nifi/processors/adls/TestPutADLSFile.java --- @@ -0,0 +1,522 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.ADLStoreOptions; +import okhttp3.mockwebserver.MockResponse; +import okhttp3.mockwebserver.MockWebServer; +import okhttp3.mockwebserver.QueueDispatcher; +import okhttp3.mockwebserver.RecordedRequest; +import org.apache.nifi.processor.Processor; +import org.apache.nifi.reporting.InitializationException; +import org.apache.nifi.util.TestRunner; +import org.apache.nifi.util.TestRunners; +import org.hamcrest.CoreMatchers; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.Map; +import java.util.concurrent.TimeUnit; + + +public class TestPutADLSFile { + +private MockWebServer server; +private ADLStoreClient client; +private Processor processor; +private TestRunner runner; +MapfileAttributes; + +@Before +public void init() throws IOException, InitializationException { +server = new MockWebServer(); +QueueDispatcher dispatcher = new QueueDispatcher(); +dispatcher.setFailFast(new MockResponse().setResponseCode(400)); +server.setDispatcher(dispatcher); +server.start(); +String accountFQDN = server.getHostName() + ":" + server.getPort(); +String dummyToken = "testDummyAadToken"; + +client = ADLStoreClient.createClient(accountFQDN, dummyToken); +client.setOptions(new ADLStoreOptions().setInsecureTransport()); + +processor = new PutADLSWithMockClient(client); + +runner = TestRunners.newTestRunner(processor); + +runner.setProperty(ADLSConstants.ACCOUNT_NAME, accountFQDN); +runner.setProperty(ADLSConstants.CLIENT_ID, "foobar"); +runner.setProperty(ADLSConstants.CLIENT_SECRET, "foobar"); +runner.setProperty(ADLSConstants.AUTH_TOKEN_ENDPOINT, "foobar"); +runner.setProperty(PutADLSFile.DIRECTORY, "/sample/"); +runner.setProperty(PutADLSFile.CONFLICT_RESOLUTION, PutADLSFile.FAIL_RESOLUTION_AV); +runner.removeProperty(PutADLSFile.UMASK); +runner.removeProperty(PutADLSFile.ACL); + +fileAttributes = new HashMap<>(); +fileAttributes.put("filename", "sample.txt"); +fileAttributes.put("path", "/root/"); +} + +@Test +public void testPutConflictFail() throws IOException, InterruptedException { + +//for create call +server.enqueue(new MockResponse().setResponseCode(200)); +//for append call(internal call while writing to stream) +server.enqueue(new MockResponse().setResponseCode(200)); +//for rename call(since its fail conflict) +server.enqueue(new MockResponse().setResponseCode(200).setBody("{\"boolean\":true}")); +//for delete call(removing temp file) +server.enqueue(new MockResponse().setResponseCode(200)); + +//String bodySimpleFileContent = new String(Files.readAllBytes(Paths.get(bodySimpleFileName))); --- End diff -- yes, removed. > Add support for Azure Data Lake Store
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176475#comment-16176475 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140503615 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/PutADLSFile.java --- @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLFileOutputStream; +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.IfExists; +import com.microsoft.azure.datalake.store.acl.AclEntry; +import org.apache.nifi.annotation.behavior.*; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.AllowableValue; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; +import org.apache.nifi.util.StopWatch; +import org.apache.nifi.util.StringUtils; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static org.apache.nifi.processors.adls.ADLSConstants.*; + +@InputRequirement(InputRequirement.Requirement.INPUT_REQUIRED) +@Tags({"hadoop", "ADLS", "put", "copy", "filesystem", "restricted"}) +@CapabilityDescription("Write FlowFile data to Azure Data Lake Store (ADLS)") +@ReadsAttributes({ +@ReadsAttribute(attribute = "filename", description = "The name of the file written to ADLS comes from the value of this attribute."), +@ReadsAttribute(attribute = "filepath", description = "The relative path to the file on ADLS comes from the value of this attribute.") +}) +@WritesAttributes({ +@WritesAttribute(attribute = "filename", description = "The name of the file written to ADLS is stored in this attribute."), +@WritesAttribute(attribute = "absolute.adls.path", description = "The absolute path to the file on ADLS is stored in this attribute.") +}) +@Restricted("Provides operator the ability to write to any file that NiFi has access to in ADLS.") +public class PutADLSFile extends ADLSAbstractProcessor { + +private static final String PART_FILE_EXTENSION = ".nifipart"; +private static final String REPLACE_RESOLUTION = "replace"; +private static final String FAIL_RESOLUTION = "fail"; +private static final String APPEND_RESOLUTION = "append"; + +protected static final AllowableValue REPLACE_RESOLUTION_AV = new AllowableValue(REPLACE_RESOLUTION, +REPLACE_RESOLUTION, "Replaces the existing file if any."); +protected static final AllowableValue FAIL_RESOLUTION_AV = new AllowableValue(FAIL_RESOLUTION, FAIL_RESOLUTION, +"Penalizes the flow file and routes it to failure."); +protected static final AllowableValue APPEND_RESOLUTION_AV = new AllowableValue(APPEND_RESOLUTION, APPEND_RESOLUTION, +"Appends to the existing file if any, creates a new file otherwise."); + +// properties +protected static final PropertyDescriptor CONFLICT_RESOLUTION = new PropertyDescriptor.Builder() +
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176472#comment-16176472 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140503030 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ADLSConstants.java --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.util.StandardValidators; + + +public class ADLSConstants { + +public static final int CHUNK_SIZE_IN_BYTES = 400; + +public static final PropertyDescriptor PATH_NAME = new PropertyDescriptor.Builder() +.name("Path") +.description("Path for file in Azure Data Lake, e.g. /adlshome/") +.required(true) +.expressionLanguageSupported(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor RETRY_ON_FAIL = new PropertyDescriptor.Builder() +.name("Overwrite policy") +.description("How many times to retry if read fails per chunk read, defaults to 3") +.required(true) +//TODO add Integer validator +.defaultValue("3") +.build(); + +public static final PropertyDescriptor ACCOUNT_NAME = new PropertyDescriptor.Builder() +.name("Account FQDN") +.description("Azure account fully qualified domain name eg: accountname.azuredatalakestore.net") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor CLIENT_ID = new PropertyDescriptor.Builder() +.name("Client ID") +.description("Azure client ID") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor CLIENT_SECRET = new PropertyDescriptor.Builder() +.name("Client secret") +.description("Azure client secret") +.required(true) +.sensitive(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor AUTH_TOKEN_ENDPOINT = new PropertyDescriptor.Builder() +.name("Auth token endpoint") +.description("Azure client secret") --- End diff -- Removed it, wasn't being used. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176249#comment-16176249 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on the issue: https://github.com/apache/nifi/pull/2158 @milanchandna - you can go in ``nifi-nar-bundles/nifi-azure-bundle`` and build from this directory ``mvn clean install -Pcontrib-check`` > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176246#comment-16176246 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on the issue: https://github.com/apache/nifi/pull/2158 Hey @pvillard31 Thanks for comments, looking into them. Currently when I am running full build, I am getting failures in TestMinimalLockingWriteAheadLog. testRecoverFileThatHasTrailingNULBytesAndTruncation(org.wali.TestMinimalLockingWriteAheadLog) And because of this not able to reliably detect failures in my own module. Any suggestion to skip the above failures? -Milan. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176178#comment-16176178 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140449455 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/test/java/org/apache/nifi/processors/adls/TestPutADLSFile.java --- @@ -0,0 +1,522 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.ADLStoreOptions; +import okhttp3.mockwebserver.MockResponse; +import okhttp3.mockwebserver.MockWebServer; +import okhttp3.mockwebserver.QueueDispatcher; +import okhttp3.mockwebserver.RecordedRequest; +import org.apache.nifi.processor.Processor; +import org.apache.nifi.reporting.InitializationException; +import org.apache.nifi.util.TestRunner; +import org.apache.nifi.util.TestRunners; +import org.hamcrest.CoreMatchers; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.Map; +import java.util.concurrent.TimeUnit; + + +public class TestPutADLSFile { + +private MockWebServer server; +private ADLStoreClient client; +private Processor processor; +private TestRunner runner; +MapfileAttributes; + +@Before +public void init() throws IOException, InitializationException { +server = new MockWebServer(); +QueueDispatcher dispatcher = new QueueDispatcher(); +dispatcher.setFailFast(new MockResponse().setResponseCode(400)); +server.setDispatcher(dispatcher); +server.start(); +String accountFQDN = server.getHostName() + ":" + server.getPort(); +String dummyToken = "testDummyAadToken"; + +client = ADLStoreClient.createClient(accountFQDN, dummyToken); +client.setOptions(new ADLStoreOptions().setInsecureTransport()); + +processor = new PutADLSWithMockClient(client); + +runner = TestRunners.newTestRunner(processor); + +runner.setProperty(ADLSConstants.ACCOUNT_NAME, accountFQDN); +runner.setProperty(ADLSConstants.CLIENT_ID, "foobar"); +runner.setProperty(ADLSConstants.CLIENT_SECRET, "foobar"); +runner.setProperty(ADLSConstants.AUTH_TOKEN_ENDPOINT, "foobar"); +runner.setProperty(PutADLSFile.DIRECTORY, "/sample/"); +runner.setProperty(PutADLSFile.CONFLICT_RESOLUTION, PutADLSFile.FAIL_RESOLUTION_AV); +runner.removeProperty(PutADLSFile.UMASK); +runner.removeProperty(PutADLSFile.ACL); + +fileAttributes = new HashMap<>(); +fileAttributes.put("filename", "sample.txt"); +fileAttributes.put("path", "/root/"); +} + +@Test +public void testPutConflictFail() throws IOException, InterruptedException { + +//for create call +server.enqueue(new MockResponse().setResponseCode(200)); +//for append call(internal call while writing to stream) +server.enqueue(new MockResponse().setResponseCode(200)); +//for rename call(since its fail conflict) +server.enqueue(new MockResponse().setResponseCode(200).setBody("{\"boolean\":true}")); +//for delete call(removing temp file) +server.enqueue(new MockResponse().setResponseCode(200)); + +//String bodySimpleFileContent = new String(Files.readAllBytes(Paths.get(bodySimpleFileName))); --- End diff -- line to be removed? > Add support for Azure Data Lake
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176177#comment-16176177 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140449340 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/test/java/org/apache/nifi/processors/adls/TestPutADLSFile.java --- @@ -0,0 +1,522 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.ADLStoreOptions; +import okhttp3.mockwebserver.MockResponse; +import okhttp3.mockwebserver.MockWebServer; +import okhttp3.mockwebserver.QueueDispatcher; +import okhttp3.mockwebserver.RecordedRequest; +import org.apache.nifi.processor.Processor; +import org.apache.nifi.reporting.InitializationException; +import org.apache.nifi.util.TestRunner; +import org.apache.nifi.util.TestRunners; +import org.hamcrest.CoreMatchers; +import org.junit.Assert; +import org.junit.Before; +import org.junit.Test; + +import java.io.ByteArrayInputStream; +import java.io.IOException; +import java.io.InputStream; +import java.nio.file.Paths; +import java.util.HashMap; +import java.util.Map; +import java.util.concurrent.TimeUnit; + + +public class TestPutADLSFile { + +private MockWebServer server; +private ADLStoreClient client; +private Processor processor; +private TestRunner runner; +MapfileAttributes; + +@Before +public void init() throws IOException, InitializationException { +server = new MockWebServer(); +QueueDispatcher dispatcher = new QueueDispatcher(); +dispatcher.setFailFast(new MockResponse().setResponseCode(400)); +server.setDispatcher(dispatcher); +server.start(); +String accountFQDN = server.getHostName() + ":" + server.getPort(); +String dummyToken = "testDummyAadToken"; + +client = ADLStoreClient.createClient(accountFQDN, dummyToken); +client.setOptions(new ADLStoreOptions().setInsecureTransport()); + +processor = new PutADLSWithMockClient(client); + +runner = TestRunners.newTestRunner(processor); + +runner.setProperty(ADLSConstants.ACCOUNT_NAME, accountFQDN); +runner.setProperty(ADLSConstants.CLIENT_ID, "foobar"); +runner.setProperty(ADLSConstants.CLIENT_SECRET, "foobar"); +runner.setProperty(ADLSConstants.AUTH_TOKEN_ENDPOINT, "foobar"); +runner.setProperty(PutADLSFile.DIRECTORY, "/sample/"); +runner.setProperty(PutADLSFile.CONFLICT_RESOLUTION, PutADLSFile.FAIL_RESOLUTION_AV); +runner.removeProperty(PutADLSFile.UMASK); +runner.removeProperty(PutADLSFile.ACL); + +fileAttributes = new HashMap<>(); +fileAttributes.put("filename", "sample.txt"); +fileAttributes.put("path", "/root/"); +} + +@Test +public void testPutConflictFail() throws IOException, InterruptedException { + +//for create call +server.enqueue(new MockResponse().setResponseCode(200)); +//for append call(internal call while writing to stream) +server.enqueue(new MockResponse().setResponseCode(200)); +//for rename call(since its fail conflict) +server.enqueue(new MockResponse().setResponseCode(200).setBody("{\"boolean\":true}")); +//for delete call(removing temp file) +server.enqueue(new MockResponse().setResponseCode(200)); + +//String bodySimpleFileContent = new String(Files.readAllBytes(Paths.get(bodySimpleFileName))); +String bodySimpleFileContent = "sample content to be send to external adls
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176174#comment-16176174 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140448400 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ListADLSFile.java --- @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.DirectoryEntry; +import com.microsoft.azure.datalake.store.DirectoryEntryType; +import org.apache.nifi.annotation.behavior.*; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.state.Scope; +import org.apache.nifi.components.state.StateMap; +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.*; +import java.util.concurrent.TimeUnit; + +import static org.apache.nifi.processors.adls.ADLSConstants.ACCOUNT_NAME; +import static org.apache.nifi.processors.adls.ADLSConstants.REL_SUCCESS; + +@TriggerSerially +@TriggerWhenEmpty +@InputRequirement(InputRequirement.Requirement.INPUT_FORBIDDEN) +@Tags({"hadoop", "ADLS", "get", "list", "ingest", "ingress", "source", "filesystem"}) +@CapabilityDescription("Retrieves a listing of files from ADLS. For each file that is " ++ "listed in ADLS, this processor creates a FlowFile that represents the ADLS file to be fetched in conjunction with FetchADLSFile. This Processor is " ++ "designed to run on Primary Node only in a cluster. If the primary node changes, the new Primary Node will pick up where the previous node left " ++ "off without duplicating all of the data.") +@WritesAttributes({ +@WritesAttribute(attribute="filename", description="The name of the file that was read from ADLS"), +@WritesAttribute(attribute="path", description="The path is set to the path of the file on ADLS"), +@WritesAttribute(attribute="adls.owner", description="The user that owns the file"), +@WritesAttribute(attribute="adls.group", description="The group that owns the file"), +@WritesAttribute(attribute="adls.lastModified", description="The timestamp of when the file was last modified, as milliseconds since midnight Jan 1, 1970 UTC"), +@WritesAttribute(attribute="adls.length", description="The number of bytes in the file"), +@WritesAttribute(attribute="adls.expirytime", description="The number of bytes in the file"), +@WritesAttribute(attribute="adls.permissions", description="The permissions for the file in ADLS. This is formatted as 3 characters for the owner, " ++ "3 for the group, and 3 for other users. For example rw-rw-r--") +}) +@Stateful(scopes = Scope.CLUSTER, description = "After performing a listing of ADLS files, the latest timestamp of all the files transferred is stored. " ++ "This allows the Processor to list only files that have been added or modified after " ++ "this date the next time that the Processor is run, without having to store
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176170#comment-16176170 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140446942 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ADLSConstants.java --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.util.StandardValidators; + + +public class ADLSConstants { + +public static final int CHUNK_SIZE_IN_BYTES = 400; + +public static final PropertyDescriptor PATH_NAME = new PropertyDescriptor.Builder() +.name("Path") --- End diff -- Could you use ``.name()`` and ``.displayName()`` ? Name cannot be changed once we release the processor because it'd break backward compatibility on existing workflow. But we can change the display name as it's only used in logs/UI. We usually use a script-friendly naming for the name field (no whitespace, lower case, etc). For instance: adls-overwrite-policy. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176171#comment-16176171 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140447052 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ADLSConstants.java --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.util.StandardValidators; + + +public class ADLSConstants { + +public static final int CHUNK_SIZE_IN_BYTES = 400; + +public static final PropertyDescriptor PATH_NAME = new PropertyDescriptor.Builder() +.name("Path") +.description("Path for file in Azure Data Lake, e.g. /adlshome/") +.required(true) +.expressionLanguageSupported(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor RETRY_ON_FAIL = new PropertyDescriptor.Builder() +.name("Overwrite policy") +.description("How many times to retry if read fails per chunk read, defaults to 3") +.required(true) +//TODO add Integer validator +.defaultValue("3") +.build(); + +public static final PropertyDescriptor ACCOUNT_NAME = new PropertyDescriptor.Builder() +.name("Account FQDN") +.description("Azure account fully qualified domain name eg: accountname.azuredatalakestore.net") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor CLIENT_ID = new PropertyDescriptor.Builder() +.name("Client ID") +.description("Azure client ID") +.required(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor CLIENT_SECRET = new PropertyDescriptor.Builder() +.name("Client secret") +.description("Azure client secret") +.required(true) +.sensitive(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor AUTH_TOKEN_ENDPOINT = new PropertyDescriptor.Builder() +.name("Auth token endpoint") +.description("Azure client secret") --- End diff -- copy/paste description > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176175#comment-16176175 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140447693 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ListADLSFile.java --- @@ -0,0 +1,289 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.DirectoryEntry; +import com.microsoft.azure.datalake.store.DirectoryEntryType; +import org.apache.nifi.annotation.behavior.*; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.components.state.Scope; +import org.apache.nifi.components.state.StateMap; +import org.apache.nifi.distributed.cache.client.DistributedMapCacheClient; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.*; +import java.util.concurrent.TimeUnit; + +import static org.apache.nifi.processors.adls.ADLSConstants.ACCOUNT_NAME; +import static org.apache.nifi.processors.adls.ADLSConstants.REL_SUCCESS; + +@TriggerSerially +@TriggerWhenEmpty +@InputRequirement(InputRequirement.Requirement.INPUT_FORBIDDEN) +@Tags({"hadoop", "ADLS", "get", "list", "ingest", "ingress", "source", "filesystem"}) --- End diff -- Could you add "azure"? > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176176#comment-16176176 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140448507 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/PutADLSFile.java --- @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLFileOutputStream; +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.IfExists; +import com.microsoft.azure.datalake.store.acl.AclEntry; +import org.apache.nifi.annotation.behavior.*; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.AllowableValue; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; +import org.apache.nifi.util.StopWatch; +import org.apache.nifi.util.StringUtils; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static org.apache.nifi.processors.adls.ADLSConstants.*; + +@InputRequirement(InputRequirement.Requirement.INPUT_REQUIRED) +@Tags({"hadoop", "ADLS", "put", "copy", "filesystem", "restricted"}) --- End diff -- Same comment here > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176172#comment-16176172 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140446472 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/ADLSConstants.java --- @@ -0,0 +1,90 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.processor.Relationship; +import org.apache.nifi.processor.util.StandardValidators; + + +public class ADLSConstants { + +public static final int CHUNK_SIZE_IN_BYTES = 400; + +public static final PropertyDescriptor PATH_NAME = new PropertyDescriptor.Builder() +.name("Path") +.description("Path for file in Azure Data Lake, e.g. /adlshome/") +.required(true) +.expressionLanguageSupported(true) +.addValidator(StandardValidators.NON_EMPTY_VALIDATOR) +.build(); + +public static final PropertyDescriptor RETRY_ON_FAIL = new PropertyDescriptor.Builder() +.name("Overwrite policy") +.description("How many times to retry if read fails per chunk read, defaults to 3") +.required(true) +//TODO add Integer validator --- End diff -- TODO? > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176173#comment-16176173 ] ASF GitHub Bot commented on NIFI-4360: -- Github user pvillard31 commented on a diff in the pull request: https://github.com/apache/nifi/pull/2158#discussion_r140449021 --- Diff: nifi-nar-bundles/nifi-azure-bundle/nifi-azure-processors/src/main/java/org/apache/nifi/processors/adls/PutADLSFile.java --- @@ -0,0 +1,267 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.nifi.processors.adls; + +import com.microsoft.azure.datalake.store.ADLFileOutputStream; +import com.microsoft.azure.datalake.store.ADLStoreClient; +import com.microsoft.azure.datalake.store.IfExists; +import com.microsoft.azure.datalake.store.acl.AclEntry; +import org.apache.nifi.annotation.behavior.*; +import org.apache.nifi.annotation.documentation.CapabilityDescription; +import org.apache.nifi.annotation.documentation.Tags; +import org.apache.nifi.components.AllowableValue; +import org.apache.nifi.components.PropertyDescriptor; +import org.apache.nifi.flowfile.FlowFile; +import org.apache.nifi.flowfile.attributes.CoreAttributes; +import org.apache.nifi.processor.ProcessContext; +import org.apache.nifi.processor.ProcessSession; +import org.apache.nifi.processor.ProcessorInitializationContext; +import org.apache.nifi.processor.exception.ProcessException; +import org.apache.nifi.processor.util.StandardValidators; +import org.apache.nifi.util.StopWatch; +import org.apache.nifi.util.StringUtils; + +import java.io.IOException; +import java.nio.file.Path; +import java.nio.file.Paths; +import java.util.ArrayList; +import java.util.Arrays; +import java.util.Collections; +import java.util.List; +import java.util.concurrent.TimeUnit; +import java.util.stream.Collectors; + +import static org.apache.nifi.processors.adls.ADLSConstants.*; + +@InputRequirement(InputRequirement.Requirement.INPUT_REQUIRED) +@Tags({"hadoop", "ADLS", "put", "copy", "filesystem", "restricted"}) +@CapabilityDescription("Write FlowFile data to Azure Data Lake Store (ADLS)") +@ReadsAttributes({ +@ReadsAttribute(attribute = "filename", description = "The name of the file written to ADLS comes from the value of this attribute."), +@ReadsAttribute(attribute = "filepath", description = "The relative path to the file on ADLS comes from the value of this attribute.") +}) +@WritesAttributes({ +@WritesAttribute(attribute = "filename", description = "The name of the file written to ADLS is stored in this attribute."), +@WritesAttribute(attribute = "absolute.adls.path", description = "The absolute path to the file on ADLS is stored in this attribute.") +}) +@Restricted("Provides operator the ability to write to any file that NiFi has access to in ADLS.") +public class PutADLSFile extends ADLSAbstractProcessor { + +private static final String PART_FILE_EXTENSION = ".nifipart"; +private static final String REPLACE_RESOLUTION = "replace"; +private static final String FAIL_RESOLUTION = "fail"; +private static final String APPEND_RESOLUTION = "append"; + +protected static final AllowableValue REPLACE_RESOLUTION_AV = new AllowableValue(REPLACE_RESOLUTION, +REPLACE_RESOLUTION, "Replaces the existing file if any."); +protected static final AllowableValue FAIL_RESOLUTION_AV = new AllowableValue(FAIL_RESOLUTION, FAIL_RESOLUTION, +"Penalizes the flow file and routes it to failure."); +protected static final AllowableValue APPEND_RESOLUTION_AV = new AllowableValue(APPEND_RESOLUTION, APPEND_RESOLUTION, +"Appends to the existing file if any, creates a new file otherwise."); + +// properties +protected static final PropertyDescriptor CONFLICT_RESOLUTION = new PropertyDescriptor.Builder() +
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16176143#comment-16176143 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on the issue: https://github.com/apache/nifi/pull/2158 Removed the test resource files(as there were license issues) and resolved their dependencies. And there were environment specific characters in test cases. Got rid of them. Please check now. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175433#comment-16175433 ] Joseph Witt commented on NIFI-4360: --- java version "1.8.0_141" Java(TM) SE Runtime Environment (build 1.8.0_141-b15) Java HotSpot(TM) 64-Bit Server VM (build 25.141-b15, mixed mode) Apache Maven 3.5.0 (ff8f5e7444045639af65f6095c62210b5713f426; 2017-04-03T15:39:06-04:00) Maven home: /Users/jwitt/Applications/apache-maven-3.5.0 Java version: 1.8.0_141, vendor: Oracle Corporation Java home: /Library/Java/JavaVirtualMachines/jdk1.8.0_141.jdk/Contents/Home/jre Default locale: en_US, platform encoding: UTF-8 OS name: "mac os x", version: "10.12.6", arch: "x86_64", family: "mac"' This is the environmental info for where I ran the build/tests. We need the tests to work on mac/win/lin or skip certain tests in environments they're not built for. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175431#comment-16175431 ] Joseph Witt commented on NIFI-4360: --- Understood on the test files being desirable. I'd like to avoid storing multi MB test file. Better to have it generated for the tests and destroyed. Even if the text seems really benign the rules for tracking the LICENSE and NOTICE also apply to test artifacts. These would need to be licensed in our source LICENSE and even if they're apache we'd cite them. It isn't clear that they are actually ASF licensed to me and in doing a quick google search they do not appear to be original works of this PR. So, lets just make this simple and not pull in test files from elsewhere. Thanks > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175423#comment-16175423 ] ASF GitHub Bot commented on NIFI-4360: -- Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2158 Lots of test failures. /code Tests run: 19, Failures: 3, Errors: 0, Skipped: 0, Time elapsed: 0.847 sec <<< FAILURE! - in org.apache.nifi.processors.adls.TestPutADLSFile testPutConflictReplace(org.apache.nifi.processors.adls.TestPutADLSFile) Time elapsed: 0.102 sec <<< FAILURE! java.lang.AssertionError: Expected: a string containing "%5Csample%5Csample.txt.nifipart" but: was "/webhdfs/v1/sample/sample.txt.nifipart?op=CREATE=DATA=true=true=3834110d-e3b1-41cc-9ff4-022da542ae4b=3834110d-e3b1-41cc-9ff4-022da542ae4b=2016-11-01" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.junit.Assert.assertThat(Assert.java:956) at org.junit.Assert.assertThat(Assert.java:923) at org.apache.nifi.processors.adls.TestPutADLSFile.testPutConflictReplace(TestPutADLSFile.java:161) testPutConflictAppend(org.apache.nifi.processors.adls.TestPutADLSFile) Time elapsed: 0.01 sec <<< FAILURE! java.lang.AssertionError: Expected: a string containing "%5Csample%5Csample.txt.nifipart" but: was "/webhdfs/v1/sample/sample.txt.nifipart?op=CREATE=DATA=true=true=0c697d9a-fa07-47b7-adfd-877b90e15717=0c697d9a-fa07-47b7-adfd-877b90e15717=2016-11-01" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.junit.Assert.assertThat(Assert.java:956) at org.junit.Assert.assertThat(Assert.java:923) at org.apache.nifi.processors.adls.TestPutADLSFile.testPutConflictAppend(TestPutADLSFile.java:215) testPutConflictFail(org.apache.nifi.processors.adls.TestPutADLSFile) Time elapsed: 0 sec <<< FAILURE! java.lang.AssertionError: Expected: a string containing "%5Csample%5Csample.txt.nifipart" but: was "/webhdfs/v1/sample/sample.txt.nifipart?op=CREATE=DATA=true=true=a6fe439c-b2f1-41c2-ba5d-e7d75836d53b=a6fe439c-b2f1-41c2-ba5d-e7d75836d53b=2016-11-01" at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:20) at org.junit.Assert.assertThat(Assert.java:956) at org.junit.Assert.assertThat(Assert.java:923) at org.apache.nifi.processors.adls.TestPutADLSFile.testPutConflictFail(TestPutADLSFile.java:108) Tests run: 7, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 0.249 sec - in org.apache.nifi.ranger.authorization.TestRangerBasePluginWithPolicies Running org.apache.nifi.ranger.authorization.TestRangerNiFiAuthorizer Results : Failed tests: TestListADLSFile.testFlowFileAttributes:233 expected:<[\]test> but was:<[/]test> TestPutADLSFile.testPutConflictAppend:215 Expected: a string containing "%5Csample%5Csample.txt.nifipart" but: was "/webhdfs/v1/sample/sample.txt.nifipart?op=CREATE=DATA=true=true=0c697d9a-fa07-47b7-adfd-877b90e15717=0c697d9a-fa07-47b7-adfd-877b90e15717=2016-11-01" TestPutADLSFile.testPutConflictFail:108 Expected: a string containing "%5Csample%5Csample.txt.nifipart" but: was "/webhdfs/v1/sample/sample.txt.nifipart?op=CREATE=DATA=true=true=a6fe439c-b2f1-41c2-ba5d-e7d75836d53b=a6fe439c-b2f1-41c2-ba5d-e7d75836d53b=2016-11-01" TestPutADLSFile.testPutConflictReplace:161 Expected: a string containing "%5Csample%5Csample.txt.nifipart" but: was "/webhdfs/v1/sample/sample.txt.nifipart?op=CREATE=DATA=true=true=3834110d-e3b1-41cc-9ff4-022da542ae4b=3834110d-e3b1-41cc-9ff4-022da542ae4b=2016-11-01" > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175402#comment-16175402 ] ASF GitHub Bot commented on NIFI-4360: -- Github user milanchandna commented on the issue: https://github.com/apache/nifi/pull/2158 Thanks @joewitt These test resource files contains Lorem Ipsum text. But anyways I will create my own if required but dont want to delete as they are required for an important test case. Thanks. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175379#comment-16175379 ] ASF GitHub Bot commented on NIFI-4360: -- Github user joewitt commented on the issue: https://github.com/apache/nifi/pull/2158 @milanchandna am checking into the dependency tree and L Looks good so far. However, please get rid of the davinci and davinci4MB files. The origin of them is unclear and it looks like it comes from common ADLS test files. We have to cite them as source material in such cases. We're better off just making up our own original test files OR removing them altogether. Thanks > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16175292#comment-16175292 ] Atul Sikaria commented on NIFI-4360: +1 for the ADLS interaction of the code. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4360) Add support for Azure Data Lake Store (ADLS)
[ https://issues.apache.org/jira/browse/NIFI-4360?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16169738#comment-16169738 ] ASF GitHub Bot commented on NIFI-4360: -- GitHub user milanchandna opened a pull request: https://github.com/apache/nifi/pull/2158 NIFI-4360 Adding support for ADLS Processors. Feature includes List, … …Get, Put processors. Till now ADLS interaction was possible using HDFS processors. Now users can ingress and egress their data directly using ADLS processors. It is much simpler and easy. And one lesser layer to go through as well. This will also help users who are not familiar with Hadoop configurations. Thank you for submitting a contribution to Apache NiFi. In order to streamline the review of the contribution we ask you to ensure the following steps have been taken: ### For all changes: - [x] Is there a JIRA ticket associated with this PR? Is it referenced in the commit message? - [x] Does your PR title start with NIFI- where is the JIRA number you are trying to resolve? Pay particular attention to the hyphen "-" character. - [x] Has your PR been rebased against the latest commit within the target branch (typically master)? - [x] Is your initial contribution a single, squashed commit? ### For code changes: - [ ] Have you ensured that the full suite of tests is executed via mvn -Pcontrib-check clean install at the root nifi folder? - There were failures in other modules testRecoverFileThatHasTrailingNULBytesAndTruncation(org.wali.TestMinimalLockingWriteAheadLog) - [x] Have you written or updated unit tests to verify your changes? - [x] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [x] If applicable, have you updated the LICENSE file, including the main LICENSE file under nifi-assembly? - [ ] If applicable, have you updated the NOTICE file, including the main NOTICE file found under nifi-assembly? - [ ] If adding new Properties, have you added .displayName in addition to .name (programmatic access) for each of the new properties? ### For documentation related changes: - [x] Have you ensured that format looks appropriate for the output in which it is rendered? ### Note: Please ensure that once the PR is submitted, you check travis-ci for build issues and submit an update to your PR as soon as possible. You can merge this pull request into a Git repository by running: $ git pull https://github.com/milanchandna/nifi NIFI-4360 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/nifi/pull/2158.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2158 commit 4f449c55c99868bd8b96bc370186503f4928c11f Author: Milan ChandnaDate: 2017-09-18T07:00:17Z NIFI-4360 Adding support for ADLS Processors. Feature includes List, Get, Put processors. > Add support for Azure Data Lake Store (ADLS) > > > Key: NIFI-4360 > URL: https://issues.apache.org/jira/browse/NIFI-4360 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Milan Chandna >Assignee: Milan Chandna > Labels: adls, azure, hdfs > Original Estimate: 336h > Remaining Estimate: 336h > > Currently ingress and egress in ADLS account is possible only using HDFS > processors. > Opening this feature to support separate processors for interaction with ADLS > accounts directly. > Benefits are many like > - simple configuration. > - Helping users not familiar with HDFS > - Helping users who currently are accessing ADLS accounts directly. > - using the ADLS SDK rather than HDFS client, one lesser layer to go through. > Can be achieved by adding separate ADLS processors. -- This message was sent by Atlassian JIRA (v6.4.14#64029)