Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]
pvillard31 closed pull request #8302: NIFI-12673 - ExtractKeywords processor URL: https://github.com/apache/nifi/pull/8302 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]
exceptionfactory commented on code in PR #8302: URL: https://github.com/apache/nifi/pull/8302#discussion_r1467668838 ## nifi-python-extensions/nifi-text-embeddings-module/src/main/python/ExtractKeywords.py: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import yake + +from nifiapi.flowfiletransform import FlowFileTransform, FlowFileTransformResult +from nifiapi.properties import PropertyDescriptor, StandardValidators, ExpressionLanguageScope + +class ExtractKeywords(FlowFileTransform): +class Java: +implements = ["org.apache.nifi.python.processor.FlowFileTransform"] + +class ProcessorDetails: +version = "2.0.0-SNAPSHOT" +description = """Parses the content of the incoming FlowFile to extract the keywords and add those as attributes +using the convention 'keyword.i' where i is a number depending on how many keywords are extracted. The attribute +'keywords.length' will be set with the number of extracted keywords. The attribute 'keyword.0' will contain the +list of all extracted keywords with the associated score.""" +tags = ["text", "keyword", "vector", "extract"] +dependencies = ['yake'] Review Comment: For reference, I think this highlights a reason for moving Python-based components to a separate repository, where there would be a clearer distinction between what is standard, and what is optional for project operation. https://lists.apache.org/thread/nok561sg1dzw3zrott06gkl34hdjxbb3 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]
exceptionfactory commented on code in PR #8302: URL: https://github.com/apache/nifi/pull/8302#discussion_r1467655082 ## nifi-python-extensions/nifi-text-embeddings-module/src/main/python/ExtractKeywords.py: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import yake + +from nifiapi.flowfiletransform import FlowFileTransform, FlowFileTransformResult +from nifiapi.properties import PropertyDescriptor, StandardValidators, ExpressionLanguageScope + +class ExtractKeywords(FlowFileTransform): +class Java: +implements = ["org.apache.nifi.python.processor.FlowFileTransform"] + +class ProcessorDetails: +version = "2.0.0-SNAPSHOT" +description = """Parses the content of the incoming FlowFile to extract the keywords and add those as attributes +using the convention 'keyword.i' where i is a number depending on how many keywords are extracted. The attribute +'keywords.length' will be set with the number of extracted keywords. The attribute 'keyword.0' will contain the +list of all extracted keywords with the associated score.""" +tags = ["text", "keyword", "vector", "extract"] +dependencies = ['yake'] Review Comment: Thanks for the reply @pvillard31. Although we are not including the `yake` library as part of the release distribution, it is required to use this component. That puts it in a different category than something like JDBC drivers, which can have any type of license that the user can accept. The Apache Licensing guidance as a section on [relying on dependencies with prohibited licenses when they support an optional feature](https://www.apache.org/legal/resolved.html#optional): > Optional means that the component is not required for standard use of the product or for the product to achieve a desirable level of quality. > The question to ask yourself in this situation is: "Will the majority of users want to use my product without adding the optional components?" From a general reading, it sounds like that could be construed to allow this type of inclusion, but it still seems questionable. Particularly in light of the summary question, if a user were to install this component, it would not be clear that they were also installing a component licensed under GPLv3 unless they analyzed the dependencies after installation. Furthermore, while this might be acceptable for components with marginal use, the summary question also seems to discourage such inclusion based on the level of use. In other words, we would not want to use this as justification for something we expect to be widely used. This sounds like a question that may need further input from Apache, but based on this initial reading, I think we should draw a clear line here and avoid Category X dependencies, as they are required at runtime, even though we are not distributing them directly. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]
pvillard31 commented on code in PR #8302: URL: https://github.com/apache/nifi/pull/8302#discussion_r1467295394 ## nifi-python-extensions/nifi-text-embeddings-module/src/main/python/ExtractKeywords.py: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import yake + +from nifiapi.flowfiletransform import FlowFileTransform, FlowFileTransformResult +from nifiapi.properties import PropertyDescriptor, StandardValidators, ExpressionLanguageScope + +class ExtractKeywords(FlowFileTransform): +class Java: +implements = ["org.apache.nifi.python.processor.FlowFileTransform"] + +class ProcessorDetails: +version = "2.0.0-SNAPSHOT" +description = """Parses the content of the incoming FlowFile to extract the keywords and add those as attributes +using the convention 'keyword.i' where i is a number depending on how many keywords are extracted. The attribute +'keywords.length' will be set with the number of extracted keywords. The attribute 'keyword.0' will contain the +list of all extracted keywords with the associated score.""" +tags = ["text", "keyword", "vector", "extract"] +dependencies = ['yake'] Review Comment: Thanks @exceptionfactory - I noticed this but since we're not including anything, I was under the impression that we would not have licensing issues to be concerned about with the Python stuff. Do we have a clear position on this? Also, I agree this library has not been maintained in a while but the capability used here is pretty limited. I'm ok to look at other options though. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]
exceptionfactory commented on code in PR #8302: URL: https://github.com/apache/nifi/pull/8302#discussion_r1467031775 ## nifi-python-extensions/nifi-text-embeddings-module/src/main/python/ExtractKeywords.py: ## @@ -0,0 +1,95 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import yake + +from nifiapi.flowfiletransform import FlowFileTransform, FlowFileTransformResult +from nifiapi.properties import PropertyDescriptor, StandardValidators, ExpressionLanguageScope + +class ExtractKeywords(FlowFileTransform): +class Java: +implements = ["org.apache.nifi.python.processor.FlowFileTransform"] + +class ProcessorDetails: +version = "2.0.0-SNAPSHOT" +description = """Parses the content of the incoming FlowFile to extract the keywords and add those as attributes +using the convention 'keyword.i' where i is a number depending on how many keywords are extracted. The attribute +'keywords.length' will be set with the number of extracted keywords. The attribute 'keyword.0' will contain the +list of all extracted keywords with the associated score.""" +tags = ["text", "keyword", "vector", "extract"] +dependencies = ['yake'] Review Comment: This dependency does not appear to be updated since 2021. https://pypi.org/project/yake/#history It is also licensed under GPLv3, which is not compatible with ASLv2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] NIFI-12673 - ExtractKeywords processor [nifi]
pvillard31 opened a new pull request, #8302: URL: https://github.com/apache/nifi/pull/8302 # Summary [NIFI-12673](https://issues.apache.org/jira/browse/NIFI-12673) - ExtractKeywords processor # Tracking Please complete the following tracking steps prior to pull request creation. ### Issue Tracking - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue created ### Pull Request Tracking - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as `NIFI-0` - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, as such `NIFI-0` ### Pull Request Formatting - [ ] Pull Request based on current revision of the `main` branch - [ ] Pull Request refers to a feature branch with one commit containing changes # Verification Please indicate the verification steps performed prior to pull request creation. ### Build - [ ] Build completed using `mvn clean install -P contrib-check` - [ ] JDK 21 ### Licensing - [ ] New dependencies are compatible with the [Apache License 2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License Policy](https://www.apache.org/legal/resolved.html) - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` files ### Documentation - [ ] Documentation formatting appears as expected in rendered files -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org