Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]

2024-01-27 Thread via GitHub


pvillard31 closed pull request #8302: NIFI-12673 - ExtractKeywords processor
URL: https://github.com/apache/nifi/pull/8302


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]

2024-01-26 Thread via GitHub


exceptionfactory commented on code in PR #8302:
URL: https://github.com/apache/nifi/pull/8302#discussion_r1467668838


##
nifi-python-extensions/nifi-text-embeddings-module/src/main/python/ExtractKeywords.py:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import yake
+
+from nifiapi.flowfiletransform import FlowFileTransform, 
FlowFileTransformResult
+from nifiapi.properties import PropertyDescriptor, StandardValidators, 
ExpressionLanguageScope
+
+class ExtractKeywords(FlowFileTransform):
+class Java:
+implements = ["org.apache.nifi.python.processor.FlowFileTransform"]
+
+class ProcessorDetails:
+version = "2.0.0-SNAPSHOT"
+description = """Parses the content of the incoming FlowFile to 
extract the keywords and add those as attributes 
+using the convention 'keyword.i' where i is a number depending on how 
many keywords are extracted. The attribute 
+'keywords.length' will be set with the number of extracted keywords. 
The attribute 'keyword.0' will contain the 
+list of all extracted keywords with the associated score."""
+tags = ["text", "keyword", "vector", "extract"]
+dependencies = ['yake']

Review Comment:
   For reference, I think this highlights a reason for moving Python-based 
components to a separate repository, where there would be a clearer distinction 
between what is standard, and what is optional for project operation.
   
   https://lists.apache.org/thread/nok561sg1dzw3zrott06gkl34hdjxbb3



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]

2024-01-26 Thread via GitHub


exceptionfactory commented on code in PR #8302:
URL: https://github.com/apache/nifi/pull/8302#discussion_r1467655082


##
nifi-python-extensions/nifi-text-embeddings-module/src/main/python/ExtractKeywords.py:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import yake
+
+from nifiapi.flowfiletransform import FlowFileTransform, 
FlowFileTransformResult
+from nifiapi.properties import PropertyDescriptor, StandardValidators, 
ExpressionLanguageScope
+
+class ExtractKeywords(FlowFileTransform):
+class Java:
+implements = ["org.apache.nifi.python.processor.FlowFileTransform"]
+
+class ProcessorDetails:
+version = "2.0.0-SNAPSHOT"
+description = """Parses the content of the incoming FlowFile to 
extract the keywords and add those as attributes 
+using the convention 'keyword.i' where i is a number depending on how 
many keywords are extracted. The attribute 
+'keywords.length' will be set with the number of extracted keywords. 
The attribute 'keyword.0' will contain the 
+list of all extracted keywords with the associated score."""
+tags = ["text", "keyword", "vector", "extract"]
+dependencies = ['yake']

Review Comment:
   Thanks for the reply @pvillard31.
   
   Although we are not including the `yake` library as part of the release 
distribution, it is required to use this component. That puts it in a different 
category than something like JDBC drivers, which can have any type of license 
that the user can accept.
   
   The Apache Licensing guidance as a section on [relying on dependencies with 
prohibited licenses when they support an optional 
feature](https://www.apache.org/legal/resolved.html#optional):
   
   > Optional means that the component is not required for standard use of the 
product or for the product to achieve a desirable level of quality.
   
   > The question to ask yourself in this situation is: "Will the majority of 
users want to use my product without adding the optional components?"
   
   From a general reading, it sounds like that could be construed to allow this 
type of inclusion, but it still seems questionable. Particularly in light of 
the summary question, if a user were to install this component, it would not be 
clear that they were also installing a component licensed under GPLv3 unless 
they analyzed the dependencies after installation. Furthermore, while this 
might be acceptable for components with marginal use, the summary question also 
seems to discourage such inclusion based on the level of use. In other words, 
we would not want to use this as justification for something we expect to be 
widely used.
   
   This sounds like a question that may need further input from Apache, but 
based on this initial reading, I think we should draw a clear line here and 
avoid Category X dependencies, as they are required at runtime, even though we 
are not distributing them directly.
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]

2024-01-25 Thread via GitHub


pvillard31 commented on code in PR #8302:
URL: https://github.com/apache/nifi/pull/8302#discussion_r1467295394


##
nifi-python-extensions/nifi-text-embeddings-module/src/main/python/ExtractKeywords.py:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import yake
+
+from nifiapi.flowfiletransform import FlowFileTransform, 
FlowFileTransformResult
+from nifiapi.properties import PropertyDescriptor, StandardValidators, 
ExpressionLanguageScope
+
+class ExtractKeywords(FlowFileTransform):
+class Java:
+implements = ["org.apache.nifi.python.processor.FlowFileTransform"]
+
+class ProcessorDetails:
+version = "2.0.0-SNAPSHOT"
+description = """Parses the content of the incoming FlowFile to 
extract the keywords and add those as attributes 
+using the convention 'keyword.i' where i is a number depending on how 
many keywords are extracted. The attribute 
+'keywords.length' will be set with the number of extracted keywords. 
The attribute 'keyword.0' will contain the 
+list of all extracted keywords with the associated score."""
+tags = ["text", "keyword", "vector", "extract"]
+dependencies = ['yake']

Review Comment:
   Thanks @exceptionfactory - I noticed this but since we're not including 
anything, I was under the impression that we would not have licensing issues to 
be concerned about with the Python stuff. Do we have a clear position on this?
   
   Also, I agree this library has not been maintained in a while but the 
capability used here is pretty limited. I'm ok to look at other options though.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



Re: [PR] NIFI-12673 - ExtractKeywords processor [nifi]

2024-01-25 Thread via GitHub


exceptionfactory commented on code in PR #8302:
URL: https://github.com/apache/nifi/pull/8302#discussion_r1467031775


##
nifi-python-extensions/nifi-text-embeddings-module/src/main/python/ExtractKeywords.py:
##
@@ -0,0 +1,95 @@
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import yake
+
+from nifiapi.flowfiletransform import FlowFileTransform, 
FlowFileTransformResult
+from nifiapi.properties import PropertyDescriptor, StandardValidators, 
ExpressionLanguageScope
+
+class ExtractKeywords(FlowFileTransform):
+class Java:
+implements = ["org.apache.nifi.python.processor.FlowFileTransform"]
+
+class ProcessorDetails:
+version = "2.0.0-SNAPSHOT"
+description = """Parses the content of the incoming FlowFile to 
extract the keywords and add those as attributes 
+using the convention 'keyword.i' where i is a number depending on how 
many keywords are extracted. The attribute 
+'keywords.length' will be set with the number of extracted keywords. 
The attribute 'keyword.0' will contain the 
+list of all extracted keywords with the associated score."""
+tags = ["text", "keyword", "vector", "extract"]
+dependencies = ['yake']

Review Comment:
   This dependency does not appear to be updated since 2021.
   
   https://pypi.org/project/yake/#history
   
   It is also licensed under GPLv3, which is not compatible with ASLv2.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[PR] NIFI-12673 - ExtractKeywords processor [nifi]

2024-01-25 Thread via GitHub


pvillard31 opened a new pull request, #8302:
URL: https://github.com/apache/nifi/pull/8302

   # Summary
   
   [NIFI-12673](https://issues.apache.org/jira/browse/NIFI-12673) - 
ExtractKeywords processor
   
   # Tracking
   
   Please complete the following tracking steps prior to pull request creation.
   
   ### Issue Tracking
   
   - [ ] [Apache NiFi Jira](https://issues.apache.org/jira/browse/NIFI) issue 
created
   
   ### Pull Request Tracking
   
   - [ ] Pull Request title starts with Apache NiFi Jira issue number, such as 
`NIFI-0`
   - [ ] Pull Request commit message starts with Apache NiFi Jira issue number, 
as such `NIFI-0`
   
   ### Pull Request Formatting
   
   - [ ] Pull Request based on current revision of the `main` branch
   - [ ] Pull Request refers to a feature branch with one commit containing 
changes
   
   # Verification
   
   Please indicate the verification steps performed prior to pull request 
creation.
   
   ### Build
   
   - [ ] Build completed using `mvn clean install -P contrib-check`
 - [ ] JDK 21
   
   ### Licensing
   
   - [ ] New dependencies are compatible with the [Apache License 
2.0](https://apache.org/licenses/LICENSE-2.0) according to the [License 
Policy](https://www.apache.org/legal/resolved.html)
   - [ ] New dependencies are documented in applicable `LICENSE` and `NOTICE` 
files
   
   ### Documentation
   
   - [ ] Documentation formatting appears as expected in rendered files
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@nifi.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org