[jira] [Commented] (NIFI-4550) Add an InferCharacterSet processor
[ https://issues.apache.org/jira/browse/NIFI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786400#comment-17786400 ] ASF subversion and git services commented on NIFI-4550: --- Commit 6c333cdb7e29c53a79a49179862e60b8fe0a3697 in nifi's branch refs/heads/main from EndzeitBegins [ https://gitbox.apache.org/repos/asf?p=nifi.git;h=6c333cdb7e ] NIFI-1874 Added Character Set Detection to IdentifyMimeType NIFI-4550 New Processor not required based on improvements to IdentifyMimeType - Added mime.charset FlowFile attribute when not null for text MIME types This closes #8011 Signed-off-by: David Handermann > Add an InferCharacterSet processor > -- > > Key: NIFI-4550 > URL: https://issues.apache.org/jira/browse/NIFI-4550 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Matt Burgess >Assignee: endzeit >Priority: Minor > Time Spent: 2h 10m > Remaining Estimate: 0h > > Sometimes in a NiFi flow it is not known what character set an incoming flow > file is using. This can make it difficult for downstream processing if the > processors expect a particular charset (whether the user can configure it or > not). There is a ConvertCharacterSet processor, but it expects an explicit > value for Input Character Set, when this might not be known. > I propose an InferCharacterSet processor, which would presumably use some > license-friendly third-party library (there is a discussion > [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream]) > to guess the character set, perhaps adding it as an attribute for use > downstream in ConvertCharacterSet. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (NIFI-4550) Add an InferCharacterSet processor
[ https://issues.apache.org/jira/browse/NIFI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238500#comment-16238500 ] Matt Burgess commented on NIFI-4550: Yes it seems like we could kill both birds under 1874, although I’d recommend a (possibly optional) separate attribute for character set, to avoid needing to parse the mime.type to find it. > Add an InferCharacterSet processor > -- > > Key: NIFI-4550 > URL: https://issues.apache.org/jira/browse/NIFI-4550 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Matt Burgess >Priority: Minor > > Sometimes in a NiFi flow it is not known what character set an incoming flow > file is using. This can make it difficult for downstream processing if the > processors expect a particular charset (whether the user can configure it or > not). There is a ConvertCharacterSet processor, but it expects an explicit > value for Input Character Set, when this might not be known. > I propose an InferCharacterSet processor, which would presumably use some > license-friendly third-party library (there is a discussion > [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream]) > to guess the character set, perhaps adding it as an attribute for use > downstream in ConvertCharacterSet. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (NIFI-4550) Add an InferCharacterSet processor
[ https://issues.apache.org/jira/browse/NIFI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238451#comment-16238451 ] Michael Moser commented on NIFI-4550: - Perhaps somewhat related to NIFI-1874? > Add an InferCharacterSet processor > -- > > Key: NIFI-4550 > URL: https://issues.apache.org/jira/browse/NIFI-4550 > Project: Apache NiFi > Issue Type: New Feature > Components: Extensions >Reporter: Matt Burgess >Priority: Minor > > Sometimes in a NiFi flow it is not known what character set an incoming flow > file is using. This can make it difficult for downstream processing if the > processors expect a particular charset (whether the user can configure it or > not). There is a ConvertCharacterSet processor, but it expects an explicit > value for Input Character Set, when this might not be known. > I propose an InferCharacterSet processor, which would presumably use some > license-friendly third-party library (there is a discussion > [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream]) > to guess the character set, perhaps adding it as an attribute for use > downstream in ConvertCharacterSet. -- This message was sent by Atlassian JIRA (v6.4.14#64029)