[jira] [Commented] (NIFI-4550) Add an InferCharacterSet processor

2023-11-15 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/NIFI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17786400#comment-17786400
 ] 

ASF subversion and git services commented on NIFI-4550:
---

Commit 6c333cdb7e29c53a79a49179862e60b8fe0a3697 in nifi's branch 
refs/heads/main from EndzeitBegins
[ https://gitbox.apache.org/repos/asf?p=nifi.git;h=6c333cdb7e ]

NIFI-1874 Added Character Set Detection to IdentifyMimeType

NIFI-4550 New Processor not required based on improvements to IdentifyMimeType

- Added mime.charset FlowFile attribute when not null for text MIME types

This closes #8011

Signed-off-by: David Handermann 


> Add an InferCharacterSet processor
> --
>
> Key: NIFI-4550
> URL: https://issues.apache.org/jira/browse/NIFI-4550
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Assignee: endzeit
>Priority: Minor
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> Sometimes in a NiFi flow it is not known what character set an incoming flow 
> file is using. This can make it difficult for downstream processing if the 
> processors expect a particular charset (whether the user can configure it or 
> not). There is a ConvertCharacterSet processor, but it expects an explicit 
> value for Input Character Set, when this might not be known.
> I propose an InferCharacterSet processor, which would presumably use some 
> license-friendly third-party library (there is a discussion 
> [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream])
>  to guess the character set, perhaps adding it as an attribute for use 
> downstream in ConvertCharacterSet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (NIFI-4550) Add an InferCharacterSet processor

2017-11-03 Thread Matt Burgess (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238500#comment-16238500
 ] 

Matt Burgess commented on NIFI-4550:


Yes it seems like we could kill both birds under 1874, although I’d recommend a 
(possibly optional) separate attribute for character set, to avoid needing to 
parse the mime.type to find it.

> Add an InferCharacterSet processor
> --
>
> Key: NIFI-4550
> URL: https://issues.apache.org/jira/browse/NIFI-4550
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Minor
>
> Sometimes in a NiFi flow it is not known what character set an incoming flow 
> file is using. This can make it difficult for downstream processing if the 
> processors expect a particular charset (whether the user can configure it or 
> not). There is a ConvertCharacterSet processor, but it expects an explicit 
> value for Input Character Set, when this might not be known.
> I propose an InferCharacterSet processor, which would presumably use some 
> license-friendly third-party library (there is a discussion 
> [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream])
>  to guess the character set, perhaps adding it as an attribute for use 
> downstream in ConvertCharacterSet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (NIFI-4550) Add an InferCharacterSet processor

2017-11-03 Thread Michael Moser (JIRA)

[ 
https://issues.apache.org/jira/browse/NIFI-4550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16238451#comment-16238451
 ] 

Michael Moser commented on NIFI-4550:
-

Perhaps somewhat related to NIFI-1874?

> Add an InferCharacterSet processor
> --
>
> Key: NIFI-4550
> URL: https://issues.apache.org/jira/browse/NIFI-4550
> Project: Apache NiFi
>  Issue Type: New Feature
>  Components: Extensions
>Reporter: Matt Burgess
>Priority: Minor
>
> Sometimes in a NiFi flow it is not known what character set an incoming flow 
> file is using. This can make it difficult for downstream processing if the 
> processors expect a particular charset (whether the user can configure it or 
> not). There is a ConvertCharacterSet processor, but it expects an explicit 
> value for Input Character Set, when this might not be known.
> I propose an InferCharacterSet processor, which would presumably use some 
> license-friendly third-party library (there is a discussion 
> [here|https://stackoverflow.com/questions/499010/java-how-to-determine-the-correct-charset-encoding-of-a-stream])
>  to guess the character set, perhaps adding it as an attribute for use 
> downstream in ConvertCharacterSet.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)