[jira] [Commented] (TIKA-4249) EML file is treating it as text file in 2.9.2 version

2024-05-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842745#comment-17842745
 ] 

Tim Allison commented on TIKA-4249:
---

Version numbers for the fix are noted above: 2.9.3 and 3.0.0 (probably 
3.0.0-BETA2 first?). We recently released 2.9.2. Crystal ball is murky on the 
timing of the next 2.x and 3.x releases.

> EML file is treating it as text file in 2.9.2 version
> -
>
> Key: TIKA-4249
> URL: https://issues.apache.org/jira/browse/TIKA-4249
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Fix For: 3.0.0, 2.9.3
>
>
> We recently upgrade from 2.9.0 to 2.9.2. In that we found that the attached 
> file is treating it as text file instead of email file. please look into this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-4249) EML file is treating it as text file in 2.9.2 version

2024-05-01 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842716#comment-17842716
 ] 

Tika User edited comment on TIKA-4249 at 5/1/24 4:38 PM:
-

[~tallison]  May I know when these changes available, like to know the version 
number 


was (Author: vamsi452):
May I know when these changes available, like to know the version number 

> EML file is treating it as text file in 2.9.2 version
> -
>
> Key: TIKA-4249
> URL: https://issues.apache.org/jira/browse/TIKA-4249
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Fix For: 3.0.0, 2.9.3
>
>
> We recently upgrade from 2.9.0 to 2.9.2. In that we found that the attached 
> file is treating it as text file instead of email file. please look into this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4249) EML file is treating it as text file in 2.9.2 version

2024-05-01 Thread Tika User (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842716#comment-17842716
 ] 

Tika User commented on TIKA-4249:
-

May I know when these changes available, like to know the version number 

> EML file is treating it as text file in 2.9.2 version
> -
>
> Key: TIKA-4249
> URL: https://issues.apache.org/jira/browse/TIKA-4249
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Fix For: 3.0.0, 2.9.3
>
>
> We recently upgrade from 2.9.0 to 2.9.2. In that we found that the attached 
> file is treating it as text file instead of email file. please look into this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4249) EML file is treating it as text file in 2.9.2 version

2024-05-01 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842627#comment-17842627
 ] 

Hudson commented on TIKA-4249:
--

SUCCESS: Integrated in Jenkins build Tika » tika-main-jdk11 #1619 (See 
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk11/1619/])
TIKA-4249 -- allow utf8 bom to at start of rfc822 detection (#1739) (github: 
[https://github.com/apache/tika/commit/9f8a2f58b20c3c71df46531d28a3702f5146cf51])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
* (edit) tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java


> EML file is treating it as text file in 2.9.2 version
> -
>
> Key: TIKA-4249
> URL: https://issues.apache.org/jira/browse/TIKA-4249
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Fix For: 3.0.0, 2.9.3
>
>
> We recently upgrade from 2.9.0 to 2.9.2. In that we found that the attached 
> file is treating it as text file instead of email file. please look into this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-01 Thread Nicholas DiPiazza (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842622#comment-17842622
 ] 

Nicholas DiPiazza commented on TIKA-4243:
-

Kinda seems like it might belong in tika-config module 

> tika configuration overhaul
> ---
>
> Key: TIKA-4243
> URL: https://issues.apache.org/jira/browse/TIKA-4243
> Project: Tika
>  Issue Type: New Feature
>  Components: config
>Affects Versions: 3.0.0
>Reporter: Nicholas DiPiazza
>Priority: Major
>
> In 3.0.0 when dealing with Tika, it would greatly help to have a Typed 
> Configuration schema. 
> In 3.x can we remove the old way of doing configs and replace with Json 
> Schema?
> Json Schema can be converted to Pojos using a maven plugin 
> [https://github.com/joelittlejohn/jsonschema2pojo]
> This automatically creates a Java Pojo model we can use for the configs. 
> This can allow for the legacy tika-config XML to be read and converted to the 
> new pojos easily using an XML mapper so that users don't have to use JSON 
> configurations yet if they do not want.
> When complete, configurations can be set as XML, JSON or YAML
> tika-config.xml
> tika-config.json
> tika-config.yaml
> Replace all instances of tika config annotations that used the old syntax, 
> and replace with the Pojo model serialized from the xml/json/yaml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-05-01 Thread Nicholas DiPiazza (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842622#comment-17842622
 ] 

Nicholas DiPiazza edited comment on TIKA-4243 at 5/1/24 12:34 PM:
--

Kinda seems like it might belong in a new  tika-config module 


was (Author: ndipiazza):
Kinda seems like it might belong in tika-config module 

> tika configuration overhaul
> ---
>
> Key: TIKA-4243
> URL: https://issues.apache.org/jira/browse/TIKA-4243
> Project: Tika
>  Issue Type: New Feature
>  Components: config
>Affects Versions: 3.0.0
>Reporter: Nicholas DiPiazza
>Priority: Major
>
> In 3.0.0 when dealing with Tika, it would greatly help to have a Typed 
> Configuration schema. 
> In 3.x can we remove the old way of doing configs and replace with Json 
> Schema?
> Json Schema can be converted to Pojos using a maven plugin 
> [https://github.com/joelittlejohn/jsonschema2pojo]
> This automatically creates a Java Pojo model we can use for the configs. 
> This can allow for the legacy tika-config XML to be read and converted to the 
> new pojos easily using an XML mapper so that users don't have to use JSON 
> configurations yet if they do not want.
> When complete, configurations can be set as XML, JSON or YAML
> tika-config.xml
> tika-config.json
> tika-config.yaml
> Replace all instances of tika config annotations that used the old syntax, 
> and replace with the Pojo model serialized from the xml/json/yaml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842605#comment-17842605
 ] 

Tim Allison commented on TIKA-4243:
---

Do we put it in tika-serialization or a new module?

> tika configuration overhaul
> ---
>
> Key: TIKA-4243
> URL: https://issues.apache.org/jira/browse/TIKA-4243
> Project: Tika
>  Issue Type: New Feature
>  Components: config
>Affects Versions: 3.0.0
>Reporter: Nicholas DiPiazza
>Priority: Major
>
> In 3.0.0 when dealing with Tika, it would greatly help to have a Typed 
> Configuration schema. 
> In 3.x can we remove the old way of doing configs and replace with Json 
> Schema?
> Json Schema can be converted to Pojos using a maven plugin 
> [https://github.com/joelittlejohn/jsonschema2pojo]
> This automatically creates a Java Pojo model we can use for the configs. 
> This can allow for the legacy tika-config XML to be read and converted to the 
> new pojos easily using an XML mapper so that users don't have to use JSON 
> configurations yet if they do not want.
> When complete, configurations can be set as XML, JSON or YAML
> tika-config.xml
> tika-config.json
> tika-config.yaml
> Replace all instances of tika config annotations that used the old syntax, 
> and replace with the Pojo model serialized from the xml/json/yaml.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4249) EML file is treating it as text file in 3.9.2 version

2024-05-01 Thread Tim Allison (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842604#comment-17842604
 ] 

Tim Allison commented on TIKA-4249:
---

The example file shared was actually kind of weird. I looked like an mbox file 
but didn't have the "From " headers. It was just a concatenation of regular 
rfc822 with new lines between them.

This is now fixed in 2.x and 3.x. Thank you for opening this issue [~Vamsi452]!

> EML file is treating it as text file in 3.9.2 version
> -
>
> Key: TIKA-4249
> URL: https://issues.apache.org/jira/browse/TIKA-4249
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Fix For: 3.0.0, 2.9.3
>
>
> We recently upgrade from 2.9.0 to 2.9.2. In that we found that the attached 
> file is treating it as text file instead of email file. please look into this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (TIKA-4249) EML file is treating it as text file in 2.9.2 version

2024-05-01 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-4249:
--
Summary: EML file is treating it as text file in 2.9.2 version  (was: EML 
file is treating it as text file in 3.9.2 version)

> EML file is treating it as text file in 2.9.2 version
> -
>
> Key: TIKA-4249
> URL: https://issues.apache.org/jira/browse/TIKA-4249
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Fix For: 3.0.0, 2.9.3
>
>
> We recently upgrade from 2.9.0 to 2.9.2. In that we found that the attached 
> file is treating it as text file instead of email file. please look into this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (TIKA-4249) EML file is treating it as text file in 3.9.2 version

2024-05-01 Thread Tim Allison (Jira)


 [ 
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-4249.
---
Fix Version/s: 3.0.0
   2.9.3
   Resolution: Fixed

> EML file is treating it as text file in 3.9.2 version
> -
>
> Key: TIKA-4249
> URL: https://issues.apache.org/jira/browse/TIKA-4249
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
> Fix For: 3.0.0, 2.9.3
>
>
> We recently upgrade from 2.9.0 to 2.9.2. In that we found that the attached 
> file is treating it as text file instead of email file. please look into this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (TIKA-4249) EML file is treating it as text file in 3.9.2 version

2024-05-01 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17842603#comment-17842603
 ] 

ASF GitHub Bot commented on TIKA-4249:
--

tballison merged PR #1739:
URL: https://github.com/apache/tika/pull/1739




> EML file is treating it as text file in 3.9.2 version
> -
>
> Key: TIKA-4249
> URL: https://issues.apache.org/jira/browse/TIKA-4249
> Project: Tika
>  Issue Type: Bug
>Reporter: Tika User
>Priority: Blocker
>
> We recently upgrade from 2.9.0 to 2.9.2. In that we found that the attached 
> file is treating it as text file instead of email file. please look into this 
> issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [PR] TIKA-4249 -- allow utf8 bom in rfc822 [tika]

2024-05-01 Thread via GitHub


tballison merged PR #1739:
URL: https://github.com/apache/tika/pull/1739


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@tika.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org