[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction

2016-09-20 Thread Akash Sudhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash Sudhakar updated TIKA-2077:
-
Priority: Major  (was: Minor)

> Special character extracted as  in docx file extraction
> ---
>
> Key: TIKA-2077
> URL: https://issues.apache.org/jira/browse/TIKA-2077
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Akash Sudhakar
> Attachments: TestData.docx
>
>
> During docx file extraction using tika 1.13, special character is extracted 
> as .
> How to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction

2016-09-12 Thread Akash Sudhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash Sudhakar updated TIKA-2077:
-
Attachment: TestData.docx

Attached test file.
Below is the code used.
BodyContentHandler handler = new BodyContentHandler();
AutoDetectParser parser = new AutoDetectParser();
Metadata metadata = new Metadata();
InputStream stream = new BufferedInputStream(new FileInputStream(file));
parser.parse(stream, handler, metadata);

> Special character extracted as  in docx file extraction
> ---
>
> Key: TIKA-2077
> URL: https://issues.apache.org/jira/browse/TIKA-2077
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Akash Sudhakar
>Priority: Minor
> Attachments: TestData.docx
>
>
> During docx file extraction using tika 1.13, special character is extracted 
> as .
> How to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction

2016-09-12 Thread Akash Sudhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash Sudhakar updated TIKA-2077:
-
Affects Version/s: (was: 1.10)
   1.13

> Special character extracted as  in docx file extraction
> ---
>
> Key: TIKA-2077
> URL: https://issues.apache.org/jira/browse/TIKA-2077
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Akash Sudhakar
>Priority: Minor
>
> During docx file extraction using tika 1.13, special character is extracted 
> as .
> How to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-2077) Special character extracted as AAAAAAAA in docx file extraction

2016-09-12 Thread Akash Sudhakar (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-2077?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Akash Sudhakar updated TIKA-2077:
-
Description: 
During docx file extraction using tika 1.13, special character is extracted as 
.
How to avoid this.

> Special character extracted as  in docx file extraction
> ---
>
> Key: TIKA-2077
> URL: https://issues.apache.org/jira/browse/TIKA-2077
> Project: Tika
>  Issue Type: Bug
>Affects Versions: 1.13
>Reporter: Akash Sudhakar
>Priority: Minor
>
> During docx file extraction using tika 1.13, special character is extracted 
> as .
> How to avoid this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)