[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667869#comment-16667869
 ] 

Hudson commented on TIKA-2599:
--

UNSTABLE: Integrated in Jenkins build tika-branch-1x #120 (See 
[https://builds.apache.org/job/tika-branch-1x/120/])
TIKA-2599: Fixed closing of styles around Hyperlinks. Contributed by (dmeikle: 
[https://github.com/apache/tika/commit/eb53077d62ed31795e676b5bcdce01b8ad809c99])
* (add) 
tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
TIKA-2599: Fixed closing of styles around Hyperlinks. Contributed by (dmeikle: 
[https://github.com/apache/tika/commit/50a2a8f6391b87fa8f1b766143f2d759c99cae4b])
* (edit) CHANGES.txt


> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Assignee: Dave Meikle
>Priority: Minor
> Fix For: 1.20
>
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667863#comment-16667863
 ] 

Hudson commented on TIKA-2599:
--

UNSTABLE: Integrated in Jenkins build Tika-trunk #1584 (See 
[https://builds.apache.org/job/Tika-trunk/1584/])
TIKA-2599: Fixed closing of styles around Hyperlinks. Contributed by (dmeikle: 
[https://github.com/apache/tika/commit/10a48b7a0077fbe627d3a0111f92910228d05d77])
* (add) 
tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java


> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Assignee: Dave Meikle
>Priority: Minor
> Fix For: 1.20
>
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-10-29 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667848#comment-16667848
 ] 

Hudson commented on TIKA-2599:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #338 (See 
[https://builds.apache.org/job/tika-2.x-windows/338/])
TIKA-2599: Fixed closing of styles around Hyperlinks. Contributed by (dmeikle: 
rev 10a48b7a0077fbe627d3a0111f92910228d05d77)
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
* (add) 
tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc


> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Assignee: Dave Meikle
>Priority: Minor
> Fix For: 1.20
>
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-10-29 Thread Dave Meikle (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667823#comment-16667823
 ] 

Dave Meikle commented on TIKA-2599:
---

Commited to branch_1x in 324cbd2eb4d64f1e34aba9789ee8b06cbf4d991e and master in 
6ccedbadd4f79d7888eabfcd3a74ab85e168.

Thanks [~ronanos]!

> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Assignee: Dave Meikle
>Priority: Minor
> Fix For: 1.20
>
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-10-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667821#comment-16667821
 ] 

ASF GitHub Bot commented on TIKA-2599:
--

dameikle closed pull request #254: TIKA-2599: Fixed closing of styles around 
Hyperlinks. Contributed by Ronan O'Sullivan.
URL: https://github.com/apache/tika/pull/254
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
 
b/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
index 30bd4bb969..6f7d3785bd 100644
--- 
a/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
+++ 
b/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
@@ -528,8 +528,8 @@ private int handleSpecialCharacterRuns(Paragraph p, int 
index, boolean skipStyli
 url = text.substring(start, end);
 }
 
-xhtml.startElement("a", "href", url);
 closeStyleElements(skipStyling, xhtml);
+xhtml.startElement("a", "href", url);
 for (CharacterRun cr : texts) {
 handleCharacterRun(cr, skipStyling, xhtml);
 }
diff --git 
a/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
 
b/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
index 7456ac409e..d2c38a42d5 100644
--- 
a/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
+++ 
b/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
@@ -560,6 +560,15 @@ public void testBoldHyperlink() throws Exception {
 assertContains("http://tika.apache.org/\;>hyper link; bold" , 
xml);
 }
 
+@Test
+public void testHyperlinkSurroundedByItalics() throws Exception {
+//TIKA-2599
+String xml = getXML("testWORD_italicsSurroundingHyperlink.doc").xml;
+xml = xml.replaceAll("\\s+", " ");
+assertContains("Italic Test before link http://www.google.com\;>" +
+"hyperlink italics Italic text after 
hyperlink", xml);
+}
+
 @Test
 public void testMacros() throws  Exception {
 
diff --git 
a/tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc
 
b/tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc
new file mode 100644
index 00..24edb8f718
Binary files /dev/null and 
b/tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc
 differ


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Assignee: Dave Meikle
>Priority: Minor
> Fix For: 1.20
>
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-10-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667820#comment-16667820
 ] 

ASF GitHub Bot commented on TIKA-2599:
--

dameikle opened a new pull request #254: TIKA-2599: Fixed closing of styles 
around Hyperlinks. Contributed by Ronan O'Sullivan.
URL: https://github.com/apache/tika/pull/254
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Assignee: Dave Meikle
>Priority: Minor
> Fix For: 1.20
>
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-10-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667793#comment-16667793
 ] 

ASF GitHub Bot commented on TIKA-2599:
--

dameikle closed pull request #253: TIKA-2599: Fixed closing of styles around 
Hyperlinks (by Ronan O'Sullivan)
URL: https://github.com/apache/tika/pull/253
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git a/CHANGES.txt b/CHANGES.txt
index 1f793d2f62..187531acf1 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -3,6 +3,9 @@ Release 1.20 - ???
* Use -javaHome or $JAVA_HOME (if they exist) when
  spawning child in tika-server's -spawnChild mode.
 
+   * Fixed closing of styles around Hyperlinks in Word Parser
+ Contributed by Ronan O'Sullivan (TIKA-2599).
+
 Release 1.19.1 - 10/4/2018
 
* Update PDFBox to 2.0.12, jempbox to 1.8.16
diff --git 
a/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
 
b/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
index 30bd4bb969..6f7d3785bd 100644
--- 
a/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
+++ 
b/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
@@ -528,8 +528,8 @@ private int handleSpecialCharacterRuns(Paragraph p, int 
index, boolean skipStyli
 url = text.substring(start, end);
 }
 
-xhtml.startElement("a", "href", url);
 closeStyleElements(skipStyling, xhtml);
+xhtml.startElement("a", "href", url);
 for (CharacterRun cr : texts) {
 handleCharacterRun(cr, skipStyling, xhtml);
 }
diff --git 
a/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
 
b/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
index 31bd8ba293..d7d6daee56 100644
--- 
a/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
+++ 
b/tika-parsers/src/test/java/org/apache/tika/parser/microsoft/WordParserTest.java
@@ -570,6 +570,15 @@ public void testBoldHyperlink() throws Exception {
 assertContains("http://tika.apache.org/\;>hyper link; bold" , 
xml);
 }
 
+@Test
+public void testHyperlinkSurroundedByItalics() throws Exception {
+//TIKA-2599
+String xml = getXML("testWORD_italicsSurroundingHyperlink.doc").xml;
+xml = xml.replaceAll("\\s+", " ");
+assertContains("Italic Test before link http://www.google.com\;>" +
+"hyperlink italics Italic text after 
hyperlink", xml);
+}
+
 @Test
 public void testMacros() throws  Exception {
 
diff --git 
a/tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc
 
b/tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc
new file mode 100644
index 00..24edb8f718
Binary files /dev/null and 
b/tika-parsers/src/test/resources/test-documents/testWord_italicsSurroundingHyperlink.doc
 differ


 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Priority: Minor
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-10-29 Thread ASF GitHub Bot (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16667789#comment-16667789
 ] 

ASF GitHub Bot commented on TIKA-2599:
--

dameikle opened a new pull request #253: TIKA-2599: Fixed closing of styles 
around Hyperlinks (by Ronan O'Sullivan)
URL: https://github.com/apache/tika/pull/253
 
 
   Contributed by Ronan O'Sullivan.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Priority: Minor
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2599) Hyperlink surrounded by Italics not closed Properly

2018-03-06 Thread Ronan O'Sullivan (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16388973#comment-16388973
 ] 

Ronan O'Sullivan commented on TIKA-2599:


Attaching diff of fix to JIRA. Cannot create review board as git diff is not 
submitting to reviewboard..

> Hyperlink surrounded by Italics not closed Properly
> ---
>
> Key: TIKA-2599
> URL: https://issues.apache.org/jira/browse/TIKA-2599
> Project: Tika
>  Issue Type: Bug
>  Components: parser
>Affects Versions: 1.14, 1.15, 1.16, 1.17
> Environment: Any
>Reporter: Ronan O'Sullivan
>Priority: Minor
> Attachments: diff-TIKA-2599.txt, 
> testWord_italicsSurroundingHyperlink.doc
>
>
> If a Word document contains a hyperlink surrounded by italicized text, the 
> resulting xhtml is:
>  
> Italic Test before link  href="http://www.google.com"/>hyperlink italics 
> Italic text after hyperlink
>  
> The opening italics tag is not closed which is not valid XHTML.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)