[jira] Updated: (LUCENE-1103) WikipediaTokenizer

2008-01-03 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1103: Attachment: LUCENE-1103.patch Adds in EXTERNAL_LINK_URL type to distinguish the first toke

[jira] Updated: (LUCENE-1103) WikipediaTokenizer

2008-01-03 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1103: Attachment: LUCENE-1103.patch Handle anchors in links and check for various link condition

[jira] Updated: (LUCENE-1103) WikipediaTokenizer

2008-01-03 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1103: Attachment: LUCENE-1103.patch The first alphanum in internal and external link gets a posi

[jira] Updated: (LUCENE-1103) WikipediaTokenizer

2008-01-02 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1103: Attachment: LUCENE-1103.patch More legible unit test. Also now skips HTML tags from withi

[jira] Updated: (LUCENE-1103) WikipediaTokenizer

2008-01-02 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1103: Attachment: LUCENE-1103.patch More URL testing and fixes. > WikipediaTokenizer >

[jira] Updated: (LUCENE-1103) WikipediaTokenizer

2008-01-02 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1103: Attachment: LUCENE-1103.patch Should now handle external links with non link text, i.e. [h

[jira] Updated: (LUCENE-1103) WikipediaTokenizer

2008-01-02 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1103: Attachment: LUCENE-1103.patch Adds contrib/wikipedia. Updates the javadocs build and the

[jira] Updated: (LUCENE-1103) WikipediaTokenizer

2008-01-02 Thread Grant Ingersoll (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1103: Fix Version/s: 2.3 Patch shortly. This will be all new code, other than minor changes to