[jira] Created: (LUCENE-1816) exampel code in overview.html uses deprecated syntax
exampel code in overview.html uses deprecated syntax Key: LUCENE-1816 URL: https://issues.apache.org/jira/browse/LUCENE-1816 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.9 Reporter: Daniel Naber Priority: Minor The examples should use non-deprecated syntax only. Im' attaching a patch, but other parts of that page might also be out-of-date, which I didn't check now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1816) example code in overview.html uses deprecated syntax
[ https://issues.apache.org/jira/browse/LUCENE-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1816: - Summary: example code in overview.html uses deprecated syntax (was: exampel code in overview.html uses deprecated syntax) example code in overview.html uses deprecated syntax Key: LUCENE-1816 URL: https://issues.apache.org/jira/browse/LUCENE-1816 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.9 Reporter: Daniel Naber Priority: Minor Attachments: overview.diff The examples should use non-deprecated syntax only. Im' attaching a patch, but other parts of that page might also be out-of-date, which I didn't check now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Updated: (LUCENE-1816) exampel code in overview.html uses deprecated syntax
[ https://issues.apache.org/jira/browse/LUCENE-1816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1816: - Attachment: overview.diff exampel code in overview.html uses deprecated syntax Key: LUCENE-1816 URL: https://issues.apache.org/jira/browse/LUCENE-1816 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.9 Reporter: Daniel Naber Priority: Minor Attachments: overview.diff The examples should use non-deprecated syntax only. Im' attaching a patch, but other parts of that page might also be out-of-date, which I didn't check now. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org For additional commands, e-mail: java-dev-h...@lucene.apache.org
[jira] Commented: (LUCENE-1472) DateTools.stringToDate() can cause lock contention under load
[ https://issues.apache.org/jira/browse/LUCENE-1472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12652085#action_12652085 ] Daniel Naber commented on LUCENE-1472: -- Could you try changing the code to create a new object every time and then run your load test again? We original did that but it was slower, at least according to this commit comment from two years ago: Don't re-create SimpleDateFormat objects, use static ones instead. Gives about a 2x performance increase in a micro benchmark. DateTools.stringToDate() can cause lock contention under load - Key: LUCENE-1472 URL: https://issues.apache.org/jira/browse/LUCENE-1472 Project: Lucene - Java Issue Type: Improvement Components: Search Affects Versions: 2.3.2 Reporter: Mark Lassau Priority: Minor Load testing our application (the JIRA Issue Tracker) has shown that threads spend a lot of time blocked in DateTools.stringToDate(). The stringToDate() method uses a singleton SimpleDateFormat object to parse the dates. Each call to SimpleDateFormat.parse() is *synchronized* because SimpleDateFormat is not thread safe. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-858) link from Lucene web page to API docs
[ https://issues.apache.org/jira/browse/LUCENE-858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12597847#action_12597847 ] Daniel Naber commented on LUCENE-858: - Otis, what I meant was a link directly from the main contents of the page to the API doc sub-page. I guess you refer to the navigation bar on the left, don't you? this was discussed on the mailing list at [http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200704.mbox/200704062300.11996%40danielnaber.de] link from Lucene web page to API docs - Key: LUCENE-858 URL: https://issues.apache.org/jira/browse/LUCENE-858 Project: Lucene - Java Issue Type: Improvement Reporter: Daniel Naber Assignee: Grant Ingersoll There should be a way to link from e.g. http://lucene.apache.org/java/docs/gettingstarted.html to the API docs, but not just to the start page with the frame set but to a specific page, e.g. this: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/overview-summary.html#overview_description To make this work a way to set a relative link is needed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-1174) outdated information in Analyzer javadoc
[ https://issues.apache.org/jira/browse/LUCENE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-1174. Resolution: Fixed Fix Version/s: 2.4 committed outdated information in Analyzer javadoc Key: LUCENE-1174 URL: https://issues.apache.org/jira/browse/LUCENE-1174 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.3 Reporter: Daniel Naber Priority: Minor Fix For: 2.4 Attachments: analyzer-javadoc.diff I'm sure you find more ways to improve the javadoc, so feel free to change and extend my patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1174) outdated information in Analyzer javadoc
[ https://issues.apache.org/jira/browse/LUCENE-1174?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1174: - Attachment: analyzer-javadoc.diff outdated information in Analyzer javadoc Key: LUCENE-1174 URL: https://issues.apache.org/jira/browse/LUCENE-1174 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.3 Reporter: Daniel Naber Priority: Minor Attachments: analyzer-javadoc.diff I'm sure you find more ways to improve the javadoc, so feel free to change and extend my patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1174) outdated information in Analyzer javadoc
outdated information in Analyzer javadoc Key: LUCENE-1174 URL: https://issues.apache.org/jira/browse/LUCENE-1174 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.3 Reporter: Daniel Naber Priority: Minor Attachments: analyzer-javadoc.diff I'm sure you find more ways to improve the javadoc, so feel free to change and extend my patch. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1170) query with AND and OR not retrieving correct results
[ https://issues.apache.org/jira/browse/LUCENE-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12567154#action_12567154 ] Daniel Naber commented on LUCENE-1170: -- It's a known problem with QueryParser, see e.g. LUCENE-167 query with AND and OR not retrieving correct results Key: LUCENE-1170 URL: https://issues.apache.org/jira/browse/LUCENE-1170 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.3 Environment: linux and windows Reporter: Graham Maloon I was working with Lucene 1.4, and have now upgraded to 2.3.0 but there is still a problem that I am experiencing with the Queryparser I am passing the following queries: big brother - works fine big brother AND dubai - works fine big brother AND football - works fine big brother AND dubai OR football - returns extra documents which contain big brother but do not contain either dubai or football. big brother AND (dubai OR football) gives the same as the one above Am I doing something wrong? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1158) DateTools UTC/GMT mismatch
[ https://issues.apache.org/jira/browse/LUCENE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber resolved LUCENE-1158. -- Resolution: Fixed Fix Version/s: 2.4 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Patch applied. DateTools UTC/GMT mismatch -- Key: LUCENE-1158 URL: https://issues.apache.org/jira/browse/LUCENE-1158 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.3 Reporter: Daniel Naber Priority: Minor Fix For: 2.4 Attachments: datetools.diff Post from Antony Bowesman on java-user: - I just noticed that although the Javadocs for Lucene 2.2 state that the dates for DateTools use UTC as a timezone, they are actually using GMT. Should either the Javadocs be corrected or the code corrected to use UTC instead. - I'm attaching a patch that changes the javadoc and will commit it, unless someone knows a reason the javadoc is correct and the code should be changed to UTC. To my understanding, there's no significant difference between UTC and GMT. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1158) DateTools UTC/GMT mismatch
[ https://issues.apache.org/jira/browse/LUCENE-1158?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1158: - Attachment: datetools.diff DateTools UTC/GMT mismatch -- Key: LUCENE-1158 URL: https://issues.apache.org/jira/browse/LUCENE-1158 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.3 Reporter: Daniel Naber Priority: Minor Attachments: datetools.diff Post from Antony Bowesman on java-user: - I just noticed that although the Javadocs for Lucene 2.2 state that the dates for DateTools use UTC as a timezone, they are actually using GMT. Should either the Javadocs be corrected or the code corrected to use UTC instead. - I'm attaching a patch that changes the javadoc and will commit it, unless someone knows a reason the javadoc is correct and the code should be changed to UTC. To my understanding, there's no significant difference between UTC and GMT. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1158) DateTools UTC/GMT mismatch
DateTools UTC/GMT mismatch -- Key: LUCENE-1158 URL: https://issues.apache.org/jira/browse/LUCENE-1158 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.3 Reporter: Daniel Naber Priority: Minor Attachments: datetools.diff Post from Antony Bowesman on java-user: - I just noticed that although the Javadocs for Lucene 2.2 state that the dates for DateTools use UTC as a timezone, they are actually using GMT. Should either the Javadocs be corrected or the code corrected to use UTC instead. - I'm attaching a patch that changes the javadoc and will commit it, unless someone knows a reason the javadoc is correct and the code should be changed to UTC. To my understanding, there's no significant difference between UTC and GMT. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1157) Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users)
[ https://issues.apache.org/jira/browse/LUCENE-1157?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12562935#action_12562935 ] Daniel Naber commented on LUCENE-1157: -- It would be nice to have this working with Javascript disabled, i.e. to have all items expanded by default in that case. This could be done by displaying all items by default and adding this code at the bottom: SCRIPT for (var i = 0; i document.getElementsByTagName(ol).length; i++) { document.getElementsByTagName(ol)[i].style.display = none; } /SCRIPT Not very clean, but I don't know a better solution for now. Formatable changes log (CHANGES.txt is easy to edit but not so friendly to read by Lucene users) - Key: LUCENE-1157 URL: https://issues.apache.org/jira/browse/LUCENE-1157 Project: Lucene - Java Issue Type: Improvement Components: Website Reporter: Doron Cohen Assignee: Doron Cohen Fix For: 2.4 Attachments: lucene-1157-take2.patch, lucene-1157.patch Background in http://www.nabble.com/formatable-changes-log-tt15078749.html -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [VOTE] Release Lucene 2.3.0 Take 2
On Dienstag, 22. Januar 2008, Michael Busch wrote: just a reminder: this is a NEW vote. We canceled the first vote because with LUCENE-1144 an issue came up that is now fixed in the artifacts. I ran the test cases, indexed a small collection and tried to access it with Luke (my system is OpenSuse 10.3 and Java 1.6.0_03). Everything worked fine, so: +1 -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1144) NPE crash in case of out of memory
NPE crash in case of out of memory -- Key: LUCENE-1144 URL: https://issues.apache.org/jira/browse/LUCENE-1144 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3 Reporter: Daniel Naber The attached class makes Lucene crash with an NPE when starting it with -Xmx10M, although there's probably an OutOfMemory problem. The stacktrace: Exception in thread main java.lang.NullPointerException at java.util.Arrays.fill(Unknown Source) at org.apache.lucene.index.DocumentsWriter$ByteBlockPool.reset(DocumentsWriter.java:2873) at org.apache.lucene.index.DocumentsWriter$ThreadState.resetPostings(DocumentsWriter.java:637) at org.apache.lucene.index.DocumentsWriter.resetPostingsData(DocumentsWriter.java:458) at org.apache.lucene.index.DocumentsWriter.abort(DocumentsWriter.java:423) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2433) at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2397) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1445) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1424) at LuceneCrash.myrun(LuceneCrash.java:32) at LuceneCrash.main(LuceneCrash.java:19) The documents are quite big (some hundred KB each), I cannot attach them but I can send them via private mail if needed. The crash happens the first time reset() is called, after indexing 10 documents. I assume the bug is just that the error is misleading, there maybe should be an OOM error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1144) NPE crash in case of out of memory
[ https://issues.apache.org/jira/browse/LUCENE-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1144: - Attachment: LuceneCrash.java NPE crash in case of out of memory -- Key: LUCENE-1144 URL: https://issues.apache.org/jira/browse/LUCENE-1144 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3 Reporter: Daniel Naber Attachments: LuceneCrash.java The attached class makes Lucene crash with an NPE when starting it with -Xmx10M, although there's probably an OutOfMemory problem. The stacktrace: Exception in thread main java.lang.NullPointerException at java.util.Arrays.fill(Unknown Source) at org.apache.lucene.index.DocumentsWriter$ByteBlockPool.reset(DocumentsWriter.java:2873) at org.apache.lucene.index.DocumentsWriter$ThreadState.resetPostings(DocumentsWriter.java:637) at org.apache.lucene.index.DocumentsWriter.resetPostingsData(DocumentsWriter.java:458) at org.apache.lucene.index.DocumentsWriter.abort(DocumentsWriter.java:423) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2433) at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2397) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1445) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1424) at LuceneCrash.myrun(LuceneCrash.java:32) at LuceneCrash.main(LuceneCrash.java:19) The documents are quite big (some hundred KB each), I cannot attach them but I can send them via private mail if needed. The crash happens the first time reset() is called, after indexing 10 documents. I assume the bug is just that the error is misleading, there maybe should be an OOM error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1144) NPE crash in case of out of memory
[ https://issues.apache.org/jira/browse/LUCENE-1144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12560835#action_12560835 ] Daniel Naber commented on LUCENE-1144: -- Yes, I get the correct exception now with your patch. Thanks! Exception in thread main java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.index.DocumentsWriter.recyclePostings(DocumentsWriter.java:3033) at org.apache.lucene.index.DocumentsWriter.access$0(DocumentsWriter.java:3028) at org.apache.lucene.index.DocumentsWriter$ThreadState$FieldData.resetPostingArrays(DocumentsWriter.java:1333) at org.apache.lucene.index.DocumentsWriter$ThreadState.resetPostings(DocumentsWriter.java:644) at org.apache.lucene.index.DocumentsWriter.resetPostingsData(DocumentsWriter.java:458) at org.apache.lucene.index.DocumentsWriter.abort(DocumentsWriter.java:423) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2433) at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2397) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1445) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1424) at LuceneCrash.myrun(LuceneCrash.java:35) at LuceneCrash.main(LuceneCrash.java:19) NPE crash in case of out of memory -- Key: LUCENE-1144 URL: https://issues.apache.org/jira/browse/LUCENE-1144 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.3 Reporter: Daniel Naber Attachments: LUCENE-1144.patch, LuceneCrash.java The attached class makes Lucene crash with an NPE when starting it with -Xmx10M, although there's probably an OutOfMemory problem. The stacktrace: Exception in thread main java.lang.NullPointerException at java.util.Arrays.fill(Unknown Source) at org.apache.lucene.index.DocumentsWriter$ByteBlockPool.reset(DocumentsWriter.java:2873) at org.apache.lucene.index.DocumentsWriter$ThreadState.resetPostings(DocumentsWriter.java:637) at org.apache.lucene.index.DocumentsWriter.resetPostingsData(DocumentsWriter.java:458) at org.apache.lucene.index.DocumentsWriter.abort(DocumentsWriter.java:423) at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:2433) at org.apache.lucene.index.DocumentsWriter.addDocument(DocumentsWriter.java:2397) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1445) at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1424) at LuceneCrash.myrun(LuceneCrash.java:32) at LuceneCrash.main(LuceneCrash.java:19) The documents are quite big (some hundred KB each), I cannot attach them but I can send them via private mail if needed. The crash happens the first time reset() is called, after indexing 10 documents. I assume the bug is just that the error is misleading, there maybe should be an OOM error. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-1113) fix for Document.getBoost() documentation
[ https://issues.apache.org/jira/browse/LUCENE-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-1113. Resolution: Fixed Fix Version/s: 2.3 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Thanks, I've committed your text. fix for Document.getBoost() documentation - Key: LUCENE-1113 URL: https://issues.apache.org/jira/browse/LUCENE-1113 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.2 Reporter: Daniel Naber Priority: Minor Fix For: 2.3 Attachments: document-getboost.diff The attached patch fixes the javadoc to make clear that getBoost() will never return a useful value in most cases. I will commit this unless someone has a better wording or a real fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1113) fix for Document.getBoost() documentation
fix for Document.getBoost() documentation - Key: LUCENE-1113 URL: https://issues.apache.org/jira/browse/LUCENE-1113 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.2 Reporter: Daniel Naber Priority: Minor Attachments: document-getboost.diff The attached patch fixes the javadoc to make clear that getBoost() will never return a useful value in most cases. I will commit this unless someone has a better wording or a real fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1113) fix for Document.getBoost() documentation
[ https://issues.apache.org/jira/browse/LUCENE-1113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1113: - Attachment: document-getboost.diff fix for Document.getBoost() documentation - Key: LUCENE-1113 URL: https://issues.apache.org/jira/browse/LUCENE-1113 Project: Lucene - Java Issue Type: Bug Components: Javadocs Affects Versions: 2.2 Reporter: Daniel Naber Priority: Minor Attachments: document-getboost.diff The attached patch fixes the javadoc to make clear that getBoost() will never return a useful value in most cases. I will commit this unless someone has a better wording or a real fix. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-770) CfsExtractor tool
[ https://issues.apache.org/jira/browse/LUCENE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12554405 ] Daniel Naber commented on LUCENE-770: - Otis, I've used it just once and noticed the problem. I'm not sure how to fix this problem, I could of course just change the javadoc. But telling people to use a hex editor to change some files isn't really a nice solution. CfsExtractor tool - Key: LUCENE-770 URL: https://issues.apache.org/jira/browse/LUCENE-770 Project: Lucene - Java Issue Type: New Feature Components: Index Affects Versions: 2.1 Reporter: Otis Gospodnetic Priority: Minor Attachments: LUCENE-770.patch A tool for extracting the content of a CFS file, in order to go from a compound index to a multi-file index. This may be handy for people who want to go back to multi-file index format now that field norms are in a single file - LUCENE-756. Most of this code already existed and was hiding in IndexReader.main. I'll commit tomorrow, unless I hear otherwise. I think I should also remove IndexReader.main then. Ja? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Lucene-java Wiki] Update of PoweredBy by PietSchmidt
On Dienstag, 18. Dezember 2007, Chris Hostetter wrote: not every one is allowed to link back to Lucene ... but i have been thinking that we could start making it a policy that if you want to put a link to your site on the wiki, you need to have two URLs: a URL showing Lucene in use, and a URL where you talk about the code you implemented or the hardware you run on (which can easily be a blog post or mailing list archive link) ... that would hopefully weed out some of the fly by linkers I'm now adding a text to the PoweredBy Wiki page, feel free to adapt it. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-770) CfsExtractor tool
[ https://issues.apache.org/jira/browse/LUCENE-770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12553778 ] Daniel Naber commented on LUCENE-770: - I think there's a small issue which is also in IndexReader.main: the javadoc claims that you need to copy the segments files to make the extracted index work, but that's not enough, you will also need to modify the segments file because it contains the information whether the index is in compound format or not. CfsExtractor tool - Key: LUCENE-770 URL: https://issues.apache.org/jira/browse/LUCENE-770 Project: Lucene - Java Issue Type: New Feature Components: Index Affects Versions: 2.1 Reporter: Otis Gospodnetic Priority: Minor Attachments: LUCENE-770.patch A tool for extracting the content of a CFS file, in order to go from a compound index to a multi-file index. This may be handy for people who want to go back to multi-file index format now that field norms are in a single file - LUCENE-756. Most of this code already existed and was hiding in IndexReader.main. I'll commit tomorrow, unless I hear otherwise. I think I should also remove IndexReader.main then. Ja? -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [Lucene-java Wiki] Update of PoweredBy by PietSchmidt
On Montag, 17. Dezember 2007, Apache Wiki wrote: + * [http://frauen-kennenlernen.com/ Frauen kennenlernen] - Search engine using Lucene I don't claim that this is spam, but more and more of the Wiki PoweredBy links look like someone just wants a link from the Lucene project, probably to boost their Google ranking. We cannot tell whether these people really use Lucene at all, or if they use some blogging software which in turn uses Lucene (in that case it wouldn't make sense to link them from our page either). My suggestion would be that we only accept links if people use Lucene directly (not via a software that has a Lucene-based search anyway) and that they put a link to Lucene on their imprint/contact page or on the search result page. On the other hand, while the page above is harmless, I guess it's not necessarily something Apache Lucene needs to be associated with. Any suggestions? Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Performance Improvement for Search using PriorityQueue
On Montag, 10. Dezember 2007, Michael Busch wrote: Reboot your machine ;-) That's what I usually do - if there's another way I'd like to know as well! On Linux (kernel 2.6.16 and later), call: sync ; echo 3 /proc/sys/vm/drop_caches Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-1084) increase default maxFieldLength?
increase default maxFieldLength? Key: LUCENE-1084 URL: https://issues.apache.org/jira/browse/LUCENE-1084 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.2 Reporter: Daniel Naber To my understanding, Lucene 2.3 will easily index large documents. So shouldn't we get rid of the 10,000 default limit for the field length? 10,000 isn't that much and as Lucene doesn't have any error logging by default, this is a common problem for users that is difficult to debug if you don't know where to look. A better new default might be Integer.MAX_VALUE. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546478 ] Daniel Naber commented on LUCENE-588: - The problem is that the WildcardQuery itself doesn't have a concept of escaped characters. The escape characters are removed in QueryParser. This mean t?\?t will arrive as t??t in WildcardQuery and the second question mark is also interpreted as a wildcard. Escaped wildcard character in wildcard term not handled correctly - Key: LUCENE-588 URL: https://issues.apache.org/jira/browse/LUCENE-588 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.0.0 Environment: Windows XP SP2 Reporter: Sunil Kamath If an escaped wildcard character is specified in a wildcard query, it is treated as a wildcard instead of a literal. e.g., t\??t is converted by the QueryParser to t??t - the escape character is discarded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546478 ] [EMAIL PROTECTED] edited comment on LUCENE-588 at 11/28/07 3:27 PM: --- The problem is that the WildcardQuery itself doesn't have a concept of escaped characters. The escape characters are removed in QueryParser. This mean t?\\?t will arrive as t??t in WildcardQuery and the second question mark is also interpreted as a wildcard. was (Author: [EMAIL PROTECTED]): The problem is that the WildcardQuery itself doesn't have a concept of escaped characters. The escape characters are removed in QueryParser. This mean t?\?t will arrive as t??t in WildcardQuery and the second question mark is also interpreted as a wildcard. Escaped wildcard character in wildcard term not handled correctly - Key: LUCENE-588 URL: https://issues.apache.org/jira/browse/LUCENE-588 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.0.0 Environment: Windows XP SP2 Reporter: Sunil Kamath If an escaped wildcard character is specified in a wildcard query, it is treated as a wildcard instead of a literal. e.g., t\??t is converted by the QueryParser to t??t - the escape character is discarded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546479 ] Daniel Naber commented on LUCENE-588: - Also, the original report and my comment look confusing because Jira removes the backslash. Imagine a backslash in front of *one* of the question marks. Escaped wildcard character in wildcard term not handled correctly - Key: LUCENE-588 URL: https://issues.apache.org/jira/browse/LUCENE-588 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.0.0 Environment: Windows XP SP2 Reporter: Sunil Kamath If an escaped wildcard character is specified in a wildcard query, it is treated as a wildcard instead of a literal. e.g., t\??t is converted by the QueryParser to t??t - the escape character is discarded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Issue Comment Edited: (LUCENE-588) Escaped wildcard character in wildcard term not handled correctly
[ https://issues.apache.org/jira/browse/LUCENE-588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12546478 ] [EMAIL PROTECTED] edited comment on LUCENE-588 at 11/28/07 3:27 PM: --- The problem is that the WildcardQuery itself doesn't have a concept of escaped characters. The escape characters are removed in QueryParser. This mean t?\?t will arrive as t??t in WildcardQuery and the second question mark is also interpreted as a wildcard. was (Author: [EMAIL PROTECTED]): The problem is that the WildcardQuery itself doesn't have a concept of escaped characters. The escape characters are removed in QueryParser. This mean t?\\?t will arrive as t??t in WildcardQuery and the second question mark is also interpreted as a wildcard. Escaped wildcard character in wildcard term not handled correctly - Key: LUCENE-588 URL: https://issues.apache.org/jira/browse/LUCENE-588 Project: Lucene - Java Issue Type: Bug Components: QueryParser Affects Versions: 2.0.0 Environment: Windows XP SP2 Reporter: Sunil Kamath If an escaped wildcard character is specified in a wildcard query, it is treated as a wildcard instead of a literal. e.g., t\??t is converted by the QueryParser to t??t - the escape character is discarded. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-1045) SortField.AUTO doesn't work with long
[ https://issues.apache.org/jira/browse/LUCENE-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-1045. Resolution: Fixed Fix Version/s: 2.3 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) patch applied SortField.AUTO doesn't work with long - Key: LUCENE-1045 URL: https://issues.apache.org/jira/browse/LUCENE-1045 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.2 Reporter: Daniel Naber Priority: Minor Fix For: 2.3 Attachments: auto-long-sorting.diff, TestDateSort.java This is actually the same as LUCENE-463 but I cannot find a way to re-open that issue. I'm attaching a test case by dragon-fly999 at hotmail com that shows the problem and a patch that seems to fix it. The problem is that a long (as used for dates) cannot be parsed as an integer, and the next step is then to parse it as a float, which works but which is not correct. With the patch the following parsers are used in this order: int, long, float. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1046) Dead code in SpellChecker.java (branch never executes)
[ https://issues.apache.org/jira/browse/LUCENE-1046?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1046: - Attachment: LUCENE-1046.diff Thanks for your report, could you try out this patch? Dead code in SpellChecker.java (branch never executes) -- Key: LUCENE-1046 URL: https://issues.apache.org/jira/browse/LUCENE-1046 Project: Lucene - Java Issue Type: Bug Components: contrib/* Affects Versions: 2.2 Reporter: Joe Priority: Minor Attachments: LUCENE-1046.diff SpellChecker contains the following lines of code: final int goalFreq = (morePopular ir != null) ? ir.docFreq(new Term(field, word)) : 0; // if the word exists in the real index and we don't care for word frequency, return the word itself if (!morePopular goalFreq 0) { return new String[] { word }; } The branch will never execute: the only way for goalFreq to be greater than zero is if morePopular is true, but if morePopular is true, the expression in the if statement evaluates to false. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-920) IndexModifier has incomplete Javadocs
[ https://issues.apache.org/jira/browse/LUCENE-920?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12545634 ] Daniel Naber commented on LUCENE-920: - I think this bug can be closed, as IndexModifier is deprecated. IndexModifier has incomplete Javadocs - Key: LUCENE-920 URL: https://issues.apache.org/jira/browse/LUCENE-920 Project: Lucene - Java Issue Type: Wish Components: Javadocs Reporter: Michael Busch Priority: Trivial A lot of public and protected members of org.apache.lucene.index.IndexModifier don't have javadocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1066) better explain output
[ https://issues.apache.org/jira/browse/LUCENE-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1066: - Attachment: explain-output.diff better explain output - Key: LUCENE-1066 URL: https://issues.apache.org/jira/browse/LUCENE-1066 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3 Reporter: Daniel Naber Attachments: explain-output.diff Very simple patch that slightly improves output of idf: show both docFreq and numDocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-1066) better explain output
[ https://issues.apache.org/jira/browse/LUCENE-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-1066. Resolution: Fixed Fix Version/s: 2.3 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) applied better explain output - Key: LUCENE-1066 URL: https://issues.apache.org/jira/browse/LUCENE-1066 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3 Reporter: Daniel Naber Priority: Trivial Fix For: 2.3 Attachments: explain-output.diff Very simple patch that slightly improves output of idf: show both docFreq and numDocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1066) better explain output
[ https://issues.apache.org/jira/browse/LUCENE-1066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-1066: - Priority: Trivial (was: Major) Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) better explain output - Key: LUCENE-1066 URL: https://issues.apache.org/jira/browse/LUCENE-1066 Project: Lucene - Java Issue Type: Improvement Components: Query/Scoring Affects Versions: 2.3 Reporter: Daniel Naber Priority: Trivial Fix For: 2.3 Attachments: explain-output.diff Very simple patch that slightly improves output of idf: show both docFreq and numDocs. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-997) Add search timeout support to Lucene
[ https://issues.apache.org/jira/browse/LUCENE-997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12527605 ] Daniel Naber commented on LUCENE-997: - Thanks for the patch. I didn't have a very close look, just one small thing: it's probably no good idea to catch and ignore the InterruptedException. See http://www-128.ibm.com/developerworks/java/library/j-jtp05236.html Add search timeout support to Lucene Key: LUCENE-997 URL: https://issues.apache.org/jira/browse/LUCENE-997 Project: Lucene - Java Issue Type: New Feature Reporter: Sean Timm Priority: Minor Attachments: LuceneTimeoutTest.java, timeout.patch This patch is based on Nutch-308. This patch adds support for a maximum search time limit. After this time is exceeded, the search thread is stopped, partial results (if any) are returned and the total number of results is estimated. This patch tries to minimize the overhead related to time-keeping by using a version of safe unsynchronized timer. This was also discussed in an e-mail thread. http://www.nabble.com/search-timeout-tf3410206.html#a9501029 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene 2.2.0 release available
On Wednesday 20 June 2007 03:01, Yonik Seeley wrote: FYI, The announcement has not made it to the http:// lucene.apache.org/ page. I just committed this. It should be viewable in about an hour. The links to the new features don't work for me, I always end up on the API overview page. Shouldn't the links be e.g. http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/document/Field.html instead of http://lucene.apache.org/java/2_2_0/api/index.html?org/apache/lucene/document/Field.html ? Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-759) Add n-gram tokenizers to contrib/analyzers
[ https://issues.apache.org/jira/browse/LUCENE-759?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500811 ] Daniel Naber commented on LUCENE-759: - Can this issue be closed or is there anything still open? Add n-gram tokenizers to contrib/analyzers -- Key: LUCENE-759 URL: https://issues.apache.org/jira/browse/LUCENE-759 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Otis Gospodnetic Assignee: Otis Gospodnetic Priority: Minor Fix For: 2.2 Attachments: LUCENE-759-filters.patch, LUCENE-759.patch, LUCENE-759.patch, LUCENE-759.patch It would be nice to have some n-gram-capable tokenizers in contrib/analyzers. Patch coming shortly. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-763) LuceneDictionary skips first word in enumeration
[ https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500863 ] Daniel Naber commented on LUCENE-763: - Thanks, Steven. Your javadoc changes have also been committed now. LuceneDictionary skips first word in enumeration Key: LUCENE-763 URL: https://issues.apache.org/jira/browse/LUCENE-763 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.0.0 Environment: Windows Sun JRE 1.4.2_10_b03 Reporter: Dan Ertman Fix For: 2.2 Attachments: LuceneDictionary.java, TestLuceneDictionary.java The current code for LuceneDictionary will always skip the first word of the TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - its first call is to TermEnum.next, which moves it past the first term (line 76). To see this problem cause a failure, add this test to TestSpellChecker: similar = spellChecker.suggestSimilar(eihgt,2); assertEquals(1, similar.length); assertEquals(similar[0], eight); Because eight is the first word in the index, it will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-763) LuceneDictionary skips first word in enumeration
[ https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-763. --- Resolution: Fixed Fix Version/s: 2.2 Thanks, patch applied. LuceneDictionary skips first word in enumeration Key: LUCENE-763 URL: https://issues.apache.org/jira/browse/LUCENE-763 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.0.0 Environment: Windows Sun JRE 1.4.2_10_b03 Reporter: Dan Ertman Fix For: 2.2 Attachments: LuceneDictionary.java, TestLuceneDictionary.java The current code for LuceneDictionary will always skip the first word of the TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - its first call is to TermEnum.next, which moves it past the first term (line 76). To see this problem cause a failure, add this test to TestSpellChecker: similar = spellChecker.suggestSimilar(eihgt,2); assertEquals(1, similar.length); assertEquals(similar[0], eight); Because eight is the first word in the index, it will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-763) LuceneDictionary skips first word in enumeration
[ https://issues.apache.org/jira/browse/LUCENE-763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12500230 ] Daniel Naber commented on LUCENE-763: - Thanks for your patch. I think there's a problem with the iterator which might not occur often, but it should be fixed nonetheless: calling next() only has an effect if hasNext() has been called before. You can see that by commenting out assertTrue(Second element doesn't exist., it.hasNext()); in the test case: the test will then fail, although, to my understanding, hasNext() should have no side effects. Could you change you patch accordingly? LuceneDictionary skips first word in enumeration Key: LUCENE-763 URL: https://issues.apache.org/jira/browse/LUCENE-763 Project: Lucene - Java Issue Type: Bug Components: Other Affects Versions: 2.0.0 Environment: Windows Sun JRE 1.4.2_10_b03 Reporter: Dan Ertman Attachments: LuceneDictionary.java, TestLuceneDictionary.java The current code for LuceneDictionary will always skip the first word of the TermEnum. The reason is that it doesn't initially retrieve TermEnum.term - its first call is to TermEnum.next, which moves it past the first term (line 76). To see this problem cause a failure, add this test to TestSpellChecker: similar = spellChecker.suggestSimilar(eihgt,2); assertEquals(1, similar.length); assertEquals(similar[0], eight); Because eight is the first word in the index, it will fail. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-886) spellchecker cleanup
[ https://issues.apache.org/jira/browse/LUCENE-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-886. --- Resolution: Fixed Fix Version/s: 2.2 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) committed. spellchecker cleanup Key: LUCENE-886 URL: https://issues.apache.org/jira/browse/LUCENE-886 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.1 Reporter: Daniel Naber Fix For: 2.2 Attachments: spellchecker-cleanup.diff Some cleanup, attached here so it can be tracked if necessary: javadoc improvements; don't print exceptions to stderr but re-throw them; new constructor for a new test case. I will commit this soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-882) Spellchecker doesn't need to store ngrams
[ https://issues.apache.org/jira/browse/LUCENE-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-882. --- Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) patch applied Spellchecker doesn't need to store ngrams - Key: LUCENE-882 URL: https://issues.apache.org/jira/browse/LUCENE-882 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 2.1 Reporter: Daniel Naber Attachments: lucene-spellchecker.diff The spellchecker in contrib stores the ngrams although this doesn't seem to be necessary. This patch changes that, I will commit it unless someone objects. This improves indexing speed and index size. Some numbers on a small test I did: Input of the original index: 2200 text files, index size 5.3 MB, indexing took 17 seconds Spell index before patch: about 60.000 documents, index size 13 MB, indexing took 62 seconds Spell index after patch: about 60.000 documents, index size 6.3 MB, indexing took 52 seconds BTW, the test case fails even before this patch. I'll probaby submit another issue about how to fix that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-883) make spell checker test case work again
[ https://issues.apache.org/jira/browse/LUCENE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-883. --- Resolution: Fixed Fix Version/s: 2.2 Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Patch applied. make spell checker test case work again --- Key: LUCENE-883 URL: https://issues.apache.org/jira/browse/LUCENE-883 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.1 Reporter: Daniel Naber Fix For: 2.2 Attachments: lucene-spellchecker-2.diff See attached path which makes the spellchecker test case work again. The problem without the patch is that consecutive calls to indexDictionary() will create a spelling index with duplicate words. Does anybody see a problem with this patch? I see that the spellchecker code is now used in Solr, isn't it? I didn't have time to test this patch inside Solr. Also see http://issues.apache.org/jira/browse/LUCENE-632, but the null check is included in this patch so the NPE described there cannot happen anymore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-403) Alternate Lucene Query Highlighter
[ https://issues.apache.org/jira/browse/LUCENE-403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-403: Assignee: (was: Lucene Developers) Summary: Alternate Lucene Query Highlighter (was: Alternate Lucene Query Parser) fix title Alternate Lucene Query Highlighter -- Key: LUCENE-403 URL: https://issues.apache.org/jira/browse/LUCENE-403 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 1.4 Environment: Operating System: All Platform: All Reporter: David Bohl Priority: Minor Attachments: HighlighterTest.java, HighlighterTest.java, QueryHighlighter.java, QueryHighlighter.java, QueryHighlighter.java, QuerySpansExtractor.java I created a lucene query highlighter (borrowing some code from the one in the sandbox) that my company is using. It better handles phrase queries, doesn't break HTML entities, and has the ability to either highlight terms in an entire document or to highlight fragments from the document. I would like to make it available to anyone who wants it. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-886) spellchecker cleanup
spellchecker cleanup Key: LUCENE-886 URL: https://issues.apache.org/jira/browse/LUCENE-886 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.1 Reporter: Daniel Naber Attachments: spellchecker-cleanup.diff Some cleanup, attached here so it can be tracked if necessary: javadoc improvements; don't print exceptions to stderr but re-throw them; new constructor for a new test case. I will commit this soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-886) spellchecker cleanup
[ https://issues.apache.org/jira/browse/LUCENE-886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-886: Attachment: spellchecker-cleanup.diff cleanup patch spellchecker cleanup Key: LUCENE-886 URL: https://issues.apache.org/jira/browse/LUCENE-886 Project: Lucene - Java Issue Type: Improvement Affects Versions: 2.1 Reporter: Daniel Naber Attachments: spellchecker-cleanup.diff Some cleanup, attached here so it can be tracked if necessary: javadoc improvements; don't print exceptions to stderr but re-throw them; new constructor for a new test case. I will commit this soon. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-883) make spell checker test case work again
[ https://issues.apache.org/jira/browse/LUCENE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12497014 ] Daniel Naber commented on LUCENE-883: - Yes, the exist() method checks whether the reader is null and re-opens it if necessary, so reader = null is needed. make spell checker test case work again --- Key: LUCENE-883 URL: https://issues.apache.org/jira/browse/LUCENE-883 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.1 Reporter: Daniel Naber Attachments: lucene-spellchecker-2.diff See attached path which makes the spellchecker test case work again. The problem without the patch is that consecutive calls to indexDictionary() will create a spelling index with duplicate words. Does anybody see a problem with this patch? I see that the spellchecker code is now used in Solr, isn't it? I didn't have time to test this patch inside Solr. Also see http://issues.apache.org/jira/browse/LUCENE-632, but the null check is included in this patch so the NPE described there cannot happen anymore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-884) Query Syntax page does not make it clear that wildcard searches are not allowed in Phrase Queries
[ https://issues.apache.org/jira/browse/LUCENE-884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-884. --- Resolution: Fixed Thanks, this is fixed now (website should update soon). Query Syntax page does not make it clear that wildcard searches are not allowed in Phrase Queries - Key: LUCENE-884 URL: https://issues.apache.org/jira/browse/LUCENE-884 Project: Lucene - Java Issue Type: Improvement Components: Website Affects Versions: 2.0.1 Reporter: Paul Taylor The queryparsersyntax page which is where I expect most novices (such as myself) start with lucene seems to indicate that wildcards can be used in phrase terms Quoting: 'Terms: A query is broken up into terms and operators. There are two types of terms: Single Terms and Phrases. A Single Term is a single word such as test or hello. A Phrase is a group of words surrounded by double quotes such as hello dolly. Wildcard Searches Lucene supports single and multiple character wildcard searches. To perform a multiple character wildcard search use the * symbol. Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search: test* You can also use the wildcard searches in the middle of a term. ' there is nothing to indicate in the section on Wildcard Searches that it can be performed only on Single word terms not Phrase terms. Chris argues 'that there is nothing in the description of a Phrase to indicate that it can be anything other then what it says a group of words surrounded by double quotes .. at no point does it suggest that other types of queries or syntax can be used inside the quotes. likewise the discussion of Wildcards makes no mention of phrases to suggest that wildcard characters can be used in a phrase.' but I don't accept this because there is nothing in the description of a Single Term either to indicate it can use wildcards either. Wildcards are only mentioned in the Wildcard section and there it says thay can be used in a term, it does not restrict the type of term I Propose a simple solution modify: Lucene supports single and multiple character wildcard searches. to Lucene supports single and multiple character wildcard searches within single terms. (Chris asked for a patch, but Im not sure how to do this, but the change is simple enough) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Recreating a document from its index
On Thursday 17 May 2007 00:58, Stefano Fornari wrote: I have a question to which I could not answer reading the documentation and searching the mailing list archive: This actually belongs more to the user list... try Luke and click on the Reconstruct Edit button, then on the Tokenized tab. This will show you what can be recreated. This depends on the stopwords and the other normalizations made by the Analyzer. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-882) Spellchecker doesn't need to store ngrams
Spellchecker doesn't need to store ngrams - Key: LUCENE-882 URL: https://issues.apache.org/jira/browse/LUCENE-882 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 2.1 Reporter: Daniel Naber Attachments: lucene-spellchecker.diff The spellchecker in contrib stores the ngrams although this doesn't seem to be necessary. This patch changes that, I will commit it unless someone objects. This improves indexing speed and index size. Some numbers on a small test I did: Input of the original index: 2200 text files, index size 5.3 MB, indexing took 17 seconds Spell index before patch: about 60.000 documents, index size 13 MB, indexing took 62 seconds Spell index after patch: about 60.000 documents, index size 6.3 MB, indexing took 52 seconds BTW, the test case fails even before this patch. I'll probaby submit another issue about how to fix that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-882) Spellchecker doesn't need to store ngrams
[ https://issues.apache.org/jira/browse/LUCENE-882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-882: Attachment: lucene-spellchecker.diff don't store but only index ngrams Spellchecker doesn't need to store ngrams - Key: LUCENE-882 URL: https://issues.apache.org/jira/browse/LUCENE-882 Project: Lucene - Java Issue Type: Improvement Components: Other Affects Versions: 2.1 Reporter: Daniel Naber Attachments: lucene-spellchecker.diff The spellchecker in contrib stores the ngrams although this doesn't seem to be necessary. This patch changes that, I will commit it unless someone objects. This improves indexing speed and index size. Some numbers on a small test I did: Input of the original index: 2200 text files, index size 5.3 MB, indexing took 17 seconds Spell index before patch: about 60.000 documents, index size 13 MB, indexing took 62 seconds Spell index after patch: about 60.000 documents, index size 6.3 MB, indexing took 52 seconds BTW, the test case fails even before this patch. I'll probaby submit another issue about how to fix that. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-883) make spell checker test case work again
make spell checker test case work again --- Key: LUCENE-883 URL: https://issues.apache.org/jira/browse/LUCENE-883 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.1 Reporter: Daniel Naber Attachments: lucene-spellchecker-2.diff See attached path which makes the spellchecker test case work again. The problem without the patch is that consecutive calls to indexDictionary() will create a spelling index with duplicate words. Does anybody see a problem with this patch? I see that the spellchecker code is now used in Solr, isn't it? I didn't have time to test this patch inside Solr. Also see http://issues.apache.org/jira/browse/LUCENE-632, but the null check is included in this patch so the NPE described there cannot happen anymore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-883) make spell checker test case work again
[ https://issues.apache.org/jira/browse/LUCENE-883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-883: Attachment: lucene-spellchecker-2.diff patch to make test case work again make spell checker test case work again --- Key: LUCENE-883 URL: https://issues.apache.org/jira/browse/LUCENE-883 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.1 Reporter: Daniel Naber Attachments: lucene-spellchecker-2.diff See attached path which makes the spellchecker test case work again. The problem without the patch is that consecutive calls to indexDictionary() will create a spelling index with duplicate words. Does anybody see a problem with this patch? I see that the spellchecker code is now used in Solr, isn't it? I didn't have time to test this patch inside Solr. Also see http://issues.apache.org/jira/browse/LUCENE-632, but the null check is included in this patch so the NPE described there cannot happen anymore. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-523) FSDirectory.openFile(String) causes ClassCastException
[ https://issues.apache.org/jira/browse/LUCENE-523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-523. --- Resolution: Fixed openFile had been deprecated in Lucene 1.9 and then later removed, so I'm closing this issue. FSDirectory.openFile(String) causes ClassCastException -- Key: LUCENE-523 URL: https://issues.apache.org/jira/browse/LUCENE-523 Project: Lucene - Java Issue Type: Bug Components: Store Affects Versions: 1.9, 2.0.0 Environment: Lucene 1.9.1 Reporter: Eric Isakson When you call FSDirectory.openFile(String) you get a ClassCastException since FSIndexInput is not an org.apache.lucene.store.InputStream The workaround is to reimplement using openInput(String). I personally don't need this to be fixed but wanted to document it here in case anyone else runs into this for any reason. The reason I'm calling this is that I have a requirement on my project to create read only indexes and name the index segments consistently from one build to the next. So, after creating and optimizing the index, I rename the files and rewrite the segments file. It would be nice if I had an API that would allow me to say I only want one segment and I want its name to be 'foo'. For instance IndexWriter.optimize(String segmentName) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-523) FSDirectory.openFile(String) causes ClassCastException
[ https://issues.apache.org/jira/browse/LUCENE-523?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495163 ] Daniel Naber commented on LUCENE-523: - The issue at Jackrabbit is closed, so I guess this can be closed too? I'll do so unless someone objects. FSDirectory.openFile(String) causes ClassCastException -- Key: LUCENE-523 URL: https://issues.apache.org/jira/browse/LUCENE-523 Project: Lucene - Java Issue Type: Bug Components: Store Affects Versions: 1.9, 2.0.0 Environment: Lucene 1.9.1 Reporter: Eric Isakson When you call FSDirectory.openFile(String) you get a ClassCastException since FSIndexInput is not an org.apache.lucene.store.InputStream The workaround is to reimplement using openInput(String). I personally don't need this to be fixed but wanted to document it here in case anyone else runs into this for any reason. The reason I'm calling this is that I have a requirement on my project to create read only indexes and name the index segments consistently from one build to the next. So, after creating and optimizing the index, I rename the files and rewrite the segments file. It would be nice if I had an API that would allow me to say I only want one segment and I want its name to be 'foo'. For instance IndexWriter.optimize(String segmentName) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-858) link from Lucene web page to API docs
link from Lucene web page to API docs - Key: LUCENE-858 URL: https://issues.apache.org/jira/browse/LUCENE-858 Project: Lucene - Java Issue Type: Improvement Reporter: Daniel Naber Assigned To: Grant Ingersoll There should be a way to link from e.g. http://lucene.apache.org/java/docs/gettingstarted.html to the API docs, but not just to the start page with the frame set but to a specific page, e.g. this: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/overview-summary.html#overview_description To make this work a way to set a relative link is needed. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: linking the API docs
On Saturday 07 April 2007 00:42, Chris Hostetter wrote: : I think you can put in the link, just use relative link like in the : site.xml. using a relative link is *key* ... it ensures not only that the static files build by the nightly build work, but also that the docs distributed with each release contain good local pointers. I'm not familiar with forrest, could you help me setting the link? The pages to be linked are these: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/overview-summary.html#overview_description http://lucene.apache.org/java/2_1_0/api/overview-summary.html#overview_description (etc) Note that this is not the API docs page (which contains the frameset) but a content page plus an anchor. So I cannot use a href=ext:javadocs but a href=ext:javadocs/overview-summary.html#overview_description doesn't work either. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
linking the API docs
Hi, we have a short but (I think) useful snippet of example code in our API docs: http://lucene.zones.apache.org:8080/hudson/job/Lucene-Nightly/javadoc/overview-summary.html#overview_description We also have the Getting started section on the web site, which only refers to the demo and doesn't offer code examples: http://lucene.apache.org/java/docs/gettingstarted.html I'd like to link from the Getting Started to the API example. Is it okay to just put the above link (lucene.zones.apache.org) in the file or isn't that supposed to be stable? If that's not okay, the best thing might be to move the code example to the Getting Started. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-841) Replace UTF8 characters in stemmer code with integer values.
[ https://issues.apache.org/jira/browse/LUCENE-841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482914 ] Daniel Naber commented on LUCENE-841: - Which environments still don't handle UTF-8? Using anything that escapes the real characters will make the code difficult to read. Replace UTF8 characters in stemmer code with integer values. Key: LUCENE-841 URL: https://issues.apache.org/jira/browse/LUCENE-841 Project: Lucene - Java Issue Type: Improvement Components: Analysis Reporter: Karl Wettin Priority: Critical BrazillianStemmer, GermanStemmer, FrenchStemmer and DutchStemmer all contains UTF characters in the java code. All environments does not handle that. It really ought to be integer values instead. I'll come up with a patch sooner or later. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-795) deprecate Directory.renameFile()
deprecate Directory.renameFile() Key: LUCENE-795 URL: https://issues.apache.org/jira/browse/LUCENE-795 Project: Lucene - Java Issue Type: Bug Components: Store Affects Versions: 2.0.0 Reporter: Daniel Naber Priority: Minor Fix For: 2.1 Copied from my mailing list post so this issue can be tracked (if necessary). I will commit a patch. I see that Directory.renameFile() isn't used anymore. I assume it has only been public for technical reasons, not because we expect this to be used from outside of Lucene? Should we deprecate this method? Its implementation e.g. in FSDirectory looks a bit scary anyway (the comment correctly says This is not atomic while the abstract class says This replacement should be atomic). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-795) deprecate Directory.renameFile()
[ https://issues.apache.org/jira/browse/LUCENE-795?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber closed LUCENE-795. --- Resolution: Fixed Committed. deprecate Directory.renameFile() Key: LUCENE-795 URL: https://issues.apache.org/jira/browse/LUCENE-795 Project: Lucene - Java Issue Type: Bug Components: Store Affects Versions: 2.0.0 Reporter: Daniel Naber Priority: Minor Fix For: 2.1 Copied from my mailing list post so this issue can be tracked (if necessary). I will commit a patch. I see that Directory.renameFile() isn't used anymore. I assume it has only been public for technical reasons, not because we expect this to be used from outside of Lucene? Should we deprecate this method? Its implementation e.g. in FSDirectory looks a bit scary anyway (the comment correctly says This is not atomic while the abstract class says This replacement should be atomic). -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
deprecate Directory.renameFile()?
Hi, I see that Directory.renameFile() isn't used anymore. I assume it has only been public for technical reasons, not because we expect this to be used from outside of Lucene? Should we deprecate this method? Its implementation e.g. in FSDirectory looks a bit scary anyway (the comment correctly says This is not atomic while the abstract class says This replacement should be atomic). Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-781) NPE in MultiReader.isCurrent() and getVersion()
[ https://issues.apache.org/jira/browse/LUCENE-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-781: Attachment: multireader.diff updated patch NPE in MultiReader.isCurrent() and getVersion() --- Key: LUCENE-781 URL: https://issues.apache.org/jira/browse/LUCENE-781 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Daniel Naber Attachments: multireader.diff, multireader.diff, multireader_test.diff, multireader_test.diff I'm attaching a fix for the NPE in MultiReader.isCurrent() plus a testcase. For getVersion(), we should throw a better exception that NPE. I will commit unless someone objects or has a better idea. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-781) NPE in MultiReader.isCurrent() and getVersion()
[ https://issues.apache.org/jira/browse/LUCENE-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-781: Attachment: multireader_test.diff updated patch NPE in MultiReader.isCurrent() and getVersion() --- Key: LUCENE-781 URL: https://issues.apache.org/jira/browse/LUCENE-781 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Daniel Naber Attachments: multireader.diff, multireader.diff, multireader_test.diff, multireader_test.diff I'm attaching a fix for the NPE in MultiReader.isCurrent() plus a testcase. For getVersion(), we should throw a better exception that NPE. I will commit unless someone objects or has a better idea. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-781) NPe in MultiReader.isCurrent() and getVersion()
[ https://issues.apache.org/jira/browse/LUCENE-781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Daniel Naber updated LUCENE-781: Attachment: multireader.diff NPe in MultiReader.isCurrent() and getVersion() --- Key: LUCENE-781 URL: https://issues.apache.org/jira/browse/LUCENE-781 Project: Lucene - Java Issue Type: Bug Components: Index Reporter: Daniel Naber Attachments: multireader.diff, multireader_test.diff I'm attaching a fix for the NPE in MultiReader.isCurrent() plus a testcase. For getVersion(), we should throw a better exception that NPE. I will commit unless someone objects or has a better idea. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Scalability Question
On Monday 08 January 2007 20:33, Ali Salehi wrote: 1. The search time for simple queries such as precision:\+0002 is really high (4-10 seconds). I want to know if this search time is normal 2. The search gives TooManyClauses exception when I'm searching for a data item with the queries similar to the one below : Please see the FAQ at http://wiki.apache.org/jakarta-lucene/LuceneFAQ: Why am I getting a TooManyClauses exception? How do I speed up searching? If that doesn't help, please re-post you question on the user list. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-765) Index package level javadocs needs content
[ https://issues.apache.org/jira/browse/LUCENE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462203 ] Daniel Naber commented on LUCENE-765: - Some of this is already here: http://lucene.apache.org/java/docs/api/overview-summary.html#overview_description Index package level javadocs needs content -- Key: LUCENE-765 URL: https://issues.apache.org/jira/browse/LUCENE-765 Project: Lucene - Java Issue Type: Task Components: Javadocs Reporter: Grant Ingersoll Priority: Minor The org.apache.lucene.index package level javadocs are sorely lacking. They should be updated to give a summary of the important classes, how indexing works, etc. Maybe give an overview of how the different writers coordinate. Links to file formats, information on the posting algorithm, etc. would be helpful. See the search package javadocs as a sample of the kind of info that could go here. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: access policy for Java Open Review Project
On Wednesday 27 December 2006 01:38, Erik Hatcher wrote: I'd be surprised if anyone uses Lucli, given the limited utility it has versus using Luke. It's actually very useful if you only have ssh access to a machine that has no X11 running. I just fixed the small bug found by this review. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Reopened: (LUCENE-707) Lucene Java Site docs
[ http://issues.apache.org/jira/browse/LUCENE-707?page=all ] Daniel Naber reopened LUCENE-707: - The link to the image (asf-logo.gif) in the upper left corner is broken (mhh, same problem at Nutch site). Lucene Java Site docs - Key: LUCENE-707 URL: http://issues.apache.org/jira/browse/LUCENE-707 Project: Lucene - Java Issue Type: Improvement Components: Website Environment: N/A Reporter: Grant Ingersoll Assigned To: Grant Ingersoll Priority: Minor It would be really nice if the Java site docs where consistent with the rest of the Lucene family (namely, with navigation tabs, etc.) so that one can easily go between Nutch, Hadoop, etc. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-732) Use DateTools instead of deprecated DateField in QueryParser
[ http://issues.apache.org/jira/browse/LUCENE-732?page=comments#action_12454449 ] Daniel Naber commented on LUCENE-732: - I'm not sure if most people use DateTools already, as it has just been added in Lucene 1.9. Maybe you could consider an option (yes, yet another option isn't nice, I know)? Otherwise we need to properly document how to continue using DateField, i.e. by extending QueryParser and overwriting this method I guess. Use DateTools instead of deprecated DateField in QueryParser Key: LUCENE-732 URL: http://issues.apache.org/jira/browse/LUCENE-732 Project: Lucene - Java Issue Type: Improvement Components: QueryParser Reporter: Michael Busch Assigned To: Michael Busch Priority: Minor Attachments: queryparser_datetools.patch The QueryParser currently uses the deprecated class DateField to create RangeQueries with date values. However, most users probably use DateTools to store date values in their indexes, because this is the recommended way since DateField has been deprecated. In that case RangeQueries with date values produced by the QueryParser won't work with those indexes. This patch replaces the use of DateField in QueryParser by DateTools. Because DateTools can produce date values with different resolutions, this patch adds the following methods to QueryParser: /** * Sets the default date resolution used by RangeQueries for fields for which no * specific date resolutions has been set. Field specific resolutions can be set * with [EMAIL PROTECTED] #setDateResolution(String, DateTools.Resolution)}. * * @param dateResolution the default date resolution to set */ public void setDateResolution(DateTools.Resolution dateResolution); /** * Sets the date resolution used by RangeQueries for a specific field. * * @param field field for which the date resolution is to be set * @param dateResolution date resolution to set */ public void setDateResolution(String fieldName, DateTools.Resolution dateResolution); (I also added the corresponding getter methods). Now the user can set a default date resolution used for all fields or, with the second method, field specific date resolutions. The initial default resolution, which is used if the user does not set a different resolution, is DateTools.Resolution.DAY. Please let me know if you think we should use a different resolution as default. I extended TestQueryParser to test this new feature. All unit tests pass. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-722) DEFAULT spelled DEFALT in MoreLikeThis.java
[ http://issues.apache.org/jira/browse/LUCENE-722?page=all ] Daniel Naber resolved LUCENE-722. - Resolution: Fixed Okay, unless there's a third version of that file it's fixed now :-) DEFAULT spelled DEFALT in MoreLikeThis.java --- Key: LUCENE-722 URL: http://issues.apache.org/jira/browse/LUCENE-722 Project: Lucene - Java Issue Type: Bug Components: Search Affects Versions: 2.0.0 Environment: all Reporter: Andi Vajda Priority: Minor Fix For: 2.1 DEFAULT is spelled DEFALT in contrib/queries/src/java/org/apache/lucene/search/similar/MoreLikeThis.java -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-656) FieldsInfo uses deprecated API
[ http://issues.apache.org/jira/browse/LUCENE-656?page=all ] Daniel Naber closed LUCENE-656. --- Resolution: Fixed Thanks, patch is committed. FieldsInfo uses deprecated API -- Key: LUCENE-656 URL: http://issues.apache.org/jira/browse/LUCENE-656 Project: Lucene - Java Issue Type: Improvement Components: Index Affects Versions: 2.0.1 Reporter: Simon Willnauer Priority: Minor Attachments: FieldsInfo.diff The class FieldsInfo.java uses deprecated API in method public void add(Document doc) I rused the replacement and created the patch - see attachment -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-649) Fixed Spelling mailinglist.xml
[ http://issues.apache.org/jira/browse/LUCENE-649?page=all ] Daniel Naber closed LUCENE-649. --- Resolution: Fixed Thanks, committed. Fixed Spelling mailinglist.xml -- Key: LUCENE-649 URL: http://issues.apache.org/jira/browse/LUCENE-649 Project: Lucene - Java Issue Type: Improvement Components: Website Affects Versions: 2.0.1 Reporter: Simon Willnauer Priority: Trivial Attachments: mailinglist_xml.diff Just fixed some spelling in the mailinglist.xml in /java/trunk/xdocs -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources
[ http://issues.apache.org/jira/browse/LUCENE-388?page=comments#action_12427967 ] Daniel Naber commented on LUCENE-388: - Hi Yonik, I just tested the patch: sorry, but the problem is the same as before: I get an OutOfMemoryError using settings that without the patch. That doesn't mean that the patch is wrong of course, but as we're after performance improvements it wouldn't make sense to compare it to the old version which uses less memory. [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources Key: LUCENE-388 URL: http://issues.apache.org/jira/browse/LUCENE-388 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: CVS Nightly - Specify date in submission Environment: Operating System: Mac OS X 10.3 Platform: Macintosh Reporter: Paul Smith Assigned To: Yonik Seeley Attachments: IndexWriter.patch, log-compound.txt, log.optimized.deep.txt, log.optimized.txt, Lucene Performance Test - with without hack.xls, lucene.34930.patch, yonik_indexwriter.diff Note: I believe this to be the same situation with 1.4.3 as with SVN HEAD. Analysis using hprof utility shows that during index creation with many documents highlights that the CPU spends a large portion of it's time in IndexWriter.maybeMergeSegments(), which seems to be a 'waste' compared with other valuable CPU intensive operations such as tokenization etc. Using the following test snippet to retrieve some rows from the db and create an index: Analyzer a = new StandardAnalyzer(); writer = new IndexWriter(indexDir, a, true); writer.setMergeFactor(1000); writer.setMaxBufferedDocs(1); writer.setUseCompoundFile(false); connection = DriverManager.getConnection( jdbc:inetdae7:tower.aconex.com?database=somedb, secret, squirrel); String sql = select userid, userfirstname, userlastname, email from userx; LOG.info(sql= + sql); Statement statement = connection.createStatement(); statement.setFetchSize(5000); LOG.info(Executing sql); ResultSet rs = statement.executeQuery(sql); LOG.info(ResultSet retrieved); int row = 0; LOG.info(Indexing users); long begin = System.currentTimeMillis(); while (rs.next()) { int userid = rs.getInt(1); String firstname = rs.getString(2); String lastname = rs.getString(3); String email = rs.getString(4); String fullName = firstname + + lastname; Document doc = new Document(); doc.add(Field.Keyword(userid, userid+)); doc.add(Field.Keyword(firstname, firstname.toLowerCase())); doc.add(Field.Keyword(lastname, lastname.toLowerCase())); doc.add(Field.Text(name, fullName.toLowerCase())); doc.add(Field.Keyword(email, email.toLowerCase())); writer.addDocument(doc); row++; if((row % 100)==0){ LOG.info(row + indexed); } } double end = System.currentTimeMillis(); double diff = (end-begin)/1000; double rate = row/diff; LOG.info(rate: +rate); On my 1.5GHz PowerBook with 1.5Gb RAM and a 5400 RPM drive, my CPU is maxed out, and I end up getting a rate of indexing between 490-515 documents/second run over 10 times in succession. By applying a simple patch to IndexWriter (see attached shortly), which defers the calling of maybeMergeSegments() so that it is only called every 2000 times(an arbitrary figure), I appear to get a new rate of between 945-970 documents/second. Using Luke to look inside each index created between these 2 there does not appear to be any difference. Same number of Documents, same number of Terms. I'm not suggesting one should apply this patch, I'm just highlighting the difference in performance that this sort of change gives you. We are about to use Lucene to index 4 million construction document records, and so speeding up the indexing process is in our best interest! :) If one considers the amount of CPU time spent in maybeMergeSegments over the initial index creation of 4 million documents, I think one could see how it would be ideal to try to speed this area up (at least move the bottleneck to IO). I woul appreciate anyone taking a moment to comment on this. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Reopened: (LUCENE-388) [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources
[ http://issues.apache.org/jira/browse/LUCENE-388?page=all ] Daniel Naber reopened LUCENE-388: - Something is wrong with this patch (as it has been applied) as it increases memory usage. Indexing files with the IndexFiles demo worked before using writer.setMaxBufferedDocs(50) and a tight JVM memory setting (-Xmx2M), now it fails with an OutOfMemoryError. [PATCH] IndexWriter.maybeMergeSegments() takes lots of CPU resources Key: LUCENE-388 URL: http://issues.apache.org/jira/browse/LUCENE-388 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: CVS Nightly - Specify date in submission Environment: Operating System: Mac OS X 10.3 Platform: Macintosh Reporter: Paul Smith Attachments: IndexWriter.patch, log-compound.txt, log.optimized.deep.txt, log.optimized.txt, Lucene Performance Test - with without hack.xls, lucene.34930.patch Note: I believe this to be the same situation with 1.4.3 as with SVN HEAD. Analysis using hprof utility shows that during index creation with many documents highlights that the CPU spends a large portion of it's time in IndexWriter.maybeMergeSegments(), which seems to be a 'waste' compared with other valuable CPU intensive operations such as tokenization etc. Using the following test snippet to retrieve some rows from the db and create an index: Analyzer a = new StandardAnalyzer(); writer = new IndexWriter(indexDir, a, true); writer.setMergeFactor(1000); writer.setMaxBufferedDocs(1); writer.setUseCompoundFile(false); connection = DriverManager.getConnection( jdbc:inetdae7:tower.aconex.com?database=somedb, secret, squirrel); String sql = select userid, userfirstname, userlastname, email from userx; LOG.info(sql= + sql); Statement statement = connection.createStatement(); statement.setFetchSize(5000); LOG.info(Executing sql); ResultSet rs = statement.executeQuery(sql); LOG.info(ResultSet retrieved); int row = 0; LOG.info(Indexing users); long begin = System.currentTimeMillis(); while (rs.next()) { int userid = rs.getInt(1); String firstname = rs.getString(2); String lastname = rs.getString(3); String email = rs.getString(4); String fullName = firstname + + lastname; Document doc = new Document(); doc.add(Field.Keyword(userid, userid+)); doc.add(Field.Keyword(firstname, firstname.toLowerCase())); doc.add(Field.Keyword(lastname, lastname.toLowerCase())); doc.add(Field.Text(name, fullName.toLowerCase())); doc.add(Field.Keyword(email, email.toLowerCase())); writer.addDocument(doc); row++; if((row % 100)==0){ LOG.info(row + indexed); } } double end = System.currentTimeMillis(); double diff = (end-begin)/1000; double rate = row/diff; LOG.info(rate: +rate); On my 1.5GHz PowerBook with 1.5Gb RAM and a 5400 RPM drive, my CPU is maxed out, and I end up getting a rate of indexing between 490-515 documents/second run over 10 times in succession. By applying a simple patch to IndexWriter (see attached shortly), which defers the calling of maybeMergeSegments() so that it is only called every 2000 times(an arbitrary figure), I appear to get a new rate of between 945-970 documents/second. Using Luke to look inside each index created between these 2 there does not appear to be any difference. Same number of Documents, same number of Terms. I'm not suggesting one should apply this patch, I'm just highlighting the difference in performance that this sort of change gives you. We are about to use Lucene to index 4 million construction document records, and so speeding up the indexing process is in our best interest! :) If one considers the amount of CPU time spent in maybeMergeSegments over the initial index creation of 4 million documents, I think one could see how it would be ideal to try to speed this area up (at least move the bottleneck to IO). I woul appreciate anyone taking a moment to comment on this. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: svn commit: r428998 - in /lucene/java/trunk: CHANGES.txt src/java/org/apache/lucene/analysis/StopAnalyzer.java src/test/org/apache/lucene/analysis/TestStandardAnalyzer.java
On Samstag 05 August 2006 22:31, Yonik Seeley wrote: Stop words and stemming always make literal searching less precise, with the general benefit of greater matching power (more general) and smaller index size. That's why I gave the t-online example: it makes the search result look incorrect but hardly helps reduce index size. t and s were probably added so don't doesn't get indexed as don, t, but this doesn't happen anyway as the StandardTokenizer keeps don't as a single token. 's is cut off in StandardFilter. In general, this is only a default list and people will need to adapt it anyway. So we should only add the words which are probably stopwords for most users. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: StopAnalyzer in results.jsp?
On Donnerstag 03 August 2006 19:31, Michael McCandless wrote: But, in the process, I came across this inconsistency: for the Web application demo, the indexing (done by IndexHTML.java) uses the StandardAnalyzer but the searcher (in results.jsp) uses the StopAnalyzer. Shouldn't they be the same? Shouldn't we change results.jsp to use StandardAnalyzer? Yes, you're right. Thanks for spotting it and for your (upcoming) fixes. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-646) [PATCH] fix various small issues with the getting started demo pages
[ http://issues.apache.org/jira/browse/LUCENE-646?page=all ] Daniel Naber resolved LUCENE-646. - Resolution: Fixed Thanks, the patch has been committed and the changes should soon be visible on the web pages. [PATCH] fix various small issues with the getting started demo pages -- Key: LUCENE-646 URL: http://issues.apache.org/jira/browse/LUCENE-646 Project: Lucene - Java Issue Type: Improvement Components: Website Affects Versions: 2.0.0 Reporter: Michael McCandless Priority: Minor Attachments: gettingstarted.Aug3.patch This patch contains numerous small fixes for the getting started pages on the Lucene Java web site. Here are the rough fixes: * To results.jsp: - changed StopAnalyzer - StandardAnalyzer - changed references of url to path (field url is never set and was therefore always null) - remove prefix of ../webapps from path so clicking through works * Fixed typos, grammar and other cosmetic things. * Modernized some things that have changed with time (names of JAR files, which languages have analyzers, etc.) * Added outbound links to Javadocs, Wiki, Lucene static web site, external sites, when appropriate. * Removed exact version of Tomcat for the demo web app (I think all recent versions of Tomcat will work as described) * Other small changes... Net/net I think this is an improved version of what's available on the site today. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-641) maxFieldLength actual limit is 1 greater than expected value.
[ http://issues.apache.org/jira/browse/LUCENE-641?page=all ] Daniel Naber resolved LUCENE-641. - Resolution: Fixed Thanks for the report, this has now been fixed in SVN trunk. maxFieldLength actual limit is 1 greater than expected value. - Key: LUCENE-641 URL: http://issues.apache.org/jira/browse/LUCENE-641 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 2.0.0 Environment: JSE 5.0 Reporter: Topbit Du Priority: Minor // Prepare document. Document document = new Document(); document.add(new Field(name, pattern oriented software architecture, Store.NO, Index.TOKENIZED, TermVector.WITH_POSITIONS_OFFSETS)); // Set max field length to 2. indexWriter.setMaxFieldLength(2); // Add document into index. indexWriter.addDocument(document, new StandardAnalyzer()); // Create a query. QueryParser queryParser = new QueryParser(name, new StandardAnalyzer()); Query query = queryParser.parse(software); // Search the 3rd term. Hits hits = indexSearcher.search(query); Assert.assertEquals(0, hits.length()); // failed. Actual hits.length() == 1, but expect 0. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-603) index optimize problem
[ http://issues.apache.org/jira/browse/LUCENE-603?page=comments#action_12424388 ] Daniel Naber commented on LUCENE-603: - Is there any chance you could provide a test case that demonstrates this problem? index optimize problem -- Key: LUCENE-603 URL: http://issues.apache.org/jira/browse/LUCENE-603 Project: Lucene - Java Issue Type: Bug Components: Index Affects Versions: 1.9 Environment: CentOS 4.0 , Lucene 1.9, Eclipse 3.1 Reporter: Dedian Guo have a function whichi is loop to index batches of documents, after each indexing, the function IndexWriter.optimize will be applied. for several times (not sure how many, but should be many), following exception was thrown out. Exception in thread Thread-0 java.lang.IllegalStateException: docs out of order at org.apache.lucene.index.SegmentMerger.appendPostings(SegmentMerger.java:335) at org.apache.lucene.index.SegmentMerger.mergeTermInfo(SegmentMerger.java:298) at org.apache.lucene.index.SegmentMerger.mergeTermInfos(SegmentMerger.java:272) at org.apache.lucene.index.SegmentMerger.mergeTerms(SegmentMerger.java:236) at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:89) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:681) at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:658) at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:517) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-638) Can't put non-index files (e.g. CVS, SVN directories) in a Lucene index directory
[ http://issues.apache.org/jira/browse/LUCENE-638?page=all ] Daniel Naber closed LUCENE-638. --- Resolution: Fixed Thanks, this has now been fixed in trunk. Can't put non-index files (e.g. CVS, SVN directories) in a Lucene index directory - Key: LUCENE-638 URL: http://issues.apache.org/jira/browse/LUCENE-638 Project: Lucene - Java Issue Type: Bug Reporter: Eleanor Joslin Priority: Minor Attachments: LuceneTest.java Lucene won't tolerate foreign files in its index directories. This makes it impossible to keep an index in a CVS or Subversion repository. For instance, this exception appears when creating a RAMDirectory from a java.io.File that contains a subdirectory called .svn. java.io.FileNotFoundException: /home/local/ejj/ic/.caches/.search/.index/.svn (Is a directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.FSIndexInput$Descriptor.init(FSDirectory.java:425) at org.apache.lucene.store.FSIndexInput.init(FSDirectory.java:434) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324) at org.apache.lucene.store.RAMDirectory.init(RAMDirectory.java:61) at org.apache.lucene.store.RAMDirectory.init(RAMDirectory.java:86) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-638) Can't put non-index files (e.g. CVS, SVN directories) in a Lucene index directory
[ http://issues.apache.org/jira/browse/LUCENE-638?page=comments#action_12423893 ] Daniel Naber commented on LUCENE-638: - What exactly does your code look like? Something else must be wrong because I use an index that's committed to CVS without problems (using Lucene 2.0). Can't put non-index files (e.g. CVS, SVN directories) in a Lucene index directory - Key: LUCENE-638 URL: http://issues.apache.org/jira/browse/LUCENE-638 Project: Lucene - Java Issue Type: Bug Reporter: Eleanor Joslin Priority: Minor Lucene won't tolerate foreign files in its index directories. This makes it impossible to keep an index in a CVS or Subversion repository. For instance, this exception appears when creating a RAMDirectory from a java.io.File that contains a subdirectory called .svn. java.io.FileNotFoundException: /home/local/ejj/ic/.caches/.search/.index/.svn (Is a directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:212) at org.apache.lucene.store.FSIndexInput$Descriptor.init(FSDirectory.java:425) at org.apache.lucene.store.FSIndexInput.init(FSDirectory.java:434) at org.apache.lucene.store.FSDirectory.openInput(FSDirectory.java:324) at org.apache.lucene.store.RAMDirectory.init(RAMDirectory.java:61) at org.apache.lucene.store.RAMDirectory.init(RAMDirectory.java:86) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-634) QueryParser is not applicable for the arguments (String, String, Analyzer) error in results.jsp when executing search in the browser (demo from Lucene 2.0)
[ http://issues.apache.org/jira/browse/LUCENE-634?page=all ] Daniel Naber closed LUCENE-634. --- Fix Version/s: 2.0.1 Resolution: Fixed This has been fixed after 2.0. QueryParser is not applicable for the arguments (String, String, Analyzer) error in results.jsp when executing search in the browser (demo from Lucene 2.0) --- Key: LUCENE-634 URL: http://issues.apache.org/jira/browse/LUCENE-634 Project: Lucene - Java Issue Type: Bug Affects Versions: 2.0.0 Environment: Windows XP Tomcat 5.5 Reporter: Aliaksandr Birukou Fix For: 2.0.1 When executing search in the browser (as described in demo3.html Lucene demo) I get error, because the demo uses the method (QueryParser with three arguments) which is deleted (it was deprecated). I checked the demo from Lucene 1.4-final it with Lucene 1.4-final - it works, because those time the method was there. But demo from Lucene 2.0 does not work with Lucene 2.0 The error stack is here: TTP Status 500 - type Exception report message description The server encountered an internal error () that prevented it from fulfilling this request. exception org.apache.jasper.JasperException: Unable to compile class for JSP An error occurred at line: 60 in the jsp file: /results.jsp Generated servlet error: The method parse(String) in the type QueryParser is not applicable for the arguments (String, String, Analyzer) org.apache.jasper.servlet.JspServletWrapper.handleJspException(JspServletWrapper.java:510) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:375) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) root cause org.apache.jasper.JasperException: Unable to compile class for JSP An error occurred at line: 60 in the jsp file: /results.jsp Generated servlet error: The method parse(String) in the type QueryParser is not applicable for the arguments (String, String, Analyzer) org.apache.jasper.compiler.DefaultErrorHandler.javacError(DefaultErrorHandler.java:84) org.apache.jasper.compiler.ErrorDispatcher.javacError(ErrorDispatcher.java:328) org.apache.jasper.compiler.JDTCompiler.generateClass(JDTCompiler.java:409) org.apache.jasper.compiler.Compiler.compile(Compiler.java:297) org.apache.jasper.compiler.Compiler.compile(Compiler.java:276) org.apache.jasper.compiler.Compiler.compile(Compiler.java:264) org.apache.jasper.JspCompilationContext.compile(JspCompilationContext.java:563) org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:303) org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:314) org.apache.jasper.servlet.JspServlet.service(JspServlet.java:264) javax.servlet.http.HttpServlet.service(HttpServlet.java:802) note The full stack trace of the root cause is available in the Apache Tomcat/5.5.15 logs. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
ant javacc-QueryParser
Hi, as I cannot get ant javacc-QueryParser working I manually applied the changes from my latest commit to QueryParser.java. The change was very simple so I think this should be okay. Maybe someone can run ant javacc-QueryParser just to be sure. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: ant javacc-QueryParser
On Sonntag 23 Juli 2006 15:02, Simon Willnauer wrote: Did you set the property in your common-build.xml? Yes, but I always get Could not create task or type of type: javacc. I use a javacc that I downloaded and installed (i.e. unpacked) manually. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: svn commit: r424449 - /lucene/java/trunk/src/java/org/apache/lucene/document/DateTools.java
On Samstag 22 Juli 2006 07:58, Chris Hostetter wrote: however i'm not sure if the performance benefits of the static instances Daniel mentioned in his commit will exist in a multithreaded app (the synchronization costs may outway the instantiation costs) I created a micro benchmark with 2 to 4 threads and the new version was faster about a factor of at least 2. Regards Daniel -- http://www.danielnaber.de - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-630) results.jsp in luceneweb.war uses unknown parse-Method
[ http://issues.apache.org/jira/browse/LUCENE-630?page=all ] Daniel Naber closed LUCENE-630. --- Fix Version/s: 2.0.1 Resolution: Fixed This has been fixed some time ago (after the 2.0 release). results.jsp in luceneweb.war uses unknown parse-Method -- Key: LUCENE-630 URL: http://issues.apache.org/jira/browse/LUCENE-630 Project: Lucene - Java Issue Type: Bug Components: Examples Affects Versions: 2.0.0 Environment: Windows XP Pro and Linux (Ubuntu 6.06 TLS) Tomcat 5.5 Sun Java 1.5_07 Reporter: Philip Reimer Priority: Trivial Fix For: 2.0.1 results.jsp in luceneweb.war demo throws JasperException: org.apache.jasper.JasperException: Unable to compile class for JSP An error occurred at line: 60 in the jsp file: /results.jsp Generated servlet error: The method parse(String) in the type QueryParser is not applicable for the arguments (String, String, Analyzer) I think, the code in line 81 of results.jsp should maybe look like the following ? QueryParser qp = new QueryParser(contents, analyzer); query = qp.parse(queryString); -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-101) Selecting a language-specific analyzer according to a locale.
[ http://issues.apache.org/jira/browse/LUCENE-101?page=all ] Daniel Naber closed LUCENE-101. --- Resolution: Fixed Closing, the code changes the original report talks about don't seem to be needed anymore today. Selecting a language-specific analyzer according to a locale. - Key: LUCENE-101 URL: http://issues.apache.org/jira/browse/LUCENE-101 Project: Lucene - Java Issue Type: Improvement Components: Analysis Affects Versions: unspecified Environment: Operating System: other Platform: Other Reporter: Eric Isakson Priority: Minor Moved from todo.xml: Now we rewrite parts of Lucene code in order to use another analyzer. It will be useful to select analyzer without touching code. This was orginally request by Kazuhiro Kazama ([EMAIL PROTECTED]) in http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene- [EMAIL PROTECTED]msgId=338928 Not sure if this was completed to Kazuhiro Kazama's satisfaction in the current CVS. We can certainly choose which analyzer to use for a given IndexWriter and QueryParser it sounded like he was asking for something like a factory the would create an analyzer based on a locale but unless I don't understand things quite right, searching an index with any analyzer that you didn't create the index with is bound to cause you to have false hits in your results. Perhaps this is fixed or no action should be taken. Can someone with a better understanding of the request comment on this one or close it out? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-608) deprecate Document.fields(), add getFields()
[ http://issues.apache.org/jira/browse/LUCENE-608?page=all ] Daniel Naber resolved LUCENE-608: - Resolution: Fixed The patch has been committed. deprecate Document.fields(), add getFields() Key: LUCENE-608 URL: http://issues.apache.org/jira/browse/LUCENE-608 Project: Lucene - Java Type: Improvement Components: Other Versions: 2.0.0 Reporter: Daniel Naber Fix For: 2.1 Attachments: document.diff A simple API improvement that I'm going to commit if nobody objects. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Created: (LUCENE-608) deprecate Document.fields(), add getFields()
deprecate Document.fields(), add getFields() Key: LUCENE-608 URL: http://issues.apache.org/jira/browse/LUCENE-608 Project: Lucene - Java Type: Improvement Components: Other Versions: 2.0.0 Reporter: Daniel Naber Fix For: 2.1 Attachments: document.diff A simple API improvement that I'm going to commit if nobody objects. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-608) deprecate Document.fields(), add getFields()
[ http://issues.apache.org/jira/browse/LUCENE-608?page=all ] Daniel Naber updated LUCENE-608: Attachment: document.diff deprecate Document.fields(), add getFields() Key: LUCENE-608 URL: http://issues.apache.org/jira/browse/LUCENE-608 Project: Lucene - Java Type: Improvement Components: Other Versions: 2.0.0 Reporter: Daniel Naber Fix For: 2.1 Attachments: document.diff A simple API improvement that I'm going to commit if nobody objects. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-590) Demo HTML parser gives incorrect summaries when title is repeated as a heading
[ http://issues.apache.org/jira/browse/LUCENE-590?page=all ] Daniel Naber updated LUCENE-590: Description: If you have an html document where the title is repeated as a heading at the top of the document, the HTMLParser will return the title as the summary, ignoring everything else that was added to the summary. Instead, it should keep the rest of the summary and chop off the title part at the beginning (essentially the opposite). I don't see any benefit to repeating the title in the summary for any case. In HTMLParser.jj's getSummary(): String sum = summary.toString().trim(); String tit = getTitle(); if (sum.startsWith(tit) || sum.equals()) return tit; else return sum; change it to: (* denotes a line that has changed) String sum = summary.toString().trim(); String tit = getTitle(); *if (sum.startsWith(tit)) // don't repeat title in summary * return sum.substring(tit.length()).trim(); else return sum; was: If you have an html document where the title is repeated as a heading at the top of the document, the HTMLParser will return the title as the summary, ignoring everything else that was added to the summary. Instead, it should keep the rest of the summary and chop off the title part at the beginning (essentially the opposite). I don't see any benefit to repeating the title in the summary for any case. In HTMLParser.jj's getSummary(): String sum = summary.toString().trim(); String tit = getTitle(); if (sum.startsWith(tit) || sum.equals()) return tit; else return sum; change it to: (* denotes a line that has changed) String sum = summary.toString().trim(); String tit = getTitle(); *if (sum.startsWith(tit)) // don't repeat title in summary * return sum.substring(tit.length()).trim(); else return sum; Priority: Minor (was: Major) decrease priority (affects demo only) Demo HTML parser gives incorrect summaries when title is repeated as a heading -- Key: LUCENE-590 URL: http://issues.apache.org/jira/browse/LUCENE-590 Project: Lucene - Java Type: Bug Components: Examples Versions: 2.0.0 Reporter: Curtis d'Entremont Priority: Minor If you have an html document where the title is repeated as a heading at the top of the document, the HTMLParser will return the title as the summary, ignoring everything else that was added to the summary. Instead, it should keep the rest of the summary and chop off the title part at the beginning (essentially the opposite). I don't see any benefit to repeating the title in the summary for any case. In HTMLParser.jj's getSummary(): String sum = summary.toString().trim(); String tit = getTitle(); if (sum.startsWith(tit) || sum.equals()) return tit; else return sum; change it to: (* denotes a line that has changed) String sum = summary.toString().trim(); String tit = getTitle(); *if (sum.startsWith(tit)) // don't repeat title in summary * return sum.substring(tit.length()).trim(); else return sum; -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-525) A standard Lucene install that works for simple web sites
[ http://issues.apache.org/jira/browse/LUCENE-525?page=all ] Daniel Naber updated LUCENE-525: Priority: Minor (was: Major) decrease priority A standard Lucene install that works for simple web sites - Key: LUCENE-525 URL: http://issues.apache.org/jira/browse/LUCENE-525 Project: Lucene - Java Type: New Feature Environment: web site Reporter: Dave Yost Priority: Minor I'm new to Lucene. I would like to be able to download a blob, install it, set a few settings, preferably in a GUI, and be on the air with search enabled on my static web site. What I find on the Examples page is nothing like this. It is a collection of stuff that leads me to believe that I'll have to become expert in all sorts of Lucene arcana before I can get to my goal. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-562) Allow Unstored AND Unindexed Fields as in 1.4
[ http://issues.apache.org/jira/browse/LUCENE-562?page=comments#action_12416415 ] Daniel Naber commented on LUCENE-562: - I think this should be closed as won't fix. You could either write your wrapper class or just use an indexed or stored field that later gets removed. The stored/indexed value should only have an effect once the document is added to the index. Allow Unstored AND Unindexed Fields as in 1.4 - Key: LUCENE-562 URL: http://issues.apache.org/jira/browse/LUCENE-562 Project: Lucene - Java Type: Bug Versions: 1.9 Reporter: Sam Hough Priority: Minor In 1.4 it was possible to have a field that was not to be indexed or stored. This was useful in passing information that Lucene should ignore but that layers on top of it should pickup. This saves the need for an extra class to wrap a Lucene Document. Sorry it has taken me two years to spot the change: http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/src/java/org/apache/lucene/document/Field.java?rev=150206r1=149967r2=150206diff_format=h I have to admit that this really isn't a Lucene bug but the 1.4 behaviour was really handy like XML processing instructions. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-559) Turkish Analyzer for Lucene
[ http://issues.apache.org/jira/browse/LUCENE-559?page=comments#action_12416407 ] Daniel Naber commented on LUCENE-559: - Thanks for your contribution. Could you write some unit tests for your classes, similar to the existing tests for other languages? Turkish Analyzer for Lucene --- Key: LUCENE-559 URL: http://issues.apache.org/jira/browse/LUCENE-559 Project: Lucene - Java Type: Improvement Components: Analysis Reporter: Emre Bayram Attachments: TurkishAnalyzer.java, TurkishAnalyzer.java, TurkishStemFilter.java, TurkishStemFilter.java, TurkishStemmer.java, TurkishStemmer.java I have developed an Analyzer for Turkish, thanks to German Language Analyzer and Brazillian Language Analyzers. This Turkish Analyzer supports iso-8859-9 character set(Turkish) and have a nice stop words set. I hope it can help to Turkish developers who use lucene(i searched many hours for a turkish analyzer for lucene but couldnt find, so i coded and sending it here.) -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-101) Selecting a language-specific analyzer according to a locale.
[ http://issues.apache.org/jira/browse/LUCENE-101?page=comments#action_12416410 ] Daniel Naber commented on LUCENE-101: - The URL from the original report doesn't work anymore, I think it refers to this post: http://mail-archives.apache.org/mod_mbox/lucene-java-dev/200205.mbox/20020522.153124.14421363.kazama%40ingrid.org I guess this report can be closed? Selecting a language-specific analyzer according to a locale. - Key: LUCENE-101 URL: http://issues.apache.org/jira/browse/LUCENE-101 Project: Lucene - Java Type: Improvement Components: Analysis Versions: unspecified Environment: Operating System: other Platform: Other Reporter: Eric Isakson Priority: Minor Moved from todo.xml: Now we rewrite parts of Lucene code in order to use another analyzer. It will be useful to select analyzer without touching code. This was orginally request by Kazuhiro Kazama ([EMAIL PROTECTED]) in http://nagoya.apache.org/eyebrowse/ReadMsg?listName=lucene- [EMAIL PROTECTED]msgId=338928 Not sure if this was completed to Kazuhiro Kazama's satisfaction in the current CVS. We can certainly choose which analyzer to use for a given IndexWriter and QueryParser it sounded like he was asking for something like a factory the would create an analyzer based on a locale but unless I don't understand things quite right, searching an index with any analyzer that you didn't create the index with is bound to cause you to have false hits in your results. Perhaps this is fixed or no action should be taken. Can someone with a better understanding of the request comment on this one or close it out? -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-259) HTML Parser doesn't decode character references in attributes
[ http://issues.apache.org/jira/browse/LUCENE-259?page=all ] Daniel Naber updated LUCENE-259: Bugzilla Id: (was: 30621) Assign To: (was: Lucene Developers) Priority: Minor (was: Major) Decrease priority because this affects the demo only. HTML Parser doesn't decode character references in attributes - Key: LUCENE-259 URL: http://issues.apache.org/jira/browse/LUCENE-259 Project: Lucene - Java Type: Bug Components: Examples Versions: 1.4 Environment: Operating System: All Platform: All Reporter: Dave Sparks Priority: Minor The HTML Parser includes the values of certain attributes in the summary, the metaTags and the output stream. Character references in the attribute values are not decoded. Specifically: 1. The value of the alt= attribute of an img ... tag is included in the summary and the output stream. This value is case-significant, and may include character references. The character references are not decoded. 2. The value of the content= attribute of a meta ... tag is included in the metaTags if the tag also has a name= or http-equiv= attribute. This value is case-significant, and may include character references. The character references are not decoded, and the value is downcased (since the fix to bug #27423). I've patched our version of the parser to decode the character references, by adding a decodeAll method to Entities to parse a String for character references and return a String where the references have been replaced by the corresponding characters (or the original String, if no change is needed). This method is called to decode alt= attributes and content= attributes. I've removed the .toLowerCase() on the content= value. I'm not really happy with this fix, as it seems to me to be wrong to parse a value which was previously parsed as a single token; there ought to be a way to get it right the first time. I've left the name= and http-equiv= values alone. It's not entirely clear (to me) whether character references are allowed, and it would be perverse to use them here. I also appreciate the convenience of having a single combined namespace, with downcased names, even though this is technically wrong. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Closed: (LUCENE-587) Explanation.toHtml outputs invalid HTML
[ http://issues.apache.org/jira/browse/LUCENE-587?page=all ] Daniel Naber closed LUCENE-587: --- Resolution: Fixed Sorry, I must have looked at the wrong output. You're right, it seems to be okay now. Explanation.toHtml outputs invalid HTML --- Key: LUCENE-587 URL: http://issues.apache.org/jira/browse/LUCENE-587 Project: Lucene - Java Type: Bug Components: Search Versions: 2.0.0 Reporter: Trejkaz Assignee: Hoss Man If you want an HTML representation of an Explanation, you might call the toHtml() method. However, the output of this method looks like the following: ul lisome value = some description/li ul lisome nested value = some description/li /ul /ul As it is illegal in HTML to nest a UL directly inside a UL, this method will always output unparseable HTML if there are nested explanations. What Lucene probably means to output is the following, which is valid HTML: ul lisome value = some description ul lisome nested value = some description/li /ul /li /ul -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]