Re: Summer of Code idea for lucene
we have almost implemented BM25 using lucene structure, but we need help to finish query parser and other details. If you o somebody want We can send you the code and you can help us to implement the query parser and prepare the code to sandbox. If there are people interested I can made a web page for the project and put our implementatio to download Somebody is interested? jose -- José Ramón Pérez Agüera Dept. de Ingeniería del Software e Inteligencia Artificial Despacho 411 tlf. 913947599 Facultad de Informática Universidad Complutense de Madrid On Sat, Mar 15, 2008 at 5:32 AM, Ian Holsman <[EMAIL PROTECTED]> wrote: > If no one objects (I don't think it's too late) > > would you mind a GSOC project to implement BM25 relevancy/scoring? > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: [jira] Assigned: (LUCENE-1202) Clover setup currently has some problems
OK. I am trying it out. Please disregard any error messages in the meantime. -Grant On Mar 14, 2008, at 11:20 PM, Hoss Man (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hoss Man reassigned LUCENE-1202: Assignee: Grant Ingersoll I was hoping seeing it again would jog your memory : ) i committed the changes to the build files, if the hudson problem was related to the classpath for clover this may magically solve that problem -- if not, just makesure whatever directory clover is in gets added to the CLASSPATH before running ant. Committed revision 637344. assigning to you to track the hudson config fiddling Clover setup currently has some problems Key: LUCENE-1202 URL: https://issues.apache.org/jira/browse/LUCENE-1202 Project: Lucene - Java Issue Type: Bug Reporter: Hoss Man Assignee: Grant Ingersoll Attachments: LUCENE-1202.db-contrib-instrumentation.patch, LUCENE-1202.patch (tracking as a bug before it get lost in email... http://www.nabble.com/Clover-reports-missing-from-hudson--to15510616.html#a15510616 ) The clover setup for Lucene currently has some problems, 3 i think... 1) instrumentation fails on contrib/db/ because it contains java packages the ASF Clover lscence doesn't allow instrumentation of. i have a patch for this. 2) running instrumented contrib tests for other contribs produce strange errors... {{monospaced}} [junit] Testsuite: org.apache.lucene.analysis.el.GreekAnalyzerTest [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0.126 sec [junit] [junit] - Standard Error - [junit] [CLOVER] FATAL ERROR: Clover could not be initialised. Are you sure you have Clover in the runtime classpath? (class java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo) [junit] - --- [junit] Testcase: testAnalyzer(org.apache.lucene.analysis.el.GreekAnalyzerTest): Caused an ERROR [junit] com_cenqua_clover/g [junit] java.lang.NoClassDefFoundError: com_cenqua_clover/g [junit] at org .apache.lucene.analysis.el.GreekAnalyzer.(GreekAnalyzer.java: 157) [junit] at org .apache .lucene .analysis.el.GreekAnalyzerTest.testAnalyzer(GreekAnalyzerTest.java: 60) [junit] [junit] [junit] Test org.apache.lucene.analysis.el.GreekAnalyzerTest FAILED {{monospaced}} ...i'm not sure what's going on here. the error seems to happen both when trying to run clover on just a single contrib, or when doing the full build ... i suspect there is an issue with the way the batchtests fork off, but I can't see why it would only happen to contribs (the regular tests fork as well) 3) according to Grant... {{quote}} ...There is also a bit of a change on Hudson during the migration to the new servers that needs to be ironed out. {{quote}} -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. -- Grant Ingersoll http://www.lucenebootcamp.com Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Build failed in Hudson: Lucene-trunk #404
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/404/changes -- started Building remotely on lucene.zones.apache.org FATAL: remote file operation failed hudson.util.IOException2: remote file operation failed at hudson.FilePath.act(FilePath.java:304) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:346) at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:299) at hudson.model.AbstractProject.checkout(AbstractProject.java:564) at hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:258) at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:221) at hudson.model.Run.run(Run.java:659) at hudson.model.Build.run(Build.java:101) at hudson.model.ResourceController.execute(ResourceController.java:70) at hudson.model.Executor.run(Executor.java:71) Caused by: java.io.IOException: Unable to delete http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/docs/api/org/apache/lucene at hudson.Util.deleteFile(Util.java:171) at hudson.Util.deleteRecursive(Util.java:178) at hudson.Util.deleteContentsRecursive(Util.java:142) at hudson.Util.deleteRecursive(Util.java:177) at hudson.Util.deleteContentsRecursive(Util.java:142) at hudson.Util.deleteRecursive(Util.java:177) at hudson.Util.deleteContentsRecursive(Util.java:142) at hudson.Util.deleteRecursive(Util.java:177) at hudson.Util.deleteContentsRecursive(Util.java:142) at hudson.Util.deleteRecursive(Util.java:177) at hudson.Util.deleteContentsRecursive(Util.java:142) at hudson.Util.deleteRecursive(Util.java:177) at hudson.Util.deleteContentsRecursive(Util.java:142) at hudson.Util.deleteRecursive(Util.java:177) at hudson.Util.deleteContentsRecursive(Util.java:142) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:392) at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:352) at hudson.FilePath$FileCallableWrapper.call(FilePath.java:1091) at hudson.remoting.UserRequest.perform(UserRequest.java:69) at hudson.remoting.UserRequest.perform(UserRequest.java:23) at hudson.remoting.Request$2.run(Request.java:200) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907) at java.lang.Thread.run(Thread.java:619) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1236) EdgeNGram* documentation improvement
[ https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll updated LUCENE-1236: Comment: was deleted > EdgeNGram* documentation improvement > > > Key: LUCENE-1236 > URL: https://issues.apache.org/jira/browse/LUCENE-1236 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* >Reporter: Hiroaki Kawai >Priority: Trivial > Attachments: EdgeNGram.patch > > > To clarify what "edge" means, I added some description. That edge means the > beggining edge of a term or ending edge of a term. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1236) EdgeNGram* documentation improvement
[ https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579065#action_12579065 ] Grant Ingersoll commented on LUCENE-1236: - Hi Hiroaki, Thanks for the patch. I will apply the doc changes, but please don't combine other functionality into a patch (my guess is you still had some of the other NGram patches applied) Thanks, Grant > EdgeNGram* documentation improvement > > > Key: LUCENE-1236 > URL: https://issues.apache.org/jira/browse/LUCENE-1236 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* >Reporter: Hiroaki Kawai >Priority: Trivial > Attachments: EdgeNGram.patch > > > To clarify what "edge" means, I added some description. That edge means the > beggining edge of a term or ending edge of a term. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Assigned: (LUCENE-1236) EdgeNGram* documentation improvement
[ https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll reassigned LUCENE-1236: --- Assignee: Grant Ingersoll > EdgeNGram* documentation improvement > > > Key: LUCENE-1236 > URL: https://issues.apache.org/jira/browse/LUCENE-1236 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* >Reporter: Hiroaki Kawai >Assignee: Grant Ingersoll >Priority: Trivial > Attachments: EdgeNGram.patch > > > To clarify what "edge" means, I added some description. That edge means the > beggining edge of a term or ending edge of a term. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Resolved: (LUCENE-1236) EdgeNGram* documentation improvement
[ https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Grant Ingersoll resolved LUCENE-1236. - Resolution: Fixed Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) committed, minus the >1024 clause. Also removed existing author tags. > EdgeNGram* documentation improvement > > > Key: LUCENE-1236 > URL: https://issues.apache.org/jira/browse/LUCENE-1236 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* >Reporter: Hiroaki Kawai >Assignee: Grant Ingersoll >Priority: Trivial > Attachments: EdgeNGram.patch > > > To clarify what "edge" means, I added some description. That edge means the > beggining edge of a term or ending edge of a term. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579069#action_12579069 ] Grant Ingersoll commented on LUCENE-1224: - Please add unit tests to the patch demonstrating the issue. > NGramTokenFilter creates bad TokenStream > > > Key: LUCENE-1224 > URL: https://issues.apache.org/jira/browse/LUCENE-1224 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* >Reporter: Hiroaki Kawai >Assignee: Grant Ingersoll >Priority: Critical > Attachments: NGramTokenFilter.patch, NGramTokenFilter.patch > > > With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string > into an index, but I can't query it with "abc". If I query with "ab", I can > get a hit result. > The reason is that the NGramTokenFilter generates badly ordered TokenStream. > Query is based on the Token order in the TokenStream, that how stemming or > phrase should be anlayzed is based on the order (Token.positionIncrement). > With current filter, query string "abc" is tokenized to : ab bc abc > meaning "query a string that has ab bc abc in this order". > Expected filter will generate : ab abc(positionIncrement=0) bc > meaning "query a string that has (ab|abc) bc in this order" > I'd like to submit a patch for this issue. :-) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1227) NGramTokenizer to handle more than 1024 chars
[ https://issues.apache.org/jira/browse/LUCENE-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579068#action_12579068 ] Grant Ingersoll commented on LUCENE-1227: - Hi Hiroaki, Thanks for the patch. Can you add unit tests for your patch? > NGramTokenizer to handle more than 1024 chars > - > > Key: LUCENE-1227 > URL: https://issues.apache.org/jira/browse/LUCENE-1227 > Project: Lucene - Java > Issue Type: Improvement > Components: contrib/* >Reporter: Hiroaki Kawai >Assignee: Grant Ingersoll >Priority: Minor > Attachments: NGramTokenizer.patch, NGramTokenizer.patch > > > Current NGramTokenizer can't handle character stream that is longer than > 1024. This is too short for non-whitespace-separated languages. > I created a patch for this issues. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-1225) NGramTokenizer creates bad TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579070#action_12579070 ] Grant Ingersoll commented on LUCENE-1225: - Please add unit tests. Also, while not required, you do have several patches w/ the same name. I find it useful to name my patches after the JIRA issue, something like LUCENE-1225.patch. Thanks! > NGramTokenizer creates bad TokenStream > -- > > Key: LUCENE-1225 > URL: https://issues.apache.org/jira/browse/LUCENE-1225 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* >Reporter: Hiroaki Kawai >Priority: Critical > Attachments: NGramTokenizer.patch > > > The issue is much the same with > https://issues.apache.org/jira/browse/LUCENE-1224 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Hudson build is back to normal: Lucene-trunk #405
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/405/changes - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-1225) NGramTokenizer creates bad TokenStream
[ https://issues.apache.org/jira/browse/LUCENE-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hiroaki Kawai updated LUCENE-1225: -- Attachment: LUCENE-1225.patch Modified unit tests to do in more appropriate way and add a test that index and query. I had to fix my patch again which is included in LUCENE-1225.patch. :-p Thank you. > NGramTokenizer creates bad TokenStream > -- > > Key: LUCENE-1225 > URL: https://issues.apache.org/jira/browse/LUCENE-1225 > Project: Lucene - Java > Issue Type: Bug > Components: contrib/* >Reporter: Hiroaki Kawai >Priority: Critical > Attachments: LUCENE-1225.patch, NGramTokenizer.patch > > > The issue is much the same with > https://issues.apache.org/jira/browse/LUCENE-1224 -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]