Re: Summer of Code idea for lucene

2008-03-15 Thread José Ramón Pérez Agüera
we have almost implemented BM25 using lucene structure, but we need
help to finish query parser and other details. If you o somebody want
We can send you the code and you can help us to implement the query
parser and prepare the code to sandbox.

If there are people interested I can made a web page for the project
and put our implementatio to download

Somebody is interested?

jose

-- 
José Ramón Pérez Agüera

Dept. de Ingeniería del Software e Inteligencia Artificial
Despacho 411 tlf. 913947599
Facultad de Informática
Universidad Complutense de Madrid

On Sat, Mar 15, 2008 at 5:32 AM, Ian Holsman <[EMAIL PROTECTED]> wrote:
> If no one objects (I don't think it's too late)
>
>  would you mind a GSOC project to implement BM25 relevancy/scoring?
>
>
>  -
>  To unsubscribe, e-mail: [EMAIL PROTECTED]
>  For additional commands, e-mail: [EMAIL PROTECTED]
>
>

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: [jira] Assigned: (LUCENE-1202) Clover setup currently has some problems

2008-03-15 Thread Grant Ingersoll
OK.  I am trying it out.  Please disregard any error messages in the  
meantime.


-Grant

On Mar 14, 2008, at 11:20 PM, Hoss Man (JIRA) wrote:



[ https://issues.apache.org/jira/browse/LUCENE-1202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel 
 ]


Hoss Man reassigned LUCENE-1202:


   Assignee: Grant Ingersoll


I was hoping seeing it again would jog your memory  : )

i committed the changes to the build files, if the hudson problem  
was related to the classpath for clover this may magically solve  
that problem -- if not, just makesure whatever directory clover is  
in gets added to the CLASSPATH before running ant.


Committed revision 637344.

assigning to you to track the hudson config fiddling



Clover setup currently has some problems


   Key: LUCENE-1202
   URL: https://issues.apache.org/jira/browse/LUCENE-1202
   Project: Lucene - Java
Issue Type: Bug
  Reporter: Hoss Man
  Assignee: Grant Ingersoll
   Attachments: LUCENE-1202.db-contrib-instrumentation.patch,  
LUCENE-1202.patch



(tracking as a bug before it get lost in email...
 
http://www.nabble.com/Clover-reports-missing-from-hudson--to15510616.html#a15510616
)
The clover setup for Lucene currently has some problems, 3 i think...
1) instrumentation fails on contrib/db/ because it contains java  
packages the ASF Clover lscence doesn't allow instrumentation of.   
i have a patch for this.
2) running instrumented contrib tests for other contribs produce  
strange errors...

{{monospaced}}
   [junit] Testsuite: org.apache.lucene.analysis.el.GreekAnalyzerTest
   [junit] Tests run: 1, Failures: 0, Errors: 1, Time elapsed:  
0.126 sec

   [junit]
   [junit] - Standard Error -
   [junit] [CLOVER] FATAL ERROR: Clover could not be initialised.  
Are you sure you have Clover

in the runtime classpath? (class
java.lang.NoClassDefFoundError:com_cenqua_clover/CloverVersionInfo)
   [junit] -  ---
   [junit] Testcase:  
testAnalyzer(org.apache.lucene.analysis.el.GreekAnalyzerTest): 
Caused

an ERROR
   [junit] com_cenqua_clover/g
   [junit] java.lang.NoClassDefFoundError: com_cenqua_clover/g
   [junit] at  
org 
.apache.lucene.analysis.el.GreekAnalyzer.(GreekAnalyzer.java: 
157)

   [junit] at
org 
.apache 
.lucene 
.analysis.el.GreekAnalyzerTest.testAnalyzer(GreekAnalyzerTest.java: 
60)

   [junit]
   [junit]
   [junit] Test org.apache.lucene.analysis.el.GreekAnalyzerTest  
FAILED

{{monospaced}}
...i'm not sure what's going on here.  the error seems to happen  
both when

trying to run clover on just a single contrib, or when doing the full
build ... i suspect there is an issue with the way the batchtests  
fork
off, but I can't see why it would only happen to contribs (the  
regular

tests fork as well)
3) according to Grant...
{{quote}}
...There is also a bit of a change on Hudson during the migration  
to the new servers that needs to be ironed  out.

{{quote}}


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



--
Grant Ingersoll
http://www.lucenebootcamp.com
Next Training: April 7, 2008 at ApacheCon Europe in Amsterdam

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ






-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Build failed in Hudson: Lucene-trunk #404

2008-03-15 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/404/changes

--
started
Building remotely on lucene.zones.apache.org
FATAL: remote file operation failed
hudson.util.IOException2: remote file operation failed
at hudson.FilePath.act(FilePath.java:304)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:346)
at hudson.scm.SubversionSCM.checkout(SubversionSCM.java:299)
at hudson.model.AbstractProject.checkout(AbstractProject.java:564)
at 
hudson.model.AbstractBuild$AbstractRunner.checkout(AbstractBuild.java:258)
at hudson.model.AbstractBuild$AbstractRunner.run(AbstractBuild.java:221)
at hudson.model.Run.run(Run.java:659)
at hudson.model.Build.run(Build.java:101)
at hudson.model.ResourceController.execute(ResourceController.java:70)
at hudson.model.Executor.run(Executor.java:71)
Caused by: java.io.IOException: Unable to delete 
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/ws/trunk/build/docs/api/org/apache/lucene
 
at hudson.Util.deleteFile(Util.java:171)
at hudson.Util.deleteRecursive(Util.java:178)
at hudson.Util.deleteContentsRecursive(Util.java:142)
at hudson.Util.deleteRecursive(Util.java:177)
at hudson.Util.deleteContentsRecursive(Util.java:142)
at hudson.Util.deleteRecursive(Util.java:177)
at hudson.Util.deleteContentsRecursive(Util.java:142)
at hudson.Util.deleteRecursive(Util.java:177)
at hudson.Util.deleteContentsRecursive(Util.java:142)
at hudson.Util.deleteRecursive(Util.java:177)
at hudson.Util.deleteContentsRecursive(Util.java:142)
at hudson.Util.deleteRecursive(Util.java:177)
at hudson.Util.deleteContentsRecursive(Util.java:142)
at hudson.Util.deleteRecursive(Util.java:177)
at hudson.Util.deleteContentsRecursive(Util.java:142)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:392)
at hudson.scm.SubversionSCM$CheckOutTask.invoke(SubversionSCM.java:352)
at hudson.FilePath$FileCallableWrapper.call(FilePath.java:1091)
at hudson.remoting.UserRequest.perform(UserRequest.java:69)
at hudson.remoting.UserRequest.perform(UserRequest.java:23)
at hudson.remoting.Request$2.run(Request.java:200)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
at java.util.concurrent.FutureTask.run(FutureTask.java:138)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:885)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:907)
at java.lang.Thread.run(Thread.java:619)


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1236) EdgeNGram* documentation improvement

2008-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll updated LUCENE-1236:


Comment: was deleted

> EdgeNGram* documentation improvement
> 
>
> Key: LUCENE-1236
> URL: https://issues.apache.org/jira/browse/LUCENE-1236
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Priority: Trivial
> Attachments: EdgeNGram.patch
>
>
> To clarify what "edge" means, I added some description. That edge means the 
> beggining edge of a term or ending edge of a term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1236) EdgeNGram* documentation improvement

2008-03-15 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579065#action_12579065
 ] 

Grant Ingersoll commented on LUCENE-1236:
-

Hi Hiroaki,

Thanks for the patch.  I will apply the doc changes, but please don't combine 
other functionality into a patch (my guess is you still had some of the other 
NGram patches applied)

Thanks,
Grant

> EdgeNGram* documentation improvement
> 
>
> Key: LUCENE-1236
> URL: https://issues.apache.org/jira/browse/LUCENE-1236
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Priority: Trivial
> Attachments: EdgeNGram.patch
>
>
> To clarify what "edge" means, I added some description. That edge means the 
> beggining edge of a term or ending edge of a term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Assigned: (LUCENE-1236) EdgeNGram* documentation improvement

2008-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll reassigned LUCENE-1236:
---

Assignee: Grant Ingersoll

> EdgeNGram* documentation improvement
> 
>
> Key: LUCENE-1236
> URL: https://issues.apache.org/jira/browse/LUCENE-1236
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Assignee: Grant Ingersoll
>Priority: Trivial
> Attachments: EdgeNGram.patch
>
>
> To clarify what "edge" means, I added some description. That edge means the 
> beggining edge of a term or ending edge of a term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Resolved: (LUCENE-1236) EdgeNGram* documentation improvement

2008-03-15 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved LUCENE-1236.
-

   Resolution: Fixed
Lucene Fields: [New, Patch Available]  (was: [Patch Available, New])

committed, minus the >1024 clause.  Also removed existing author tags.

> EdgeNGram* documentation improvement
> 
>
> Key: LUCENE-1236
> URL: https://issues.apache.org/jira/browse/LUCENE-1236
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Assignee: Grant Ingersoll
>Priority: Trivial
> Attachments: EdgeNGram.patch
>
>
> To clarify what "edge" means, I added some description. That edge means the 
> beggining edge of a term or ending edge of a term.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1224) NGramTokenFilter creates bad TokenStream

2008-03-15 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579069#action_12579069
 ] 

Grant Ingersoll commented on LUCENE-1224:
-

Please add unit tests to the patch demonstrating the issue.

> NGramTokenFilter creates bad TokenStream
> 
>
> Key: LUCENE-1224
> URL: https://issues.apache.org/jira/browse/LUCENE-1224
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Assignee: Grant Ingersoll
>Priority: Critical
> Attachments: NGramTokenFilter.patch, NGramTokenFilter.patch
>
>
> With current trunk NGramTokenFilter(min=2,max=4) , I index "abcdef" string 
> into an index, but I can't query it with "abc". If I query with "ab", I can 
> get a hit result.
> The reason is that the NGramTokenFilter generates badly ordered TokenStream. 
> Query is based on the Token order in the TokenStream, that how stemming or 
> phrase should be anlayzed is based on the order (Token.positionIncrement).
> With current filter, query string "abc" is tokenized to : ab bc abc 
> meaning "query a string that has ab bc abc in this order".
> Expected filter will generate : ab abc(positionIncrement=0) bc
> meaning "query a string that has (ab|abc) bc in this order"
> I'd like to submit a patch for this issue. :-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1227) NGramTokenizer to handle more than 1024 chars

2008-03-15 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579068#action_12579068
 ] 

Grant Ingersoll commented on LUCENE-1227:
-

Hi Hiroaki,

Thanks for the patch.  Can you add unit tests for your patch?

> NGramTokenizer to handle more than 1024 chars
> -
>
> Key: LUCENE-1227
> URL: https://issues.apache.org/jira/browse/LUCENE-1227
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Assignee: Grant Ingersoll
>Priority: Minor
> Attachments: NGramTokenizer.patch, NGramTokenizer.patch
>
>
> Current NGramTokenizer can't handle character stream that is longer than 
> 1024. This is too short for non-whitespace-separated languages.
> I created a patch for this issues.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-1225) NGramTokenizer creates bad TokenStream

2008-03-15 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12579070#action_12579070
 ] 

Grant Ingersoll commented on LUCENE-1225:
-

Please add unit tests.  Also, while not required, you do have several patches 
w/ the same name.  I find it useful to name my patches after the JIRA issue, 
something like LUCENE-1225.patch.

Thanks!

> NGramTokenizer creates bad TokenStream
> --
>
> Key: LUCENE-1225
> URL: https://issues.apache.org/jira/browse/LUCENE-1225
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Priority: Critical
> Attachments: NGramTokenizer.patch
>
>
> The issue is much the same with 
> https://issues.apache.org/jira/browse/LUCENE-1224

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Hudson build is back to normal: Lucene-trunk #405

2008-03-15 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Lucene-trunk/405/changes



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-1225) NGramTokenizer creates bad TokenStream

2008-03-15 Thread Hiroaki Kawai (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hiroaki Kawai updated LUCENE-1225:
--

Attachment: LUCENE-1225.patch

Modified unit tests to do in more appropriate way and add a test that index and 
query.

I had to fix my patch again which is included in LUCENE-1225.patch. :-p
Thank you.

> NGramTokenizer creates bad TokenStream
> --
>
> Key: LUCENE-1225
> URL: https://issues.apache.org/jira/browse/LUCENE-1225
> Project: Lucene - Java
>  Issue Type: Bug
>  Components: contrib/*
>Reporter: Hiroaki Kawai
>Priority: Critical
> Attachments: LUCENE-1225.patch, NGramTokenizer.patch
>
>
> The issue is much the same with 
> https://issues.apache.org/jira/browse/LUCENE-1224

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]