Re: [jira] Closed: (LUCENE-708) Setup nightly build website links and docs

2007-01-04 Thread Slava Imeshev
Grant,

Parabuild is currently running integration build and a daily build at 

 http://parabuild.viewtier.com:8080/parabuild/index.htm?displaygroupid=5

Viewtier intends to keep supporting Lucene.

Regards,

Slava Imeshev

--- "Grant Ingersoll (JIRA)" <[EMAIL PROTECTED]> wrote:

> 
>  [
>
https://issues.apache.org/jira/browse/LUCENE-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
> ]
> 
> Grant Ingersoll closed LUCENE-708.
> --
> 
> 
> > Setup nightly build website links and docs
> > --
> >
> > Key: LUCENE-708
> > URL: https://issues.apache.org/jira/browse/LUCENE-708
> > Project: Lucene - Java
> >  Issue Type: Improvement
> >  Components: Website
> >Reporter: Grant Ingersoll
> > Assigned To: Grant Ingersoll
> >Priority: Minor
> >
> > Per discussion on mailing list, we are going to setup a Nightly Build link 
> > on the website
> linking to the docs (and javadocs) generated by the nightly build process.  
> The build process
> may need to be modified to complete this task.
> > Going forward, the main website will, for the most part, only be updated 
> > per releases (I
> imagine exceptions will be made for News items and per committer's 
> discretion).  The Javadocs
> linked to from the main website will always be for the latest release.
> 
> -- 
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the administrators:
> https://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see: http://www.atlassian.com/software/jira
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
> 




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-765) Index package level javadocs needs content

2007-01-04 Thread Daniel Naber (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462203
 ] 

Daniel Naber commented on LUCENE-765:
-

Some of this is already here:
http://lucene.apache.org/java/docs/api/overview-summary.html#overview_description

> Index package level javadocs needs content
> --
>
> Key: LUCENE-765
> URL: https://issues.apache.org/jira/browse/LUCENE-765
> Project: Lucene - Java
>  Issue Type: Task
>  Components: Javadocs
>Reporter: Grant Ingersoll
>Priority: Minor
>
> The org.apache.lucene.index package level javadocs are sorely lacking.  They 
> should be updated to give a summary of the important classes, how indexing 
> works, etc.  Maybe give an overview of how the different writers coordinate.  
> Links to file formats, information on the posting algorithm, etc. would be 
> helpful.
> See the search package javadocs as a sample of the kind of info that could go 
> here.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes

2007-01-04 Thread Marvin Humphrey (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462285
 ] 

Marvin Humphrey commented on LUCENE-510:


Grant... At the moment I am completely consumed by the task of getting a devel 
release of KinoSearch version 0.20 out the door.  Once that is taken care of, I 
will be glad to update this patch, and to explore how to compensate for the 
performance hit it causes.

Chuck... If bytecount-based strings are adopted, standard UTF-8 probably comes 
along for the ride.  There's actually a 1-2% performance gain to be had using 
standard over modified because of simplified conditionals.  What holds us back 
is backwards compatibility -- but we'll have wrecked backwards compat with the 
bytecounts.  However, I no longer have a strong objection to using Modified 
UTF-8 (for Lucene, that is -- Modified UTF-8 would be a deal-breaker for Lucy), 
so if somewhere along the way we find a compelling reason to stick with 
modified UTF-8, so be it.

If bytecount-based strings get adopted, it will be because they hold up on 
their own merits.  They're required for KinoSearch merge model; once KS 0.20 is 
out, I'll port the new benchmarking stuff, we can study the numbers, and assess 
whether the significant effort needed to pry that algo into Lucene would be 
worthwhile.

Yonik... yes, I agree.  Even better for indexing time, leave postings in 
serialized form for the entire indexing session.  :)

> IndexOutput.writeString() should write length in bytes
> --
>
> Key: LUCENE-510
> URL: https://issues.apache.org/jira/browse/LUCENE-510
> Project: Lucene - Java
>  Issue Type: Improvement
>  Components: Store
>Affects Versions: 2.1
>Reporter: Doug Cutting
> Assigned To: Grant Ingersoll
> Fix For: 2.1
>
> Attachments: SortExternal.java, strings.diff, TestSortExternal.java
>
>
> We should change the format of strings written to indexes so that the length 
> of the string is in bytes, not Java characters.  This issue has been 
> discussed at:
> http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html
> We must increment the file format number to indicate this change.  At least 
> the format number in the segments file should change.
> I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until 
> after 2.0 is released, to minimize incompatible changes between 1.9 and 2.0 
> (other than removal of deprecated features).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2007-01-04 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462287
 ] 

Doron Cohen commented on LUCENE-675:


Grant, thanks for trying this out - I will update the patch shortly. 
I am using this for benchmarking - quite easy to add new stuff - and in fact I 
added some stuff lately but did not update here because wasn't sure if others 
are interested. 
I will verify what I have with svn head and pack it here as an updated patch.
Regards,
Doron

> Lucene benchmark: objective performance test for Lucene
> ---
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
> Assigned To: Grant Ingersoll
>Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, 
> BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java, 
> LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Updated: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2007-01-04 Thread Doron Cohen (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doron Cohen updated LUCENE-675:
---

Attachment: byTask.2.patch.txt

> Lucene benchmark: objective performance test for Lucene
> ---
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
> Assigned To: Grant Ingersoll
>Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, 
> BenchmarkingIndexer.pm, byTask.2.patch.txt, extract_reuters.plx, 
> LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, 
> tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene

2007-01-04 Thread Doron Cohen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462402
 ] 

Doron Cohen commented on LUCENE-675:


This update of the byTask package includes:
- allowing to tailor a perf test "programmically" (without an .alg file).
- maintaining both the "algorithm" and the run-properties in a single .alg file 
- this is easier to maintain in my opinion.
- some code cleanup.
- build.xml has a single "task related" target now: run-task. an ant property 
is used to invoke other .alg files.
- documentation updated (package docs under byTask).

To apply the patch from the trunk dir:   patch -p0 -i 
To test it, cd to contrib/benchmark and type:  ant run-task

Grant, I noticed that the patch file contains EOL characters - Unix/DOS thing I 
guess.
But 'patch' works cleanly for me either with these characters or without them, 
so I am leaving these characters there.
I hope this patch applies cleanly for you.


> Lucene benchmark: objective performance test for Lucene
> ---
>
> Key: LUCENE-675
> URL: https://issues.apache.org/jira/browse/LUCENE-675
> Project: Lucene - Java
>  Issue Type: Improvement
>Reporter: Andrzej Bialecki 
> Assigned To: Grant Ingersoll
>Priority: Minor
> Attachments: benchmark.byTask.patch, benchmark.patch, 
> BenchmarkingIndexer.pm, byTask.2.patch.txt, extract_reuters.plx, 
> LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, 
> tiny.alg, tiny.properties
>
>
> We need an objective way to measure the performance of Lucene, both indexing 
> and querying, on a known corpus. This issue is intended to collect comments 
> and patches implementing a suite of such benchmarking tests.
> Regarding the corpus: one of the widely used and freely available corpora is 
> the original Reuters collection, available from 
> http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz 
> or 
> http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz.
>  I propose to use this corpus as a base for benchmarks. The benchmarking 
> suite could automatically retrieve it from known locations, and cache it 
> locally.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]