Re: [jira] Closed: (LUCENE-708) Setup nightly build website links and docs
Grant, Parabuild is currently running integration build and a daily build at http://parabuild.viewtier.com:8080/parabuild/index.htm?displaygroupid=5 Viewtier intends to keep supporting Lucene. Regards, Slava Imeshev --- "Grant Ingersoll (JIRA)" <[EMAIL PROTECTED]> wrote: > > [ > https://issues.apache.org/jira/browse/LUCENE-708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel > ] > > Grant Ingersoll closed LUCENE-708. > -- > > > > Setup nightly build website links and docs > > -- > > > > Key: LUCENE-708 > > URL: https://issues.apache.org/jira/browse/LUCENE-708 > > Project: Lucene - Java > > Issue Type: Improvement > > Components: Website > >Reporter: Grant Ingersoll > > Assigned To: Grant Ingersoll > >Priority: Minor > > > > Per discussion on mailing list, we are going to setup a Nightly Build link > > on the website > linking to the docs (and javadocs) generated by the nightly build process. > The build process > may need to be modified to complete this task. > > Going forward, the main website will, for the most part, only be updated > > per releases (I > imagine exceptions will be made for News items and per committer's > discretion). The Javadocs > linked to from the main website will always be for the latest release. > > -- > This message is automatically generated by JIRA. > - > If you think it was sent incorrectly contact one of the administrators: > https://issues.apache.org/jira/secure/Administrators.jspa > - > For more information on JIRA, see: http://www.atlassian.com/software/jira > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-765) Index package level javadocs needs content
[ https://issues.apache.org/jira/browse/LUCENE-765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462203 ] Daniel Naber commented on LUCENE-765: - Some of this is already here: http://lucene.apache.org/java/docs/api/overview-summary.html#overview_description > Index package level javadocs needs content > -- > > Key: LUCENE-765 > URL: https://issues.apache.org/jira/browse/LUCENE-765 > Project: Lucene - Java > Issue Type: Task > Components: Javadocs >Reporter: Grant Ingersoll >Priority: Minor > > The org.apache.lucene.index package level javadocs are sorely lacking. They > should be updated to give a summary of the important classes, how indexing > works, etc. Maybe give an overview of how the different writers coordinate. > Links to file formats, information on the posting algorithm, etc. would be > helpful. > See the search package javadocs as a sample of the kind of info that could go > here. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-510) IndexOutput.writeString() should write length in bytes
[ https://issues.apache.org/jira/browse/LUCENE-510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462285 ] Marvin Humphrey commented on LUCENE-510: Grant... At the moment I am completely consumed by the task of getting a devel release of KinoSearch version 0.20 out the door. Once that is taken care of, I will be glad to update this patch, and to explore how to compensate for the performance hit it causes. Chuck... If bytecount-based strings are adopted, standard UTF-8 probably comes along for the ride. There's actually a 1-2% performance gain to be had using standard over modified because of simplified conditionals. What holds us back is backwards compatibility -- but we'll have wrecked backwards compat with the bytecounts. However, I no longer have a strong objection to using Modified UTF-8 (for Lucene, that is -- Modified UTF-8 would be a deal-breaker for Lucy), so if somewhere along the way we find a compelling reason to stick with modified UTF-8, so be it. If bytecount-based strings get adopted, it will be because they hold up on their own merits. They're required for KinoSearch merge model; once KS 0.20 is out, I'll port the new benchmarking stuff, we can study the numbers, and assess whether the significant effort needed to pry that algo into Lucene would be worthwhile. Yonik... yes, I agree. Even better for indexing time, leave postings in serialized form for the entire indexing session. :) > IndexOutput.writeString() should write length in bytes > -- > > Key: LUCENE-510 > URL: https://issues.apache.org/jira/browse/LUCENE-510 > Project: Lucene - Java > Issue Type: Improvement > Components: Store >Affects Versions: 2.1 >Reporter: Doug Cutting > Assigned To: Grant Ingersoll > Fix For: 2.1 > > Attachments: SortExternal.java, strings.diff, TestSortExternal.java > > > We should change the format of strings written to indexes so that the length > of the string is in bytes, not Java characters. This issue has been > discussed at: > http://www.mail-archive.com/java-dev@lucene.apache.org/msg01970.html > We must increment the file format number to indicate this change. At least > the format number in the segments file should change. > I'm targetting this for 2.1, i.e., we shouldn't commit it to trunk until > after 2.0 is released, to minimize incompatible changes between 1.9 and 2.0 > (other than removal of deprecated features). -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462287 ] Doron Cohen commented on LUCENE-675: Grant, thanks for trying this out - I will update the patch shortly. I am using this for benchmarking - quite easy to add new stuff - and in fact I added some stuff lately but did not update here because wasn't sure if others are interested. I will verify what I have with svn head and pack it here as an updated patch. Regards, Doron > Lucene benchmark: objective performance test for Lucene > --- > > Key: LUCENE-675 > URL: https://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Andrzej Bialecki > Assigned To: Grant Ingersoll >Priority: Minor > Attachments: benchmark.byTask.patch, benchmark.patch, > BenchmarkingIndexer.pm, extract_reuters.plx, LuceneBenchmark.java, > LuceneIndexer.java, taskBenchmark.zip, timedata.zip, tiny.alg, tiny.properties > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Updated: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-675: --- Attachment: byTask.2.patch.txt > Lucene benchmark: objective performance test for Lucene > --- > > Key: LUCENE-675 > URL: https://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Andrzej Bialecki > Assigned To: Grant Ingersoll >Priority: Minor > Attachments: benchmark.byTask.patch, benchmark.patch, > BenchmarkingIndexer.pm, byTask.2.patch.txt, extract_reuters.plx, > LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, > tiny.alg, tiny.properties > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
[jira] Commented: (LUCENE-675) Lucene benchmark: objective performance test for Lucene
[ https://issues.apache.org/jira/browse/LUCENE-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12462402 ] Doron Cohen commented on LUCENE-675: This update of the byTask package includes: - allowing to tailor a perf test "programmically" (without an .alg file). - maintaining both the "algorithm" and the run-properties in a single .alg file - this is easier to maintain in my opinion. - some code cleanup. - build.xml has a single "task related" target now: run-task. an ant property is used to invoke other .alg files. - documentation updated (package docs under byTask). To apply the patch from the trunk dir: patch -p0 -i To test it, cd to contrib/benchmark and type: ant run-task Grant, I noticed that the patch file contains EOL characters - Unix/DOS thing I guess. But 'patch' works cleanly for me either with these characters or without them, so I am leaving these characters there. I hope this patch applies cleanly for you. > Lucene benchmark: objective performance test for Lucene > --- > > Key: LUCENE-675 > URL: https://issues.apache.org/jira/browse/LUCENE-675 > Project: Lucene - Java > Issue Type: Improvement >Reporter: Andrzej Bialecki > Assigned To: Grant Ingersoll >Priority: Minor > Attachments: benchmark.byTask.patch, benchmark.patch, > BenchmarkingIndexer.pm, byTask.2.patch.txt, extract_reuters.plx, > LuceneBenchmark.java, LuceneIndexer.java, taskBenchmark.zip, timedata.zip, > tiny.alg, tiny.properties > > > We need an objective way to measure the performance of Lucene, both indexing > and querying, on a known corpus. This issue is intended to collect comments > and patches implementing a suite of such benchmarking tests. > Regarding the corpus: one of the widely used and freely available corpora is > the original Reuters collection, available from > http://www-2.cs.cmu.edu/afs/cs.cmu.edu/project/theo-20/www/data/news20.tar.gz > or > http://people.csail.mit.edu/u/j/jrennie/public_html/20Newsgroups/20news-18828.tar.gz. > I propose to use this corpus as a base for benchmarks. The benchmarking > suite could automatically retrieve it from known locations, and cache it > locally. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]