Dear Lucene developers,
I'd be interested in doing some benchmarking on (at least) Lucene,
Egothor and MG4J. There is no actual data around on publicly available
collections, and it would be nice to have some more objective data on
efficiency for a significantly large collection.
We have GOV2 (25M
Hi,
On 5/29/06, Sebastiano Vigna <[EMAIL PROTECTED]> wrote:
Dear Lucene developers,
I'd be interested in doing some benchmarking on (at least) Lucene,
Egothor and MG4J. There is no actual data around on publicly available
collections, and it would be nice to have some more objective data on
effi
Hi,
We have been doing such a benchmark over all TREC collections and TREC
queries. Our participation to TREC in last years gives us the opportunity
to work on the collections. Lucene is one of the systems that we look at.
The measurements are based on two functionalities; indexing and querying.
W
[
http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12413695 ]
Arthit Suriyawongkul commented on LUCENE-503:
-
related projects/implementations:
SansarnLook
based on Lucene, with additional ThaiAnalyzer
ref: http://sansarn.c
Thanks for the reply but I couldnt get your point..Could you elaborate it
further?
Fopr instance we have
FirstName (= Martin ), LastName (= Spaniol), Company (= Mark Co.) and we search
for the "Mar*" which will be found in FirstName and Company ..so how can I
retrieve this info that it is foun
That would be great to see!
There is a million of enhancements and ideas that could come up as a result of
this comparison. For example, I would not be surprised to see mg4j "perfect
skipping" to become interesting optimization for Lucene, Trie based Lexicon
could make some regex queries signi
On Mon, 2006-05-29 at 17:33 +0800, Dave Kor wrote:
> I was wondering if you have seen the TREC 2004 paper by Giuseppe
> Attardi, Andrea Esuli and Chirag Pate from the University of Pisa,
> Italy, titled "Using Clustering and Blade Clusters in the TeraByte
> task"? http://trec.nist.gov/pubs/trec13/
On 5/29/06, Otis Gospodnetic <[EMAIL PROTECTED]> wrote:
Regarding the part where you describe how indexing/fields could be configured
via XML descriptors, you may want to have a look at
lucene/java/trunk/contrib/xml-query-parser .
And Solr's schema.xml :-)
http://svn.apache.org/viewvc/incuba
Otis wrote:
Short answer: no. Damn are those scoring classes hard to follow...
I have looked at these classes several times, including stepping through
them and I still find them confusing. Perhaps someone (Doug? pretty
please?) could illuminate us?
I think the part I always found tr
This is great guys, I will definately have a look at it. :) Thanks!!
I'm thinking of configuring the indexer via xml as well. As the atom
format allows foreign namespaces the indexing component has to be very
flexible. I might configure all the elements in the atom namespace
globally and will off
Boy, I'd sure like to see at least one bug-fix release for 2.0
maintain java 1.4 compatibility. Would that be 2.1?
Bill
> This sounds reasonable to me. I feel bad about Andi and PyLucene, but it
> sounds like GCJ(X) will soon be up-to-date (the link Andi sent was from early
> February). Disc
I guess this discussion isn't over...
I would like to know if anybody would feel uncomfortable with a 1.5
dependend contrib project like the GData Server?
I'm not sure whether it is worth to think about a 2.0 / 2.1 (tiger)
branch. That would be a lot more work but far less fight ;)
simon
On 5/2
Could be 2.0.*. I think that is what Hoss was saying, too.
Otis
- Original Message
From: Bill Janssen <[EMAIL PROTECTED]>
To: java-dev@lucene.apache.org; Otis Gospodnetic <[EMAIL PROTECTED]>
Sent: Monday, May 29, 2006 11:17:43 AM
Subject: Re: Lucene and Java 1.5
Boy, I'd sure like to
Dave Kor wrote:
Hi,
On 5/29/06, Sebastiano Vigna <[EMAIL PROTECTED]> wrote:
Dear Lucene developers,
I'd be interested in doing some benchmarking on (at least) Lucene,
Egothor and MG4J. There is no actual data around on publicly available
collections, and it would be nice to have some more objec
Hi,
- Original Message
From: Andrzej Bialecki <[EMAIL PROTECTED]>
Dave Kor wrote:
> Hi,
>
> On 5/29/06, Sebastiano Vigna <[EMAIL PROTECTED]> wrote:
>> Dear Lucene developers,
>> I'd be interested in doing some benchmarking on (at least) Lucene,
>> Egothor and MG4J. There is no actual dat
Hi, Noon,
Sorry I did not initially understand the detailed problem you have.
This sounds like a prefix match problem. You can create index for each field
and then do a prefix mach for these fields.
By the way, I think you question could be better served by posting to the
lucene user group.
Ch
Otis Gospodnetic wrote:
OG: But Andrzej, you already wrote that indexing benchmark tool (which we never
put anywhere in SVN, I'm afraid) that works on some freely available Reuters
corpus, I believe. Why couldn't that be adapted for testing Lucene, Egothor,
and MG4J?
Hmm, yes, indeed I h
On May 29, 2006, at 10:34 AM, Andrzej Bialecki wrote:
It could use the Reuters corpus
Has anyone used existing categorization data associated with the
Reuters corpus to build a benchmarker that measured IR precision and/
or recall?
Marvin Humphrey
Rectangular Research
http://www.rectang
To weigh in with my take, all environments I develop and deploy to
are at JDK/JRE 1.5. Solr is exclusively for 1.5+ and it has top
billing my architecture. GData server at 1.5 is perfectly fine by
me. I'd use it, and very interested in Solr collaboration as well.
Erik
On May 29
Marvin Humphrey wrote:
On May 29, 2006, at 10:34 AM, Andrzej Bialecki wrote:
It could use the Reuters corpus
Has anyone used existing categorization data associated with the
Reuters corpus to build a benchmarker that measured IR precision
and/or recall?
That would be RCV1 or RCV2, right
On May 29, 2006, at 10:58 AM, Andrzej Bialecki wrote:
Has anyone used existing categorization data associated with the
Reuters corpus to build a benchmarker that measured IR precision
and/or recall?
That would be RCV1 or RCV2, right? AFAIK the Reuters-21578 has no
such information ... Th
So I guess that would be totally alright to build the gdata server 1.5
dependent.
does anyone feel comfortable with that?
simon
On 5/29/06, Erik Hatcher <[EMAIL PROTECTED]> wrote:
To weigh in with my take, all environments I develop and deploy to
are at JDK/JRE 1.5. Solr is exclusively for 1.5
Hello everyone,
today I reconsidered the internal representation of the feed /
entries. I had a closer look at the Google Data Client Api which is
supposed to be the other end to the server.
This API is dist. under the Apache Licence e.g. open source. It
already provides the Object representatio
: Boy, I'd sure like to see at least one bug-fix release for 2.0
: maintain java 1.4 compatibility. Would that be 2.1?
: Could be 2.0.*. I think that is what Hoss was saying, too.
Yes, that was my point ... as far as i can tell, Lucene bug fix releases
have historically been at the "third leve
Sebastiano Vigna wrote on 05/28/2006 10:39 PM:
> but we will certainly need
> some help to configure Lucene so that it works at its best.
>
> We would like to measure indexing time and query answer time
>
I'm not sure what form you would like that help to take, but here are a
couple high-level
[ http://issues.apache.org/jira/browse/LUCENE-503?page=all ]
Samphan Raruenrom updated LUCENE-503:
-
Attachment: TestThaiAnalyzer.java
Add TestThaiAnalyzer junit test, modified from TestFrenchAnalyzer. The Thai
words are picked so that changing the d
[
http://issues.apache.org/jira/browse/LUCENE-503?page=comments#action_12413756 ]
Samphan Raruenrom commented on LUCENE-503:
--
All the code have been tested with Lucene 2.0.0.
Thanks Art for the info/URL. I've never known about Pichai's work before I
My concern is really with the use of GCJ with Lucene. I'd hate to see
Lucene core releases that couldn't be used with the latest "stable"
release of GCJ. Unfortunately, it's very hard to know what that
means. What's the latest version of GCJ? What Java language features
are supported in it? It
On Mon, 2006-05-29 at 14:35 -1000, Chuck Williams wrote:
> I'm not sure what form you would like that help to take, but here are a
> couple high-level points imho:
Help in configuring Lucene so that it uses all resources available, and
so that the results returned are identical to all other engin
29 matches
Mail list logo