On 31/10/2011 21:42, Petite Abeille wrote:
On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote:
similarity-preserving hash function was calculated on each sentence, and the
hash was added as a field. The property of the hash was that similar documents
(sentences) would produce a similar hash
On Oct 31, 2011, at 9:32 PM, Andrzej Bialecki wrote:
> similarity-preserving hash function was calculated on each sentence, and the
> hash was added as a field. The property of the hash was that similar
> documents (sentences) would produce a similar hash, with only some bit-level
> perturbati
On 22/10/2011 11:11, Grant Ingersoll wrote:
Hi All,
I'm giving a talk at ApacheCon titled "Bet you didn't know Lucene can..."
(http://na11.apachecon.com/talks/18396). It's based on my observation, that over the
years, a number of us in the community have done some pretty cool things using Luc
m also using public domain Wikipedia data so can release the code and data
> somewhere if that's of interest.
>
> Cheers
> Mark
>
>
>
> - Original Message -
> From: Dawid Weiss
> To: java-user@lucene.apache.org
> Cc:
> Sent: Tuesday, 25 October 2011,
pache.org
Cc:
Sent: Tuesday, 25 October 2011, 23:17
Subject: Re: Bet you didn't know Lucene can...
> Lucene started out at an avg 3ms but subsequent runs took it down
> dramatically due to OS file caching. The all-in-memory hashset implementation
> clearly did not demonstrate th
> Lucene started out at an avg 3ms but subsequent runs took it down
> dramatically due to OS file caching. The all-in-memory hashset implementation
> clearly did not demonstrate the same speed ups between runs.
I don't say the benchmark was wrong or anything, but this is
surprising. I mean, the
> Avg lookup time slightly less than a HashSet? Interesting.
Yep, HashSet comparison was a surprise to me too. I threw it in as a datapoint
for what I thought would be the fastest option on the example dataset but
clearly not a long-term answer to my problem as it costs so much in RAM.
Lucene s
Avg lookup time slightly less than a HashSet? Interesting. Is the code
to these benchmarks available somewhere?
Dawid
On Tue, Oct 25, 2011 at 9:57 PM, Grant Ingersoll wrote:
>
> On Oct 25, 2011, at 11:26 AM, mark harwood wrote:
>
using Lucene that don't fit under the core premise of full te
On Oct 25, 2011, at 11:26 AM, mark harwood wrote:
>>> using Lucene that don't fit under the core premise of full text search
>
> I've had several use cases over the years that use features peculiar to
> Lucene but here's a very simple one I came across today that illustrates its
> raw index l
At the group where I worked at UVa once upon a time, a coworker built Juxta,
this way cool tool to diff multiple versions of a document visually with heat
maps and "difference"-o-meters, and it leverages Lucene analyzers to extract
words and positions and such.
You can find it here: http://www.
>>using Lucene that don't fit under the core premise of full text search
I've had several use cases over the years that use features peculiar to Lucene
but here's a very simple one I came across today that illustrates its raw index
lookup capability:
I needed a fast, scalable and persistent "S
Hi Grant,
In Carrot2 (and Carrot Search's commercial products) we're not using
Lucene as an indexing/ search service directly, but we are re-using a
lot of internal infrastructure (like analyzers, ported snowball
stemmers and other segmentation stuff). We also plan on using the new
language identi
Using Lucene as a recommendation engine.
On Sat, Oct 22, 2011 at 6:33 PM, Grant Ingersoll wrote:
>
> On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote:
>
>> Hi Grant,
>>
>> Not sure if this qualifies as a "bet you didn't know", but one could use
>> Lucene term vectors to construct document vectors for
On Oct 22, 2011, at 6:03 PM, Sujit Pal wrote:
> Hi Grant,
>
> Not sure if this qualifies as a "bet you didn't know", but one could use
> Lucene term vectors to construct document vectors for similarity,
> clustering and classification tasks. I found this out recently (although
> I am probably no
Hi Grant,
These are 2 cases into work i've done that I can think of:
-use Lucene to match products in a database with eBay auctions, the title
of the auction is used as the query to Lucene.
-use a servlet filter and Lucene to map well-formed URL's into a website
to it's individual (product) page
Hi Grant,
Not sure if this qualifies as a "bet you didn't know", but one could use
Lucene term vectors to construct document vectors for similarity,
clustering and classification tasks. I found this out recently (although
I am probably not the first one), and I think this could be quite
useful.
-
Grant,
for years the ActiveMath learning environment has been using as storage engine.
At the time (~2004), it was by far the best storage engine ever doable in a
pure java-world.
Now it still is perfect in terms of performance.
We had an issue with the separate versions where the stored-fields w
17 matches
Mail list logo