Thank-you.
Glen
On Sat, 6 Aug 2022 at 23:46, Tomoko Uchida
wrote:
> Hi Glen,
> I verified your Jira/GitHub usernames and added a mapping.
>
> https://github.com/apache/lucene-jira-archive/commit/ae78d583b40f5bafa1f8ee09854294732dbf530b
>
> Tomoko
>
>
> 20
jira: gnewton
github: gnewton (github.com/gnewton)
Thanks,
Glen
On Sat, 6 Aug 2022 at 14:11, Tomoko Uchida
wrote:
> Hi everyone.
>
> I wanted to let you know that we'll extend the deadline until the date the
> migration is started (the date is not fixed yet).
> Please let us know your Ji
-date though.
>
> Shai
>
> On Thu, Nov 10, 2016 at 4:40 PM Glen Newton wrote:
>
> > I am looking for documentation on Lucene faceting. The most recent
> > documentation I can find is for 4.0.0 here:
> >
> > http://lucene.apache.org/core/4_0_0/facet/org
I am looking for documentation on Lucene faceting. The most recent
documentation I can find is for 4.0.0 here:
http://lucene.apache.org/core/4_0_0/facet/org/apache/lucene/facet/doc-files/userguide.html
Is there more recent documentation for 6.3.0? Or 6.x?
Thanks,
Glen
> load a single document (or a fixed number of them) for every step. In
> the case you call loadAll() there is a problem with memory.
>
>
>
>
> 2016-08-19 15:39 GMT+02:00, Glen Newton :
> > Making docid an int64 is a non-trivial undertaking, and this work needs
> to
> &
Making docid an int64 is a non-trivial undertaking, and this work needs to
be compared against the use cases and how compelling they are.
That said, in the lifetime of most software projects a decision is made to
break backward compatibility to move the project forward.
When/if moving to int64 hap
Or maybe it is time Lucene re-examined this limit.
There are use cases out there where >2^31 does make sense in a single index
(huge number of tiny docs).
Also, I think the underlying hardware and the JDK have advanced to make
this more defendable.
Constructively,
Glen
On Thu, Aug 18, 2016 at
Query would look like if it allowed a 'toQuery'
> capability and returned data from both sides of the join.
>
> 3. If you can denormalize your data into hierarchies, then you could
> use index-time joining (BlockJoin) for better performance and easier
> collecting of your gro
Anyone?
On Thu, Dec 11, 2014 at 2:53 PM, Glen Newton wrote:
> Is there any reason JoinUtil (below) does not have a 'Query toQuery'
> available? I was wanting to filter on the 'to' side as well. I feel I
> am missing something here.
>
> To make sure this is not
Is there any reason JoinUtil (below) does not have a 'Query toQuery'
available? I was wanting to filter on the 'to' side as well. I feel I
am missing something here.
To make sure this is not an XY problem, here is my use case:
I have a many-to-many relationship. The left, join, and right 'table'
Hi Koji,
Semantic vectors is here: http://code.google.com/p/semanticvectors/
It is a project that has been around for a number of years and used by many
people (including me
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
).
If you could compare and contrast word2vec
You should consider making each _line_ of the log file a (Lucene)
document (assuming it is a log-per-line log file)
-Glen
On Fri, Feb 14, 2014 at 4:12 PM, John Cecere wrote:
> I'm not sure in today's world I would call 2GB 'immense' or 'enormous'. At
> any rate, I don't have control over the siz
Thanks :-)
On Fri, May 3, 2013 at 2:31 PM, Alan Woodward wrote:
> Hi Glen,
>
> You want the SynonymFilter:
> http://lucene.apache.org/core/4_2_0/analyzers-common/org/apache/lucene/analysis/synonym/SynonymFilter.html
>
> Alan Woodward
> www.flax.co.uk
>
>
> On 3 M
Hello,
I know I've seen it go by on this list and elsewhere, but cannot seem
to find it: can someone point me to the best way to do term expansions
at indexing time.
That is, when the sentence is: "This foo is in my way"
And I somewhere: foo=bar|yak
Lucene indexes something like:
"This foo|bar|
I am in the process of upgrading LuSql from 2.x to 4.x and I am first
going to 3.6 as the jump to 4.x was too big.
I would suggest this to you. I think it is less work.
Of course I am also able to offer LuSql to 3.6 users, so this is
slightly different from your case.
-Glen
On Wed, Jan 9, 2013 a
adding an annotation to text.
>
>
> On 12/13/2012 01:54 PM, Glen Newton wrote:
>>
>> It is not clear this is exactly what is needed/being discussed.
>>
>> From the issue:
>> "We are also planning a Tokenizer/TokenFilter that can put parts of
>> speec
It is not clear this is exactly what is needed/being discussed.
>From the issue:
"We are also planning a Tokenizer/TokenFilter that can put parts of
speech as either payloads (PartOfSpeechAttribute?) on a token or at
the same position."
This adds it to a token, not a span. 'same position' does no
>Unfortunately, Lucene doesn't properly index
spans (it records the start position but not the end position), so
that limits what kind of matching you can do at search time.
If this could be fixed (i.e. indexing the _end_ of a span) I think all
the things that I want to do, and the things that can
+10
These are the kind of things you can do in GATE[1] using annotations[2].
A VERY useful feature.
-Glen
[1]http://gate.ac.uk
[2]http://gate.ac.uk/wiki/jape-repository/annotations.html
On Wed, Dec 12, 2012 at 3:02 PM, Wu, Stephen T., Ph.D.
wrote:
>>> Is there any (preliminary) code checked in
Yes, very interested.
--> Quick scan: very cool work! +10 :-)
Thanks,
Glen Newton
On Wed, Sep 26, 2012 at 9:59 AM, Carsten Schnober
wrote:
> Hi,
> in case someone is interested in an application of the Lucene indexing
> engine in the field of corpus linguistics rather than
Storing content in large indexes can significantly add to index time.
The model of indexing fields only in Lucene and storing just a key,
and then storing the content in some other container (DBMS, NoSql,
etc) with the key as lookup is almost a necessity for this use case
unless you have a complet
od is
> incrementToken, I have no idea what to do in it.
>
> Regards,
>
> Prakash Bande
> Director - Hyperworks Enterprise Software
> Altair Eng. Inc.
> Troy MI
> Ph: 248-614-2400 ext 489
> Cell: 248-404-0292
>
> -Original Message-
> From: Glen Ne
I'd suggest writing a perl script or
insert-favourite-scripting-language-here script to pre-filter this
content out of the files before it gets to Lucene/Solr
Or you could just grep for "Data' and"Description" (or is
'Description' multi-line)?
-Glen Newto
Do the check _before_ indexing.
Use https://code.google.com/p/language-detection/ to verify the
language of the text document before you put it in the index.
-Glen Newton
http://zzzoot.blogspot.com/
On Mon, Feb 27, 2012 at 10:53 AM, Ilya Zavorin wrote:
> Suppose I have a bunch of t
"Caste" --> Castle
https://bitbucket.org/acunu
http://support.acunu.com/entries/20216797-castle-build-instructions
It looks very promising.
It is a kernel module and I'm not sure it can run in user space, which
I'd prefer.
-Glen Newton
On Sat, Sep 3, 2011 at 9:21 PM,
43
AIX allows different malloc policies to be used in the underlying
system calls. Consider using the WATSON (!) malloc policy. p.134,136
and
http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.aix.genprogc/doc/genprogc/sys_mem_alloc.htm
Finally (or before doing all of
So to use Lucene-speak, each sentence is a document.
I don't know how you are indexing and what code you are using (and
what hardware, etc.), but you if you are not already, should consider
multi-threading the indexing which should give you a significant
indexing performance boost.
-Glen
On Fri
Could you elaborate what you want to do with the index of large
documents? Do you want to search at the document or sentence level?
This can drive how to index this content.
-Glen
On Fri, Jul 22, 2011 at 10:52 AM, starz10de wrote:
> Hi,
>
> I have one text file that contains 60 000 sentences. Is
gmail interprets the closing asterisk as part of the URL, for all
three URLs --> 404s
You might want to add a space before the '*'...
-glen
On Thu, Jul 7, 2011 at 2:17 PM, Abhishek Rakshit wrote:
> Hey folks,
>
> We received great feedback on the Lucene Architecture site that we have been
> buil
-threaded-query-lucene.html
http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
Glen Newton
On Tue, Jan 25, 2011 at 11:31 AM, Siraj Haider wrote:
> Hello there,
> I was looking for best practices for indexing/searching on a
> multi-processor/core machine but
Where do you get your Lucene/Solr downloads from?
[x] ASF Mirrors (linked in our release announcements or via the Lucene website)
[] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
[] I/we build them from source via an SVN/Git checkout.
-Glen Newton
he Lucene list.
If you have any questions, please contact me.
Thanks,
Glen Newton
http://zzzoot.blogspot.com
--> Old LuSql benchmarks:
http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html
On Thu, Dec 16, 2010 at 12:04 PM, Dyer, James wrote:
> We have ~50 lon
Does anyone know what technology they are using: http://www.indextank.com/
Is it Lucene under the hood?
Thanks, and apologies for cross-posting.
-Glen
http://zzzoot.blogspot.com
--
-
-
To unsubscribe, e-mail: java-user-unsubs
the ClueWeb collection
http://trec.nist.gov/pubs/trec18/papers/arsc.WEB.pdf
Expanding Queries Using Multiple Resources
http://staff.science.uva.nl/~mdr/Publications/Files/trec2006-proceedings-genomics.pdf
-Glen Newton
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html
http
Hi Luan,
Could you tell us the name and/or URL of this plugin so that the list
might know about it?
Thanks,
Glen
On 10 August 2010 12:21, Luan Cestari wrote:
>
> We would like to say thanks for the replies.
>
> We found a plugin in Nutch (the Creative Commons plugin) that does like Otis
> said.
, in a Solr context.
http://wiki.apache.org/solr/DataImportHandler
Thanks,
-Glen Newton
LuSql author
http://zzzoot.blogspot.com/
On 23 July 2010 15:46, manjula wijewickrema wrote:
> Hi,
>
> Normally, when I am building my index directory for indexed documents, I
> used to keep my i
There are a number of strategies, on the Java or OS side of things:
- Use huge pages[1]. Esp on 64 bit and lots of ram. For long running,
large memory (and GC busy) applications, this has achieved significant
improvements. Like 300% on EJBs. See [2],[3],[4]. For a great article
introducing and benc
Hello Uwe.
That will teach me for not keeping up with the versions! :-)
So it is up to the application to keep track of what it used for compression.
Understandable.
Thanks!
Glen
On 27 February 2010 10:17, Uwe Schindler wrote:
> Hi Glen,
>
>
>> Pluggable compression allowing for alternatives to
Pluggable compression allowing for alternatives to gzip for text
compression for storing.
Specifically I am interested in bzip2[1] as implemented in Apache
Commons Compress[2].
While bzip2 compression is considerable slower than gzip (although
decompression is not too much slower than gzip) it comp
Documents cannot be re-used in v3.0?
http://wiki.apache.org/lucene-java/ImproveIndexingSpeed
-glen
http://zzzoot.blogspot.com/
On 2 February 2010 02:55, Simon Willnauer
wrote:
> Ganesh,
>
> do you reuse your Document instances in any way or do you create new
> docs for each add?
>
> simon
>
> O
en looking at their index with
> Luke. :)
> Otis
> --
> Sematext is hiring -- http://sematext.com/about/jobs.html?mls
> Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR
>
>
>
> ----- Original Message
>> From: Glen Newton
>> To: java-user@
s.apache.org/jira/browse/LUCENE-652
> https://issues.apache.org/jira/browse/LUCENE-1960
>
> Glen Newton wrote:
>> Could someone send me where the rationale for the removal of
>> COMPRESSED fields is? I've looked at
>> http://people.apache.org/~uschindler/staging-area/luce
Could someone send me where the rationale for the removal of
COMPRESSED fields is? I've looked at
http://people.apache.org/~uschindler/staging-area/lucene-3.0.0-rc1/changes/Changes.html#3.0.0.changes_in_runtime_behavior
but it is a little light on the 'why' of this change.
My fault - of course - f
You might try re-implementing, using ThreadPoolExecutor
http://java.sun.com/j2se/1.5.0/docs/api/java/util/concurrent/ThreadPoolExecutor.html
glen
2009/11/10 Jamie Band :
> Hi There
>
> Our app spends alot of time waiting for Lucene to finish writing to the
> index. I'd like to minimize this. If y
This is basically what LuSql does. The time increases ("8h to 30 min")
are similar. Usually on the order of an order of magnitude.
Oh, the comments suggesting most of the interaction is with the
database? The answer is: it depends.
With large Lucene documents: Lucene is the limiting factor (worsen
LuSql
Disclosure: I am the author of LuSql.
-Glen Newton
http://zzzoot.blogspot.com/
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/Glen_Newton
2009/10/22 Paul Taylor :
> I'm building a lucene index from a database, creating 1 about 1 million
> documents, unsuprisingly
I appreciate your explanation, but I think that the use case I
described merits a deeper exploration:
Scenario 1: 16 threads indexing; queue size = 1000; present api; need to store
In this scenario, there are always 1000 Strings with all the contents
of their respective files.
Averaging 50k per do
h and/or tests if
> you have them.
>
> Cheers,
> Anthony
>
> On Mon, Sep 14, 2009 at 1:03 PM, Glen Newton wrote:
>> Hi,
>>
>> In 2.4.1, Field has 2 constructors that involve a Reader:
>> public Field(String name,
>> Reader
tring name,
Reader reader,
Field.Store store,
Field.Index index,
Field.TermVector termVector)
Constructively,
Glen Newton
http://zzzoot.blogspot.com/
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index
..@lucene.apache.org
> [mailto:java-user-return-42272-paul_murdoch=emainc@lucene.apache.org] On
> Behalf Of Glen Newton
> Sent: Friday, September 11, 2009 9:53 AM
> To: java-user@lucene.apache.org
> Subject: Re: Indexing large files? - No answers yet...
>
> In this
In this project:
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
I concatenate all the text of all of articles of a single journal into
a single text file.
This can create a text file that is 500MB in size.
Lucene is OK in indexing files this size (in parallel even),
You are optimizing before the threads are finished adding to the index.
I think this should work:
IndexWriter writer = new IndexWriter("D:\\index", new StandardAnalyzer(),
true);
File file=new File(args[0]);
Thread t1=new Thread(new IndexFiles(writer,file));
Thread t2=new Thread(new IndexFiles(wri
tion using only the
full-text (no metadata).
For more info & howto:
http://zzzoot.blogspot.com/2009/07/project-torngat-building-large-scale.html
Glen Newton
--
-
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.
e you include Lucene v2.3 in your
> code...does it work correctly with indexes created on v2.4 as well?
> - Greg
>
>
> On Mon, Apr 13, 2009 at 6:49 PM, Glen Newton wrote:
>
>> As the creator of LuSql
>> [http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
As the creator of LuSql
[http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql]
I would have hoped for a more creative (and more different) name.
:-)
-glen
2009/4/13 jonathan esposito :
> I created a command-line tool in Java that allows the user to execute
> sql-like commands again
Another solution is to have your application on the AppEngine, but the
index is on another machine. Then the application 'proxies' the
requests to the machine that has the index, which is using Solr
[http://lucene.apache.org/solr/] or some other way to expose to the
index to the web.
Yes, this mea
Dear Shashi,
It should work now.
A temporary failure: our apologies.
thanks,
Glen
2009/4/2 Shashi Kant :
> Hi all, I have been trying to get the latest version of LuSQL from the
> NRC.ca website but get 404s on the download links. I have written to the
> webmaster, but anyone have the jar handy
You might try looking in a list that talks about recommender systems.
Google hits:
- http://en.wikipedia.org/wiki/Recommendation_system
- ACM Recommender Systems 2009 http://recsys.acm.org/
- A Guide to Recommender Systems
http://www.readwriteweb.com/archives/recommender_systems.php
2009/3/17 Aaro
and your colleagues do not have infinite social
capital, and hopefully you will have no reason to be forced to spend
this capital in such an unfortunate manner in the future. :-)
sincerely,
Glen Newton
2009/3/5 Yonik Seeley :
> This morning, an apparently over-zealous marketing firm, on behalf
I would suggest you try LuSql, which was designed specifically to
index relational databases into Lucene.
It has an extensive user manual/tutorial which has some complex
examples involving multi-joins and sub-queries.
I am the author of LuSql.
LuSql home page:
http://lab.cisti-icist.nrc-cnrc.gc.c
onventional
> processors execute an idle loop when there is no work to do, so
> CPI may be artificially low, especially when the system is
> somewhat idle. The UltraSPARC T1 and T2 "park" idle threads,
> consuming no energy, when there is no work to do, so CPI may
> be arti
Could you give some configuration details:
- Solaris version
- Java VM version, heap size, and any other flags
- disk setup
You should also consider using huge pages (see
http://zzzoot.blogspot.com/2009/02/java-mysql-increased-performance-with.html)
I will also be posting performance gains using
V1 of a project of mine, Ungava[1], which uses Lucene to index
research articles and library catalog metadata, also uses Project
Simile's Metaphor and Timeline. I have some simple examples using
them:
Here is the search for "cell" in articles:
http://lab.cisti-icist.nrc-cnrc.gc.ca/ungava/Search?
Congrats & good-luck on this new endeavour!
-Glen :-)
2009/1/26 Grant Ingersoll :
> Hi Lucene and Solr users,
>
> As some of you may know, Yonik, Erik, Sami, Mark and I teamed up with
> Marc Krellenstein to create a company to provide commercial
> support (with SLAs), training, value-add compone
There is a discussion here:
http://www.terracotta.org/web/display/orgsite/Lucene+Integration
Also of interest: "Katta - distribute lucene indexes in a grid"
http://katta.wiki.sourceforge.net/
-glen
http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html
http://zzzoot.blo
> I'm not sure if it's a better idea to use something like Solr or start from
> scratch and customize the application as I move forward. What do you think
LuSql might be appropriate for your needs:
"LuSql is a high-performance, simple tool for indexing data held in a
DBMS into a Lucene index. It c
- Fast Similarity Search in Large Dictionaries. http://fastss.csg.uzh.ch/
- Paper: Fast Similarity Search in Large Dictionaries.
http://fastss.csg.uzh.ch/ifi-2007.02.pdf
- FastSimilarSearch.java http://fastss.csg.uzh.ch/FastSimilarSearch.java
- Paper: Fast Similarity Search in Peer-to-Peer Networks
Oops. Thanks! :-)
2008/12/10 Gary Moore <[EMAIL PROTECTED]>:
> svn co https://bobo-browse.svn.sourceforge.net/svnroot/bobo-browse/trunk
> bobo-browse
> -Gary
> Glen Newton wrote:
>>
>> I don't think this is an Open Source project: I couldn't find any
>
I don't think this is an Open Source project: I couldn't find any
source on the site and the only download is a jar with .class files...
-glen
2008/12/10 John Wang <[EMAIL PROTECTED]>:
> www.browseengine.com
> -John
>
> On Wed, Dec 10, 2008 at 10:55 AM, Glen Newt
>From what I understand:
faceted browse is a taxonomy of depth =1
A taxonomy in general has an arbitrary depth:
Example: Biological taxonomy:
Kingdom Animalia
Phylum Acanthocephala
Class Archiacanthocephala
Phylum Annelida
Kingdom Fungi
Phylum Ascomycota
Class Ascomycetes
oblems, generally you don't
> want concurrent writes.
>
> -John
>
> On Thu, Dec 4, 2008 at 2:44 PM, Glen Newton <[EMAIL PROTECTED]> wrote:
>
>> Am I missing something here?
>>
>> Why not use:
>> IndexWriter writer = new IndexWriter(NIOFSDi
s more on how to use NIOFSDirectory class. I am hoping for a simply
>> answer,
>> > is what I am doing (setting the class name statically on system property)
>> > the right way?
>> >
>> > -John
>> >
>> > On Thu, Dec 4, 2008
Sorrywhat version are we talking about? :-)
thanks,
Glen
2008/12/4 Yonik Seeley <[EMAIL PROTECTED]>:
> On Thu, Dec 4, 2008 at 4:11 PM, John Wang <[EMAIL PROTECTED]> wrote:
>> Hi guys:
>>We did some profiling and benchmarking:
>>
>>The thread contention on FSDIrectory is gone, and fo
Hi Magnus,
Could you post the OS, version, RAM size, swapsize, Java VM version,
hardware, #cores, VM command line parameters, etc? This can be very
relevant.
Have you tried other garbage collectors and/or tuning as described in
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html?
Let's say I have 8 indexes on a 4 core system and I want to merge them
(inside a single vm instance).
Is it better to do a single merge of all 8, or to in parallel threads
merge in pairs, until there is only a single index left? I guess the
question involves how multi-threaded merging is and if it
I have some simple indexing benchmarks comparing Lucene 2.3.1 with 2.4:
http://zzzoot.blogspot.com/2008/11/lucene-231-vs-24-benchmarks-using-lusql.html
In the next couple of days I will be running benchmarks comparing
Solr's DataImportHandler/JdbcDataSource indexing performance with
LuSql and wil
g an
86GB Lucene index in ~13 hours.
http://lab.cisti-icist.nrc-cnrc.gc.ca/cistilabswiki/index.php/LuSql
Glen Newton
--
-
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Thanks! :-)
2008/11/6 Michael McCandless <[EMAIL PROTECTED]>:
>
> The field never changes across all docs? If so, this will work fine.
>
> Mike
>
> Glen Newton wrote:
>
>> I have a use case where I want all of my documents to have - in
>> addition to the
I have a use case where I want all of my documents to have - in
addition to their other fields - a single field=value.
An example use is where I have multiple Lucene indexes that I search
in parallel, but still need to distinguish them.
Index 1: All documents have: source="a1"
Index 2: All documen
Yes, the problem goes away when I do the following:
synchronized(doc)
{
doc.add(field);
}
Thanks.
[I'll use a Lock to do this properly]
-glen
2008/10/31 Yonik Seeley <[EMAIL PROTECTED]>:
> On Fri, Oct 31, 2008 at 11:53 AM, Glen Newton <[EMAIL PROTECTED]> wrote:
>>
Hello,
I am using Lucene 2.3.1.
I have concurrent threads adding Fields to the same Document, but
getting some odd behaviour.
Before going into too much depth, is Document thread-safe?
thanks,
Glen
http://zzzoot.blogspot.com/
--
-
---
2008/10/23 Michael McCandless <[EMAIL PROTECTED]>:
>
> Mark Miller wrote:
>
>> Glen Newton wrote:
>>>
>>> 2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
>>>
>>>> It sounds like you might have some thread synchronization issues outside
2008/10/23 Mark Miller <[EMAIL PROTECTED]>:
> It sounds like you might have some thread synchronization issues outside of
> Lucene. To simplify things a bit, you might try just using one IndexWriter.
> If I remember right, the IndexWriter is now pretty efficient, and there
> isn't much need to inde
You might want to look at my indexing of 6.4 million PDF articles,
full-text and metadata. It resulted in an 83GB index taking 20.5 hours
to run. It uses multiple writers, is massively multithreaded.
More info here:
http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html
Che
See also:
http://zzzoot.blogspot.com/2007/10/drill-clouds-for-search-refinement-id.html
and
http://zzzoot.blogspot.com/2007/10/tag-cloud-inspired-html-select-lists.html
-glen
2008/10/16 Glen Newton <[EMAIL PROTECTED]>:
> Yes, tag clouds.
>
> I've implemented them using
lts they got back. Sort of
> like latent relationships.
>
> Does that help?
>
> I thought this could be done using term frequency vectors in Lucene, but
> I've never used TFV's before. And can then be limited to just a set of
> results.
>
> HTH,
> D
Sorry, could you explain what you mean by a "link map over lucene results"?
thanks,
-glen
2008/10/16 Darren Govoni <[EMAIL PROTECTED]>:
> Hi,
> Has anyone created a link map over lucene results or know of a link
> describing the process? If not, I would like to build one to contribute.
>
> Also,
IndexWriter is thread-safe and has been for a while
(http://www.mail-archive.com/[EMAIL PROTECTED]/msg00157.html)
so you don't have to worry about that.
As reported in my blog in April
(http://zzzoot.blogspot.com/2008/04/lucene-indexing-performance-benchmarks.html)
but perhaps not explicitly enoug
> I think it is not good idea to use lucene as storage, it is just index.
I strongly disagree with this position.
To qualify my disagreement: yes, you should not use Lucene as your
primary storage for your data in your organization.
But, for a particular application, taking content from your pri
There are a number of ways to do this. Here is one:
Lose the parentid field (unless you have other reasons to keep it).
Add a field fullName, and a field called depth :
doc1
fullName: state
depth: 0
doc2
fullName: state/department
depth:1
doc3
fullName: state/department/Boston
depth: 2
doc4
ful
A subset of your questions are answered (or at least examined) in my
postings on multi-thread queries on a multiple-core single system:
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html
http://zzzoot.blogspot.com/2008/06/lucene-concurrent-search-performance.html
-Glen
200
Use Carrot2:
http://project.carrot2.org/
For Lucene + Carrot2:
http://project.carrot2.org/faq.html#lucene-integration
-glen
2008/7/7 Ariel <[EMAIL PROTECTED]>:
> Hi everybody:
> Do you have Idea how to make how to make documents clustering and topic
> classification using lucene ??? Is there a
Lutan,
Yes, no problem. I am away at a conference next week but plan to
release the code the following week. Is this OK for you?
thanks,
Glen
2008/6/13 lutan <[EMAIL PROTECTED]>:
>
> TO: Glen Newton Could I get your test code or code architecture for study.
> I ha
en the performance will slowly deteriorate with more
> readers/searchers let's see it!
I'm running it & will post when it is done.
thanks,
Glen :-)
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Messag
I have extended my evaluation (previous evaluation:
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html)
to include as well as an increasing # of threads performing concurrent
queries, 1,2,4 and 8 IndexReaders.
The results can be found here:
http://zzzoot.blogspot.com/2008/0
Lucene Database Search in 3 minutes:
> http://wiki.dbsight.com/index.php?title=Create_Lucene_Database_Search_in_3_minutes
> DBSight customer, a shopping comparison site, (anonymous per request) got
> 2.6 Million Euro funding!
>
> On Mon, Jun 9, 2008 at 3:51 PM, Glen Newton <[EMAIL PROT
I have, with the gnuplot
scripts that I have. Let me finish off what I am doing for my work and
I will clean things up a bit, write a little documentation.
-Glen
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
> - Original Message
>&g
A number of people have asked about query benchmarks.
I have posted benchmarks for concurrent query requests for Lucene
2.3.1 on my blog, where I look at 1 - 4096 concurrent requests:
http://zzzoot.blogspot.com/2008/06/simultaneous-threaded-query-lucene.html
I hope you find this useful.
thanks
want. And it
> works for both indexing and querying out-of-the-box.
>
> Best
> Erick
>
> On Thu, Jun 5, 2008 at 12:14 PM, Glen Newton <[EMAIL PROTECTED]> wrote:
>
>> I would like to be able to get multi-language support within a single
>> index.
>>
d
to make these sorts of manipulations to the nature of the segments
files easier for mere mortal developers? :-)
Is this something that is already being talked about/looked in
to/being implemented? :-)
thanks,
Glen Newton
http://zzzoot.blogspot.com/
--
1 - 100 of 120 matches
Mail list logo