Re: Lucene 4.0 Index Format Finalization Timetable

2011-12-08 Thread Mark Miller
While we are in constant sync due to the merge, lucene would still be updated multiple times before a solr 4 release, and it would be subject to happen at any time - so its really not any different. On Wednesday, December 7, 2011, Jamie Johnson wrote: > Yeah, biggest issue for us is we're using t

Re: "read past EOF" when merge

2012-11-03 Thread Mark Miller
Can you file a JIRA Markus? This is probably related to the new code that uses Directory for replication. - Mark On Nov 2, 2012, at 6:53 AM, Markus Jelsma wrote: > Hi, > > For what it's worth, we have seen similar issues with Lucene/Solr from this > week's trunk. The issue manifests itself w

Re: Lucene 4.1 tentative release

2012-12-12 Thread Mark Miller
We are hoping for 4.1 very soon! With the holidays it will be difficult to say - but 4.1 talk has been going on for some time now. Its really a matter of wrapping up some short term work and getting some guys to do the release work. I dont think anyone can give you a date, but it's certainly in

Re: Luke?

2013-03-15 Thread Mark Miller
If anyone is able to donate some effort, a nice future scenario could be that Luke comes fully up to date with every Lucene release: https://issues.apache.org/jira/browse/LUCENE-2562 - Mark On Mar 15, 2013, at 5:58 AM, Eric Charles wrote: > For the record, I happily use Luke (with Lucene 4.1)

[ANNOUNCE] Apache Lucene 4.2.1 released

2013-04-03 Thread Mark Miller
April 2013, Apache Lucene™ 4.2.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.2.1. Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-te

[ANN] Lucene/Solr Meetup in NYC on May 11th

2010-05-08 Thread Mark Miller
If you haven't heard, there is a Lucene/Solr meetup in New York next week: http://www.meetup.com/NYC-Apache-Lucene-Solr-Meetup/calendar/13325754/ The scheduled talks are (in addition to lightening talks): Solr 1.5 and Beyond: Yonik Seeley, author of Solr, co-founder, Lucid Imagination Topics w

Re: NumericField API

2010-06-01 Thread Mark Miller
On 6/1/10 9:34 AM, Mindaugas Žakšauskas wrote: It's just an early observation as historically Lucene has been doing an amazing job in terms of API stability. Yes it has :) Get ready for even more change in that area though :) -- - Mark http://www.lucidimagination.com ---

[ANN] Free technical webinar: Mastering the Lucene Index: Wednesday, August 11, 2010 11:00 AM PST / 2:00 PM EST / 20:00 CET

2010-08-09 Thread Mark Miller
Hey all - apologize for the quick cross post - just to let you know, Andrzej is giving a free webinar this wed. His presentations are always fantastic, so check it out: Lucid Imagination Presents a free technical webinar: Mastering the Lucene Index Wednesday, August 11, 2010 11:00 AM PST / 2:00 P

Re: Difference between regular Highlighter and Fast Vector Highlighter ?

2011-04-11 Thread Mark Miller
er if you do. FVH: works with fewer query types and requires that you store term vectors - but scales better than the std Highlighter to very large documents - Mark Miller lucidimagination.com Lucene/Solr User Conference May 25-26, San Francisco www.lucenerevolution.org On Apr 1, 2011, at 8:

Re: NRT consistency

2011-04-11 Thread Mark Miller
T-consistency-tp2801878p2801878.html > Sent from the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java

Re: NRT consistency

2011-04-11 Thread Mark Miller
rom the Lucene - Java Users mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - Mark Mill

Re: NRT consistency

2011-04-11 Thread Mark Miller
- Amazon Dynamo uses vector clocks for this. > > Otis > > Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch > Lucene ecosystem search :: http://search-lucene.com/ > > > > - Original Message >> From: Mark Miller >> To: java-user@lucene.

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-06 Thread Mark Miller
icitly set it higher than 0 for now. Feel free to create a JIRA issue and we can give it's own default greater than 0. - Mark Miller lucidimagination.com On Jul 6, 2011, at 5:34 PM, Jahangir Anwari wrote: > I have a CustomHighlighter that extends the SolrHighlighter and overrides &

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-07 Thread Mark Miller
sp); } else if (query instanceof TermQuery) { - extractWeightedTerms(terms, query); + extractWeightedSpanTerms(terms, new SpanTermQuery(((TermQuery)query).getTerm())); } else if (query instanceof SpanQuery) { extractWeightedSpanTerms(terms, (SpanQuery) query);

Re: Extracting span terms using WeightedSpanTermExtractor

2011-07-08 Thread Mark Miller
On Jul 8, 2011, at 5:43 AM, Jahangir Anwari wrote: > I don't think this is the best > solution, am open to other alternatives. Could also make it static public where it is? Either way. - Mark Miller lucidimag

[Announce] Lucene-Eurocon Call for Participation Closes Friday, JULY 15

2011-07-12 Thread Mark Miller
e Lucene EuroCon 2011 is presented by Lucid Imagination, the commercial entity for Apache Solr/Lucene Open Source Search; proceeds of the conference benefit The Apache Software Foundation. "Lucene" and "Apache Solr" are trademarks of the Apache Software Foundation. - Mark

Re: Questions on index Writer

2011-07-16 Thread Mark Miller
My advice: Don't close the IndexWriter - just call commit. Don't worry about forcing merges - let them happen as they do when you call commit. If you are going to use the IndexWriter again, you generally do not want to close it. Calling commit is the preferred option. - M

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
nd I think the limitation that I ate was that the word could belong to both it's true sentence, and the one after it. - Mark Miller lucidimagination.com - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache

Re: Search within a sentence (revisited)

2011-07-20 Thread Mark Miller
On Jul 20, 2011, at 7:44 PM, Mark Miller wrote: > > On Jul 20, 2011, at 11:27 AM, Peter Keegan wrote: > >> Mark Miller's 'SpanWithinQuery' patch >> seems to have the same issue. > > If I remember right (It's been more the a couple years),

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
.length, 1); > > clauses[1] = makeSpanTermQuery("3"); > allKeywords = new SpanNearQuery(clauses, Integer.MAX_VALUE, false); // > SpanAndQuery equivalent > query = new SpanWithinQuery(allKeywords, endSentence, 0); > System.out.println("query: "+query); > hits =

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
t there. > > Peter > > On Thu, Jul 21, 2011 at 3:07 PM, Mark Miller wrote: > >> Hey Peter, >> >> Getting sucked back into Spans... >> >> That test should pass now - I uploaded a new patch to >> https://issues.apache.org/jira/browse/LUCENE-777

Re: Search within a sentence (revisited)

2011-07-21 Thread Mark Miller
I just uploaded a patch for 3X that will work for 3.2. On Jul 21, 2011, at 4:25 PM, Mark Miller wrote: > Yeah, it's off trunk - I'll submit a 3X patch in a bit - just have to change > that to an IndexReader I believe. > > - Mark > > On Jul 21, 2011, at 4:01 PM, Pe

Re: Search within a sentence (revisited)

2011-07-25 Thread Mark Miller
Thanks Peter - if you supply the unit tests, I'm happy to work on the fixes. I can likely look at this later today. - Mark Miller lucidimagination.com On Jul 25, 2011, at 10:14 AM, Peter Keegan wrote: > Hi Mark, > > Sorry to bug you again, but there's another case that

Re: Search within a sentence (revisited)

2011-07-25 Thread Mark Miller
y use even more tests before feeling too confident here… I've attached a patch for 3X with the new test and fix (changed that include back to exclude). - Mark Miller lucidimagination.com On Jul 25, 2011, at 10:29 AM, Mark Miller wrote: > Thanks Peter - if you supply the unit tests, I'

Re: Search within a sentence (revisited)

2011-07-26 Thread Mark Miller
case tests like I likely should try if I was going to commit this thing. - Mark Miller lucidimagination.com On Jul 26, 2011, at 8:56 AM, Peter Keegan wrote: > Thanks Mark! The new patch is working fine with the tests and a few more. If > you have particular test cases in mind, I'd

Re: implicit closing of an IndexWriter

2011-07-26 Thread Mark Miller
On Jul 26, 2011, at 9:52 AM, Clemens Wyss wrote: > Side note: I am using threads when writing and theses threads are (by design) > interrupted (from time to time) Perhaps you are seeing this: https://issues.apache.org/jira/browse/LUCENE-2239 - Mark Miller lucidimaginati

Re: optimize with num segments > 1 index keeps growing

2011-09-12 Thread Mark Miller
> we should correct the javadocs for expungeDeletes here I think: so > that its more consistent with the javadocs for optimize? > > "Requests an expunge operation..." ? > +1 - it's a documentation bug now. - Mark Miller lu

Re: ElasticSearch

2011-11-17 Thread Mark Miller
The XML query parser can map to Lucene one to one as well - hasn't seemed to pick up enough steam to be included with Solr yet, but there has been some commotion so it's likely to go in at some point. Not enough demand yet I guess. https://issues.apache.org/jira/browse/SOLR-839 XML Query Parser Sup

Re: Regarding Compression Tool

2013-09-16 Thread Mark Miller
Have you considered storing your indexes server-side? I haven't used compression but usually the trade-off of compression is CPU usage which will also be a drain on battery life. Or maybe consider how important the highlighter is to your users - is it worth the trade-off of either disk space or bat

[ANNOUNCE] Apache Lucene 4.5.1 released.

2013-10-24 Thread Mark Miller
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 October 2013, Apache Lucene™ 4.5.1 available The Lucene PMC is pleased to announce the release of Apache Lucene 4.5.1 Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable

[ANNOUNCE] Apache Lucene 4.10.3 released

2014-12-29 Thread Mark Miller
case, please try another mirror. This also goes for Maven access. Happy Holidays, Mark Miller http://www.about.me/markrmiller - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java

Re: Lucene in action

2023-06-10 Thread Mark Miller
Nature abhors being anything but an author by name on a second tech book. The ruse is up after one when you have the inputs crystalized and the hourly wage in hand. Hard to find anything but executive producers after that. I’d shoot for a persuasive crowdfunding attempt.

Re: Analyzer at Query time

2008-08-28 Thread Mark Miller
Dino Korah wrote: Hi All, If I am to completely avoid the query parser and use the BooleanQuery along with TermQuery, RangeQuery, PrefixQuery, PhraseQuery, etc, does the search words still get to the Analyzer, before actually doing the real search? Many thanks, Dino Answer: no The Q

Re: phrases and slop

2008-08-28 Thread Mark Miller
Andy Goodell wrote: I thought I understood phrases and slop until one of my coworkers brought by the following example For a document that contains "quick brown fox" "quick brown fox"~0 "quick fox brown"~2 "fox quick brown"~3 all match. I would have expected "fox quick brown" to require a 4 i

Re: Performance, yet again

2008-09-02 Thread Mark Miller
Andre Rubin wrote: Hi all, Most of our queries are very simple, of the type: Query query = new PrefixQuery(new Term(LABEL_FIELD, prefix)); Hits hits = searcher.search(query, new Sort(new SortField(LABEL_FIELD))) You might want to check out solrs ConstantScorePrefixQuery and compare performa

Re: Performance, yet again

2008-09-02 Thread Mark Miller
Andre Rubin wrote: On Tue, Sep 2, 2008 at 10:16 AM, Mark Miller <[EMAIL PROTECTED]> wrote: Andre Rubin wrote: Hi all, Most of our queries are very simple, of the type: Query query = new PrefixQuery(new Term(LABEL_FIELD, prefix)); Hits hits = searcher.search(query, new So

Re: Lucene Memory Leak

2008-09-02 Thread Mark Miller
You should really close the IndexSearcher rather than the directory. Andy33 wrote: I have a memory leak in my lucene search code. I am able to run a few queries fine, but I eventually run out of memory. Please note that I do close and set to null the ivIndexSearcher object elsewhere. Here is the

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-04 Thread Mark Miller
Sounds like its more in line with what you are looking for. If I remember correctly, the phrase query factors in the edit distance in scoring, but the NearSpanQuery will just use the combined idf for each of the terms in it, so distance shouldnt matter with spans (I'm sure Paul will correct me

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-05 Thread Mark Miller
Paul Elschot wrote: Op Thursday 04 September 2008 20:39:13 schreef Mark Miller: Sounds like its more in line with what you are looking for. If I remember correctly, the phrase query factors in the edit distance in scoring, but the NearSpanQuery will just use the combined idf for each of the

Re: PhraseQuery issues - differences with SpanNearQuery

2008-09-05 Thread Mark Miller
SpanScorer will use the similarity slop factor for each matching span size to adjust the effective frequency. Regards, Paul Elschot You have pointed this out to me before. One day I will remember Every time I look things over again I miss it, and I couldn't find that email in the archive

Re: Frequently updated fields

2008-09-12 Thread Mark Miller
You might check out the tagindex issue in jira as well. Havn't looked at it myself, but I believe its supposed to be an option for this. Gerardo Segura wrote: I think the important question is: in general how to cope with frequently changing fields. Karl Wettin wrote: Hi Wojciech, can you

Re: StandardAnalyzer exclude numbers

2008-09-22 Thread Mark Miller
[EMAIL PROTECTED] wrote: Hello Is it possible to exclude numbers using StandardAnalyzer just like SimpleAnalyzer? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] Its possible bu

Re: StandardAnalyzer exclude numbers

2008-09-22 Thread Mark Miller
e a token filter? > > On Mon, Sep 22, 2008 at 8:36 PM, Mark Miller <[EMAIL PROTECTED]> wrote: > > >> [EMAIL PROTECTED] wrote: >> >> >>> Hello >>> >>> Is it possible to e

Re: sharing SearchIndexer

2008-09-25 Thread Mark Miller
simon litwan wrote: hi all i tried to reuse the IndexSearcher among all of the threads that are doing searches as described in (http://wiki.apache.org/lucene-java/LuceneFAQ#head-48921635adf2c968f7936dc07d51dfb40d638b82) this works fine. but our application does continuous indexing. so the

Re: QueryParser

2008-10-18 Thread Mark Miller
Right, just don't share the same instance across threads. - Mark On Oct 18, 2008, at 3:11 PM, "Rafael Almeida" <[EMAIL PROTECTED]> wrote: On queryparser's documentation says: "Note that QueryParser is not thread-safe." it only means that the same instance of QueryParser can't be used by mu

Re: Hiring etiquette

2008-10-19 Thread Mark Miller
Richard Marr wrote: Hi all, Is there a mailing-list-appropriate way to hire coders with Lucene experience? I don't want to just spam the list because I don't want to crap where I live. I'm a programmer not a recruiter if that makes any difference. Cheers, Rich

Re: Multi -threaded indexing of large number of PDF documents

2008-10-23 Thread Mark Miller
It sounds like you might have some thread synchronization issues outside of Lucene. To simplify things a bit, you might try just using one IndexWriter. If I remember right, the IndexWriter is now pretty efficient, and there isn't much need to index to smaller indexes and then merge. There is a

Re: Multi -threaded indexing of large number of PDF documents

2008-10-23 Thread Mark Miller
Glen Newton wrote: 2008/10/23 Mark Miller <[EMAIL PROTECTED]>: It sounds like you might have some thread synchronization issues outside of Lucene. To simplify things a bit, you might try just using one IndexWriter. If I remember right, the IndexWriter is now pretty efficient, and there

Re: Change the merge factor for an existing index?

2008-10-28 Thread Mark Miller
Just change it. Merges will start obeying the new merge factor seamlessly. - Mark On Oct 27, 2008, at 1:07 PM, Tom Saulpaugh <[EMAIL PROTECTED]> wrote: Hello, We are currently using lucene v2.1 and we are planning to upgrade to lucene v2.4. Can we change the merge factor for an existi

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Mark Miller
How many fields are you sorting on? Lots of unuiqe terms in those fields? - Mark On Oct 29, 2008, at 6:03 PM, "Todd Benge" <[EMAIL PROTECTED]> wrote: Hi, I'm the lead engineer for search on a large website using lucene for search. We're indexing about 300M documents in ~ 100 indices.

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-29 Thread Mark Miller
The term, terminfo, indexreader internals stuff is prob on the low end compared to the size of your field caches (needed for sorting). If you are sorting by String I think the space needed is 32 bits x number of docs + an array to hold all of the unique terms. So checking 300 million docs (I kn

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-30 Thread Mark Miller
adoop, nutch, solr & terracotta for possibilities such as index sharding. Has anyone implemented a solution using hadoop or terracotta for a large scale system? Just wondering the pro's / con's of the various approaches. Thanks, Todd On Wed, Oct 29, 2008 at 6:07 PM, Mark Miller &l

Re: Document marked as deleted

2008-10-30 Thread Mark Miller
John G wrote: I have an index with a particular document marked as deleted. If I use the search method that returns TopDocs and that deleted document satisfies the search criteria, will it be included in the returned TopDocs object even though it has been marked as deleted? Thanks in advance. J

Re: OutOfMemory Problems Lucene 2.4 / Tomcat

2008-10-31 Thread Mark Miller
20 fields on a huge index? Wow - not sure there is a ton you can do with that...anyone have any suggestions for that one? Distributed should help I suppose, but thats a lot of sort fields for a large index. If LUCENE-831 ever gets off the ground you will be able to change the cache used, and p

Re: Performance of never optimizing

2008-11-03 Thread Mark Miller
Am I missing your benchmark algorithm somewhere? We need it. Something doesn't make sense. - Mark Justus Pendleton wrote: Howdy, I have a couple of questions regarding some Lucene benchmarking and what the results mean[3]. (Skip to the numbered list at the end if you don't want to read the

Re: Performance of never optimizing

2008-11-03 Thread Mark Miller
will ensure you are reusing the same reader for each search. Hope to analyze further soon. - Mark Justus Pendleton wrote: On 03/11/2008, at 11:07 PM, Mark Miller wrote: Am I missing your benchmark algorithm somewhere? We need it. Something doesn't make sense. I thought I had includ

Re: searchable archives

2008-11-07 Thread Mark Miller
Or nabble or markmail - Mark On Nov 7, 2008, at 3:33 PM, Dragon Fly <[EMAIL PROTECTED]> wrote: http://www.gossamer-threads.com/lists/lucene/java-user/ Date: Fri, 7 Nov 2008 14:27:38 -0700 From: [EMAIL PROTECTED] To: java-user@lucene.apache.org Subject: searchable archives Hey, Is thi

Re: Multisearcher

2008-11-08 Thread Mark Miller
Not out of the box, but it's fairly trivial to copy multisesscher and modify it so that a different query goes to each suvsearcher. - Mark On Nov 8, 2008, at 5:45 AM, "Shishir Jain" <[EMAIL PROTECTED]> wrote: Hi, Doc1: Field1, Field2 Doc2: Field1, Field2 If I create Index such that Fie

Re: ScoreDoc

2008-11-09 Thread Mark Miller
people are interested in rather than all matching docs. Sorry for the confusion there - need to double check what I write... Mark Miller wrote: Their is definitely some stale javadoc in Lucene here and there. All of what your talking about has been shaken up recently with the deprecation of Hits

Re: Highlighter and Phrase Queries

2008-11-10 Thread Mark Miller
Check out the SpanScorer. - Mark On Nov 10, 2008, at 8:25 AM, "Sertic Mirko, Bedag" <[EMAIL PROTECTED] > wrote: [EMAIL PROTECTED] I am searching for a solution to make the Highlighter run property in combination with phrase queries. I want to highlight text with a phrase query like "w

Re: Boosting results

2008-11-10 Thread Mark Miller
Michael McCandless wrote: But: it's slow to load a field for the first time. LUCENE-1231 (column-stride fields) aims to greatly speed up the load time. Test it out though. In some recent testing I was doing it was *way* faster than I thought it would be based on what I had been reading. Of c

Re: AW: Highlighter and Phrase Queries

2008-11-10 Thread Mark Miller
, it works just like the non phrase/span aware Highlighter. - Mark Sertic Mirko, Bedag wrote: Hi Thank you for your response. Are there examples available? Regards Mirko -Ursprüngliche Nachricht- Von: Mark Miller [mailto:[EMAIL PROTECTED] Gesendet: Montag, 10. November 2008 14:45

Re: AW: AW: Highlighter and Phrase Queries

2008-11-10 Thread Mark Miller
- Von: Mark Miller [mailto:[EMAIL PROTECTED] Gesendet: Montag, 10. November 2008 15:38 An: java-user@lucene.apache.org Betreff: Re: AW: Highlighter and Phrase Queries Check out the unit tests for the highlighter and there are a bunch of examples. Its pretty much the same as using the standard

Re: ScoreDoc

2008-11-09 Thread Mark Miller
Their is definitely some stale javadoc in Lucene here and there. All of what your talking about has been shaken up recently with the deprecation of Hits. Hits used to pretty much be considered the non-expert API, but its been tossed in favor of the TopDoc API's. The HitCollector stuff has been

Re: IndexSearcher and multi-threaded performance

2008-11-11 Thread Mark Miller
Nice! An 8 core machine with a test ready to go! How about trying the read only mode that was added to 2.4 on your IndexReader? And if you you are on unix and could try trunk and use the new NIOFSDirectory implementation...that would be awesome. Those two additions are our current hope for

Re: IndexSearcher and multi-threaded performance

2008-11-11 Thread Mark Miller
And if you you are on unix and could try trunk and use the new NIOFSDirectory implementation...that would be awesome. Woah...that made 2.4 too. A 2.4 release will allow both optimizations. Many thanks! - To unsubscribe, e-m

Re: IndexSearcher and multi-threaded performance

2008-11-11 Thread Mark Miller
an FSDirectory. Thats a good point, and points out a bug in solr trunk for me. Frankly I don't see how its done. There is no code I can see/find to use it rather than FSDirectory. Still assuming there must be a way, but I don't see it... - Mark Any ideas? Cheers, Dmitri On Tue, Nov

Re: IndexSearcher and multi-threaded performance

2008-11-11 Thread Mark Miller
Mark Miller wrote: Thats a good point, and points out a bug in solr trunk for me. Frankly I don't see how its done. There is no code I can see/find to use it rather than FSDirectory. Still assuming there must be a way, but I don't see it... Ah - brain freeze. What else is new :) Y

Re: IndexSearcher and multi-threaded performance

2008-11-12 Thread Mark Miller
r? Or something? Mike Mark Miller wrote: Mark Miller wrote: Thats a good point, and points out a bug in solr trunk for me. Frankly I don't see how its done. There is no code I can see/find to use it rather than FSDirectory. Still assuming there must be a way, but I don't see it

Re: IndexSearcher and multi-threaded performance

2008-11-12 Thread Mark Miller
I'm thinking about it, so if someone else doesn't get something together before I have some free time... Its just not clear to me at the moment how best to do it. Michael McCandless wrote: Any takers for pulling a patch together...? Mike Mark Miller wrote: +1 - Mark On No

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller
If your new to Lucene, this might be a little much (and maybe I am not fully understand the problem), but you might try: Add the attributes to the words in a payload with a PayloadAnalyzer. Do searching as normal. Use the new PayloadSpanUtil class to get the payloads for the matching words. (T

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller
the class fully working. That said, if it can give me serious speed improvements it's definitely worth considering. - Greg On Wed, Nov 12, 2008 at 12:01 PM, Mark Miller <[EMAIL PROTECTED]> wrote: If your new to Lucene, this might be a little much (and maybe I am not fully understand

Re: Lucene implementation/performance question

2008-11-12 Thread Mark Miller
10 at a time. Depends on your usecase if its feasible or not though. Most find it efficient enough to do highlighting with, so I'm assuming it should be good enough here. Thanks again for your help on this one. - Greg On Wed, Nov 12, 2008 at 12:52 PM, Mark Miller <[EMAIL PROTECTED]> w

Re: LUCENE-831 (complete cache overhaul) -> mem use

2008-11-14 Thread Mark Miller
Its hard to predict the future of LUCENE-831. I would bet that it will end up in Lucene at some point in one form or another, but its hard to say if that form will be whats in the available patches (I'm a contrib committer so I won't have any real say in that, so take that prediction with a gra

Re: LUCENE-831 (complete cache overhaul) -> mem use

2008-11-15 Thread Mark Miller
Like I said, its pretty easy to add this, but its also going to suck. Kind of exposes the fact that its missing the right extensibility at the moment. Things are still a bit ugly overall. Your going to need new CacheKeys for the data types you want to support. A CacheKey builds and provides a

Re: InstantiatedIndex help

2008-11-16 Thread Mark Miller
Check out the docs at: http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/index.html There is a performance graph there to check out. The code should be fairly straightforward - you can make an InstantiatedIndex thats empty, or seed it with an IndexReader. Then you can make an Inst

Re: InstantiatedIndex help

2008-11-16 Thread Mark Miller
tedIndex(reader) ireader = iindex.indexReaderFactory() isearcher = IndexSearcher(ireader) Kind of round about way to get an InstantiatedIndex I guess,but maybe there's a briefer way? Thank you. Darren On Sun, 2008-11-16 at 10:50 -0500, Mark Miller wrote: Check out the docs at: http://lu

Re: Spread of lucene score

2008-11-19 Thread Mark Miller
excitingComm2 wrote: Hi everybody, as far as I know the lucene score is an arbitrary number between 0.0 and 1.0. Is this correct, that the scores in my resultset are always normalised to this spread or is it possible to get higher scores? Regards, John W. Hits is the class that did the norma

Re: Lucene implementation/performance question

2008-11-20 Thread Mark Miller
Yeah, discussion came up on order and I believe we punted - its up to you to track order and sort at the moment. I think that was to prevent those that didnt need it from paying the sort cost, but I have to go find that discussion again (maybe its in the issue?) I'll look at the whole idea agai

Re: # of fields, performance

2008-12-02 Thread Mark Miller
There is not much impact as long as you turn off Norms for the majority of them. - Mark On Dec 2, 2008, at 8:47 AM, Darren Govoni <[EMAIL PROTECTED]> wrote: Hi, I saw this question asked before without a clear answer. Pardons if I missed it in the archive elsewhere. Is there a serious deg

Re: lucene nicking my memory ?

2008-12-03 Thread Mark Miller
Careful here. Not only do you need to pass -server, but you need the ability to use it :) It will silently not work if its not there I believe. Oddly, the JRE doesn't seem to come with the server hotspot implementation. The JDK always does appear to. Probably varies by OS to some degree. Some

Re: NPE inside org.apache.lucene.index.SegmentReader.getNorms

2008-12-03 Thread Mark Miller
Sounds familiar. This may actually be in JIRA already. - Mark On Dec 3, 2008, at 6:25 PM, "Teruhiko Kurosaka" <[EMAIL PROTECTED]> wrote: Mike, You are right. There was an error on my part. I think I was, in effect, making a SpanNearQuery object of: new SpanNearQuery(new SpanQuery[0], 0,

Re: Open IndexReader read-only

2008-12-08 Thread Mark Miller
Chris Bamford wrote: So does that mean if you don't explicitly open an IndexReader, the IndexSearcher will do it for you? Or what? Right. The IndexReader takes a Directory, and the IndexSearcher takes an IndexReader - there are sugar constructors though - An IndexSearcher will also accept

Re: Fragment Highlighter Phrase?

2008-12-08 Thread Mark Miller
Ian Vink wrote: Is there a way to get phrases counted in the list of fragments that come back from Highlighter.GetBestFragments() in general. It seems to only take words into account. Ian Not sure I fully understand, but have you tried the SpanScorer? It allows the Highlighter to work with

Re: Open IndexReader read-only

2008-12-08 Thread Mark Miller
any of my own logic... Is there a suitable subclass I can use? The documented ones - FilterIndexReader, InstantiatedIndexReader, MultiReader, ParallelReader - all seem too complicated for what I need. My only requirement is to open it read-only! Am I missing something? Mark Miller wrote

Re: Open IndexReader read-only

2008-12-08 Thread Mark Miller
ory that takes a String, make a Directory, and use it to make an IndexReader that you build the IndexSearcher with. If its using a Directory, use that directory to make the IndexReader that is used for you IndexSearcher. Thanks for your continued help with this :-) Chris Mark Miller wrote: L

Re: Has anyone written SpanFuzzyQuery?

2008-12-09 Thread Mark Miller
http://issues.apache.org/jira/browse/LUCENE-522 note the bugs mentioned at the bottom. - Mark - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: GWT port of Lucene's QueryParser

2008-12-11 Thread Mark Miller
Paul Libbrecht wrote: Hello again list, has anyone tried to port or simply run the QueryParser of Lucene to GWT? It would look like a very nice thing to do to provide direct rendering of the query interpretation (it could be made into a whole editor probably, e.g. removing or selecting parts

Re: Field.omitTF

2008-12-18 Thread Mark Miller
Drops positions as well. - Mark On Dec 18, 2008, at 4:57 PM, "John Wang" wrote: Hi: In lucene 2.4, when Field.omitTF() is called, payload is disabled as well. Is this intentional? My understanding is payload is independent from the term frequencies. Thanks -John

Re: Field.omitTF

2008-12-18 Thread Mark Miller
for this field. */ void setOmitTf(boolean omitTf); - Mark John Wang wrote: Thanks Mark!I don't think it is documented (at least the ones I've read), should this be considered as a bug or ... ? Thanks -John On Thu, Dec 18, 2008 at 2:05 PM, Mark Miller wrote: Drops positi

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Mark Miller
Well look at the issues and see for yourself :) Its a subjective call I think. Heres my take: There are not going to be too many sweeping changes in the next release. There are tons of little bug fixes and improvements, but not a lot of the bullet point type stuff that you mention in your wish

Re: Approximate release date for Lucene 2.9

2008-12-18 Thread Mark Miller
Mark Miller wrote: TrieRangeQuery has been added to contrib. Super awesome, super efficient, large scale sorting. Sorry. Its way past my bedtime. Large scale numerical range searching. Sorting on the brain. - To

Re: Approximate release date for Lucene 2.9

2008-12-19 Thread Mark Miller
e. My understanding is certainly less than yours though :) - Mark Michael McCandless wrote: The new extensible TokenStream API (based on AttributeSource) is also in 2.9. Mike Mark Miller wrote: Well look at the issues and see for yourself :) Its a subjective call I think. Heres my take: Ther

Re: Optimize and Out Of Memory Errors

2008-12-23 Thread Mark Miller
Lebiram wrote: Also, what are norms Norms are a byte value per field stored in the index that is factored into the score. Its used for length normalization (shorter documents = more important) and index time boosting. If you want either of those, you need norms. When norms are loaded up into a

Re: Optimize and Out Of Memory Errors

2008-12-23 Thread Mark Miller
Mark Miller wrote: Lebiram wrote: Also, what are norms Norms are a byte value per field stored in the index that is factored into the score. Its used for length normalization (shorter documents = more important) and index time boosting. If you want either of those, you need norms. When norms

Re: Optimize and Out Of Memory Errors

2008-12-24 Thread Mark Miller
n norms data in scoring somehow? I'm just stumped as to how Luke is able to do a seach (with limit) on the docs but in my code it just dies with OutOfMemory errors. How does Luke not allocate these norms? ________ From: Mark Miller To: java-user@lucene.apac

Re: about TopFieldDocs

2009-01-05 Thread Mark Miller
Erick Erickson wrote: > The number of documents > is irrelevant here, what is relevant is the number of > distinct terms in your "fieldName" field. > Depending on the size of your index, the number of docs will matter though. You have to store the unique terms in a String[] array, but you also s

Re: ANNOUNCE: Welcome Patrick O'Leary as Contrib Committer

2009-01-16 Thread Mark Miller
Welcome Patrick! +1 for LocalLucene. patrick o'leary wrote: Thanks Folks I'm in the business well over a decade now; Started my career in my country of origin in Ireland, and have since lived & worked in UK and the US. I've also traveled extensively establishing development groups in remote of

Re: term offsets info seems to be wrong...

2009-01-16 Thread Mark Miller
Okay, Koji, hopefully I'll be more luckily suggesting this this time. Have you tried http://issues.apache.org/jira/browse/LUCENE-1448 yet? I am not sure if its in an applyable state, but I hope that covers your issue. On Fri, Jan 16, 2009 at 7:15 PM, Koji Sekiguchi wrote: > Hello, > > I'm writi

Re: Group by in Lucene ?

2009-01-28 Thread Mark Miller
Group-by in Lucene/Solr has not been solved in a great general way yet to my knowledge. Ideally, we would want a solution that does not need to fit into memory. However, you need the value of the field for each document. to do the grouping As you are finding, this is not cheap to get. Currentl

  1   2   3   4   5   6   7   >