Re: ComplexPhraseQuery problems with simple phrases

2010-02-19 Thread Mark Harwood
This is because phrases are expected to contain >1 clause and the ComplexPhraseQueryParser was expecting a BooleanQuery from the base class which is used to hold the elements in the phrase. In this single-clause scenario I guess we could silently hide the error and return whatever single query

Re: Lucene Filter

2010-03-03 Thread mark harwood
>>Document doc = searcher.doc(i); Isn't "i" your results order number as opposed to a lucene document id? I.e. the first result could be document id 3,232,432 but you are asking for doc #1 You need to get to doc id out of topDocs Cheers Mark - Original Message From: Dyutiman To:

Re: Incremental Field Updates

2010-03-27 Thread Mark Harwood
Of course introducing the idea of updates also introduces the notion of a primary key and there's probably an entirely separate discussion to be had around user-supplied vs Lucene-generated keys. That aside, the biggest concern for me here is the impact that this is likely to have on search -

Re: Incremental Field Updates

2010-03-28 Thread Mark Harwood
Of course introducing the idea of updates also introduces the notion of a primary key and there's probably an entirely separate discussion to be had around user-supplied vs Lucene-generated keys. Not sure I see that need. Can you explain your reasoning a bit more? If you want to update a do

Re: Incremental Field Updates

2010-03-29 Thread Mark Harwood
On 29 Mar 2010, at 07:45, Earwin Burrfoot wrote: Of course introducing the idea of updates also introduces the notion of a primary key and there's probably an entirely separate discussion to be had around user-supplied vs Lucene-generated keys. Not sure I see that need. Can you explain your re

Re: Incremental Field Updates

2010-03-29 Thread mark harwood
>I can delete by lucene-generated docId. Which users used to have to find by first coding a primary-key-term search. Delete by term removed this step to make life easier. >If someone needs this, it can be built over lucene, without >introducing it as a core feature and needlessly complicating

Re: Incremental Field Updates

2010-03-29 Thread mark harwood
>Variant d) sounds most logical? And enables all sorts of fun stuff. So the duplicate-key docs can have different values for initial-insert fields but partial updates will cause sharing of a common field value? And subsequent same-key doc inserts do or don't share these previous "partial-update

Re: Incremental Field Updates

2010-03-29 Thread mark harwood
>Who ever said that some_condition should point to a unique document? My assumption was, for now, we were still talking about the simpler case of updating a single document. If we extend the discussion to support set-based updates it's worth considering the common requirements for updating set

Re: Incremental Field Updates

2010-03-29 Thread mark harwood
r.updateDocument(term,doc) method for inserts. Cheers, Mark From: Grant Ingersoll To: java-dev@lucene.apache.org Sent: Mon, 29 March, 2010 13:11:56 Subject: Re: Incremental Field Updates On Mar 29, 2010, at 2:26 AM, Mark Harwood wrote: > > >> &g

Re: Proposed Lucene modification - FieldCollector

2005-03-09 Thread mark harwood
>>To get complete statistics like >>above, you currently have to iterate through the result >> set and pull each Document from the Hits. Not necessarily true. You can use TermVectors or an indexed field eg "doctype" to derive this stuff without stored fields. Here's an example of how I've done it

Re: Proposed Lucene modification - FieldCollector

2005-03-10 Thread mark harwood
>>To get complete statistics like >>above, you currently have to iterate through the result >> set and pull each Document from the Hits. Not necessarily true. You can use TermVectors or an indexed field eg "doctype" to derive this stuff without stored fields. Here's an example of how I've done it

Re: Initially creating index throws out of memory

2005-04-11 Thread mark harwood
By default Lucene does not have a setting that allows you to control memory usage directly in terms of bytes of RAM. It does offer IndexWriter.setMaxBufferedDocs which dictates how many documents are accumulated in RAM (which is obviously fast) before the RAM is flushed to disk. Setting this value

Re: ParallelReader

2005-04-29 Thread mark harwood
An equivalent Parallelizer for IndexWriter would be a useful addition to keep the two indexes in synch. Hiding the details of which lucene index document data is retrieved from gives us some added flexibility in storage options but I've been thinking of a more general-purpose layer of abstraction

2nd call - [Vote] Wolfgang Hoschek for committer

2005-07-11 Thread mark harwood
Responses were light last time around: I'd like to propose Wolfgang Hoschek should be given commit rights to maintain his MemoryIndex contribution. ___ How much free photo storage do you get? Store your holiday snaps for

Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/

2005-11-15 Thread mark harwood
> That would use more memory, but still permit ranked > searches. Worth it? Not sure. I expect FuzzyQuery results would suffer if the edit distance could no longer be factored in. At least there's a quality threshold to limit the more tenuous matches but all matches below the threshold would be

Re: Performance issues with ConjunctionScorer

2005-11-22 Thread mark harwood
The Highlighter in the lucene "contrib" section has a class called TokenSources which tries to find the best way of getting a TokenStream. It can build a TokenStream from either: a) an Analyzer b) TermPositionVector (if the field was created with one in the index) You may find that using TermPosit

Re: How build something like smart tags

2005-11-25 Thread mark harwood
See IBM's UIMA project or "Gate" for Entity extraction tools. Cheers Mark ps this is a java-user question, not a java-dev topic. --- "Mario Alejandro M." <[EMAIL PROTECTED]> wrote: > I'm building a search engine couple with database > info... I wanna to detect > things like phone-numbers, adr

Re: open source YourKit licence

2005-11-30 Thread mark harwood
> a) do any other committers want a license, and I'd appreciate a license. > b) would we be willing to put their logo somewhere > in exchange? That seems a fair exchange provided that we 1) find the product useful and 2) that it doesn't contravene any apache directives about use of their infrast

"Advanced" query language

2005-12-02 Thread mark harwood
There seems to be a growing gap between Lucene functionality and the query language offered by QueryParser (eg no support for regex queries, span queries, "more like this", filter queries, minNumShouldMatch etc etc). Closing this gap is hard when: a) The availability of Javacc+Lucene skills is a b

Re: "Advanced" query language

2005-12-05 Thread mark harwood
> is a more clear syntax have: > > QUERY title,date,size, content WHERE (title LIKE > 'foo*' OR size>=0) Let's not forget that unlike most query languages which are boolean (things either match or they don't) Lucene has many facilities for influencing the degree to which matches occur. A lot of

Re: "Advanced" query language

2005-12-06 Thread mark harwood
> but we should also allow for the client to push the > analysis > responsibility to the server: Yet another variation we could support is to use the existing QueryParser server-side for handling user-typed input. On the client user input is unparsed and combined with the lower-level constraints

Re: "Advanced" query language

2005-12-15 Thread mark harwood
> While SAX is fast, I've found callback interfaces > more difficult to > deal with while generating nested object graphs... > it normally > requires one to maintain state in stack(s). I've gone to some trouble to avoid the effects of this on the programming model. Stack management is handled by t

Re: "Advanced" query language

2005-12-16 Thread mark harwood
I don't think DOM and RAM is necessarily an issue. The object construction process accesses the content in the same order that a SAX based path takes so that just seems an appropriate approach. There is no need to leap around the structure in any other way from what I can see, which is where DOM w

Re: "Advanced" query language

2005-12-20 Thread mark harwood
>However the moment you are promoting INTEROPERABILITY with other >search/retrieval systems by XMLizing the query input and the >result output, like Mark is, then it makes sense to adhere to >standards I think this is hijacking my original intentions to some extent. I may be accused of being shor

Re: "Advanced" query language

2005-12-22 Thread mark harwood
Hi Chris, Thanks for taking the time to review this. > 1) I aplaud the plugable nature of your solution. That's definitely a worthwhile objective. > 2) Digging into what was involved in writting an > ObjectBuilder, I found... > don't really feel like > the API has a very clean seperation from SAX

Re: "Advanced" query language

2005-12-22 Thread mark harwood
Sorry, slip of keyboard meant I posted last message mid-edit. Hi Chris, Thanks for taking the time to review this. > 1) I aplaud the plugable nature of your solution. I think that's definitely a worthwhile objective. > 2) Digging into what was involved in writting an > ObjectBuilder, I found...

Re: "Advanced" query language

2005-12-23 Thread mark harwood
I suspect it's a little too ambitious to provide a unifying common abstraction which wraps event based *and* "pull" parser approaches. I'm personally happier to stick with one approach, preferably with an existing, standardized interface which lets me switch implementations. I didn't really want

Re: Search agents

2006-01-04 Thread mark harwood
Yes, I've found MemoryIndex to be very fast for this kind of thing. This contribution can be used to further optimize and shortlist the queries to be run against the new document sat in MemoryIndex. ___ To help you stay sa

Re: "Advanced" query language

2006-01-04 Thread mark harwood
This example code looks interesting. If I understand correctly using this approach requires that builders like the "q" QueryObjectBuilder instance must be explicitly registered with each and every builder that consumes its type of output eg BQOB and FQOB. An alternative would be to register "q" jus

Preventing "killer" queries

2006-02-07 Thread mark harwood
I've just been doing some benchmarking on a reasonably large-scale system (38 million docs) and ran into an issue where certain *very* common terms would dramatically slow query responses. Some terms were abnormally common because I had constructed the index by taking several copies and merging th

Re: Preventing "killer" queries

2006-02-08 Thread mark harwood
Thanks for the comments, Chris/Doug. Chris, although I suggested it initially, I'm now a little uncomfortable in controlling this issue with a static variable in TermQuery because it doesnt let me have different settings for different queries, indexes or fields. Doug, I'd ideally like to optimize

Re: Re Indexing

2006-02-23 Thread mark harwood
The approach I am currently using is (pseudo code): select count(*) from docs where date_modified > lastIndexRunDate if ((countChangedOrNew/reader.numDocs) >50%) { //quicker to rebuild the whole index wipeIndex; Select * from docs for (each record)

XML Query Parser - next steps

2006-02-24 Thread mark harwood
Before I commit this stuff to contrib I wanted to sound out dev members on directions for this code. We currently have an extensible parser with composable "builder" modules. These builders currently only have a role in life which involves parsing particular XML chunks and instantiating the relate

Re: Developper Question - Highlighting

2006-03-30 Thread mark harwood
Please post "how do I?" questions to the Java-user group. The dev list is for people maintaining the core Lucene code. >>because lucene does not store the text contents >>in index It does if you want it to. See the Field.Store.Yes property when adding new docs. The Highlighter class in the cont

Re: Lazy Field Loading

2006-03-31 Thread mark harwood
I'd prefer option 4. Users should expect to provide some form of guidance to the engine about how they are going to access the data if it is expected to be retrieved efficiently. Preferably this choice of field loading policy should NOT be "baked in" at index time because index access patterns ca

Re: Lazy Field Loading

2006-03-31 Thread mark harwood
> I don't think option 3 is baked in at indexing time. Sorry, I misread it. Yes, that is another option. So if options 3 and 4 are about search-time selection (based on size and fieldname respectively) can they be generalized into a more wide-reaching retrieval API? You can imagine a high-level

Query.extractTerms - a poor introspection API?

2006-04-06 Thread mark harwood
Having switched the highlighter over from lots of Query-specific code to using the generic Query.extractTerms API I realize I have both gained something (support for all query types) and lost something (detailed boost info for each term in the tree eg Fuzzy spelling variants). The boost info was us

Re: Query.extractTerms - a poor introspection API?

2006-04-06 Thread mark harwood
> It's still the case that you often need to know what > type of query the > parent is. For highlighting purposes I typically don't need/want to concern myself too much with precisely interpreting the specifics of all Query logic: * For Boolean queries the "mustNot" terms typically don't appear in

Re: SentenceHighlighter

2006-04-19 Thread mark harwood
If you are wanting to select highlights from a document where only whole sentences are the fragments selected you will need to implement a custom Fragmenter class. This will need to look for sentence boundaries eg a "." followed by whitespace only, then a word with an uppercase first character. I

Re: trivial util to Visualize BitSets (Query results actually)

2006-05-31 Thread mark harwood
I added something similar to Luke but without the colour intensity - I may add your code in to do this. Another Luke plugin I have visualizes "vocabulary growth" for a field as a chart over time. This is useful to see if a field is "matured" or is still accumulating new terms. A Zipf term distribut

RE: Luke - in need of maintainer

2006-06-01 Thread mark harwood
I can pick this up, but I don't think I've got much more bandwidth than Andrzej to work on it. I certainly don't have the time now for a port to an Apache-friendly GUI framework but ultimately I think Luke should end up under the "contrib" section where it can be managed and benefit from the atten

Re: Edit-distance strategy

2006-06-08 Thread mark harwood
FWIW, I integrated sourceforge's "SecondString" algos (http://secondstring.sourceforge.net/javadoc ) and others using a callout interface which boiled down to: float getDifference(String a, String b) This seemed to be the cleanest lowest-common-denominator standard for plugging in string co

RangeQuery - rewrite to a RangeFilter in a ConstantScoreQuery?

2006-09-25 Thread mark harwood
Given the trouble people routinely get themselves into using RangeQuery would it make sense to change the "rewrite" method to generate a ConstantScoreQuery wrapping a RangeFilter? The only disadvantages I can see would be: 1) Scoring would change - some users may find their apps produce differe

Re: "xml" query parser, except with JSON

2007-01-30 Thread mark harwood
Hi Erik, I've not done much with JSON but it looks like it might be an interesting approach. >>Is it possible to introduce a new serialization format without rewriting the >>whole parser? Unfortunately, the XML DOM permeates it's way throughout the existing parser design. Some initial observ

Re: MatchAllDocs in BooleanQuery.rewrite

2007-02-02 Thread mark harwood
>>are there any legitimate usecases for calling rewrite other then when a >>Searcher is about to execute the query? When using the highlighter it is recommended to use a rewritten query e.g. to get all the variations for a fuzzy query. However I don't think there should be a problem with the aut

Exposing a public Filter getFilter() method in ConstantScoreQuery

2007-02-13 Thread mark harwood
Any objections to me adding this read-only method to ConstantScoreQuery? I need to discover RangeFilters etc wrapped in ConstantScoreQuerys as part of a generic query optimiser/analyser. Cheers, Mark _

Exposing RangeFilter.getFieldName() etc

2007-02-14 Thread mark harwood
Yonik, thanks for the ConstantScoreQuery.getFilter() addition yesterday. Following the same principle of "enabling query inspection", any objections to exposing read-only access to the criteria for a RangeFilter? I'm happy to make the change but possibly unable to access SVN in time if a 2.1 r

Re: Lius into apache incubator

2007-02-28 Thread mark harwood
Hi Rida, I've been talking with Jukka Zitting (involved in Nutch) about parsing/Tika and we started to sketch out some project objectives on the Wiki over there which may be of interest: http://code.google.com/p/tika/w/list I recently did a round-up of the main open source projects which mainta

Re: LIA2 on l.a.o/java OK?

2009-02-20 Thread mark harwood
I'm OK with LIA2 on the front page - as Erik suggests it does help lend credibility to a project. I encounter organisations who are nervous about buying into an open-source solution and having books up there on the home page immediately helps establish the following: 1) The APIs are stable en

Re: Welcome Uwe Schindler as Lucene committer!

2009-05-18 Thread mark harwood
Welcome, Uwe. Great work on the Trie piece - now if you could just settle the "Tree" vs "Try" pronunciation dilemma . :) - Original Message From: Mark Miller To: java-dev@lucene.apache.org Sent: Monday, 18 May, 2009 17:46:51 Subject: Re: Welcome Uwe Schindler as Lucene committ

Re: Lucene's default settings & back compatibility

2009-05-19 Thread mark harwood
>When you create IndexReader, IndexWriter and others, you must pass in a >Settings > instance. I think this would also help solve the steady growth of constructor variations (18 in 2.4's IndexWriter vs 3 in Lucene 1.9). - Original Message From: Otis Gospodnetic To: java-dev@luce

Re: WebLuke - include Jetty in Lucene binary distribution?

2009-06-08 Thread mark harwood
Hi John/Grant. I haven't done any more in developing WebLuke - although still use it regularly. As Grant suggests there was an unease (mine) about bloating the Lucene distribution size with GWT dependencies so it wasn't rolled into contrib. However I guess I'm comfortable if no one else is conce

Re: [jira] Commented: (LUCENE-1685) Make the Highlighter use SpanScorer by default

2009-06-11 Thread Mark Harwood
+1 On 11 Jun 2009, at 21:32, Michael McCandless (JIRA) wrote: [ https://issues.apache.org/jira/browse/LUCENE-1685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12718629 #action_12718629 ] Michael McCandless commented on LUCENE-1685: -

Re: Improving TimeLimitedCollector

2009-06-24 Thread Mark Harwood
I think the Collector approach makes the most sense to me, since it's the only object I fully control in the search process. I cannot control Query implementations, and I cannot control the decisions made by IndexSearcher. But I can always wrap someone else's Collector with TLC and pass it

Re: Improving TimeLimitedCollector

2009-06-26 Thread Mark Harwood
Going back to my post re TimeLimitedIndexReaders - here's an incomplete but functional prototype: http://www.inperspective.com/lucene/TimeLimitedIndexReader.java http://www.inperspective.com/lucene/TestTimeLimitedIndexReader.java The principle is that all reader accesses check a volatile vari

Re: Improving TimeLimitedCollector

2009-06-27 Thread Mark Harwood
Aside, how about using a PQ for the threads' times, or a TreeMap. That will save looping over the collection to find the next candidate. Just an implementation detail though. Shai On Sat, Jun 27, 2009 at 3:31 AM, Mark Harwood wrote: Going back to my post re TimeLimitedIndexR

Re: Improving TimeLimitedCollector

2009-06-27 Thread Mark Harwood
Odd. I see you're responding to a message from Shai I didn't get. Some mail being dropped somewhere along the line.. Why don't you use Thread.interrupt(), .isInterrupted() ? Not sure where exactly you mean for that? I'm not sure I understand that - how can a thread run >1 activity simult

Re: FuzzyLikeThis query and exact matches

2009-08-27 Thread Mark Harwood
Despite making IDF a constant the edit distance should remain a factor in the rankings so I would have thought this would give you what you need. Can you supply a more detailed example? Either print the rewritten query or use the explain function Cheers Mark On 27 Aug 2009, at 13:22, Ber

Re: FuzzyLikeThis query and exact matches

2009-08-27 Thread Mark Harwood
I think those boosts shown are reflecting the edit distance. What we can't see from this is that the Similarity class used in execution is using the same IDF for all terms. The other factors at play will be the term frequency in the doc, its length and any doc boost. I don't have access to the c

Re: [jira] Commented: (LUCENE-1781) Large distances in Spatial go beyond Prime MEridian

2009-09-11 Thread mark harwood
>> It seems like something higher up must accept two rects and OR them together >> during the searching? That's the way I've done it before. It's like the old "Asteroids" arcade game where as the ship drifts off-screen stage right it is simultaneously emerging back from stage-left. - Or

Re: WebLuke - include Jetty in Lucene binary distribution?

2008-04-25 Thread mark harwood
>>Why don't use ivy or maven for that? That would resurrect the Ant vs Maven debate around build systems. Not having used Maven I don't feel qualified to comment. Stefan, the Winstone server appears to be LGPL not Apache which also adds some complexity. The GWT compiler is the main cause of the

Re: Moving SweetSpotSimilarity out of contrib

2008-09-03 Thread mark harwood
Not tried SweetSpot so can't comment on worthiness of moving to core but agree with the principle that we can't let the hassles of a company's "due diligence" testing dictate the shape of core vs contrib. For anyone concerned with the overhead of doing these checks a company/product of potentia

Re: Can I filter the results returned by IndexReader.terms(term)?

2008-09-03 Thread mark harwood
One way is to read TermDocs for each candidate term and see if they are in your filter - but that sounds like a lot of disk IO to me when responding to individual user keystrokes. You can use "skip" to avoid reading all term docs when you know what is in the filter but it all seems a bit costly.

Re: Realtime Search for Social Networks Collaboration

2008-09-07 Thread mark harwood
Interesting discussion. >>I think we should seriously look at joining efforts with open-source Database >>engine projects I posted some initial dabblings here with a couple of the databases on your list :http://markmail.org/message/3bu5klzzc5i6uhl7 but this is not really a scalable solution

Re: Extending query parser with MinShouldMatch syntax

2008-09-13 Thread Mark Harwood
You might want to try the XML query parser in contrib. I deliberately created this to allow remote clients to have full control over lucene (filters, caching etc) without trying to bloat the standard query parser with special characters. On 13 Sep 2008, at 18:26, "Shai Erera" <[EMAIL PROTECTE

Re: RMI, Searchable and RemoteSearchable

2008-09-26 Thread mark harwood
>>since not many people, I think, even use the RMI stuff I certainly binned RMI in my distributed work. It just would not reliably stop/restart cleanly in my experience - despite following all the RMI guidelines for clean shutdowns. I'd happily see all RMI dependencies banished from core. Che

Re: [VOTE] Release Lucene 2.4.0

2008-10-03 Thread mark harwood
Hi Mike, Given the repackaging any chance you can sneak in 2 contrib fixes I added recently? Null pointer introduced to clients dropping in 2.4 upgrade - http://svn.apache.org/viewvc?view=rev&revision=700815 Bug in fuzzy matching - http://svn.apache.org/viewvc?

Re: [VOTE] Release Lucene 2.4.0

2008-10-07 Thread mark harwood
> >>> Let's start a new VOTE to release these artifacts (derived from >>> svn rev 701445) as Lucene 2.4.0: >>> >>> http://people.apache.org/~mikemccand/staging-area/lucene2.4take3 >>> >>> Here's my vote: +1. >>> >>> Mike >

Re: Adding dependency to servlet-api

2008-11-05 Thread mark harwood
Just checked Solr (forgot about that obvious precedent!) and they have it in trunk/lib and an entry in trunk/notice.txt which reads: " Includes software from other Apache Software Foundation projects, including, but not limited to: - Apache Tomcat (lib/servlet-api-2.4.jar)

Re: Adding dependency to servlet-api

2008-11-05 Thread mark harwood
ed it often, as it would be great to provide the breadth of query types that your parser can create. Erik On Nov 5, 2008, at 4:16 AM, mark harwood wrote: > Just checked Solr (forgot about that obvious precedent!) and they have it in > trunk/lib and an entry in trunk/notice.txt

Re: [jira] Created: (LUCENE-1486) Wildcards, ORs etc inside Phrase queries

2008-12-11 Thread mark harwood
I'm not sure I see an easy translation of "copyright !mycompany" into SpanQueries which is how all the other queries are being converted. SpanNotQuery isn't applicable here because that only tests spans don't overlap. Yonik's approach looks good. - Original Message From: Yonik Seeley

Highlighting - catering for all query types

2009-10-19 Thread mark harwood
I've been putting together some code to support highlighting of opaque query clauses (cached filters, trie range, spatial etc etc) which shows some promise. This is not intended as a replacement for the existing highlighter(s) which deal with free-text but is instead concentrating on the hard-to

Re: Is this correct: term.field() == fieldName ?

2007-03-21 Thread mark harwood
>>Is it correct to compare using '==' or equals should be used instead? In this context it is OK. Term fieldnames are deliberately interned using String.intern() so this equality test can be used. The intention is to make comparisons faster. Cheers, Mark - Original Message From: dmitr

Re: How to handle servlet-api.jar in build?

2007-06-12 Thread mark harwood
Thanks for the pointers Paul. >>I just don't think you can 'package' up a distribution that includes these >>jars in your distribution. Clearly the binary distribution need not bundle servlet-api.jar - a demo.war file is all that is needed. However, is the source distribution exempt from this r

Re: The JDK 1.5 Can o' Worms

2007-07-25 Thread mark harwood
>>Mostly, though, I think it gives Lucene Java the feel that we are behind. >>Isn't 1.6 the actual official release at this point? I wouldn't say "behind", just concerned about enabling Lucene for all - in the same way popular websites might choose broad accessibility over using the latest AJ

Re: Fwd: Decouple Filter from BitSet: API change and xml query parser

2007-08-10 Thread mark harwood
ava-dev@lucene.apache.org Sent: Friday, 10 August, 2007 5:31:02 PM Subject: Re: Fwd: Decouple Filter from BitSet: API change and xml query parser On Friday 10 August 2007 13:12, mark harwood wrote: > >>Could someone give me a clue as to why the test case TestRemoteCachingWrapperFilter fails

Re: Fwd: Decouple Filter from BitSet: API change and xml query parser

2007-08-10 Thread mark harwood
ader); {return null;} abstract public Matcher getMatcher(IndexReader); } Finally, are DocIdSet and DocIdSetIterator currently part of Lucene? I don't know how to go about these. Regards, Paul Elschot -- Forwarded Message -- Subject: [jira] Commented: (LUCE

Web-based Luke

2007-11-12 Thread mark harwood
I'm putting together a Google Web Toolkit-based version of Luke: http://www.inperspective.com/lucene/Luke.war ( Just add your version of lucene core jar to WEB-INF/lib subdirectory and you should have the basis of a web-enabled Luke.) The intention behind this is to port Luke to a wholly Apach

Re: Web-based Luke

2007-11-14 Thread mark harwood
>>This is neat, Mark! Thanks - GWT rocks. >>Then it became clear to me that it's actually the _remote_ filesystem one is looking at (the server's). Yes, that's a potentially worrying security issue that needs locking down carefully. I think one mode of operation should be that Luke Server is st

Re: WebLuke - include Jetty in Lucene binary distribution?

2007-12-10 Thread mark harwood
>>I don't know that we have ever checked in IDE settings GWT development is much easier with the IDE and there is a fair amount of manual setup required without the settings to run the "hosted" development environment. Hosted development is the key productivity benefit and allows debugging in J

Re: JBoss Cache as a store

2008-01-29 Thread mark harwood
Hi Manik, >>> Is there a set of tests in the Lucene sources I could use to test the "JBCDirectory", as I call it? You would probably need to adapt existing Junit tests in contrib/benchmark and src/test for performance and functionality testing, respectively. They use the

Out of memory - CachingWrappperFilter and multiple threads

2008-02-18 Thread mark harwood
I'm chasing down a bug in my application where multiple threads were readingand caching the same filter (same very common term, big index) and causedan Out of Memory exception when I would expect there to be plenty ofmemory to spare. There's a number of layers to this app to investigate (I was us

Re: Out of memory - CachingWrappperFilter and multiple threads

2008-02-18 Thread mark harwood
or, and then synchronize on the cache while even while calling filter.bits(reader). This is safe when the cache is private. Regards, Paul Elschot Op Monday 18 February 2008 13:50:16 schreef mark harwood: > I'm chasing down a bug in my application where multiple threads were >

Re: Out of memory - CachingWrappperFilter and multiple threads

2008-02-18 Thread mark harwood
<[EMAIL PROTECTED]> To: java-dev@lucene.apache.org Sent: Monday, 18 February, 2008 4:38:10 PM Subject: Re: Out of memory - CachingWrappperFilter and multiple threads Op Monday 18 February 2008 17:08:50 schreef mark harwood: > >>Even though you only have two threads... how many di

[jira] Updated: (LUCENE-725) NovelAnalyzer - wraps your choice of Lucene Analyzer and filters out all "boilerplate" text

2010-02-08 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-725: Attachment: NovelAnalyzer.java Updated for new 3.0 APIs > NovelAnalyzer - wraps your choice

[jira] Updated: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-10 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-1720: - Attachment: ActivityTimeMonitor.java TestTimeLimitedIndexReader.java

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-11 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832444#action_12832444 ] Mark Harwood commented on LUCENE-1720: -- Thanks for the updates, Shai. Agree

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-11 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832470#action_12832470 ] Mark Harwood commented on LUCENE-1720: -- The change to ATM isn't that big

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-11 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832483#action_12832483 ] Mark Harwood commented on LUCENE-1720: -- Agreed, might be useful to provide boo

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-11 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832500#action_12832500 ] Mark Harwood commented on LUCENE-1720: -- I'll pick this up > TimeLimite

[jira] Updated: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-11 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-1720: - Attachment: Lucene-1720.patch Moved ATM to o.a.l.util package Added isProjectedToTimeout method

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-11 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832721#action_12832721 ] Mark Harwood commented on LUCENE-1720: -- bq. When's this ready to test with

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-12 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12832987#action_12832987 ] Mark Harwood commented on LUCENE-1720: -- bq. I also want to a

[jira] Updated: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-12 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Harwood updated LUCENE-1720: - Attachment: Lucene-1720.patch Updated patch with TestTimeLimitingIndexReader and changes to

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-12 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833013#action_12833013 ] Mark Harwood commented on LUCENE-1720: -- bq. I think we should add some se

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833822#action_12833822 ] Mark Harwood commented on LUCENE-329: - The problem with ignoring IDF completel

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833822#action_12833822 ] Mark Harwood commented on LUCENE-329: - The problem with ignoring IDF completel

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-15 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833833#action_12833833 ] Mark Harwood commented on LUCENE-1720: -- bq. Anyway, I'm putting that asid

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833840#action_12833840 ] Mark Harwood commented on LUCENE-329: - My "best-practice" suggestion i

[jira] Commented: (LUCENE-1720) TimeLimitedIndexReader and associated utility class

2010-02-15 Thread Mark Harwood (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-1720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12833863#action_12833863 ] Mark Harwood commented on LUCENE-1720: -- bq. BTW found and fixed a bu

  1   2   3   >