from:"Jason Rutherglen"

Re: [jira] [Commented] (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2011-03-31 Thread Jason Rutherglen

Dr On Mar 31, 2011 9:44 AM, Simon Willnauer (JIRA) j...@apache.org wrote: [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013974#comment-13013974] Simon Willnauer commented on LUCENE-2573:

Setting the max number of merge threads across IndexWriters

2011-04-14 Thread Jason Rutherglen

Today the ConcurrentMergeScheduler allows setting the max thread count and is bound to a single IndexWriter. However in the [common] case of multiple IndexWriters running in the same process, this disallows one from managing the aggregate number of merge threads executing at any given time. I

Re: Setting the max number of merge threads across IndexWriters

2011-04-14 Thread Jason Rutherglen

Willnauer simon.willna...@googlemail.com wrote: On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Today the ConcurrentMergeScheduler allows setting the max thread count and is bound to a single IndexWriter. However in the [common] case of multiple

Re: Setting the max number of merge threads across IndexWriters

2011-04-15 Thread Jason Rutherglen

for this? simon On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen jason.rutherg...@gmail.com wrote: I think the proposal involved using a ThreadPoolExecutor, which seemed to not quite work as well as what we have. I think it'll be easier to simply pass a global context that keeps

Re: modularization discussion

2011-05-05 Thread Jason Rutherglen

+1 to Mike's proposal here. Each of these could easily be patches/issues. The top ones would probably be the basics, eg, faceting and schemas. As the easiest short term solution for allowing other systems to use Solr or it's features, it would be great if a 'committer' responded to SOLR-1431.

Field should accept BytesRef?

2011-05-15 Thread Jason Rutherglen

In the Field object a text value must be of type string, however I think we can allow a BytesRef to be passed in? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Field should accept BytesRef?

2011-05-16 Thread Jason Rutherglen

: Sunday, May 15, 2011 6:22 PM To: dev@lucene.apache.org Subject: Re: Field should accept BytesRef? On Sun, May 15, 2011 at 12:05 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: In the Field object a text value must be of type string, however I think we can allow a BytesRef

Re: FST and FieldCache?

2011-05-19 Thread Jason Rutherglen

maybe thats because we have one huge monolithic implementation Doesn't the DocValues branch solve this? Also, instead of trying to implement clever ways of compressing strings in the field cache, which probably won't bare fruit, I'd prefer to look at [eventually] MMap'ing (using DV) the field

Re: FST and FieldCache?

2011-05-19 Thread Jason Rutherglen

This is more about compressing strings in TermsIndex, I think. Ah, because they're sorted. I think if the string lookup cost degrades then it's not worth it? That's something that needs to be tested in the MMap case as well, eg, are ByteBuffers somehow slowing down everything by a factor of

Re: FST and FieldCache?

2011-05-19 Thread Jason Rutherglen

slowly. If the user wishes to improve performance it's easy enough to add more hardware. On Thu, May 19, 2011 at 6:40 AM, Michael McCandless luc...@mikemccandless.com wrote: On Thu, May 19, 2011 at 9:22 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: maybe thats because we have one

Re: FST and FieldCache?

2011-05-19 Thread Jason Rutherglen

wrote: On Thu, May 19, 2011 at 10:09 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: When you mmap them you let the OS decide when to swap stuff out which mean you pick up potentially high query latency waiting for these pages to swap back in Right, however if one is using lets say SSDs

Per-segment faceting?

2011-05-30 Thread Jason Rutherglen

What state is this in? Cheers. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: [jira] [Commented] (SOLR-2193) Re-architect Update Handler

2011-05-31 Thread Jason Rutherglen

of Jason's concerns here (to the dev list)... we should stick to technical feedback on the issue: On Mon, May 30, 2011 at 11:54 PM, Jason Rutherglen (JIRA) j...@apache.org wrote: It's been clear for quite a while that you folks at Lucid are trying to protect your golden goose, eg, Solr from

Re: apache lucene , katta, hadoop synchronization issue

2011-06-02 Thread Jason Rutherglen

how can we synchronize our Katta nodes indexes sync with our database updates and deletes operations Katta doesn't provide NRT or incremental indexing because it's a write once architecture. Though you can verify on the Katta mailing list. On Thu, Jun 2, 2011 at 4:53 AM, Ghulam Mustafa

Storing and loading the FST directly from disk

2011-06-02 Thread Jason Rutherglen

Is it possible to iterate over the FST while it's still on disk? If not is that type of functionality planned? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Storing and loading the FST directly from disk

2011-06-02 Thread Jason Rutherglen

seek instead). On Thu, Jun 2, 2011 at 10:54 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Is it possible to iterate over the FST while it's still on disk? If not is that type of functionality planned

Re: Storing and loading the FST directly from disk

2011-06-02 Thread Jason Rutherglen

. Loading the keys into heap with the values still on disk/system IO cache probably is non-optimal. On Thu, Jun 2, 2011 at 8:45 PM, Robert Muir rcm...@gmail.com wrote: On Thu, Jun 2, 2011 at 11:39 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: MMAP changes nothing: as its not a sequential

Re: Storing and loading the FST directly from disk

2011-06-02 Thread Jason Rutherglen

if you want to use it for this purpose, you don't need an FST, you can just use an NFA/DFA of prefixes instead, as you only need to answer accept/reject. Right, however if we already have the FST, then if it supports accept/reject efficiently, we'd simply reuse it (thereby removing the bloom

Re: Storing and loading the FST directly from disk

2011-06-02 Thread Jason Rutherglen

PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: because of how the datastructure is laid out. if you use mmap, instead of lots of seeks, it will simply be lots of page faults instead. Ok, right, however assuming the pages are in RAM it could/should be ok. Eg, in HBase we'd want

Re: Storing and loading the FST directly from disk

2011-06-03 Thread Jason Rutherglen

IMO the byte[] representation is so compact that it doesn't really matter if you use FS Cache memory or JVM memory so I'd rather go for jvm memory here too. In principle I agree, however in practice HBase users will likely have a lot of keys per region and server. If we stored every single

Re: Storing and loading the FST directly from disk

2011-06-03 Thread Jason Rutherglen

(hint: try to provide a representation that will share as many suffixes and prefixes as possible since these conflate into a single path, no matter how many sequences you have) It's just be user created keys, which will be sorted at least, and probably will be highly likely to share large

Re: Storing and loading the FST directly from disk

2011-06-03 Thread Jason Rutherglen

here you should rather store pointers to another file and mmap that file. Keep your FST as lean and compact as possible and make sure its in memory. The compression should do a good job for you here! Wow you're right, the FST size in RAM with 50 mil date 1 ms incremented keys is less than 1K.

Distributed search capability

2011-06-09 Thread Jason Rutherglen

Hi, I am wondering what happened to the distributed search capability of Lucene? Thanks! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Distributed search capability

2011-06-09 Thread Jason Rutherglen

queries / filters didn't work with it. simon On Thu, Jun 9, 2011 at 7:29 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Hi, I am wondering what happened to the distributed search capability of Lucene? Thanks

Re: Distributed search capability

2011-06-09 Thread Jason Rutherglen

, what would be different? On Thu, Jun 9, 2011 at 4:07 PM, Andrzej Bialecki a...@getopt.org wrote: On 6/10/11 12:10 AM, Jason Rutherglen wrote: Right, if that's not around, one needs to use multi searcher, that's gone too? Yes, and rightfully so - it didn't handle properly some query types, so

Re: Distributed search capability

2011-06-10 Thread Jason Rutherglen

it easy to do grouping across shards. Mike McCandless http://blog.mikemccandless.com On Fri, Jun 10, 2011 at 12:25 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: its fine to have some higher-level class to support this crap, but it shouldnt be some transparent searcher. I'll create

Re: Distributed search capability

2011-06-10 Thread Jason Rutherglen

Out of curiosity, how is DF handled with the new automaton [regex] queries? On Fri, Jun 10, 2011 at 10:48 AM, Andrzej Bialecki a...@getopt.org wrote: On 6/10/11 6:27 PM, Michael McCandless wrote: I'm actually working on something like this, basically a utility method to merge N TopDocs into

Lucene Facet path

2011-06-13 Thread Jason Rutherglen

Are we going the direction of creating full facet features outside of Solr? Eg, we have UIF extrapolated out, we can probably make a module for bit set intersections as well. In the process the faceting will go per-segment. -

Re: Lucene Facet path

2011-06-13 Thread Jason Rutherglen

...@googlemail.com wrote: I believe people are already looking into that but I am not sure. sounds reasonable to me but I think its going to be lots of work simon On Mon, Jun 13, 2011 at 11:34 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Are we going the direction of creating full

Re: Lucene Facet path

2011-06-13 Thread Jason Rutherglen

for field facet that work per-segment, but I think in the end we would want all facet types and methods to work on a per-segment basis. Martijn On 13 June 2011 23:47, Jason Rutherglen jason.rutherg...@gmail.com wrote: I think it's a better approach than rewriting Solr's internals. Eg, small

New facet module

2011-07-08 Thread Jason Rutherglen

Is it faceting per-segment? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: New facet module

2011-07-08 Thread Jason Rutherglen

to improve indexing or search perf (hopefully both). Shai On Saturday, July 9, 2011, Jason Rutherglen jason.rutherg...@gmail.com wrote: Is it faceting per-segment? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

Re: New facet module

2011-07-11 Thread Jason Rutherglen

the bitset and FieldCache ones. Shai On Saturday, July 9, 2011, Jason Rutherglen jason.rutherg...@gmail.com wrote: The taxonomy is global to the index, but I think it will be interesting to explore per-segment taxonomy, and how it can be used to improve indexing or search perf (hopefully both

Lucene 4.x release

2011-08-15 Thread Jason Rutherglen

We should release Lucene 4.x soon. What else is hyper critical for the initial release? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene 4.x release

2011-08-16 Thread Jason Rutherglen

I didn't know the bulk API was so important. Which bulk API (eg the postings one or the terms dict)? On Mon, Aug 15, 2011 at 11:17 PM, Robert Muir rcm...@gmail.com wrote: On Mon, Aug 15, 2011 at 10:49 PM, Mark Miller markrmil...@gmail.com wrote: Just throwing this out there, but: I think it

Re: Lucene 4.x release

2011-08-16 Thread Jason Rutherglen

-deployment / all kinds of new stuff for people to digest and break. On Tue, Aug 16, 2011 at 8:54 PM, Robert Muir rcm...@gmail.com wrote: the postings api. On Tue, Aug 16, 2011 at 8:24 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I didn't know the bulk API was so important. Which bulk API

Separate issue for appendable field caches / doc values

2011-08-22 Thread Jason Rutherglen

LUCENE-2312 needs appendable field caches. I can include this functionality into LUCENE-2312, or separate it out into a separate issue / patch. However it would only be useful for RT / LUCENE-2312. Also, I'm not sure how this functionality relates to doc values. If we used doc values, then we

Re: Separate issue for appendable field caches / doc values

2011-08-22 Thread Jason Rutherglen

appending impl...? But then... FC still returns fixed arrays so you can't append until we fix that? Mike McCandless http://blog.mikemccandless.com On Mon, Aug 22, 2011 at 1:13 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: LUCENE-2312 needs appendable field caches. I can include

Re: Separate issue for appendable field caches / doc values

2011-08-23 Thread Jason Rutherglen

(perDocValues method) so the RT reader Ok, it's not clear when / how DVs are used instead of field caches, and why their access isn't merged together? On Tue, Aug 23, 2011 at 12:30 PM, Michael McCandless luc...@mikemccandless.com wrote: On Tue, Aug 23, 2011 at 12:09 AM, Jason Rutherglen

DirectoryReader package protected?

2011-09-06 Thread Jason Rutherglen

I was browsing code, and noticed DirectoryReader is package protected. Why is this? Ie, SegmentReader is not. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Regarding Transaction logging

2011-09-08 Thread Jason Rutherglen

The delete by query is solved by recording the primary / UID of the document(s) deleted. It's only expensive if the transaction log implementation is not designed properly. :) On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer simon.willna...@googlemail.com wrote: hey folks, we already have

Re: Regarding Transaction logging

2011-09-08 Thread Jason Rutherglen

This isn't a new problem. Databases have been around for what, 30+ years? On Thu, Sep 8, 2011 at 11:01 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Thu, Sep 8, 2011 at 4:21 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The delete by query is solved by recording

Re: [jira] [Commented] (SOLR-2242) Get distinct count of names for a facet field

2012-06-09 Thread Jason Rutherglen

Bill, which patch is working for you? It is difficult to follow! :) On Sat, Jun 9, 2012 at 1:02 AM, William Bell billnb...@gmail.com wrote: I am not sure what the issue is. This is working for me... On Fri, Jun 8, 2012 at 8:35 AM, Jason Rutherglen (JIRA) j...@apache.org wrote

Re: Count of keys of an FST

2012-06-27 Thread Jason Rutherglen

On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: The FST class has a number of methods that return counts, which one returns the total number of keys that have been encoded into the FST

Re: Count of keys of an FST

2012-06-28 Thread Jason Rutherglen

with keys, but it's kind of ugly. Please check the fst header though -- I'm not sure, maybe Mike wrote it so that the node count/ keys count is in there. Dawid On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Sounds like I should just count

Re: Count of keys of an FST

2012-06-28 Thread Jason Rutherglen

, 2012 at 3:32 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I looked at the sources and didn't see a key count. Thanks Dawid and Mike. On Thu, Jun 28, 2012 at 6:37 AM, Michael McCandless luc...@mikemccandless.com wrote: I believe node and arc count are stored, but not key

SVN and Lucene 2.9.1

2010-04-23 Thread Jason Rutherglen

I'm browsing: http://lucene.apache.org/java/docs/developer-resources.html and there's http://svn.apache.org/repos/asf/lucene/dev/trunk however just beneath in the http://svn.apache.org/repos/asf/lucene/dev/branches/ directory there's nothing. Where's Lucene 2.9.1 source?

Re: Benchmarking Solr indexing using Lucene Benchmark?

2010-06-14 Thread Jason Rutherglen

Tom, This was discussed a while back, however I don't believe anything was committed. I think there's a fair bit of work involved in that the Lucene benchmark config would not be usable, or rather, it would need to simply point to a Solr solrconfig.xml file. Other than that, the resulting

Sequence IDs for NRT deletes

2010-07-20 Thread Jason Rutherglen

Michael B and I have been discussing the per segment doc writers and RT patches/branch. A small improvement we can add to trunk from this is the sequence IDs for deletes, which would improve the existing NRT system by avoiding the cloning of bit vectors. Implementing segment deleted docs via

Re: Sequence IDs for NRT deletes

2010-07-20 Thread Jason Rutherglen

). Maybe the deletes impl should be pluggable and apps can pick... Mike On Tue, Jul 20, 2010 at 12:33 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Michael B and I have been discussing the per segment doc writers and RT patches/branch. A small improvement we can add to trunk from

Re: Sequence IDs for NRT deletes

2010-07-20 Thread Jason Rutherglen

up to 10X per second), this may not be a good tradeoff (ie they are willing to spend more time in the reopen if it reduces RAM footprint). Maybe the deletes impl should be pluggable and apps can pick... Mike On Tue, Jul 20, 2010 at 12:33 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote

Re: Sequence IDs for NRT deletes

2010-07-20 Thread Jason Rutherglen

who may want to use this option. Have you tested? The test would be a basic benchmark of queries against BV vs. an int[] of deletes? On Tue, Jul 20, 2010 at 12:17 PM, Michael McCandless luc...@mikemccandless.com wrote: On Tue, Jul 20, 2010 at 1:44 PM, Jason Rutherglen jason.rutherg

Re: Sequence IDs for NRT deletes

2010-07-21 Thread Jason Rutherglen

long[] is probably safe Yeah it's safe for most things... short[] That could be a much better option for minimizing RAM usage, and then implement wraparound. On Wed, Jul 21, 2010 at 3:12 AM, Michael McCandless luc...@mikemccandless.com wrote: On Tue, Jul 20, 2010 at 4:21 PM, Jason

Re: Question about string retrieval with FieldCache in trunk

2010-08-18 Thread Jason Rutherglen

Karl, I believe one may pass an empty BytesRef in, and the values will be set within the getTerm method. On Wed, Aug 18, 2010 at 8:18 AM, karl.wri...@nokia.com wrote: Exactly. getTerms() returns a DocTerms, which has this: /** The BytesRef argument must not be null; the method *

Re: Ant command for installing Lucene and Solr Maven dependencies locally?

2012-10-29 Thread Jason Rutherglen

Any way to make it skip tests? On Mon, Oct 29, 2012 at 12:55 PM, Tommaso Teofili tommaso.teof...@gmail.com wrote: 'ant run-maven-build' should do the trick. Tommaso 2012/10/29 Jason Rutherglen jason.rutherg...@gmail.com I have used 'ant generate-maven-artifacts' to generate the Maven

SolrJ commit still has flush parameter

2011-12-29 Thread Jason Rutherglen

SolrJ commit still has the flush parameter, it should be removed, and softcommit should be added. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Plans to add functions to results of groups

2012-03-07 Thread Jason Rutherglen

. Martijn On 7 March 2012 07:03, Jason Rutherglen jason.rutherg...@gmail.com wrote: Are there plans to add the ability to apply functions (eg, sum, average, distinct, or custom functions) to group'd documents. Such that the document list per group is not returned, instead the result

Re: discussion about release frequency.

2010-09-18 Thread Jason Rutherglen

I had not been faced with the scrofulous horror of Maven Nice... Is this phrase copyrighted or can I use it extenuously without paying royalties (eg, open sourced). :) In other words, I could not agree more. On Sat, Sep 18, 2010 at 1:19 AM, Lance Norskog goks...@gmail.com wrote: +1 on the

Re: discussion about release frequency.

2010-09-18 Thread Jason Rutherglen

the maven stuff in 3.x/trunk is actually pretty good I've heard that about every release of Maven, and any time I've tried to use it, it doesn't quite work as expected, and given what it does should be fairly trivial, the fact that there bugs/issues, and it's been released to me has meant I

Solr 3.1

2010-09-18 Thread Jason Rutherglen

I'm a little bit confused about the difference between the Lucene/Solr 3.x branch and Lucene/Solr trunk. Is there a page on the Lucene Apache site yet for Lucene/Solr 3.x, linked to from the main page? - To unsubscribe, e-mail:

Unsynced flush in IW get reader

2010-11-04 Thread Jason Rutherglen

I'm curious why the flush call in IW getReader isn't synced? The main work of flush is synced, ie, the doFlush method. Then we're syncing yet again to call applyDeletes, redundantly because deletes were previously flushed in the flush call. I'm guessing we're trying to gain some concurrency

Re: Unsynced flush in IW get reader

2010-11-04 Thread Jason Rutherglen

it may be because CMS waits, if there are too many merges already running, and we don't want it to wait holding IW's monitor lock. Maybe try making it sync'd and see if any tests deadlock? Mike On Thu, Nov 4, 2010 at 1:54 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I'm curious

Lucene / Solr 4.x release

2011-10-02 Thread Jason Rutherglen

I asked this a little while ago, and figured I'd ask again. It seemed like the important remaining issue was the bulk postings iterator? Is that still true? Thanks! - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org

Re: Lucene / Solr 4.x release

2011-10-03 Thread Jason Rutherglen

Will the bulk postings only help PFOR? On Sun, Oct 2, 2011 at 2:28 PM, Uwe Schindler u...@thetaphi.de wrote: ...And flexible stored fields + TV, so the file format is complete flexible. Uwe -- Uwe Schindler H.-H.-Meier-Allee 63, 28213 Bremen http://www.thetaphi.de Jason Rutherglen

PagedBytes additional method

2011-10-06 Thread Jason Rutherglen

PagedBytes is great! Even better would be a couple of additional methods, one to write it out to an IndexOutput and the other for the total bytes used. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional

Re: PagedBytes additional method

2011-10-06 Thread Jason Rutherglen

I try not to without having a patch somewhat prepared! On Thu, Oct 6, 2011 at 11:38 AM, Simon Willnauer simon.willna...@googlemail.com wrote: why don't you open an issue for this? thanks, simon On Thu, Oct 6, 2011 at 5:33 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: PagedBytes

Perhaps more efficient byte[] comparisons

2011-10-31 Thread Jason Rutherglen

...benchmarks show it as being 2x more CPU-efficient than the equivalent pure-Java implementation... https://issues.apache.org/jira/browse/HADOOP-7761 - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional

Re: Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-07 Thread Jason Rutherglen

SOLR-1447 added this functionality. On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote: Lucene has this method to set the maximum size of a segment when merging: LogByteSizeMergePolicy.setMaxMergeMB

Re: Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-17 Thread Jason Rutherglen

- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Tuesday, December 07, 2010 10:48 AM To: dev@lucene.apache.org Subject: Re: Is it possible to set the merge policy setMaxMergeMB from Solr SOLR-1447 added this functionality. On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom

Re: Is it possible to set the merge policy setMaxMergeMB from Solr

2010-12-17 Thread Jason Rutherglen

Probably best to add something here as it currently has nothing regarding merge policies and has a long standing TODO on the indexDefaults. http://wiki.apache.org/solr/SolrConfigXml On Fri, Dec 17, 2010 at 7:22 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I worked on the patch and I

SegmentInfo clone

2011-01-12 Thread Jason Rutherglen

Is it intentional that SegmentInfo.segmentCodecs isn't cloned? When SI is cloned, then sizeInBytes fails with an NPE. - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: SegmentInfo clone

2011-01-12 Thread Jason Rutherglen

Sorry, that's incorrect, SegmentInfo.files is NPE'ing on segmentCodecs because it's never set (in trunk). On Wed, Jan 12, 2011 at 10:59 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Is it intentional that SegmentInfo.segmentCodecs isn't cloned? When SI is cloned, then sizeInBytes

Re: SegmentInfo clone

2011-01-12 Thread Jason Rutherglen

it is set on DocumentsWriter#flush though Thanks! I just skip segmentCodecs if it's null, for now. On Wed, Jan 12, 2011 at 11:05 AM, Simon Willnauer simon.willna...@googlemail.com wrote: On Wed, Jan 12, 2011 at 8:03 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Sorry, that's

Re: Release schedule Lucene 4?

2011-01-16 Thread Jason Rutherglen

But: they don't yet support updating the values (the goal is to allow this, eventually). This is just the first step. No? Hmm... I thought that was a main part of the functionality? On Sun, Jan 16, 2011 at 6:07 AM, Michael McCandless luc...@mikemccandless.com wrote: Actually docvalues is

Re: Release schedule Lucene 4?

2011-01-17 Thread Jason Rutherglen

be a stacked approach, where the orig full array remains and we write sparse deltas (pairs of docID + new value) What is the lookup cost using this method? On Mon, Jan 17, 2011 at 3:24 AM, Michael McCandless luc...@mikemccandless.com wrote: On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen

Re: Release schedule Lucene 4?

2011-01-17 Thread Jason Rutherglen

...@mikemccandless.com wrote: On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: But: they don't yet support updating the values (the goal is to allow this, eventually). This is just the first step. No? Hmm... I thought that was a main part of the functionality

FST for Solr Autosuggest?

2011-02-18 Thread Jason Rutherglen

Can we use LUCENE-2792's FST for the Solr autosuggest functionality? - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail: dev-h...@lucene.apache.org

Maven artifacts not working?

2012-03-20 Thread Jason Rutherglen

This link seems to not work: https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts - To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional commands, e-mail:

Re: Maven artifacts not working?

2012-03-20 Thread Jason Rutherglen

/NightlyBuilds, dev-tools/maven/README.maven in your local working copy, and https://issues.apache.org/jira/browse/LUCENE-3825. Stevev -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Tuesday, March 20, 2012 10:46 AM To: dev@lucene.apache.org Subject

Error when running 'ant generate-maven-artifacts'

2012-04-04 Thread Jason Rutherglen

I am getting the following error when running 'ant generate-maven-artifacts': Buildfile: /Users/jasonrutherglen/src/LUCENE-TRUNK/build.xml generate-maven-artifacts: filter-pom-templates: [copy] Copying 42 files to /Users/jasonrutherglen/src/LUCENE-TRUNK/lucene/build/poms

Re: Error when running 'ant generate-maven-artifacts'

2012-04-04 Thread Jason Rutherglen

-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com] Sent: Wednesday, April 04, 2012 3:58 PM To: dev@lucene.apache.org Subject: Error when running 'ant generate-maven-artifacts' I am getting

Re: Merge IO throttling

2012-04-05 Thread Jason Rutherglen

Thanks Mike. On Thu, Apr 5, 2012 at 11:55 AM, Michael McCandless luc...@mikemccandless.com wrote: Yes, in trunk: FSDirectory.setMaxMergeWriteMBPerSec. Mike McCandless http://blog.mikemccandless.com On Thu, Apr 5, 2012 at 11:54 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Has

Re: GData, updateable IndexSearcher

2006-04-26 Thread jason rutherglen

: Doug Cutting [EMAIL PROTECTED] To: solr-dev@lucene.apache.org Sent: Wednesday, April 26, 2006 11:27:44 AM Subject: Re: GData, updateable IndexSearcher jason rutherglen wrote: Interesting, does this mean there is a plan for incrementally updateable IndexSearchers to become part of Lucene

Re: GData, updateable IndexSearcher

2006-04-27 Thread jason rutherglen

Subject: Re: GData, updateable IndexSearcher jason rutherglen wrote: I was thinking you implied that you knew of someone who had customized their own, but it was a closed source solution. And if so then you would know how that project faired. I don't recall the details, but I know folks

Re: GData, updateable IndexSearcher

2006-05-01 Thread jason rutherglen

Can you post your code? - Original Message From: Robert Engels [EMAIL PROTECTED] To: java-dev@lucene.apache.org; jason rutherglen [EMAIL PROTECTED] Sent: Monday, May 1, 2006 11:33:06 AM Subject: RE: GData, updateable IndexSearcher fyi, using my reopen(0 implementation (which rereads

Re: GData, updateable IndexSearcher

2006-05-01 Thread jason rutherglen

Thanks for the code and performance metric Robert. Have you had any issues with the deleted segments as Doug has been describing? - Original Message From: Robert Engels [EMAIL PROTECTED] To: java-dev@lucene.apache.org; jason rutherglen [EMAIL PROTECTED] Sent: Monday, May 1, 2006 11:49

Re: GData Server - Lucene storage

2006-06-02 Thread jason rutherglen

Yonik, It might be interesting to merge using BDB into Solr, as an option to provide better realtime updates. Perhaps the replication could be used as well in place of rsync? I don't have any experience with BDB replication, anyone have thoughts on the matter? Jason - Original Message

Re: GData Server - Lucene storage

2006-06-02 Thread jason rutherglen

Is it possible to turn off directory locking with BDB? How is the performance compared to regular FSDirectory for queries? - Original Message From: Andi Vajda [EMAIL PROTECTED] To: java-dev@lucene.apache.org; jason rutherglen [EMAIL PROTECTED] Sent: Friday, June 2, 2006 10:52:27 AM

LUCENE-528 and 565

2006-08-15 Thread jason rutherglen

What about using this http://issues.apache.org/jira/browse/LUCENE-528 to solve this http://issues.apache.org/jira/browse/LUCENE-565 Where the batching is performed in another index that is then merged into the existing one. This is something I have been looking for. Is 528 ok to use?

IndexReader.reopen discussion

2006-08-15 Thread jason rutherglen

There was this discussion regarding adding a reopen method to IndexReader however it seems to have dropped off the map. Robert Engels submitted some code however it was not a patch. http://www.gossamer-threads.com/lists/lucene/java-dev/34898?search_string=reopen I would submit something but

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-08-22 Thread jason rutherglen

Yes I am including this patch as it is very useful for increasing the efficiency of updates as you described. I will be conducting more tests and will post any results. Yes a patch for IndexWriter will be useful so that the entirety of this build will work. Thanks! - Original Message

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-08-29 Thread jason rutherglen

The documents reached disk as a close was performed on the NewIndexModifier and the index size grows, seem like the deleteable files registers the documents as deleted though, so a search returns nothing and an optimize deletes all of the documents. Maybe the new documents have the same docid

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

2006-09-06 Thread jason rutherglen

Sounds interesting Marvin, I would be willing to test out what you create. I am working on trying creating a rapidly updating index and it sounds like this may help that. I've noticed even using a ramdisk that the whole merging process is quite slow. Maybe also because of the locking that

IndexReader.reopen FieldCache

2006-09-07 Thread jason rutherglen

Robert Engels, I implemented the reopen code you posted, works well, thanks. One thing I am curious about, are you able to reuse the FieldCache? From what I am seeing, it is being rebuilt after a commit, which makes the next query slow. Any ideas on this? Thanks, Jason

Project Ocean

2008-05-02 Thread Jason Rutherglen

Project Ocean is designed to provide realtime search capabilities. It is pre-alpha with an Apache license. Developers and corporate sponsors welcome. The genesis of the idea was at a social networking company using Solr who wanted realtime instead of batch updates. At first I tried making Solr

Re: Project Ocean

2008-05-02 Thread Jason Rutherglen

this as a Lucene and/or Solr contrib or patch, whichever is appropriate? Otherwise, this risks being a fork that doesn't get enough users/developers and eventually dies off. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jason Rutherglen

Tag Index

2008-05-22 Thread Jason Rutherglen

In reference to LUCENE-1292, am thinking the structure of the tag index term dictionary file can reuse Lucene .tii and .tis code also storing a number per term, and use http://dsiutils.dsi.unimi.it/docs/it/unimi/dsi/io/InputBitStream.html to store the docs associated with a term between the terms.

Re: Tag Index

2008-05-23 Thread Jason Rutherglen

Actually, will use MultiLevelSkipListReader for termdocs. On Thu, May 22, 2008 at 11:05 PM, Jason Rutherglen [EMAIL PROTECTED] wrote: In reference to LUCENE-1292, am thinking the structure of the tag index term dictionary file can reuse Lucene .tii and .tis code also storing a number per

Re: Tag Index

2008-05-24 Thread Jason Rutherglen

-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Jason Rutherglen [EMAIL PROTECTED] To: java-dev@lucene.apache.org Sent: Thursday, May 22, 2008 11:05:35 PM Subject: Tag Index In reference to LUCENE-1292, am thinking the structure

Synchronization bottlenecks

2008-06-12 Thread Jason Rutherglen

I have seen this discussed before but with no conclusion. It is safe to say that SegmentReader.isDeleted is a synchronization bottleneck. When using a single IndexReader per query for highly concurrent application such as a web application, with the index entirely in the system cache, the

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1725 matches

Mail list logo