Dr
On Mar 31, 2011 9:44 AM, Simon Willnauer (JIRA) j...@apache.org wrote:
[
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13013974#comment-13013974]
Simon Willnauer commented on LUCENE-2573:
Today the ConcurrentMergeScheduler allows setting the max thread
count and is bound to a single IndexWriter.
However in the [common] case of multiple IndexWriters running in
the same process, this disallows one from managing the aggregate
number of merge threads executing at any given time.
I
Willnauer
simon.willna...@googlemail.com wrote:
On Thu, Apr 14, 2011 at 5:20 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Today the ConcurrentMergeScheduler allows setting the max thread
count and is bound to a single IndexWriter.
However in the [common] case of multiple
for this?
simon
On Thu, Apr 14, 2011 at 19:40, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I think the proposal involved using a ThreadPoolExecutor, which seemed
to not quite work as well as what we have. I think it'll be easier to
simply pass a global context that keeps
+1 to Mike's proposal here. Each of these could easily be
patches/issues. The top ones would probably be the basics, eg,
faceting and schemas.
As the easiest short term solution for allowing other systems to use
Solr or it's features, it would be great if a 'committer' responded to
SOLR-1431.
In the Field object a text value must be of type string, however I
think we can allow a BytesRef to be passed in?
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:
: Sunday, May 15, 2011 6:22 PM
To: dev@lucene.apache.org
Subject: Re: Field should accept BytesRef?
On Sun, May 15, 2011 at 12:05 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
In the Field object a text value must be of type string, however I
think we can allow a BytesRef
maybe thats because we have one huge monolithic implementation
Doesn't the DocValues branch solve this?
Also, instead of trying to implement clever ways of compressing
strings in the field cache, which probably won't bare fruit, I'd
prefer to look at [eventually] MMap'ing (using DV) the field
This is more about compressing strings in TermsIndex, I think.
Ah, because they're sorted. I think if the string lookup cost
degrades then it's not worth it? That's something that needs to be
tested in the MMap case as well, eg, are ByteBuffers somehow slowing
down everything by a factor of
slowly. If the user wishes to improve
performance it's easy enough to add more hardware.
On Thu, May 19, 2011 at 6:40 AM, Michael McCandless
luc...@mikemccandless.com wrote:
On Thu, May 19, 2011 at 9:22 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
maybe thats because we have one
wrote:
On Thu, May 19, 2011 at 10:09 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
When
you mmap them you let the OS decide when to swap stuff out which mean
you pick up potentially high query latency waiting for these pages to
swap back in
Right, however if one is using lets say SSDs
What state is this in? Cheers.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
of Jason's concerns here (to the dev list)... we should
stick to technical feedback on the issue:
On Mon, May 30, 2011 at 11:54 PM, Jason Rutherglen (JIRA)
j...@apache.org wrote:
It's been clear for quite a while that you folks at Lucid are trying to
protect your golden goose, eg, Solr from
how can we synchronize our Katta nodes indexes sync with our database updates
and
deletes operations
Katta doesn't provide NRT or incremental indexing because it's a write
once architecture. Though you can verify on the Katta mailing list.
On Thu, Jun 2, 2011 at 4:53 AM, Ghulam Mustafa
Is it possible to iterate over the FST while it's still on disk? If
not is that type of functionality planned?
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:
seek instead).
On Thu, Jun 2, 2011 at 10:54 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Is it possible to iterate over the FST while it's still on disk? If
not is that type of functionality planned
. Loading the keys into heap with the values
still on disk/system IO cache probably is non-optimal.
On Thu, Jun 2, 2011 at 8:45 PM, Robert Muir rcm...@gmail.com wrote:
On Thu, Jun 2, 2011 at 11:39 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
MMAP changes nothing: as its not a sequential
if you want to use it for this purpose, you don't need an FST, you can
just use an NFA/DFA of prefixes instead, as you only need to answer
accept/reject.
Right, however if we already have the FST, then if it supports
accept/reject efficiently, we'd simply reuse it (thereby removing the
bloom
PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
because of how the datastructure is laid out. if you use mmap, instead
of lots of seeks, it will simply be lots of page faults instead.
Ok, right, however assuming the pages are in RAM it could/should be
ok. Eg, in HBase we'd want
IMO the byte[] representation is so compact
that it doesn't really matter if you use FS Cache memory or JVM memory
so I'd rather go for jvm memory here too.
In principle I agree, however in practice HBase users will likely have
a lot of keys per region and server. If we stored every single
(hint: try to provide a representation
that will share as many suffixes and prefixes as possible since these
conflate into a single path, no matter how many sequences you have)
It's just be user created keys, which will be sorted at least, and
probably will be highly likely to share large
here you should rather store pointers to another file and mmap that
file. Keep your FST as lean and compact as possible and make sure its
in memory. The compression should do a good job for you here!
Wow you're right, the FST size in RAM with 50 mil date 1 ms
incremented keys is less than 1K.
Hi,
I am wondering what happened to the distributed search capability of Lucene?
Thanks!
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
queries / filters didn't work with it.
simon
On Thu, Jun 9, 2011 at 7:29 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Hi,
I am wondering what happened to the distributed search capability of
Lucene?
Thanks
, what would be different?
On Thu, Jun 9, 2011 at 4:07 PM, Andrzej Bialecki a...@getopt.org wrote:
On 6/10/11 12:10 AM, Jason Rutherglen wrote:
Right, if that's not around, one needs to use multi searcher, that's
gone too?
Yes, and rightfully so - it didn't handle properly some query types, so
it easy to do grouping across shards.
Mike McCandless
http://blog.mikemccandless.com
On Fri, Jun 10, 2011 at 12:25 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
its fine to have some higher-level class to support this crap, but it
shouldnt be some transparent searcher.
I'll create
Out of curiosity, how is DF handled with the new automaton [regex] queries?
On Fri, Jun 10, 2011 at 10:48 AM, Andrzej Bialecki a...@getopt.org wrote:
On 6/10/11 6:27 PM, Michael McCandless wrote:
I'm actually working on something like this, basically a utility
method to merge N TopDocs into
Are we going the direction of creating full facet features outside of
Solr? Eg, we have UIF extrapolated out, we can probably make a module
for bit set intersections as well. In the process the faceting will
go per-segment.
-
...@googlemail.com wrote:
I believe people are already looking into that but I am not sure.
sounds reasonable to me but I think its going to be lots of work
simon
On Mon, Jun 13, 2011 at 11:34 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Are we going the direction of creating full
for field facet that work
per-segment,
but I think in the end we would want all facet types and methods to work on
a per-segment basis.
Martijn
On 13 June 2011 23:47, Jason Rutherglen jason.rutherg...@gmail.com wrote:
I think it's a better approach than rewriting Solr's internals. Eg,
small
Is it faceting per-segment?
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
to
improve indexing or search perf (hopefully both).
Shai
On Saturday, July 9, 2011, Jason Rutherglen jason.rutherg...@gmail.com
wrote:
Is it faceting per-segment?
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
the bitset and FieldCache ones.
Shai
On Saturday, July 9, 2011, Jason Rutherglen jason.rutherg...@gmail.com
wrote:
The taxonomy is global to the index, but I think it will be
interesting to explore per-segment taxonomy, and how it can be used to
improve indexing or search perf (hopefully both
We should release Lucene 4.x soon. What else is hyper critical for
the initial release?
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
I didn't know the bulk API was so important. Which bulk API (eg the
postings one or the terms dict)?
On Mon, Aug 15, 2011 at 11:17 PM, Robert Muir rcm...@gmail.com wrote:
On Mon, Aug 15, 2011 at 10:49 PM, Mark Miller markrmil...@gmail.com wrote:
Just throwing this out there, but:
I think it
-deployment / all kinds of new stuff for people to digest
and break.
On Tue, Aug 16, 2011 at 8:54 PM, Robert Muir rcm...@gmail.com wrote:
the postings api.
On Tue, Aug 16, 2011 at 8:24 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I didn't know the bulk API was so important. Which bulk API
LUCENE-2312 needs appendable field caches. I can include this
functionality into LUCENE-2312, or separate it out into a separate
issue / patch.
However it would only be useful for RT / LUCENE-2312. Also, I'm not
sure how this functionality relates to doc values. If we used doc
values, then we
appending impl...?
But then... FC still returns fixed arrays so you can't append until we fix
that?
Mike McCandless
http://blog.mikemccandless.com
On Mon, Aug 22, 2011 at 1:13 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
LUCENE-2312 needs appendable field caches. I can include
(perDocValues method) so the RT reader
Ok, it's not clear when / how DVs are used instead of field caches,
and why their access isn't merged together?
On Tue, Aug 23, 2011 at 12:30 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Tue, Aug 23, 2011 at 12:09 AM, Jason Rutherglen
I was browsing code, and noticed DirectoryReader is package protected.
Why is this? Ie, SegmentReader is not.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:
The delete by query is solved by recording the primary / UID of the
document(s) deleted. It's only expensive if the transaction log
implementation is not designed properly. :)
On Thu, Sep 8, 2011 at 5:35 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
hey folks,
we already have
This isn't a new problem. Databases have been around for what, 30+ years?
On Thu, Sep 8, 2011 at 11:01 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
On Thu, Sep 8, 2011 at 4:21 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
The delete by query is solved by recording
Bill, which patch is working for you? It is difficult to follow! :)
On Sat, Jun 9, 2012 at 1:02 AM, William Bell billnb...@gmail.com wrote:
I am not sure what the issue is.
This is working for me...
On Fri, Jun 8, 2012 at 8:35 AM, Jason Rutherglen (JIRA) j...@apache.org
wrote
On Wed, Jun 27, 2012 at 10:36 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
The FST class has a number of methods that return counts, which one
returns
the total number of keys that have been encoded into the FST
with keys, but it's
kind of ugly.
Please check the fst header though -- I'm not sure, maybe Mike wrote
it so that the node count/ keys count is in there.
Dawid
On Wed, Jun 27, 2012 at 10:50 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Sounds like I should just count
, 2012 at 3:32 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I looked at the sources and didn't see a key count.
Thanks Dawid and Mike.
On Thu, Jun 28, 2012 at 6:37 AM, Michael McCandless
luc...@mikemccandless.com wrote:
I believe node and arc count are stored, but not key
I'm browsing: http://lucene.apache.org/java/docs/developer-resources.html
and there's http://svn.apache.org/repos/asf/lucene/dev/trunk however
just beneath in the
http://svn.apache.org/repos/asf/lucene/dev/branches/ directory there's
nothing. Where's Lucene 2.9.1 source?
Tom,
This was discussed a while back, however I don't believe
anything was committed. I think there's a fair bit of work
involved in that the Lucene benchmark config would not be
usable, or rather, it would need to simply point to a Solr
solrconfig.xml file. Other than that, the resulting
Michael B and I have been discussing the per segment doc writers
and RT patches/branch. A small improvement we can add to trunk
from this is the sequence IDs for deletes, which would improve
the existing NRT system by avoiding the cloning of bit vectors.
Implementing segment deleted docs via
). Maybe the deletes
impl should be pluggable and apps can pick...
Mike
On Tue, Jul 20, 2010 at 12:33 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Michael B and I have been discussing the per segment doc writers and
RT patches/branch. A small improvement we can add to trunk from
up to 10X per second), this may not be a good tradeoff (ie they
are willing to spend more time in the reopen if it reduces RAM
footprint). Maybe the deletes impl should be pluggable and apps can
pick...
Mike
On Tue, Jul 20, 2010 at 12:33 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote
who may want to use this option.
Have you tested?
The test would be a basic benchmark of queries against BV vs. an int[]
of deletes?
On Tue, Jul 20, 2010 at 12:17 PM, Michael McCandless
luc...@mikemccandless.com wrote:
On Tue, Jul 20, 2010 at 1:44 PM, Jason Rutherglen
jason.rutherg
long[] is probably safe
Yeah it's safe for most things...
short[]
That could be a much better option for minimizing RAM usage, and then
implement wraparound.
On Wed, Jul 21, 2010 at 3:12 AM, Michael McCandless
luc...@mikemccandless.com wrote:
On Tue, Jul 20, 2010 at 4:21 PM, Jason
Karl,
I believe one may pass an empty BytesRef in, and the values will be
set within the getTerm method.
On Wed, Aug 18, 2010 at 8:18 AM, karl.wri...@nokia.com wrote:
Exactly. getTerms() returns a DocTerms, which has this:
/** The BytesRef argument must not be null; the method
*
Any way to make it skip tests?
On Mon, Oct 29, 2012 at 12:55 PM, Tommaso Teofili
tommaso.teof...@gmail.com wrote:
'ant run-maven-build' should do the trick.
Tommaso
2012/10/29 Jason Rutherglen jason.rutherg...@gmail.com
I have used 'ant generate-maven-artifacts' to generate the Maven
SolrJ commit still has the flush parameter, it should be removed, and
softcommit should be added.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
.
Martijn
On 7 March 2012 07:03, Jason Rutherglen jason.rutherg...@gmail.com wrote:
Are there plans to add the ability to apply functions (eg, sum,
average, distinct, or custom functions) to group'd documents. Such
that the document list per group is not returned, instead the result
I had not been faced with the scrofulous horror of Maven
Nice... Is this phrase copyrighted or can I use it extenuously without
paying royalties (eg, open sourced). :)
In other words, I could not agree more.
On Sat, Sep 18, 2010 at 1:19 AM, Lance Norskog goks...@gmail.com wrote:
+1 on the
the maven stuff in 3.x/trunk is actually pretty good
I've heard that about every release of Maven, and any time I've tried
to use it, it doesn't quite work as expected, and given what it does
should be fairly trivial, the fact that there bugs/issues, and it's
been released to me has meant I
I'm a little bit confused about the difference between the Lucene/Solr
3.x branch and Lucene/Solr trunk. Is there a page on the Lucene
Apache site yet for Lucene/Solr 3.x, linked to from the main page?
-
To unsubscribe, e-mail:
I'm curious why the flush call in IW getReader isn't synced? The main
work of flush is synced, ie, the doFlush method. Then we're syncing
yet again to call applyDeletes, redundantly because deletes were
previously flushed in the flush call. I'm guessing we're trying to
gain some concurrency
it may be because CMS
waits, if there are too many merges already running, and we don't want
it to wait holding IW's monitor lock.
Maybe try making it sync'd and see if any tests deadlock?
Mike
On Thu, Nov 4, 2010 at 1:54 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I'm curious
I asked this a little while ago, and figured I'd ask again. It seemed
like the important remaining issue was the bulk postings iterator? Is
that still true? Thanks!
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
Will the bulk postings only help PFOR?
On Sun, Oct 2, 2011 at 2:28 PM, Uwe Schindler u...@thetaphi.de wrote:
...And flexible stored fields + TV, so the file format is complete flexible.
Uwe
--
Uwe Schindler
H.-H.-Meier-Allee 63, 28213 Bremen
http://www.thetaphi.de
Jason Rutherglen
PagedBytes is great! Even better would be a couple of additional
methods, one to write it out to an IndexOutput and the other for the
total bytes used.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional
I try not to without having a patch somewhat prepared!
On Thu, Oct 6, 2011 at 11:38 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
why don't you open an issue for this?
thanks,
simon
On Thu, Oct 6, 2011 at 5:33 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
PagedBytes
...benchmarks show it as being 2x more CPU-efficient than the
equivalent pure-Java implementation...
https://issues.apache.org/jira/browse/HADOOP-7761
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional
SOLR-1447 added this functionality.
On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom tburt...@umich.edu wrote:
Lucene has this method to set the maximum size of a segment when merging:
LogByteSizeMergePolicy.setMaxMergeMB
-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: Tuesday, December 07, 2010 10:48 AM
To: dev@lucene.apache.org
Subject: Re: Is it possible to set the merge policy setMaxMergeMB from Solr
SOLR-1447 added this functionality.
On Mon, Dec 6, 2010 at 2:34 PM, Burton-West, Tom
Probably best to add something here as it currently has nothing
regarding merge policies and has a long standing TODO on the
indexDefaults. http://wiki.apache.org/solr/SolrConfigXml
On Fri, Dec 17, 2010 at 7:22 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
I worked on the patch and I
Is it intentional that SegmentInfo.segmentCodecs isn't cloned? When
SI is cloned, then sizeInBytes fails with an NPE.
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:
Sorry, that's incorrect, SegmentInfo.files is NPE'ing on segmentCodecs
because it's never set (in trunk).
On Wed, Jan 12, 2011 at 10:59 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Is it intentional that SegmentInfo.segmentCodecs isn't cloned? When
SI is cloned, then sizeInBytes
it is set on DocumentsWriter#flush though
Thanks! I just skip segmentCodecs if it's null, for now.
On Wed, Jan 12, 2011 at 11:05 AM, Simon Willnauer
simon.willna...@googlemail.com wrote:
On Wed, Jan 12, 2011 at 8:03 PM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Sorry, that's
But: they don't yet support updating the values (the goal is to allow
this, eventually). This is just the first step.
No? Hmm... I thought that was a main part of the functionality?
On Sun, Jan 16, 2011 at 6:07 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Actually docvalues is
be a stacked approach, where the orig full array remains
and we write sparse deltas (pairs of docID + new value)
What is the lookup cost using this method?
On Mon, Jan 17, 2011 at 3:24 AM, Michael McCandless
luc...@mikemccandless.com wrote:
On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen
...@mikemccandless.com wrote:
On Sun, Jan 16, 2011 at 11:35 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
But: they don't yet support updating the values (the goal is to allow
this, eventually). This is just the first step.
No? Hmm... I thought that was a main part of the functionality
Can we use LUCENE-2792's FST for the Solr autosuggest functionality?
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org
This link seems to not work:
https://builds.apache.org/job/Lucene-Solr-Maven-trunk/lastSuccessfulBuild/artifact/maven_artifacts
-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail:
/NightlyBuilds,
dev-tools/maven/README.maven in your local working copy, and
https://issues.apache.org/jira/browse/LUCENE-3825.
Stevev
-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: Tuesday, March 20, 2012 10:46 AM
To: dev@lucene.apache.org
Subject
I am getting the following error when running 'ant generate-maven-artifacts':
Buildfile: /Users/jasonrutherglen/src/LUCENE-TRUNK/build.xml
generate-maven-artifacts:
filter-pom-templates:
[copy] Copying 42 files to
/Users/jasonrutherglen/src/LUCENE-TRUNK/lucene/build/poms
-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
-Original Message-
From: Jason Rutherglen [mailto:jason.rutherg...@gmail.com]
Sent: Wednesday, April 04, 2012 3:58 PM
To: dev@lucene.apache.org
Subject: Error when running 'ant generate-maven-artifacts'
I am getting
Thanks Mike.
On Thu, Apr 5, 2012 at 11:55 AM, Michael McCandless
luc...@mikemccandless.com wrote:
Yes, in trunk: FSDirectory.setMaxMergeWriteMBPerSec.
Mike McCandless
http://blog.mikemccandless.com
On Thu, Apr 5, 2012 at 11:54 AM, Jason Rutherglen
jason.rutherg...@gmail.com wrote:
Has
: Doug Cutting [EMAIL PROTECTED]
To: solr-dev@lucene.apache.org
Sent: Wednesday, April 26, 2006 11:27:44 AM
Subject: Re: GData, updateable IndexSearcher
jason rutherglen wrote:
Interesting, does this mean there is a plan for incrementally updateable
IndexSearchers to become part of Lucene
Subject: Re: GData, updateable IndexSearcher
jason rutherglen wrote:
I was thinking you implied that you knew of someone who had customized their
own, but it was a closed source solution. And if so then you would know how
that project faired.
I don't recall the details, but I know folks
Can you post your code?
- Original Message
From: Robert Engels [EMAIL PROTECTED]
To: java-dev@lucene.apache.org; jason rutherglen [EMAIL PROTECTED]
Sent: Monday, May 1, 2006 11:33:06 AM
Subject: RE: GData, updateable IndexSearcher
fyi, using my reopen(0 implementation (which rereads
Thanks for the code and performance metric Robert. Have you had any issues
with the deleted segments as Doug has been describing?
- Original Message
From: Robert Engels [EMAIL PROTECTED]
To: java-dev@lucene.apache.org; jason rutherglen [EMAIL PROTECTED]
Sent: Monday, May 1, 2006 11:49
Yonik,
It might be interesting to merge using BDB into Solr, as an option to provide
better realtime updates. Perhaps the replication could be used as well in
place of rsync? I don't have any experience with BDB replication, anyone have
thoughts on the matter?
Jason
- Original Message
Is it possible to turn off directory locking with BDB? How is the performance
compared to regular FSDirectory for queries?
- Original Message
From: Andi Vajda [EMAIL PROTECTED]
To: java-dev@lucene.apache.org; jason rutherglen [EMAIL PROTECTED]
Sent: Friday, June 2, 2006 10:52:27 AM
What about using this http://issues.apache.org/jira/browse/LUCENE-528 to solve
this http://issues.apache.org/jira/browse/LUCENE-565 Where the batching is
performed in another index that is then merged into the existing one. This is
something I have been looking for. Is 528 ok to use?
There was this discussion regarding adding a reopen method to IndexReader
however it seems to have dropped off the map. Robert Engels submitted some
code however it was not a patch.
http://www.gossamer-threads.com/lists/lucene/java-dev/34898?search_string=reopen
I would submit something but
Yes I am including this patch as it is very useful for increasing the
efficiency of updates as you described. I will be conducting more tests and
will post any results. Yes a patch for IndexWriter will be useful so that the
entirety of this build will work. Thanks!
- Original Message
The documents reached disk as a close was performed on the NewIndexModifier and
the index size grows, seem like the deleteable files registers the documents as
deleted though, so a search returns nothing and an optimize deletes all of the
documents. Maybe the new documents have the same docid
Sounds interesting Marvin, I would be willing to test out what you create. I
am working on trying creating a rapidly updating index and it sounds like this
may help that. I've noticed even using a ramdisk that the whole merging
process is quite slow. Maybe also because of the locking that
Robert Engels,
I implemented the reopen code you posted, works well, thanks. One thing I am
curious about, are you able to reuse the FieldCache? From what I am seeing, it
is being rebuilt after a commit, which makes the next query slow. Any ideas on
this?
Thanks,
Jason
Project Ocean is designed to provide realtime search capabilities. It is
pre-alpha with an Apache license. Developers and corporate sponsors
welcome.
The genesis of the idea was at a social networking company using Solr who
wanted realtime instead of batch updates. At first I tried making Solr
this as a Lucene and/or Solr contrib or patch, whichever
is appropriate?
Otherwise, this risks being a fork that doesn't get enough
users/developers and eventually dies off.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Jason Rutherglen
In reference to LUCENE-1292, am thinking the structure of the tag index term
dictionary file can reuse Lucene .tii and .tis code also storing a number
per term, and use
http://dsiutils.dsi.unimi.it/docs/it/unimi/dsi/io/InputBitStream.html to
store the docs associated with a term between the terms.
Actually, will use MultiLevelSkipListReader for termdocs.
On Thu, May 22, 2008 at 11:05 PM, Jason Rutherglen
[EMAIL PROTECTED] wrote:
In reference to LUCENE-1292, am thinking the structure of the tag index
term dictionary file can reuse Lucene .tii and .tis code also storing a
number per
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Jason Rutherglen [EMAIL PROTECTED]
To: java-dev@lucene.apache.org
Sent: Thursday, May 22, 2008 11:05:35 PM
Subject: Tag Index
In reference to LUCENE-1292, am thinking the structure
I have seen this discussed before but with no conclusion. It is safe to say
that SegmentReader.isDeleted is a synchronization bottleneck. When using a
single IndexReader per query for highly concurrent application such as a web
application, with the index entirely in the system cache, the
1 - 100 of 1725 matches
Mail list logo