> Short answer is that no, there isn't an aggregate
> function. And you shouldn't even try
If that is the case why does a 'stats' component exist for Solr with
the SUM function built in?
http://wiki.apache.org/solr/StatsComponent
On Thu, Jan 5, 2012 at 1:37 PM, Erick Erickson wrote:
> You will
red
> SUM, stats would do it.
>
> Erick
>
> On Thu, Jan 5, 2012 at 7:23 PM, Jason Rutherglen
> wrote:
>>> Short answer is that no, there isn't an aggregate
>>> function. And you shouldn't even try
>>
>> If that is the case why does a 'st
If you want the index to be stored completely in RAM, there is the
ByteBuffer directory [1]. Though I do not see the point in putting an
index in RAM, it will be cached in RAM regardless in the OS system IO
cache.
1.
https://github.com/elasticsearch/elasticsearch/blob/master/src/main/java/org/ap
t. Is that right?
>
> What about the ByteBufferDirectory? Can this specific directory utilize the
> 2GB memory I grant to the app?
>
> On Mon, Jun 4, 2012 at 10:58 PM, Jason Rutherglen <
> jason.rutherg...@gmail.com> wrote:
>
>> If you want the index to be stored
Cloud
* Hadoop integration
Thanks,
Jason Rutherglen, Jack Krupansky, and Ryan Tabora
http://shop.oreilly.com/product/0636920028765.do
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-ma
This is more of a unix related question than Lucene specific
however because Lucene is being used, I'm asking here as perhaps
other people have run into a similar issue.
On an Amazon EC2 merge, read, and write operations are possibly
blocking due to underlying IO. Is there a tool that you have
use
Grant,
I can probably do the 3 billion document one from Prague, or a
realtime search one... I spaced on submitting for ApacheCon.
Are there cool places in the Carolinas to hang?
Cheers bro,
Jason
On Tue, Jun 22, 2010 at 10:51 AM, Grant Ingersoll
wrote:
> Lucene Revolution Call For Particip
Lets say the segment infos file is missing, and I'm aware of
CheckIndex, however is there a tool to recreate a segment infos file?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail:
egment is given the same name as the first segment that
> shares it. However, unfortunately, because of merging, it's possible
> that this mapping is not easy (maybe not possible, depending on the
> merge policy...) to reconstruct. I think this'll be the hardest part
> :)
>
&
In a word, no. You'd need to customize the Lucene source to accomplish this.
On Wed, Nov 10, 2010 at 1:02 PM, Burton-West, Tom wrote:
> Hello all,
>
> We have an extremely large number of terms in our indexes. I want to be able
> to extract a sample of the terms, say something like every 128th
Yeah that's customizing the Lucene source. :) I should have gone into
more detail, I will next time.
On Wed, Nov 10, 2010 at 2:10 PM, Michael McCandless
wrote:
> Actually, the .tii file pre-flex (3.x) is nearly identical to the .tis
> file, just that it only contains every 128th term.
>
> If you
I'm curious if there's a new way (using flex or term states) to store
IDs alongside a document and retrieve the IDs of the top N results?
The goal would be to minimize HD seeks, and not use field caches
(because they consume too much heap space) or the doc stores (which
require two seeks). One pos
s branch)
>
> -Yonik
> http://lucidimagination.com
>
>
> On Wed, Feb 2, 2011 at 1:03 PM, Jason Rutherglen > wrote:
>
>> I'm curious if there's a new way (using flex or term states) to store
>> IDs alongside a document and retrieve the IDs of the top N resul
> there is a entire RAM resident part and a Iterator API that reads /
> streams data directly from disk.
> look at DocValuesEnum vs, Source
Nice, thanks!
On Thu, Feb 3, 2011 at 12:20 AM, Simon Willnauer
wrote:
> On Thu, Feb 3, 2011 at 3:23 AM, Jason Rutherglen
> wrote:
>>
This could be a rhetorical question. The way to find the last/max
term that is a unique per document is to use TermsEnum to seek to the
first term of a field, then call seek to the docFreq-1 for the last
ord, then get the term, or is there a better/faster way?
that supports ord (eg FixedGap).
>
> Mike
>
> On Fri, Feb 18, 2011 at 9:24 PM, Jason Rutherglen
> wrote:
>> This could be a rhetorical question. The way to find the last/max
>> term that is a unique per document is to use TermsEnum to seek to the
>> first term of a
rd. How
would I seek to the last term in the index using VarGaps? Or do I
need to interact directly with the FST class (and if so I'm not sure
what to do there either).
Thanks Mike.
On Sun, Feb 20, 2011 at 2:51 PM, Michael McCandless
wrote:
> On Sat, Feb 19, 2011 at 8:42 AM, Jason Rutherg
ordered IDs stored in the index, so that
remaining documents (that lets say were left in RAM prior to process
termination) can be indexed. It's an inferred transaction checkpoint.
On Mon, Feb 21, 2011 at 5:31 AM, Michael McCandless
wrote:
> On Sun, Feb 20, 2011 at 8:47 PM, Jason Rutherglen
&
ConcurrentMergeScheduler is tied to a specific IndexWriter, however if
we're running in an environment (such as Solr's multiple cores, and
other similar scenarios) then we'd have a CMS per IW. I think this
effectively disables CMS's max thread merge throttling feature?
---
I'm seeing an error when using the misc Append codec.
java.lang.AssertionError
at
org.apache.lucene.store.ByteArrayDataInput.readBytes(ByteArrayDataInput.java:107)
at
org.apache.lucene.index.codecs.BlockTermsReader$FieldReader$SegmentTermsEnum._next(BlockTermsReader.java:661)
at
org.apache.luce
I think Solr has a HashDocSet implementation?
On Tue, Apr 5, 2011 at 3:19 AM, Michael McCandless
wrote:
> Can we simply factor out (poach!) those useful-sounding classes from
> Nutch into Lucene?
>
> Mike
>
> http://blog.mikemccandless.com
>
> On Tue, Apr 5, 2011 at 2:24 AM, Antony Bowesman
> w
Is http://code.google.com/a/apache-extras.org/p/luceneutil/ designed
to replace or augment the contrib benchmark? For example it looks
like SearchPerfTest would be useful for executing queries over a
pre-built index. Though there's no indexing tool in the code tree?
-
> I don't think we'd do the post-filtering solution, but instead maybe
> resolve the deletes "live" and store them in a transactional data
I think Michael B. aptly described the sequence ID approach for 'live' deletes?
On Mon, Jun 13, 2011 at 3:00 PM, Michael McCandless
wrote:
> Yes, adding dele
> deletions made by readers merely mark it for
> deletion, and once a doc has been marked for deletions it is deleted for all
> intents and purposes, right?
There's the point-in-timeness of a reader to consider.
> Does the N in NRT represent only the cost of reopening a searcher?
Aptly put, and
> even high complexity as ES supports lucene-like query nesting via JSON
That sounds interesting. Where is it described in the ES docs? Thanks.
On Wed, Nov 16, 2011 at 1:36 PM, Peter Karich wrote:
> Hi,
>
> its not really fair to compare NRT of Solr to ElasticSearch.
> ElasticSearch provides
The docs are slim on examples.
On Wed, Nov 16, 2011 at 3:35 PM, Peter Karich wrote:
>
>>> even high complexity as ES supports lucene-like query nesting via JSON
>> That sounds interesting. Where is it described in the ES docs? Thanks.
>
> "Think of the Query DSL as an AST of queries"
> http://w
Even though the NumericRangeQuery.new* methods do not support
BigInteger, the underlying recursive algorithm supports any sized
number.
Has this been explored?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For
te:
>
> Yes, I think you pinpointed what I see over and over with Solr. The two
> desires pull in opposite directions. I think Jason Rutherglen is very keen
> to start talking about Lucene clusters and index replication in such clusters
> without using the classic master/slave appr
Hello all,
I don't mean this to sound like a solicitation. I've been working on
realtime search and created some Lucene patches etc. I am wondering
if there are social networks (or anyone else) out there who would be
interested in collaborating with Apache on realtime search to get it
to the poi
for social networks interested in realtime
search to get involved as it may be something that is difficult for
one company to have enough resources to implement to a production
level. I think this is where open source collaboration is
particularly useful.
Cheers,
Jason Rutherglen
[EMAIL PROTECTED]
On W
ections. and before a
> indexwrite/delete i would sync the cache with index.
>
> I am waiting for lucene 2.4 to proceed. (query by delete)
>
> Best.
>
> On Wed, Sep 3, 2008 at 10:20 PM, Jason Rutherglen <
> [EMAIL PROTECTED]> wrote:
>
>> Hello all,
>>
>&g
In Ocean I had to use a transaction log and execute everything that
way like SQL database replication. Then let each node handle it's own
merging process. Syncing the indexes is used to get a new node up to
speed, otherwise it's avoided for the reasons mentioned in the
previous email.
On Fri, Se
Hi Jang,
I've been working on Tag Index to address this issue. It seems like a
popular feature and I have not had time to fully implement it yet.
http://issues.apache.org/jira/browse/LUCENE-1292 To be technical it
handles UN_TOKENIZED fields (did this name change now?) and some
specialized thing
Hi Jang,
Yes, and I have not completed it either... Perhaps when I do you can use it.
Best regards,
Jason
On Tue, Sep 9, 2008 at 9:20 PM, 장용석 <[EMAIL PROTECTED]> wrote:
> Thanks for your helps.
> I have about 40 documents in my index and it is constant update (price
> or name.. etc).
> I wil
Yes Tag Index will work. I have not had time to complete it however
if you are interested in working on it please feel free to contact me.
On Fri, Sep 12, 2008 at 3:48 PM, Mark Miller <[EMAIL PROTECTED]> wrote:
> You might check out the tagindex issue in jira as well. Havn't looked at it
> myself
It would be good to allow users to use their own Filter subclasses in
SOLR. This will help with RMI based implementations that use SOLR,
and will allow all of the open source Filter work to be used in SOLR,
without needing to recreate it with DocSets.
2008/9/14 Gerardo Segura <[EMAIL PROTECTED]>:
I am updating it to work with trunk.
On Mon, Sep 15, 2008 at 2:11 PM, Otis Gospodnetic
<[EMAIL PROTECTED]> wrote:
> Yes, probably out of sync with the 2.3.2 code. Have you tried applying it to
> the trunk?
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> - Or
Hi Wojciech,
The code isn't ready, it is a major project and I am trying to also
complete the realtime indexing patches and look for a job. I believe
that the tag indexing stuff is of interest to many people so if there
is someone who can pay to get it completed feel free to contact me as
I am av
Hi Wojciech,
Integration with SOLR would be ideal. However that would
take more time. It depends on the exact features. There is at least
one patch to IndexWriter. The merging is the part that needs to be
synchronized and this is where I am hesitant because Ocean/realtime
search performs merge
It would be a good feature in Lucene to be able to sort, or perhaps
store the postings in term frequency sorted order. Thoughts?
On Wed, Sep 17, 2008 at 9:33 AM, Andrzej Bialecki <[EMAIL PROTECTED]> wrote:
> Renaud Delbru wrote:
>>
>> Hi all,
>>
>> I am wondering if Lucene implements the query op
Mike,
As part of my goal of trying to use Lucene as primary storage
mechanism (perhaps not the best idea), what do you think is the best
way to handle storing data in Lucene and preventing corrupted data the
way something like an SQL database handles corrupted data? Or is
there simply no good way
2008 at 12:13 PM, Michael McCandless
<[EMAIL PROTECTED]> wrote:
>
> Corrupted data in what sense?
>
> EG if you don't trust your IO system to store data properly?
>
> Mike
>
> Jason Rutherglen wrote:
>
>> Mike,
>>
>> As part of my goal of tryi
What is that?
On Mon, Sep 29, 2008 at 8:51 AM, Cam Bazz <[EMAIL PROTECTED]> wrote:
> Has anyone tried to implement a triplet store with lucene?
>
> Best,
> -C.B.
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional c
Seeing the following issue between Lucene 2.3 and 2.4. A 2.3 serialized Term
object cannot be deserialized by 2.4. I would guess it has something to do
with a different Java compiler being used for the Lucene 2.4 build as
serialVersionUID is not defined in the Term class. Fixing the issue is
crit
It would be nice to have a pluggable solution for deleteddocs in IndexReader
that accepts a Filter, and have BitVector implement Filter. This way I do
not have to implement IndexReader.clone.
On Mon, Dec 1, 2008 at 5:04 PM, Michael McCandless <
[EMAIL PROTECTED]> wrote:
>
> So in your UI, you'd
ssign one ourselves, and then we have to remember to change
> it if we ever make a big enough change to Term, to allow serialize in
> one version of Lucene & deserialize in another.
>
> Mike
>
>
> Jason Rutherglen wrote:
>
> Seeing the following issue between Lucene 2.
<
[EMAIL PROTECTED]> wrote:
>
> Jason Rutherglen wrote:
>
> if you don't set serialVersionUID yourself, then java assigns a
>>>
>> rather volatile one for you
>>
>> True however the Java specification defines how the serialVersionUID
>> shoul
I prefer Externalizable as well as it makes Serialization faster. Perhaps
also for Query and it's subclasses to start? I had code to do this for
Analyzer as well which could be useful, perhaps a different patch though.
On Tue, Dec 2, 2008 at 2:22 AM, Michael McCandless <
[EMAIL PROTECTED]> wrote
The field cache is completely reloaded. LUCENE-831 solves this by merging
the field caches of the segments. For realtime search systems, merging the
field caches is not desirable though.
On Thu, Dec 4, 2008 at 6:45 PM, John Wang <[EMAIL PROTECTED]> wrote:
> Glad to be of help.
> Understand that
Hi M.S.,
Do you think it would be cool to have some faceting built into Lucene at
some point?
-J
On Tue, Dec 9, 2008 at 10:11 PM, Michael Stoppelman <[EMAIL PROTECTED]>wrote:
> Yeah looks similar to what we've implemented for ourselves (although I
> haven't looked at the implementation). We've
Hello,
I'm interested in getting FastSSFuzzy into Lucene, perhaps as a contrib
module. One question is how much would the index grow? We've got a list of
people's names we want to do spellchecking on for example.
-J
I downloaded trunk via SVN. Went to trunk/contrib/benchmark. Executed ant
enwiki. I'm not sure what else needs to be done. Received this error:
enwiki:
[echo] Working Directory:
/Users/jrutherg/dev/lucenetrunk/trunk/contrib/benchmark/work
[java] Running algorithm from:
/Users/jrutherg
1, 2009 at 5:03 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
>
> You should download Wikipedia's XML file manually yourself, uncompress it,
> and then edit docs.file in that alg to point to it.
>
> Mike
>
>
> Jason Rutherglen wrote:
>
>
pendix.
>>
>> But you are close... oh, it actually looks like the output file
>> (work/enwiki.txt) could not be written. Does that directory (../work)
>> exist? (I think build.xml should have created it).
>>
>> Mike
>>
>> Jason Rutherglen
Google uses dedicated highlighting servers. Maybe this architecture would
work for you.
On Mon, Feb 2, 2009 at 11:24 PM, Michael Stoppelman wrote:
> Hi all,
>
> My search backends are only able to eek out 13-15 qps even with the entire
> index in memory (this makes it very expensive to scale). A
http://en.wikipedia.org/wiki/Google_platform
Document server summarization
On Thu, Feb 5, 2009 at 12:57 PM, Michael Stoppelman wrote:
> On Thu, Feb 5, 2009 at 12:47 PM, Michael Stoppelman >wrote:
>
> >
> >
> > On Thu, Feb 5, 2009 at 9:05 AM, Jason Rutherglen <
While indexing using
contrib/org.apache.lucene.benchmark.byTask.feeds.EnwikiDocMaker. The
asserion error is from TermsHashPerField.comparePostings(RawPostingList p1,
RawPostingList p2). A Payload is added to the document representing a UID.
Only 1-2 out of 1 million documents indexed generates th
I'm overriding MergePolicy which is public, however SegmentInfos is package
protected which means the MergePolicy subclass must be in the
org.apache.lucene.index package. Can we make SegmentInfos public?
> > H.
> >
> > Jason is this easily/compactly repeated? EG, try to index the N docs
> > before that one.
> >
> > If you remove the SinglePayloadTokenStream field, does the exception
> > still happen?
> >
> > Mike
> >
> > Jas
12:25 PM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> H.
>
> Jason is this easily/compactly repeated? EG, try to index the N docs
> before that one.
>
> If you remove the SinglePayloadTokenStream field, does the exception
> still happen?
>
> Mike
It looks like you are reusing a Field (the f.setValue(...) calls); are
> you sure you're not changing a Document/Field while another thread is
> adding it to the index?
>
> If you can post the full code, then I can try to run it on my
> wikipedia dump locally.
>
> Mi
LuceneError when executed should reproduce the failure. The
contrib/benchmark libraries are required. MultiThreadDocAdd is a
multithreaded indexing utility class.
On Wed, Mar 25, 2009 at 1:06 PM, Jason Rutherglen <
jason.rutherg...@gmail.com> wrote:
> Each document is being created in
e segments with enough deletes need to merged
away in 1-2 hours. Meaning optimizing may not be best as it requires later
large merges. Also an interleaving system that does not perform merges if a
flush is occurring could useful for minimizing disk trash.
On Wed, Mar 25, 2009 at 3:39 PM, J
John,
We looked at implementing delete by doc id for LUCENE-1516, however it
seemed to be something that if enough people wanted we could implement it at
as a later patch.
The implementation involves maintaining a genealogy of SegmentReaders within
IndexWriter so that deletes to a reader that has
Hi Shay,
I think IndexWriter.getReader from LUCENE-1516 in trunk is what
you're talking about? It pools readers internally so there's no
need to call IndexReader.reopen, one simply calls IW.getReader
to get new readers containing recent updates.
-J
BTW I replied to the message on java-u...@lucen
Hi Dan,
You are looking to throttle the merging? I'd recommend setting
ConcurrentMergeScheduler.setMaxThreadCount(1). This way IW.addDocument
doesn't wait while a merge occurs (like SerialMergeScheduler) however it
should not use as much CPU as only one merge will occur at a time.
In regards to
On the topic of user groups, is there a Bay Area Lucene users group?
> LUCENE-1458 (flexible indexing) has these improvements,
Mike, can you explain how it's different? I looked through the code once
but yeah, it's in with a lot of other changes.
On Wed, Jun 10, 2009 at 5:40 AM, Michael McCandless <
luc...@mikemccandless.com> wrote:
> This (very large number of
d
>terms, and is slurped into the arrays on init.
>
> This is a sizable RAM savings over what's done now because you save 2
> objects, 3 pointers, 2 longs, 2 ints (I think), per indexed term.
>
> Mike
>
> On Wed, Jun 10, 2009 at 2:02 PM, Jason
> Rutherglen wrote:
&
> As I understand it, the user won't see any changes to the
index until a new Searcher is created.
Correct.
> How much memory will caching the searcher cost? Are there
other tradeoff's I need to consider?
If you're updating the index frequently (every N seconds) and
the searcher/reader is closed
On the topic of RAM consumption, it seems like field caches
could return estimated RAM usage (given they're arrays of
standard Java types)? There's methods of calculating per
platform (I believe relatively accurately).
On Fri, Jun 19, 2009 at 12:11 PM, Michael McCandless <
luc...@mikemccandless.co
This requires tracking the genealogy of docids as they are merged inside
IndexWriter. It's doable, so if you're particularly interested feel free to
open a jira issue.
On Sun, Jun 28, 2009 at 2:21 AM, Shay Banon wrote:
>
> Hi,
>
> I have a case where deleting documents by doc id make sense (I
Ah ok, I was thinking we'd wait for the new flex indexing patch.
I had started working along these lines before and will take it
on as a project (which is I believe reducing the memory
consumption of the term dictionary).
I plan to segue it into the tag index at some point.
On Tue, Jul 7, 2009 at
Just wondering if it works and if it's a good fit for autosuggest?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user-h...@lucene.apache.org
Do we think that we'll be able to support indexing stop words
using PFOR (with relaxation on the compression to gain
performance?) Today it seems like the best approach to indexing
stop words is to use shingles? However this blows up the term
dict because shingles concatenates phrases together.
On
be honest, I do not know is anyone today runs high volume search from disk
> (maybe SSD), even than, significant portion has to be in RAM...
>
> One day we could throw many CPUs at Query... but this is not an easy one...
>
>
>
>
>
> - Original Message
>> F
http://arstechnica.com/hardware/news/2009/07/intels-new-34nm-ssds-cut-prices-by-60-percent-boost-speed.ars
For me the price on the 80GB is now within reason for a $1300
SuperMicro quad-core 12GB RAM type of server.
-
To unsubscri
In trying to calculate the cost of various slop settings for phrase
queries, what's the time complexity? O(n) or O(n^2)?
-
To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-user
Micah,
If you can post some of your code, it may be easier to identify the
problem you're experiencing.
-J
On Tue, Aug 18, 2009 at 9:55 AM, Micah Jaffe wrote:
> Hi, thanks for the response! The (custom) searchers that are falling out of
> cache are indeed calling close on their IndexReader in f
Take a look at contrib/spatial.
On Fri, Aug 21, 2009 at 7:00 AM, javaguy44 wrote:
>
> Hi,
>
> I'm currently looking at sorting in lucene, and to get started I took a look
> at the distance sorting example from the Lucene in Action book.
>
> Working through the test DistanceSortingTest, I've notice
even hits.
>
> Is there no way to limit the sorting to only the documents that were found
> in the query?
>
> Thanks
>
>
>
> Jason Rutherglen-2 wrote:
>>
>> Take a look at contrib/spatial.
>>
>> On Fri, Aug 21, 2009 at 7:00 AM, javaguy44 wrot
Daniel,
You may want to look at SOLR-1375 which enables ID checking
using a BloomFilter (with a specified errorrate of false
positives). Otherwise for what you're trying to do, you'd need
to create a hash map?
-J
On Thu, Aug 13, 2009 at 7:33 AM, Daniel Shane wrote:
> Hi all!
>
> I'm currently ru
While indexing with the latest nightly build of Solr on Amazon EC2 the
following JVM bug has occurred twice on two different servers.
Post the log to a Jira issue?
java version "1.6.0_07"
Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed
> - Mark
>
> http://www.lucidimagination.com
>
>
>
> Jason Rutherglen wrote:
>> While indexing with the latest nightly build of Solr on Amazon EC2 the
>> following JVM bug has occurred twice on two different servers.
>>
>> Post the log to a Jira issue?
>>
I think CSF hasn't been implemented because it's only marginally
useful yet requires fairly significant rewrites of core code
(i.e. SegmentMerger) so no one's picked it up including myself.
An interim solution that fulfills the same function (quickly
loading field cache values) using what works rel
I'm seeing a strange exception when indexing using the latest Solr rev on EC2.
org.apache.solr.client.solrj.SolrServerException:
org.apache.solr.client.solrj.SolrServerException:
java.lang.RuntimeException: after flush: fdx size mismatch: 468 docs
vs 298404 length in bytes of _0.fdx
at
or
he fdx file
> size is 3748 (= 4 + 468*8), yet the file size is far larger than that
> (298404).
>
> How repeatable is it? Can you turn on infoStream, get the exception
> to happen, then post the resulting output?
>
> Mike
>
> On Thu, Sep 10, 2009 at 7:19 PM, Jason Ruther
It depends on whether or not the commit completes before the
reopen. Lucene 2.9 adds an IndexWriter.getReader method that
will always return with the latest modifications to your index.
So if you're adding many documents, you can at anytime, call
IW.getReader and you will be able to search the cha
I'm not sure I understand the question. You're trying to reopen
the segments that you're replicated and you're wondering what's
changed in Lucene?
On Mon, Oct 5, 2009 at 5:30 PM, Nigel wrote:
> Anyone have any ideas here? I imagine a lot of other people will have a
> similar question when trying
Chris,
It sounds like you're on the right track. Have you looked at
Solr which uses the rsync/Java replication method you mentioned?
Replication and near realtime in Solr aren't quite there yet,
however it wouldn't be too hard to add it.
-J
On Tue, Oct 6, 2009 at 3:57 PM, Chris Were wrote:
> Hi
Maarten,
Depending on the hardware available you can use a Hadoop cluster
to reindex more quickly. With Amazon EC2 one can spin up several
nodes, reindex, then tear them down when they're no longer
needed. Also you can simply update in place the existing
documents in the index, though you'd need t
We have a way to merges indexes together with IW.addIndexes,
however not the opposite, split up an index with multiple
segments. I think I can simply manufacture a new segmentinfos in
a new directory, copy over the segments files from those
segments, delete the copied segments from the source, and
Out of curiousity and perhaps for practical purposes, how does one
handle mixed language documents? I suppose one could extract the
words of a particular language and place it in a lang specific field?
Are there libraries to perform this (yet)?
On Thu, Oct 8, 2009 at 6:32 AM, Christian Reuschling
Eric,
Katta doesn't require HDFS which would be slow to search on,
though Katta can be used to copy indexes out of HDFS onto local
servers. The best bet is hardware that uses SSDs because merges
and update latency will greatly decrease and there won't be a
synchronous IO issue as there is with har
on it.
-J
On Thu, Oct 8, 2009 at 8:18 PM, Jake Mannix wrote:
> Jason,
>
> On Thu, Oct 8, 2009 at 7:56 PM, Jason Rutherglen > wrote:
>
>> Today near realtime search (with or without SSDs) comes at a
>> price, that is reduced indexing speed due to continued in RAM
>&g
variety of configurations. The best way to go about
>> this is to post benchmarks that others may run in their
>> environment which can then be tweaked for their unique edge
>> cases. I wish I had more time to work on it.
>>
>> -J
>>
>> On Thu, Oct 8, 2009
ust plain
> disappointing.*
>
> Thanks Jake for the clarification, and Eric, let me know if you to
> know more in detail with how we are dealing with realtime indexing/search
> with Zoie here at linkedin in a production environment powering a real
> internet company with real
Hi Cedric,
There is a wiki page on NRT at:
http://wiki.apache.org/lucene-java/NearRealtimeSearch
Feel free tp ask questions if there's not enough information.
-J
On Mon, Oct 12, 2009 at 2:24 AM, melix wrote:
>
> Hi,
>
> I'm going to replace an old reader/writer synchronization mechanism we had
If there's a bug you're seeing, it's helpful to open an issue and post
code reproducing it.
On Wed, Nov 11, 2009 at 3:41 AM, Albert Juhe wrote:
>
> I think that this is the best way to proceed.
>
> thank you Mike
>
>
>
> Michael McCandless-2 wrote:
>>
>> Can you narrow the leak down to a small se
Is there a setting to fix this?
[junit] Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
[junit] at java.util.Arrays.copyOf(Arrays.java:2882)
[junit] at
java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:100)
[junit] at
java.lang
1 - 100 of 150 matches
Mail list logo