Hi Folks,
I just tried to index a data set that was probably 2x as large as the previous
one I'd been using with the same code. The indexing completed fine, although
it was slower than I would have liked. ;-) But the following problem occurs
when I try to use FieldCache to look up an indexed
Not good indeed.
Synched to trunk, blew away old indexes, reindexed, same behavior. So I think
we've got a problem, Houston. ;-)
Karl
-Original Message-
From: ext Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Wednesday, October 27, 2010 11:08 AM
To: dev@lucene.apache.org
It's on an internal Nokia machine, unfortunately, so the only way I can
transfer it out is with my credentials, or by email, which is definitely not
going to work ;-). But if you can provide me with an account on a machine I'd
be transferring it to, I may be able to scp it from here.
Karl
-
Talked with IT here - they don't recommend external transfers of this size. So
I think we'd best try the "instrument and repeat" approach instead."
Karl
-Original Message-
From: ext karl.wri...@nokia.com [mailto:karl.wri...@nokia.com]
Sent: Thursday, October 28, 2010 8:16 AM
To: dev@lu
Yep, that fixed it. ;-)
Everything seems happy now.
Karl
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley
Sent: Thursday, October 28, 2010 10:17 AM
To: dev@lucene.apache.org
Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
On
The internet is not the bottleneck ;-). It's the intranet here. Index is 14GB.
Besides, it looks like Yonik found the problem.
Karl
-Original Message-
From: ext Walter Underwood [mailto:wun...@wunderwood.org]
Sent: Thursday, October 28, 2010 11:00 AM
To: dev@lucene.apache.org
Subject:
Glad to be of service. ;-)
Karl
-Original Message-
From: ext Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Thursday, October 28, 2010 11:48 AM
To: dev@lucene.apache.org; simon.willna...@gmail.com
Subject: Re: ArrayIndexOutOfBounds exception using FieldCache
On Thu, Oct 28,
In database queries, it is often useful to treat an empty value specially, and
be able to search explicitly for records that have (for instance) no field X,
or no value for field X. I can't regurgitate offhand all the precise
situations that I've used this and claim that they would apply to a s
Solr trunk seems to have compilation errors:
[javac]
C:\wip\solr-dym\lucene_solr_trunk\solr\src\java\org\apache\solr\handler\component\ResponseBuilder.java:124:
cannot find symbol
[javac] symbol : variable debug
[javac] location: class org.apache.solr.handler.component.ResponseBuild
Never mind - this was due to a local change in my work area.
Karl
_
From: Wright Karl (Nokia-MS/Boston)
Sent: Friday, November 05, 2010 3:51 PM
To: 'dev@lucene.apache.org'
Subject: Compilation errors
Solr trunk seems to have compilation errors:
[j
Is this something ManifoldCF needs to do also?
Karl
-Original Message-
From: ext Grant Ingersoll [mailto:gsing...@apache.org]
Sent: Tuesday, November 09, 2010 3:34 PM
To: dev@lucene.apache.org
Subject: Re: svn commit: r1032995 - in
/lucene/dev/trunk/solr/src/site/src/documentation/conten
Folks,
I had an interesting conversation with Simon a few weeks back. It occurred to
me that it might be possible to build an automata that handles stemming and
pluralization on searches. Just a thought...
Karl
This list might be interested to know that the current Solr LICENSE and NOTICE
file contents are not Apache standard. The ManifoldCF project based its
LICENSE and NOTICE files on the Solr ones and got the following icy reception
in the incubator:
>>
The NOTICE file is still incorrect and i
>From svn, Yonik seems to be the go-to guy for LICENSE and NOTICE stuff.
>Yonik, do you remember why the HSQLDB and Jetty notice text was included in
>Solr's NOTICE.txt? The incubator won't release ManifoldCF until we answer
>this question. ;-)
Karl
F
>>
Nope - wasn't me that added the license stuff into NOTICE.txt ;-)
But, including Jetty's NOTICE seems appropriate for our NOTICE. It's
just the license parts of the HSQLDB and SLF4J that should be moved to
LICENSE.txt
<<
The NOTICE text is actually different from the LICENSE text for
Everyone should (carefully) read the Apache License 2.0 section 4(d). It turns
out that Apache has a somewhat unusual definition for the term "derivative
work". It has to be something you actually modified, not just include. So the
incubator approach seems correct; neither the HSQLDB notice n
There's a fixed-sized thread pool involved in doing the indexing, of a size
that depends on the machine parameters.
Karl
-Original Message-
From: ext Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Wednesday, October 03, 2012 10:43 AM
To: Wright Karl (Nokia-LC/Boston)
Subject
Threads are managed via an executor service and are a fixed size thread pool,
of size 16 on this machine.
There are not a lot of fields in the schema (a half dozen). We do use
PerFieldAnalyzerWrapper.
I'm still grappling with the mat reports; it's possible of course that we're
holding onto so
Mystery resolved; the problem was due to an ever-increasing record size, which
was in turn due to a record structure that was never being cleared. This
caused it to appear as if the total allocation of structures used for analysis
was steadily growing. But the number of such entities did NOT g
Hi folks,
I'm sorely puzzled by the fact that my QParser implementation ceased to work
after the latest Solr/Lucene trunk update. My previous update was about ten
days ago, right after Mike made his index changes.
The symptom is that, although the query parser is correctly called, and seems
t
Another data point: the standard query parser actually ALSO fails when you do
anything other than a *:* query. When you specify a field name, it returns
zero results:
root@duck93:/data/solr-dym/solr-dym# curl
"http://localhost:8983/solr/nose/standard?q=value_0:a*";
07value_0:a*
But:
root@
This turns out to have indeed been due to a recent, but un-announced, index
format change. A rebuilt index worked properly.
Thanks!
Karl
From: ext karl.wri...@nokia.com [karl.wri...@nokia.com]
Sent: Monday, January 17, 2011 10:53 AM
To: dev@lucene.apache
I tried commenting out the final OR term, and that excluded all records that
were out-of-language as expected. It's just the boost that doesn't seem to
work.
Exploring the explain is challenging because of its size, but there are NO
boosts recorded of the size I am using (10.0). Here's the ba
The original query is fine, and has the boost as expected:
((+language:eng +(
CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667)
CutoffQueryWrapper((+othervalue_0:bunker~0.8332333
+value_0:hill)^0.5714286)
CutoffQueryWrapper((+value_0:bunker~0.8332333
+otherval
So I think I understand where the blank values and repeats come from. Those
are the expansions of fuzzy queries against fields that have no matches
whatsoever for the fuzzy values in question. So those are indeed OK.
I guess then that the problem is that the scoring explanation makes no sense.
Found the cause of the zero querynorms, and fixed it. But the results are
still not as I would expect. The first result has language=ger but scores
higher than the second result which has language=eng. And yet, my query is
boosting like this:
Boolean
OR Boolean (boost = 100.0)
AND (langua
Turns out that I inadvertently reverted one of Simon's changes to
CutoffQueryWrapper, which explains the second effect. So all is now well.
Thanks for your assistance!
Karl
From: Wright Karl (Nokia-MS/Boston)
Sent: Thursday, January 20, 2011 9:44 PM
To:
This is a query that wraps another query, which limits the number of results
returned from it to some specific number. It seems very helpful for the
situation where you have a lot of clauses in a query and each of them is
expected to be small, but there is a chance of having one clause return l
A nice idea. I've always wondered about this, because for me "summer" and
"code" do not go together very well. ;-)
Karl
-Original Message-
From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com]
Sent: Monday, January 24, 2011 3:30 PM
To: dev@lucene.apache.org
Subject: Lucene &
I have an interesting scoring problem, which I can't seem to get around.
The problem is best stated as follows:
(1)My schema has several independent fields, e.g. "value_0", "value_1", ...
"value_6".
(2)Every document has all of these fields set, with a-priori field norm
values. Where
Interesting datapoint: After the reindexing, the following query returns the
right results in the right order:
(+value_3:Lexington~0.877 +value_1:Massachusetts~0.877 +*:*^0.0 +*:*^0.0
+*:*^0.0)
(+value_3:Lexington~0.877 +value_1:Massachusetts~0.877 +value_4:_empty_
+value_5:_empty_ +value_6:_em
I took my own suggestion and used the DisjunctionMaxQuery. This solved the
problem.
Karl
From: Wright Karl (Nokia-MS/Boston)
Sent: Wednesday, January 26, 2011 6:40 PM
To: Wright Karl (Nokia-MS/Boston); 'dev@lucene.apache.org'
Cc: 'simon.willna...@gmail.com'
Subject: RE: Scoring woes?
Interestin
All that the patch contributes is the infrastructure needed to allow multiple
queries. It's structured so that the results from one query are available to
construct the query for the next. The patch does not contribute a multi-query
query parser, or means of merging the results into a final re
Hi Grant,
This is a great post.
I'm not a committer for Lucene or Solr, but I'm seriously thinking that much of
what Lucene/Solr does right should be considered by the project I AM a
committer for: ManifoldCF.
Key things I would add based on experience with commercial software development:
(A
Congratulations, Jan!
Karl
-Original Message-
From: ext Mark Miller [mailto:markrmil...@gmail.com]
Sent: Monday, June 13, 2011 10:43 AM
To: dev@lucene.apache.org
Subject: Welcome Jan Høydahl as Lucene/Solr committer
I'm happy to announce that the Lucene/Solr PMC has voted in Jan Høydahl
Hi folks,
How hard would it be to get a link to ManifoldCF from the Solr site's
related-link section? I'm seeing a lot of people who know Solr but have no
idea ManifoldCF even exists, and I'd like to find some way to correct that
problem.
Karl
I created a ticket for it - SOLR-2602. I'll attach a patch shortly.
Karl
-Original Message-
From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com]
Sent: Thursday, June 16, 2011 2:00 PM
To: dev@lucene.apache.org
Subject: Re: Related project link to ManifoldCF from Solr site?
a
Hi folks,
I'm trying to update to the latest trunk, and there have been changes to the
Solr updater that I don't understand how to use. For instance, the following
code:
CommitUpdateCommand commit = new CommitUpdateCommand(this.request,optimize);
... now requires an array of IndexReader ob
Hi folks,
I'm trying to turn SOLR-1895 into a real contrib module but I'm having some
trouble with the ant build for it. Specifically, the module needs the lucene
contrib jar lucene-queries.jar, but I don't know the right way to indicate that
in my new solr/contrib/auth/build.xml file. Does a
Thanks for the reply!
Unfortunately, there must be something more to it. This is what I have:
>>
Solr Integration with ManifoldCF, for repository document authorization
<<
The lucene-libs directory is not even create
common.compile-core:
[javac] Compiling 1 source file to C:\wip\solr\trunk\solr\build\contrib\solr
-auth\classes\java
[javac] C:\wip\solr\trunk\solr\contrib\auth\src\java\org\apache\solr\auth\Ma
nifoldCFSecurityFilter.java:163: cannot find symbol
[javac] symbol : class BooleanFilter
You’re right – the package moved since this was originally developed. An awful
lot of stuff has, in fact, moved. ;-)
That made the difference in finding that class – now I’ve got to chase down a
few others and I should be set.
Karl
From: ext Steven A Rowe [mailto:sar...@syr.edu]
Sent: Friday,
I think your expectation for s-d13 may be incorrect. If you use AD as a model,
you are effectively applying share security that has no allow sids but some
deny sids. With AD you would not get this doc either.
-Original Message
-
From: ext Koji Sekiguchi (JIRA)
Sent: 17/09/2011, 11:49
This works fine for a SearchComponent, but if I try this for a QParserPlugin I
get the following:
[junit] org.apache.solr.common.SolrException: Invalid 'Aware' object:
org.apache.solr.mcf.ManifoldCFQParserPlugin@18941f7 --
org.apache.solr.util.plugin.SolrCoreAware must be an instance of:
[
I created a ticket for this: SOLR-3015. I hope there's a simple solution and I
can just close it, but if not I will experiment and try to produce a patch.
Karl
From: Wright Karl (Nokia-LC/Boston)
Sent: Monday, January 02, 2012 11:02 AM
To: dev@lucene.apa
"SolrCoreAware" and "CloseHook" are related in that you need a SolrCore object
in order to call SolrCore.addCloseHook(). Indeed, the javadoc for the
CloseHook interface states that the expected way you are supposed to use this
in a plugin is via something like this:
public void inform(SolrCore
Thanks, Erik, this is not ideal but it will work for my purposes. But it seems
a shame that the whole SolrCoreAware setup as it was designed turned out to be
so problematic.
Karl
-Original Message-
From: ext Erik Hatcher [mailto:erik.hatc...@gmail.com]
Sent: Wednesday, January 11, 201
Having some interest in this issue, may I suggest setting a branch date? On
the agreed-upon date, a branch is made. After that date, commits go to trunk
and (maybe) are pulled up into the 4.0 branch. If the date is oh, say, 1 week
away, people can plan accordingly to yield a relatively stable
Is there a preferred time/manner for a Solr component (e.g. a SearchComponent)
to clean up resources that have been allocated during the time of its
existence, other than via a finalizer? There seems to be nothing for this in
the NamedListInitializedPlugin interface, and yet if you allocate a r
Hi all,
I received a report of a problem with posting data to Solr. The post method is
a multi-part form, so if you inspect it, it looks something like this:
>>
boundary---
Content-Disposition: form-data; name=metadata_attribute_name
Content-Type: text; charset=utf-8
abc;def;ghi
---bou
I'll need to ask the reporter for more details since it appears the answer is
not simple. It may even be an app server issue.
Thanks
Karl
Sent from my Windows Phone
-Original Message-
From: ext Chris Hostetter
Sent: 7/12/2012 8:29 PM
To: dev@lucene.apache.org
Subject: Re: Solr posting
Hoss,
Here are the details:
(1) The actual metadata posted is a string of the form "12345;#string". There
is only be one value posted for the metadata field, but Solr complains that
we're trying to apply multiple values to a single-valued field and does not
index the document, unless the ";"
I'm sorry the info has been dribbling in slowly; it's all now summarized in
CONNECTORS-491. Now that I've confirmed that this even occurs for them without
the ";" (unlike what I was originally told) it is clear it is a config related
issue. I have urged them to look to this list for further he
Hi guys,
The 4.0.0 lucene-spatial maven dependency on spatial4j is UNVERSIONED. But the
two spatial4j versions in play (0.2 and 0.3) are significantly different. We
have code developed for lucene-spatial 4.0.0 beta which doesn't seem to compile
with either spatial4j version. What was the int
Hi David,
We found the version in the grandparent pom, so that's ok. The build issue
against 0.2 was due to other changes in Lucene 4.0.0-BETA vs. Lucene 4.0.0.
I am willing to assist to some extent with spatial4j, if that is yours. It
changed significantly from 0.2 to 0.3, and not just in th
I am told that SSD's are spec'd for only 70 full writes before they get an
error. The error block is set aside but eventually something critical gets
hit. So you should probably should expect this to happen again.
Karl
-Original Message-
From: ext Uwe Schindler [mailto:u...@thetaphi.d
" Only 70 full writes seems a little bit low for an SSD."
That's what I thought. I was astounded to learn that that is in fact correct
(at least for some of the drives we are using here). Automatic recovery is how
the SSD copes with this failure rate.
But it is entirely possible that the caus
Mike, I'm talking about a 1TB SSD option for some hardware we are buying. If
you are really curious, I can ask the people who are doing the project for the
model and specs.
Karl
-Original Message-
From: ext Michael McCandless [mailto:luc...@mikemccandless.com]
Sent: Monday, August 19,
Right, that's what I said. And one write means writing the *whole* disk. So
Mike and I may *both* be right. ;-)
Karl
-Original Message-
From: ext Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Monday, August 19, 2013 1:07 PM
To: dev@lucene.apache.org
Subject: RE: Lucene tests killed on
Hi all,
For the ManifoldCF project, we have an output connector for Solr, and we'd like
to port it to use SolrJ instead of homegrown code. However, I cannot find any
mention anywhere of whether anyone has tried to maintain compatibility between
later versions of SolrJ (e.g. 4.0.0) and previous
Thanks for the reply. The ticket in question is CONNECTORS-594, if you would
like to just comment there.
Karl
Sent from my Windows Phone
From: ext Ryan McKinley
Sent: 12/28/2012 4:03 PM
To: solr-...@lucene.apache.org
Subject: Re: Is there documentation anywhere
Hi all,
I'm researching the ticket CONNECTORS-513. In this ticket we seem to have
different behavior between Solr 3.x and Solr 4.x as far as Tika content
extraction is concerned. The differences seem to be related to the content
type that is posted to Solr, and can be demonstrated with cURL.
A quick update - it appears that cURL is providing a Content-Type header in the
content part of its multipart post, and is using the file extension to come up
with "text/plain". Changing the file name causes cURL to change this
content-type to "application/octet-stream". But the questions stil
Hi All (and especially Robert),
Lucene NumericDocValues seems to operate slower than we would expect. In our
application, we're using it for storing coordinate values, which we retrieve to
compute a distance. While doing timings trying to determine the impact of
including a sqrt in the calcul
. That is
both the x & y into the same byte[] chunk. I've done this for a Solr
integration in https://issues.apache.org/jira/browse/SOLR-5170
~ David
karl.wright-2 wrote
> Hi All (and especially Robert),
>
> Lucene NumericDocValues seems to operate slower than we would ex
Wow, Hoss, this post was so long ago I barely remember writing it. ;-)
The problem we were having is not that the content type is not set in SolrJ -
it's that SolrCell does not discover it as it did when we used multipart posts
and ran with Solr 3.6. We still aren't sure where the change is tha
Congratulations, Uwe!
Karl
Sent from my Windows Phone
From: ext Koji Sekiguchi
Sent: 11/14/2013 6:35 PM
To: dev@lucene.apache.org
Subject: Re: [VOTE] Lucene / Solr 4.6.0"
Congrats Uwe! :)
koji
(13/11/15 5:11), Uwe Schindler wrote:
> The PMC Chair is going to mar
Hi folks,
Maybe this is documented somewhere, and someone can point me at it. For the
ManifoldCF Solr plugins, we supply a SearchComponent, which wraps the supplied
query in order to perform authorization restrictions on returned documents.
The component only fires if the SHARDS parameter is
As an interested party, and deeply involved in another related Apache project,
I have to say that there is a huge benefit for all Apache projects to use
common source control. If we were starting over, or if svn was going to die
forever, it might be a different story - but given that svn is ali
It also doesn't deal with a major difference between git and svn - in svn,
directories are first-class objects, and in git they aren't (they are created
as needed). So when you try using gitsvn you almost always wind up with
directories you want to remove but can't.
Karl
From: ext Michael Del
FYI
From: Wright Karl (Nokia-S/Cambridge)
Sent: Tuesday, April 20, 2010 8:16 AM
To: 'dominique.bej...@eolya.fr'
Cc: 'solr-...@apache.org'; 'connectors-...@incubator.apache.org';
'connectors-u...@incubator.apache.org'
Subject: RE: Solr and LCF security at query tim
SOLR-1872 looks exactly like what I was envisioning, from the search query
perspective, although instead of the acl xml file you specify LCF stipulates
you would dynamically query the lcf-authority-service servlet for the access
tokens themselves. That would get you support for AD, Documentum,
Hi Peter,
I'm the principal committer for LCF, but I don't know as much about Solr as I
ought to, so it sounds like a potentially productive collaboration.
LCF does exactly what you are looking for - the only issue at all is that you
need to fetch a URL from a webapp to get what you are looking
Hi Peter,
I just committed the promised changes to the LCF Solr output connector.
ACL metadata will now be posted to the Solr Http interface along with the
document as the two following fields:
__ACCESS_TOKEN__document
__DENY_TOKEN__document
There will, of course, potentially be multiple value
Looking around for no-Apache java-only solutions to the AD authentication
problem, it seems to me that what we mainly have available is JAAS plus the
following JAAS login module:
com.sun.security.auth.module.Krb5LoginModule
... which should permit AD authentication to take place, if properly
Hi Peter,
>>
For general Solr access control, there's two layers of security that need to be
addressed:
1. Authentication - make sure the incoming query is from a valid user, and
the passed-in credentials (hash, certificate etc.) are correct
2. Query filtering - potentially reduce the nu
Hi Peter,
I finally had a moment to review the SOLR 1872 and SOLR 1834 contributions in
detail, and have a couple of SOLR-related questions.
Both contributions rely on a SearchComponent to work their magic. However, it
also appears that each modifies the user query in a different way. 1834 us
Ok, not hearing back from Peter, I've done some Solr research and written some
code that might work. The approach I've taken is most similar to SOLR 1834,
other than the LCF-centric logic. Hopefully there will be a chance to try this
out in a full end-to-end way on the weekend, after which I
Hi Solr-knowledgeable folks,
The LCF Solr SearchComponent plugin I'm developing doesn't quite work. The
query I'm trying to do is:
-(allow_token_document:*) and -(deny_token_document:*) and
The result I'm seeing is that everything in the user's search matches, unlike
what I see in the admin
Turns out that, for the standard requestHandler, running this SearchComponent
first causes its rewritten query to be lost. Running last fixed the problem.
(I'd *love* to know why that would be necessary.)
But I'd still like comment as to whether the WildcardFilter construct is
expected to be
Adding to the getFilters() list seems reasonable - although, to be fair, my
code does seem to work as intended when the component is added "last". I'll do
some experimentation and see what model things work most consistently with.
TermRangeQuery doesn't seem to map readily to the functionality
That's certainly an option, and I had thought of it already, but the downside
is that you won't be able to search for documents that *aren't* indexed via LCF
under that model. Which is why I wanted to try to make the other approach fly.
FWIW, I was also told by a colleague that, because this is
I tried the getFilters() approach. It turned out I also needed to create a
list and do setFilters() if getFilters() returns null, but that was easily
remedied. When this is done, it once again works fine if the component is
added "last". But if it is added "first", we now get a stack trace fr
Turns out that FilteredQuery is what is causing the issue in this case. I
removed FilteredQuery, and instead constructed the search using Query objects
instead of Filter objects, and everything is happy now.
Karl
From: Wright Karl (Nokia-S/Cambridge)
Se
Hi Peter,
I'm more than happy to hear your customer's requirements, so no problem there.
It does seem to me that they are a bit different than what I've seen. I think
there is plenty of room for different flavors of solution, so please by all
means go ahead and propose your take on it!
Karl
Putting access control lookup at search-result time has the following benefits:
- It sees changes right away, when the underlying repository changes
Here are the drawbacks, as far as I can see:
- There's a significant extra load on the repository, because every search
result has to be checked a
If we aren't talking about a repository of some kind, then we aren't talking
about using LCF. If your design point is about applying security to NFS via an
acl-xml file, your uploaded contribution will do that just fine (although I
think you might want to use Filters in some places you are curr
Hi Peter,
You should be able to use LCF authorities for your purposes. I'm less clear
about what you mean by the "interface into decoupled acl storage". Existing
repository connectors are not aware of any decoupled storage, and if you were
to adopt the LCF model in its entirety, you've defeate
How low-tech do you want to go?
For example, you can run solr under an entirely different instance of tomcat,
listening on a different port. You can configure (via server.xml) the instance
to only accept connections from the local machine. Your application, which is
happily running on a diffe
>>
Can you explain this localhost restriction thing? If I restrict it to localhost
only would users on the internet still be able to access the solr instance?
Would the application have to make the request and pass back the results to the
external user?
<<
Hi Matt,
This connection bind
That's not what I am talking about at all.
Look inside your tomcat instance's server.xml file. There's a
tag in there somewhere. You just change that to:
Note the "address" attribute. That's the one that causes local binding.
Karl
-Original Message-
From: ext Matthew Mau
Nobody in their right mind can disagree with (1). I should also point out that
writing a custom analyzer is a very typical activity (as is a custom scorer),
so this should be made as straightforward as is possible.
Karl
-Original Message-
From: ext Robert Muir [mailto:rcm...@gmail.com
Hi all,
It seems to me that the "commit" logic in the Solr updateRequestHandler (or
wherever the logic is actually located) conflates two different semantics. One
semantic is what you need to do to make the index process perform well. The
other semantic is guaranteed atomicity of document rec
Hi Mark,
Unfortunately, indexing performance *is* of concern, otherwise I'd already be
committing on every post.
If your guess is correct, you are basically saying that adding a document to an
index in Solr/Lucene is just as fast as writing that file directly to the disk.
Because, obviously,
The reason for this is simple. LCF keeps track of which documents it has
handed off to Solr, and has a fairly involved mechanism for making sure that
every document LCF *thinks* got there, actually does. It even uses a mechanism
akin to a 2-phase commit to make sure that its internal records a
Hi Simon,
I think you are on the right track.
I believe it is not even possible to write a middleware-style layer that stores
documents and performs periodic commits on its own, because the update request
handler never ACKs individual documents on a commit, but merely everything it
has seen si
I created SOLR-1924. Let me know if it's clear enough, or if you'd like me to
modify the ticket in any way.
Thanks,
Karl
From: ext Mark Miller [markrmil...@gmail.com]
Sent: Tuesday, May 25, 2010 5:20 AM
To: dev@lucene.apache.org
Subject: Re: Solr updateReq
I don't understand the -dev requirement either, but for maven the jar suffix
names do matter, and that's where -SNAPSHOT comes in (it represents a nightly
build in Maven naming parlance).
I suspect that since some people will want Lucene without Solr, it is probably
going to be necessary to cre
Hi folks,
This morning I was experimenting with using multiple threads while indexing
some 20,000,000 records worth of content. In fact, my test spun up some 50
threads, and happily chugged away for a couple of hours before I saw the
following output from my test code:
>>
Http protocol er
Hi folks,
This morning I was experimenting with using multiple threads while indexing
some 20,000,000 records worth of content. In fact, my test spun up some 50
threads, and happily chugged away for a couple of hours before I saw the
following output from my test code:
>>
Http protocol er
1 - 100 of 166 matches
Mail list logo