ArrayIndexOutOfBounds exception using FieldCache

2010-10-27 Thread karl.wright
Hi Folks, I just tried to index a data set that was probably 2x as large as the previous one I'd been using with the same code. The indexing completed fine, although it was slower than I would have liked. ;-) But the following problem occurs when I try to use FieldCache to look up an indexed

RE: ArrayIndexOutOfBounds exception using FieldCache

2010-10-28 Thread karl.wright
Not good indeed. Synched to trunk, blew away old indexes, reindexed, same behavior. So I think we've got a problem, Houston. ;-) Karl -Original Message- From: ext Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, October 27, 2010 11:08 AM To: dev@lucene.apache.org

RE: ArrayIndexOutOfBounds exception using FieldCache

2010-10-28 Thread karl.wright
It's on an internal Nokia machine, unfortunately, so the only way I can transfer it out is with my credentials, or by email, which is definitely not going to work ;-). But if you can provide me with an account on a machine I'd be transferring it to, I may be able to scp it from here. Karl -

RE: ArrayIndexOutOfBounds exception using FieldCache

2010-10-28 Thread karl.wright
Talked with IT here - they don't recommend external transfers of this size. So I think we'd best try the "instrument and repeat" approach instead." Karl -Original Message- From: ext karl.wri...@nokia.com [mailto:karl.wri...@nokia.com] Sent: Thursday, October 28, 2010 8:16 AM To: dev@lu

RE: ArrayIndexOutOfBounds exception using FieldCache

2010-10-28 Thread karl.wright
Yep, that fixed it. ;-) Everything seems happy now. Karl -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of ext Yonik Seeley Sent: Thursday, October 28, 2010 10:17 AM To: dev@lucene.apache.org Subject: Re: ArrayIndexOutOfBounds exception using FieldCache On

RE: ArrayIndexOutOfBounds exception using FieldCache

2010-10-28 Thread karl.wright
The internet is not the bottleneck ;-). It's the intranet here. Index is 14GB. Besides, it looks like Yonik found the problem. Karl -Original Message- From: ext Walter Underwood [mailto:wun...@wunderwood.org] Sent: Thursday, October 28, 2010 11:00 AM To: dev@lucene.apache.org Subject:

RE: ArrayIndexOutOfBounds exception using FieldCache

2010-10-28 Thread karl.wright
Glad to be of service. ;-) Karl -Original Message- From: ext Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Thursday, October 28, 2010 11:48 AM To: dev@lucene.apache.org; simon.willna...@gmail.com Subject: Re: ArrayIndexOutOfBounds exception using FieldCache On Thu, Oct 28,

RE: inconsistency/performance trap of empty terms

2010-10-28 Thread karl.wright
In database queries, it is often useful to treat an empty value specially, and be able to search explicitly for records that have (for instance) no field X, or no value for field X. I can't regurgitate offhand all the precise situations that I've used this and claim that they would apply to a s

Compilation errors

2010-11-05 Thread karl.wright
Solr trunk seems to have compilation errors: [javac] C:\wip\solr-dym\lucene_solr_trunk\solr\src\java\org\apache\solr\handler\component\ResponseBuilder.java:124: cannot find symbol [javac] symbol : variable debug [javac] location: class org.apache.solr.handler.component.ResponseBuild

RE: Compilation errors

2010-11-05 Thread karl.wright
Never mind - this was due to a local change in my work area. Karl _ From: Wright Karl (Nokia-MS/Boston) Sent: Friday, November 05, 2010 3:51 PM To: 'dev@lucene.apache.org' Subject: Compilation errors Solr trunk seems to have compilation errors: [j

RE: svn commit: r1032995 - in /lucene/dev/trunk/solr/src/site/src/documentation/content/xdocs/images: solr.jpg solr_FC.eps

2010-11-09 Thread karl.wright
Is this something ManifoldCF needs to do also? Karl -Original Message- From: ext Grant Ingersoll [mailto:gsing...@apache.org] Sent: Tuesday, November 09, 2010 3:34 PM To: dev@lucene.apache.org Subject: Re: svn commit: r1032995 - in /lucene/dev/trunk/solr/src/site/src/documentation/conten

Stemming using automata

2010-11-17 Thread karl.wright
Folks, I had an interesting conversation with Simon a few weeks back. It occurred to me that it might be possible to build an automata that handles stemming and pluralization on searches. Just a thought... Karl

LICENSE/NOTICE file contents

2011-01-08 Thread karl.wright
This list might be interested to know that the current Solr LICENSE and NOTICE file contents are not Apache standard. The ManifoldCF project based its LICENSE and NOTICE files on the Solr ones and got the following icy reception in the incubator: >> The NOTICE file is still incorrect and i

RE: LICENSE/NOTICE file contents

2011-01-08 Thread karl.wright
>From svn, Yonik seems to be the go-to guy for LICENSE and NOTICE stuff. >Yonik, do you remember why the HSQLDB and Jetty notice text was included in >Solr's NOTICE.txt? The incubator won't release ManifoldCF until we answer >this question. ;-) Karl F

RE: LICENSE/NOTICE file contents

2011-01-08 Thread karl.wright
>> Nope - wasn't me that added the license stuff into NOTICE.txt ;-) But, including Jetty's NOTICE seems appropriate for our NOTICE. It's just the license parts of the HSQLDB and SLF4J that should be moved to LICENSE.txt << The NOTICE text is actually different from the LICENSE text for

RE: LICENSE/NOTICE file contents

2011-01-10 Thread karl.wright
Everyone should (carefully) read the Apache License 2.0 section 4(d). It turns out that Apache has a somewhat unusual definition for the term "derivative work". It has to be something you actually modified, not just include. So the incubator approach seems correct; neither the HSQLDB notice n

RE: Lucene 4.0 memory usage during indexing - is this expected?

2012-10-03 Thread karl.wright
There's a fixed-sized thread pool involved in doing the indexing, of a size that depends on the machine parameters. Karl -Original Message- From: ext Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Wednesday, October 03, 2012 10:43 AM To: Wright Karl (Nokia-LC/Boston) Subject

RE: Lucene 4.0 memory usage during indexing - is this expected?

2012-10-03 Thread karl.wright
Threads are managed via an executor service and are a fixed size thread pool, of size 16 on this machine. There are not a lot of fields in the schema (a half dozen). We do use PerFieldAnalyzerWrapper. I'm still grappling with the mat reports; it's possible of course that we're holding onto so

RE: Lucene 4.0 memory usage during indexing - is this expected?

2012-10-03 Thread karl.wright
Mystery resolved; the problem was due to an ever-increasing record size, which was in turn due to a record structure that was never being cleared. This caused it to appear as if the total allocation of structures used for analysis was steadily growing. But the number of such entities did NOT g

Query parser contract changes?

2011-01-17 Thread karl.wright
Hi folks, I'm sorely puzzled by the fact that my QParser implementation ceased to work after the latest Solr/Lucene trunk update. My previous update was about ten days ago, right after Mike made his index changes. The symptom is that, although the query parser is correctly called, and seems t

RE: Query parser contract changes?

2011-01-17 Thread karl.wright
Another data point: the standard query parser actually ALSO fails when you do anything other than a *:* query. When you specify a field name, it returns zero results: root@duck93:/data/solr-dym/solr-dym# curl "http://localhost:8983/solr/nose/standard?q=value_0:a*"; 07value_0:a* But: root@

RE: Query parser contract changes?

2011-01-18 Thread karl.wright
This turns out to have indeed been due to a recent, but un-announced, index format change. A rebuilt index worked properly. Thanks! Karl From: ext karl.wri...@nokia.com [karl.wri...@nokia.com] Sent: Monday, January 17, 2011 10:53 AM To: dev@lucene.apache

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright
I tried commenting out the final OR term, and that excluded all records that were out-of-language as expected. It's just the boost that doesn't seem to work. Exploring the explain is challenging because of its size, but there are NO boosts recorded of the size I am using (10.0). Here's the ba

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright
The original query is fine, and has the boost as expected: ((+language:eng +( CutoffQueryWrapper((+value_0:bunker~0.8332333 +value_0:hill)^0.667) CutoffQueryWrapper((+othervalue_0:bunker~0.8332333 +value_0:hill)^0.5714286) CutoffQueryWrapper((+value_0:bunker~0.8332333 +otherval

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright
So I think I understand where the blank values and repeats come from. Those are the expansions of fuzzy queries against fields that have no matches whatsoever for the fuzzy values in question. So those are indeed OK. I guess then that the problem is that the scoring explanation makes no sense.

RE: Odd Boolean scoring behavior?

2011-01-20 Thread karl.wright
Found the cause of the zero querynorms, and fixed it. But the results are still not as I would expect. The first result has language=ger but scores higher than the second result which has language=eng. And yet, my query is boosting like this: Boolean OR Boolean (boost = 100.0) AND (langua

RE: Odd Boolean scoring behavior?

2011-01-21 Thread karl.wright
Turns out that I inadvertently reverted one of Simon's changes to CutoffQueryWrapper, which explains the second effect. So all is now well. Thanks for your assistance! Karl From: Wright Karl (Nokia-MS/Boston) Sent: Thursday, January 20, 2011 9:44 PM To:

RE: Odd Boolean scoring behavior?

2011-01-21 Thread karl.wright
This is a query that wraps another query, which limits the number of results returned from it to some specific number. It seems very helpful for the situation where you have a lot of clauses in a query and each of them is expected to be small, but there is a chance of having one clause return l

RE: Lucene & Google Summer of Code 2011

2011-01-24 Thread karl.wright
A nice idea. I've always wondered about this, because for me "summer" and "code" do not go together very well. ;-) Karl -Original Message- From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Monday, January 24, 2011 3:30 PM To: dev@lucene.apache.org Subject: Lucene &

Scoring woes?

2011-01-26 Thread karl.wright
I have an interesting scoring problem, which I can't seem to get around. The problem is best stated as follows: (1)My schema has several independent fields, e.g. "value_0", "value_1", ... "value_6". (2)Every document has all of these fields set, with a-priori field norm values. Where

RE: Scoring woes?

2011-01-26 Thread karl.wright
Interesting datapoint: After the reindexing, the following query returns the right results in the right order: (+value_3:Lexington~0.877 +value_1:Massachusetts~0.877 +*:*^0.0 +*:*^0.0 +*:*^0.0) (+value_3:Lexington~0.877 +value_1:Massachusetts~0.877 +value_4:_empty_ +value_5:_empty_ +value_6:_em

RE: Scoring woes?

2011-01-26 Thread karl.wright
I took my own suggestion and used the DisjunctionMaxQuery. This solved the problem. Karl From: Wright Karl (Nokia-MS/Boston) Sent: Wednesday, January 26, 2011 6:40 PM To: Wright Karl (Nokia-MS/Boston); 'dev@lucene.apache.org' Cc: 'simon.willna...@gmail.com' Subject: RE: Scoring woes? Interestin

RE: [jira] Commented: (SOLR-2026) Need infrastructure support in Solr for requests that perform multiple sequential queries

2011-03-04 Thread karl.wright
All that the patch contributes is the infrastructure needed to allow multiple queries. It's structured so that the results from one query are available to construct the query for the next. The patch does not contribute a multi-query query parser, or means of merging the results into a final re

RE: Brainstorming on Improving the Release Process

2011-03-30 Thread karl.wright
Hi Grant, This is a great post. I'm not a committer for Lucene or Solr, but I'm seriously thinking that much of what Lucene/Solr does right should be considered by the project I AM a committer for: ManifoldCF. Key things I would add based on experience with commercial software development: (A

RE: Welcome Jan Høydahl as Lucene/Solr committer

2011-06-13 Thread karl.wright
Congratulations, Jan! Karl -Original Message- From: ext Mark Miller [mailto:markrmil...@gmail.com] Sent: Monday, June 13, 2011 10:43 AM To: dev@lucene.apache.org Subject: Welcome Jan Høydahl as Lucene/Solr committer I'm happy to announce that the Lucene/Solr PMC has voted in Jan Høydahl

Related project link to ManifoldCF from Solr site?

2011-06-16 Thread karl.wright
Hi folks, How hard would it be to get a link to ManifoldCF from the Solr site's related-link section? I'm seeing a lot of people who know Solr but have no idea ManifoldCF even exists, and I'd like to find some way to correct that problem. Karl

RE: Related project link to ManifoldCF from Solr site?

2011-06-16 Thread karl.wright
I created a ticket for it - SOLR-2602. I'll attach a patch shortly. Karl -Original Message- From: ext Simon Willnauer [mailto:simon.willna...@googlemail.com] Sent: Thursday, June 16, 2011 2:00 PM To: dev@lucene.apache.org Subject: Re: Related project link to ManifoldCF from Solr site? a

Solr updater trunk changes

2011-07-27 Thread karl.wright
Hi folks, I'm trying to update to the latest trunk, and there have been changes to the Solr updater that I don't understand how to use. For instance, the following code: CommitUpdateCommand commit = new CommitUpdateCommand(this.request,optimize); ... now requires an array of IndexReader ob

How to access a Lucene contrib package from a Solr contrib package?

2011-09-16 Thread karl.wright
Hi folks, I'm trying to turn SOLR-1895 into a real contrib module but I'm having some trouble with the ant build for it. Specifically, the module needs the lucene contrib jar lucene-queries.jar, but I don't know the right way to indicate that in my new solr/contrib/auth/build.xml file. Does a

RE: How to access a Lucene contrib package from a Solr contrib package?

2011-09-16 Thread karl.wright
Thanks for the reply! Unfortunately, there must be something more to it. This is what I have: >> Solr Integration with ManifoldCF, for repository document authorization << The lucene-libs directory is not even create

RE: How to access a Lucene contrib package from a Solr contrib package?

2011-09-16 Thread karl.wright
common.compile-core: [javac] Compiling 1 source file to C:\wip\solr\trunk\solr\build\contrib\solr -auth\classes\java [javac] C:\wip\solr\trunk\solr\contrib\auth\src\java\org\apache\solr\auth\Ma nifoldCFSecurityFilter.java:163: cannot find symbol [javac] symbol : class BooleanFilter

RE: How to access a Lucene contrib package from a Solr contrib package?

2011-09-16 Thread karl.wright
You’re right – the package moved since this was originally developed. An awful lot of stuff has, in fact, moved. ;-) That made the difference in finding that class – now I’ve got to chase down a few others and I should be set. Karl From: ext Steven A Rowe [mailto:sar...@syr.edu] Sent: Friday,

RE: [jira] [Issue Comment Edited] (SOLR-1895) ManifoldCF SearchComponent plugin for enforcing ManifoldCF security at search time

2011-09-18 Thread karl.wright
I think your expectation for s-d13 may be incorrect. If you use AD as a model, you are effectively applying share security that has no allow sids but some deny sids. With AD you would not get this doc either. -Original Message - From: ext Koji Sekiguchi (JIRA) Sent: 17/09/2011, 11:49

RE: Solr plugin component resource cleanup?

2012-01-02 Thread karl.wright
This works fine for a SearchComponent, but if I try this for a QParserPlugin I get the following: [junit] org.apache.solr.common.SolrException: Invalid 'Aware' object: org.apache.solr.mcf.ManifoldCFQParserPlugin@18941f7 -- org.apache.solr.util.plugin.SolrCoreAware must be an instance of: [

RE: Solr plugin component resource cleanup?

2012-01-08 Thread karl.wright
I created a ticket for this: SOLR-3015. I hope there's a simple solution and I can just close it, but if not I will experiment and try to produce a patch. Karl From: Wright Karl (Nokia-LC/Boston) Sent: Monday, January 02, 2012 11:02 AM To: dev@lucene.apa

RE: Solr plugin component resource cleanup?

2012-01-11 Thread karl.wright
"SolrCoreAware" and "CloseHook" are related in that you need a SolrCore object in order to call SolrCore.addCloseHook(). Indeed, the javadoc for the CloseHook interface states that the expected way you are supposed to use this in a plugin is via something like this: public void inform(SolrCore

RE: Solr plugin component resource cleanup?

2012-01-11 Thread karl.wright
Thanks, Erik, this is not ideal but it will work for my purposes. But it seems a shame that the whole SolrCoreAware setup as it was designed turned out to be so problematic. Karl -Original Message- From: ext Erik Hatcher [mailto:erik.hatc...@gmail.com] Sent: Wednesday, January 11, 201

RE: Lucene 4.0 Beta

2012-01-11 Thread karl.wright
Having some interest in this issue, may I suggest setting a branch date? On the agreed-upon date, a branch is made. After that date, commits go to trunk and (maybe) are pulled up into the 4.0 branch. If the date is oh, say, 1 week away, people can plan accordingly to yield a relatively stable

Solr plugin component resource cleanup?

2011-12-20 Thread karl.wright
Is there a preferred time/manner for a Solr component (e.g. a SearchComponent) to clean up resources that have been allocated during the time of its existence, other than via a finalizer? There seems to be nothing for this in the NamedListInitializedPlugin interface, and yet if you allocate a r

Solr posting question

2012-07-12 Thread karl.wright
Hi all, I received a report of a problem with posting data to Solr. The post method is a multi-part form, so if you inspect it, it looks something like this: >> boundary--- Content-Disposition: form-data; name=metadata_attribute_name Content-Type: text; charset=utf-8 abc;def;ghi ---bou

RE: Solr posting question

2012-07-12 Thread karl.wright
I'll need to ask the reporter for more details since it appears the answer is not simple. It may even be an app server issue. Thanks Karl Sent from my Windows Phone -Original Message- From: ext Chris Hostetter Sent: 7/12/2012 8:29 PM To: dev@lucene.apache.org Subject: Re: Solr posting

RE: Solr posting question

2012-07-13 Thread karl.wright
Hoss, Here are the details: (1) The actual metadata posted is a string of the form "12345;#string". There is only be one value posted for the metadata field, but Solr complains that we're trying to apply multiple values to a single-valued field and does not index the document, unless the ";"

RE: Solr posting question

2012-07-14 Thread karl.wright
I'm sorry the info has been dribbling in slowly; it's all now summarized in CONNECTORS-491. Now that I've confirmed that this even occurs for them without the ";" (unlike what I was originally told) it is clear it is a config related issue. I have urged them to look to this list for further he

Spatial4j dependency in lucene 4.0.0, final

2012-11-15 Thread karl.wright
Hi guys, The 4.0.0 lucene-spatial maven dependency on spatial4j is UNVERSIONED. But the two spatial4j versions in play (0.2 and 0.3) are significantly different. We have code developed for lucene-spatial 4.0.0 beta which doesn't seem to compile with either spatial4j version. What was the int

RE: Spatial4j dependency in lucene 4.0.0, final

2012-11-15 Thread karl.wright
Hi David, We found the version in the grandparent pom, so that's ok. The build issue against 0.2 was due to other changes in Lucene 4.0.0-BETA vs. Lucene 4.0.0. I am willing to assist to some extent with spatial4j, if that is yours. It changed significantly from 0.2 to 0.3, and not just in th

RE: Lucene tests killed one other SSD - Policeman Jenkins

2013-08-19 Thread karl.wright
I am told that SSD's are spec'd for only 70 full writes before they get an error. The error block is set aside but eventually something critical gets hit. So you should probably should expect this to happen again. Karl -Original Message- From: ext Uwe Schindler [mailto:u...@thetaphi.d

RE: Lucene tests killed one other SSD - Policeman Jenkins

2013-08-19 Thread karl.wright
" Only 70 full writes seems a little bit low for an SSD." That's what I thought. I was astounded to learn that that is in fact correct (at least for some of the drives we are using here). Automatic recovery is how the SSD copes with this failure rate. But it is entirely possible that the caus

RE: Lucene tests killed one other SSD - Policeman Jenkins

2013-08-19 Thread karl.wright
Mike, I'm talking about a 1TB SSD option for some hardware we are buying. If you are really curious, I can ask the people who are doing the project for the model and specs. Karl -Original Message- From: ext Michael McCandless [mailto:luc...@mikemccandless.com] Sent: Monday, August 19,

RE: Lucene tests killed one other SSD - Policeman Jenkins

2013-08-19 Thread karl.wright
Right, that's what I said. And one write means writing the *whole* disk. So Mike and I may *both* be right. ;-) Karl -Original Message- From: ext Uwe Schindler [mailto:u...@thetaphi.de] Sent: Monday, August 19, 2013 1:07 PM To: dev@lucene.apache.org Subject: RE: Lucene tests killed on

Is there documentation anywhere describing interoperability of SolrJ?

2012-12-28 Thread karl.wright
Hi all, For the ManifoldCF project, we have an output connector for Solr, and we'd like to port it to use SolrJ instead of homegrown code. However, I cannot find any mention anywhere of whether anyone has tried to maintain compatibility between later versions of SolrJ (e.g. 4.0.0) and previous

RE: Is there documentation anywhere describing interoperability of SolrJ?

2012-12-28 Thread karl.wright
Thanks for the reply. The ticket in question is CONNECTORS-594, if you would like to just comment there. Karl Sent from my Windows Phone From: ext Ryan McKinley Sent: 12/28/2012 4:03 PM To: solr-...@lucene.apache.org Subject: Re: Is there documentation anywhere

Solrj/Tika question about content types

2013-01-17 Thread karl.wright
Hi all, I'm researching the ticket CONNECTORS-513. In this ticket we seem to have different behavior between Solr 3.x and Solr 4.x as far as Tika content extraction is concerned. The differences seem to be related to the content type that is posted to Solr, and can be demonstrated with cURL.

RE: Solrj/Tika question about content types

2013-01-17 Thread karl.wright
A quick update - it appears that cURL is providing a Content-Type header in the content part of its multipart post, and is using the file extension to come up with "text/plain". Changing the file name causes cURL to change this content-type to "application/octet-stream". But the questions stil

FW: Is there a really performant way to store a full 32-bit int in doc values?

2013-10-08 Thread karl.wright
Hi All (and especially Robert), Lucene NumericDocValues seems to operate slower than we would expect. In our application, we're using it for storing coordinate values, which we retrieve to compute a distance. While doing timings trying to determine the impact of including a sqrt in the calcul

RE: FW: Is there a really performant way to store a full 32-bit int in doc values?

2013-10-08 Thread karl.wright
. That is both the x & y into the same byte[] chunk. I've done this for a Solr integration in https://issues.apache.org/jira/browse/SOLR-5170 ~ David karl.wright-2 wrote > Hi All (and especially Robert), > > Lucene NumericDocValues seems to operate slower than we would ex

RE: Solrj/Tika question about content types

2013-02-13 Thread karl.wright
Wow, Hoss, this post was so long ago I barely remember writing it. ;-) The problem we were having is not that the content type is not set in SolrJ - it's that SolrCell does not discover it as it did when we used multipart posts and ran with Solr 3.6. We still aren't sure where the change is tha

RE: [VOTE] Lucene / Solr 4.6.0"

2013-11-14 Thread karl.wright
Congratulations, Uwe! Karl Sent from my Windows Phone From: ext Koji Sekiguchi Sent: 11/14/2013 6:35 PM To: dev@lucene.apache.org Subject: Re: [VOTE] Lucene / Solr 4.6.0" Congrats Uwe! :) koji (13/11/15 5:11), Uwe Schindler wrote: > The PMC Chair is going to mar

Solr best practices for search components vis-a-vis sharding

2013-11-20 Thread karl.wright
Hi folks, Maybe this is documented somewhere, and someone can point me at it. For the ManifoldCF Solr plugins, we supply a SearchComponent, which wraps the supplied query in order to perform authorization restrictions on returned documents. The component only fires if the SHARDS parameter is

RE: The Old Git Discussion

2014-01-03 Thread karl.wright
As an interested party, and deeply involved in another related Apache project, I have to say that there is a huge benefit for all Apache projects to use common source control. If we were starting over, or if svn was going to die forever, it might be a different story - but given that svn is ali

RE: The Old Git Discussion

2014-01-03 Thread karl.wright
It also doesn't deal with a major difference between git and svn - in svn, directories are first-class objects, and in git they aren't (they are created as needed). So when you try using gitsvn you almost always wind up with directories you want to remove but can't. Karl From: ext Michael Del

FW: Solr and LCF security at query time

2010-04-20 Thread karl.wright
FYI From: Wright Karl (Nokia-S/Cambridge) Sent: Tuesday, April 20, 2010 8:16 AM To: 'dominique.bej...@eolya.fr' Cc: 'solr-...@apache.org'; 'connectors-...@incubator.apache.org'; 'connectors-u...@incubator.apache.org' Subject: RE: Solr and LCF security at query tim

RE: FW: Solr and LCF security at query time

2010-04-20 Thread karl.wright
SOLR-1872 looks exactly like what I was envisioning, from the search query perspective, although instead of the acl xml file you specify LCF stipulates you would dynamically query the lcf-authority-service servlet for the access tokens themselves. That would get you support for AD, Documentum,

RE: FW: Solr and LCF security at query time

2010-04-20 Thread karl.wright
Hi Peter, I'm the principal committer for LCF, but I don't know as much about Solr as I ought to, so it sounds like a potentially productive collaboration. LCF does exactly what you are looking for - the only issue at all is that you need to fetch a URL from a webapp to get what you are looking

RE: FW: Solr and LCF security at query time

2010-04-21 Thread karl.wright
Hi Peter, I just committed the promised changes to the LCF Solr output connector. ACL metadata will now be posted to the Solr Http interface along with the document as the two following fields: __ACCESS_TOKEN__document __DENY_TOKEN__document There will, of course, potentially be multiple value

RE: FW: Solr and LCF security at query time

2010-04-22 Thread karl.wright
Looking around for no-Apache java-only solutions to the AD authentication problem, it seems to me that what we mainly have available is JAAS plus the following JAAS login module: com.sun.security.auth.module.Krb5LoginModule ... which should permit AD authentication to take place, if properly

RE: FW: Solr and LCF security at query time

2010-04-22 Thread karl.wright
Hi Peter, >> For general Solr access control, there's two layers of security that need to be addressed: 1. Authentication - make sure the incoming query is from a valid user, and the passed-in credentials (hash, certificate etc.) are correct 2. Query filtering - potentially reduce the nu

RE: FW: Solr and LCF security at query time

2010-04-27 Thread karl.wright
Hi Peter, I finally had a moment to review the SOLR 1872 and SOLR 1834 contributions in detail, and have a couple of SOLR-related questions. Both contributions rely on a SearchComponent to work their magic. However, it also appears that each modifies the user query in a different way. 1834 us

RE: FW: Solr and LCF security at query time

2010-04-27 Thread karl.wright
Ok, not hearing back from Peter, I've done some Solr research and written some code that might work. The approach I've taken is most similar to SOLR 1834, other than the LCF-centric logic. Hopefully there will be a chance to try this out in a full end-to-end way on the weekend, after which I

Solr query question

2010-04-28 Thread karl.wright
Hi Solr-knowledgeable folks, The LCF Solr SearchComponent plugin I'm developing doesn't quite work. The query I'm trying to do is: -(allow_token_document:*) and -(deny_token_document:*) and The result I'm seeing is that everything in the user's search matches, unlike what I see in the admin

RE: Solr query question

2010-04-28 Thread karl.wright
Turns out that, for the standard requestHandler, running this SearchComponent first causes its rewritten query to be lost. Running last fixed the problem. (I'd *love* to know why that would be necessary.) But I'd still like comment as to whether the WildcardFilter construct is expected to be

RE: Solr query question

2010-04-28 Thread karl.wright
Adding to the getFilters() list seems reasonable - although, to be fair, my code does seem to work as intended when the component is added "last". I'll do some experimentation and see what model things work most consistently with. TermRangeQuery doesn't seem to map readily to the functionality

RE: Solr query question

2010-04-28 Thread karl.wright
That's certainly an option, and I had thought of it already, but the downside is that you won't be able to search for documents that *aren't* indexed via LCF under that model. Which is why I wanted to try to make the other approach fly. FWIW, I was also told by a colleague that, because this is

RE: Solr query question

2010-04-28 Thread karl.wright
I tried the getFilters() approach. It turned out I also needed to create a list and do setFilters() if getFilters() returns null, but that was easily remedied. When this is done, it once again works fine if the component is added "last". But if it is added "first", we now get a stack trace fr

RE: Solr query question

2010-04-28 Thread karl.wright
Turns out that FilteredQuery is what is causing the issue in this case. I removed FilteredQuery, and instead constructed the search using Query objects instead of Filter objects, and everything is happy now. Karl From: Wright Karl (Nokia-S/Cambridge) Se

RE: FW: Solr and LCF security at query time

2010-04-28 Thread karl.wright
Hi Peter, I'm more than happy to hear your customer's requirements, so no problem there. It does seem to me that they are a bit different than what I've seen. I think there is plenty of room for different flavors of solution, so please by all means go ahead and propose your take on it! Karl

RE: FW: Solr and LCF security at query time

2010-04-29 Thread karl.wright
Putting access control lookup at search-result time has the following benefits: - It sees changes right away, when the underlying repository changes Here are the drawbacks, as far as I can see: - There's a significant extra load on the repository, because every search result has to be checked a

RE: FW: Solr and LCF security at query time

2010-04-29 Thread karl.wright
If we aren't talking about a repository of some kind, then we aren't talking about using LCF. If your design point is about applying security to NFS via an acl-xml file, your uploaded contribution will do that just fine (although I think you might want to use Filters in some places you are curr

RE: FW: Solr and LCF security at query time

2010-04-29 Thread karl.wright
Hi Peter, You should be able to use LCF authorities for your purposes. I'm less clear about what you mean by the "interface into decoupled acl storage". Existing repository connectors are not aware of any decoupled storage, and if you were to adopt the LCF model in its entirety, you've defeate

RE: Security Questions on Solr & Tomcat 6

2010-05-04 Thread karl.wright
How low-tech do you want to go? For example, you can run solr under an entirely different instance of tomcat, listening on a different port. You can configure (via server.xml) the instance to only accept connections from the local machine. Your application, which is happily running on a diffe

RE: Security Questions on Solr & Tomcat 6

2010-05-04 Thread karl.wright
>> Can you explain this localhost restriction thing? If I restrict it to localhost only would users on the internet still be able to access the solr instance? Would the application have to make the request and pass back the results to the external user? << Hi Matt, This connection bind

RE: Security Questions on Solr & Tomcat 6

2010-05-04 Thread karl.wright
That's not what I am talking about at all. Look inside your tomcat instance's server.xml file. There's a tag in there somewhere. You just change that to: Note the "address" attribute. That's the one that causes local binding. Karl -Original Message- From: ext Matthew Mau

RE: solr and analyzers module

2010-05-19 Thread karl.wright
Nobody in their right mind can disagree with (1). I should also point out that writing a custom analyzer is a very typical activity (as is a custom scorer), so this should be made as straightforward as is possible. Karl -Original Message- From: ext Robert Muir [mailto:rcm...@gmail.com

Solr updateRequestHandler and performance vs. atomicity

2010-05-24 Thread karl.wright
Hi all, It seems to me that the "commit" logic in the Solr updateRequestHandler (or wherever the logic is actually located) conflates two different semantics. One semantic is what you need to do to make the index process perform well. The other semantic is guaranteed atomicity of document rec

RE: Solr updateRequestHandler and performance vs. atomicity

2010-05-24 Thread karl.wright
Hi Mark, Unfortunately, indexing performance *is* of concern, otherwise I'd already be committing on every post. If your guess is correct, you are basically saying that adding a document to an index in Solr/Lucene is just as fast as writing that file directly to the disk. Because, obviously,

RE: Solr updateRequestHandler and performance vs. atomicity

2010-05-24 Thread karl.wright
The reason for this is simple. LCF keeps track of which documents it has handed off to Solr, and has a fairly involved mechanism for making sure that every document LCF *thinks* got there, actually does. It even uses a mechanism akin to a 2-phase commit to make sure that its internal records a

RE: Solr updateRequestHandler and performance vs. atomicity

2010-05-25 Thread karl.wright
Hi Simon, I think you are on the right track. I believe it is not even possible to write a middleware-style layer that stores documents and performs periodic commits on its own, because the update request handler never ACKs individual documents on a commit, but merely everything it has seen si

RE: Solr updateRequestHandler and performance vs. atomicity

2010-05-25 Thread karl.wright
I created SOLR-1924. Let me know if it's clear enough, or if you'd like me to modify the ticket in any way. Thanks, Karl From: ext Mark Miller [markrmil...@gmail.com] Sent: Tuesday, May 25, 2010 5:20 AM To: dev@lucene.apache.org Subject: Re: Solr updateReq

RE: lucene-dev.jar vs lucene-SNAPSHOT.jar

2010-06-07 Thread karl.wright
I don't understand the -dev requirement either, but for maven the jar suffix names do matter, and that's where -SNAPSHOT comes in (it represents a nightly build in Maven naming parlance). I suspect that since some people will want Lucene without Solr, it is probably going to be necessary to cre

Solr spewage and dropped documents, while indexing

2010-06-07 Thread karl.wright
Hi folks, This morning I was experimenting with using multiple threads while indexing some 20,000,000 records worth of content. In fact, my test spun up some 50 threads, and happily chugged away for a couple of hours before I saw the following output from my test code: >> Http protocol er

Repeat to the right list: Solr spewage and possible re-entrancy problem?

2010-06-07 Thread karl.wright
Hi folks, This morning I was experimenting with using multiple threads while indexing some 20,000,000 records worth of content. In fact, my test spun up some 50 threads, and happily chugged away for a couple of hours before I saw the following output from my test code: >> Http protocol er

  1   2   >