Re: SolrJ Tutorial
I got the solution. Attach one complete sample code I made as follows. Thanks, LB package com.greatfree.Solr; import org.apache.solr.client.solrj.SolrServer; import org.apache.solr.client.solrj.SolrServerException; import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer; import org.apache.solr.client.solrj.response.QueryResponse; import org.apache.solr.common.params.ModifiableSolrParams; import org.apache.solr.client.solrj.SolrQuery; import org.apache.solr.common.SolrDocumentList; import org.apache.solr.client.solrj.beans.Field; import java.net.MalformedURLException; public class SolrJExample { public static void main(String[] args) throws MalformedURLException, SolrServerException { SolrServer solr = new CommonsHttpSolrServer( http://192.168.210.195:8080/solr/CategorizedHub;); SolrQuery query = new SolrQuery(); query.setQuery(*:*); QueryResponse rsp = solr.query(query); SolrDocumentList docs = rsp.getResults(); System.out.println(docs.getNumFound()); try { SolrServer solrScore = new CommonsHttpSolrServer( http://192.168.210.195:8080/solr/score;); Score score = new Score(); score.id = 4; score.type = modern; score.name = iphone; score.score = 97; solrScore.addBean(score); solrScore.commit(); } catch (Exception e) { System.out.println(e.toString()); } } } On Sat, Jan 22, 2011 at 3:58 PM, Lance Norskog goks...@gmail.com wrote: The unit tests are simple and show the steps. Lance On Fri, Jan 21, 2011 at 10:41 PM, Bing Li lbl...@gmail.com wrote: Hi, all, In the past, I always used SolrNet to interact with Solr. It works great. Now, I need to use SolrJ. I think it should be easier to do that than SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a tutorial that is easy to follow. No tutorials explain the SolrJ programming step by step. No complete samples are found. Could anybody offer me some online resources to learn SolrJ? I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources to them? Thanks so much! LB -- Lance Norskog goks...@gmail.com
Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
Where do you get your Lucene/Solr downloads from? [] ASF Mirrors (linked in our release announcements or via the Lucene website) [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [X] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) -- Sami Siren
Re: Is solr 4.0 ready for prime time? (or other ways to use geo distance in search)
On Fri, Jan 21, 2011 at 11:53 PM, Lance Norskog goks...@gmail.com wrote: The Solr 4 branch is nowhere near ready for prime time. For example, within the past week code was added that forces you to completely reindex all of the documents you had. Solr 4 is really the trunk. The low-level stuff is being massively changed to allow very big performance improvements and new features. Changing the index format is not a sign of instability, we did this to improve performance. So, changing the index format is in no way a bad sign, nor indicative of whether or not the trunk is good for production use. You aren't forced to re-index all your documents if you are riding trunk -- its your decision to make that tradeoff when you type 'svn update'. If you want stability you can take a snapshot (e.g. nightly build), and just stick with it.
Re: Is solr 4.0 ready for prime time? (or other ways to use geo distance in search)
I tried to build yeaterdays svn trunk of 4.0 and got massive failures... The Hudson zipped up version seems to work without any issues. Has anyone else seem this build issue on the Mac? I guess this also has to do with Grants recent poll... Adam On Jan 22, 2011, at 6:34 AM, Robert Muir rcm...@gmail.com wrote: On Fri, Jan 21, 2011 at 11:53 PM, Lance Norskog goks...@gmail.com wrote: The Solr 4 branch is nowhere near ready for prime time. For example, within the past week code was added that forces you to completely reindex all of the documents you had. Solr 4 is really the trunk. The low-level stuff is being massively changed to allow very big performance improvements and new features. Changing the index format is not a sign of instability, we did this to improve performance. So, changing the index format is in no way a bad sign, nor indicative of whether or not the trunk is good for production use. You aren't forced to re-index all your documents if you are riding trunk -- its your decision to make that tradeoff when you type 'svn update'. If you want stability you can take a snapshot (e.g. nightly build), and just stick with it.
Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?
[] ASF Mirrors (linked in our release announcements or via the Lucene website) [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [] I/we build them from source via an SVN/Git checkout. [] Other (someone in your company mirrors them internally or via a downstream project) Please put an X in the box that applies to you. Multiple selections are OK (for instance, if one project uses a mirror and another uses Maven) Please do not turn this thread into a discussion on Maven and it's (de)merits, I simply want to know, informally, where people get their JARs from. In other words, no discussion is necessary (we already have that going on d...@lucene.apache.org which you are welcome to join.) Thanks, Grant
SolrCloud Questions for MultiCore Setup
Hello list, i want to experiment with the new SolrCloud feature. So far, I got absolutely no experience in distributed search with Solr. However, there are some things that remain unclear to me: 1 ) What is the usecase of a collection? As far as I understood: A collection is the same as a core but in a distributed sense. It contains a set of cores on one or multiple machines. It makes sense that all the cores in a collection got the same schema and solrconfig - right? Can someone tell me if I understood the concept of a collection correctly? 2 ) The wiki says this will cause an update -Durl=http://localhost:8983/solr/collection1/update However, as far as I know this cause an update to a CORE named collection1 at localhost:8983, not to the full collection. Am I correct here? So *I* have to care about consistency between the different replicas inside my cloud? 3 ) If I got replicas of the same shard inside a collection, how does SolrCloud determine that two documents in a result set are equal? Is it neccessary to define a unique key? Is it random which of the two documents is picked into the final resultset? --- I think these are my most basic questions. However, there is one more tricky thing: If I understood the collection-idea correctly: What happens if I create two cores and each core belongs to a different collection and THEN I do a SWAP. Say: core1-collection1, core2-collection2 SWAP core1,core2 Does core2 now maps to collection1? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2309443.html Sent from the Solr - User mailing list archive at Nabble.com.
api key filtering
Just wanted to see if others are handling this in some special way, but I think this is pretty simple. We have a database of api keys that map to allowed db records. I'm planning on indexing the db records into solr, along with their api keys in an indexed, non-stored, multi-valued field. Then, to query for docs that belong to a particular api key, they'll be queried using a filter query on api_key. The only concern of mine is that, what if we end up with 100k api_keys? Would it be a problem to have 100k non-stored keys in each document? We have about 500k documents total. Matt
Re: api key filtering
The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema Reardon's Ruminations: Spring Security ACL Schema for Oracle trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Acl.php - kohana-mptt - Project Hosting on Google Code Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform The page cannot be found Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 11:48:22 AM Subject: api key filtering Just wanted to see if others are handling this in some special way, but I think this is pretty simple. We have a database of api keys that map to allowed db records. I'm planning on indexing the db records into solr, along with their api keys in an indexed, non-stored, multi-valued field. Then, to query for docs that belong to a particular api key, they'll be queried using a filter query on api_key. The only concern of mine is that, what if we end up with 100k api_keys? Would it be a problem to have 100k non-stored keys in each document? We have about 500k documents total. Matt
Re: api key filtering
Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.netwrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema Reardon's Ruminations: Spring Security ACL Schema for Oracle trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Acl.php - kohana-mptt - Project Hosting on Google Code Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform The page cannot be found Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 11:48:22 AM Subject: api key filtering Just wanted to see if others are handling this in some special way, but I think this is pretty simple. We have a database of api keys that map to allowed db records. I'm planning on indexing the db records into solr, along with their api keys in an indexed, non-stored, multi-valued field. Then, to query for docs that belong to a particular api key, they'll be queried using a filter query on api_key. The only concern of mine is that, what if we end up with 100k api_keys? Would it be a problem to have 100k non-stored keys in each document? We have about 500k documents total. Matt
Re: Solr with many indexes
See below. On Wed, Jan 19, 2011 at 7:26 PM, Joscha Feth jos...@feth.com wrote: Hello Erick, Thanks for your answer! But I question why you *require* many different indexes. [...] including isolating one users' data from all others, [...] Yes, thats exactly what I am after - I need to make sure that indexes don't mix, as every user shall only be able to query his own data (index). well, this can also be handled by simply appending the equivalent of +user:theuser to each query. This solution does have some interesting side effects though. In particular if you autosuggest based on combined documents, users will see terms NOT in documents they own. And even using lots of cores can be made to work if you don't pre-warm newly-opened cores, assuming that the response time when using cold searchers is adequate. Could you explain that further or point me to some documentation? Are you talking about: http://wiki.apache.org/solr/CoreAdmin#UNLOAD? if yes, LOAD does not seem to be implemented, yet. Or has this something to do with http://wiki.apache.org/solr/SolrCaching#autowarmCount only? About what time per X documents are we talking here for delay if auto warming is disabled? Is there more documentation about this setting? It's the autoWarm parameter. When you open a core the first few queries that run on it will pay some penalty for filling caches etc. If your cores are small enough, then this penalty may not be noticeable to your users, in which case you can just not bother autowarming (see firstSearcher , newSearcher). You might also be able to get away with having very small caches, it mostly depends on your usage patterns. If your pattern as that a user signs on, makes one search and signs off, there may not be much good in having large caches. On the other and, if users sign on and search for hours continually, their experience may be enhanced by having significant caches. It all depends. Hopt that helps Erick Kind regards, Joscha
Re: api key filtering
The links didn't work, so here the are again, NOT from a sent folder: PHP Access Control - PHP5 CMS Framework Development | PHP Zone A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb UserRbac - symfony - Trac Acl.php - kohana-mptt - Project Hosting on Google Code CANDIDATE-PHP Generic Access Control Lists http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql makeAclTables.sql php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow PHP Generic Access Control Lists Reardon's Ruminations: Spring Security ACL Schema for Oracle Re: [symfony-users] Implementing an existing ACL API in symfony SchemaWeb - Classes And Properties - ACL Schema trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Using Zend_Acl with a database backend - Zend Framework Wiki W3C ACL System Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 12:50:24 PM Subject: Re: api key filtering Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.netwrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ / Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema Reardon's Ruminations: Spring Security ACL Schema for Oracle trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Acl.php - kohana-mptt - Project Hosting on Google Code Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform The page cannot be found Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 11:48:22 AM Subject: api key filtering Just wanted to see if others are handling this in some special way, but I think this is pretty simple. We have a database of api keys that map to allowed db records. I'm planning on indexing the db records into
Re: solrconfig.xml settings question
Yep, that's about it. By far the main constraint is memory and the caches are what eats it up. So by minimizing the caches on the master (since they are filled by searching) you speed that part up. By maximizing the cache settings on the servers, you make them go as fast as possible. RamBufferSize is irrelevant on the searcher. It governs how much data is stored in RAM when *indexing* before flushing to disk. This usually gets to diminishing returns at 128M BTW. Oh, there is one other thing on the searchers that can really hurt to frequent polling of the master if the master is furiously indexing polling too often can lead to thrashing if the time it takes to autowarm is longer than the polling interval, which you can only figure out by measuring... Best Erick On Thu, Jan 20, 2011 at 8:34 AM, kenf_nc ken.fos...@realestate.com wrote: Is that it? Of all the strange, esoteric, little understood configuration settings available in solrconfig.xml, the only thing that affects Index Speed vs Query Speed is turning on/off the Query Cache and RamBufferSize? And for the latter, why wouldn't RamBufferSize be the same for both...that is, as high as you can make it? -- View this message in context: http://lucene.472066.n3.nabble.com/solrconfig-xml-settings-question-tp2271594p2294668.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: api key filtering
Dang! There were hot, clickable links in the web mail I put them in. I guess you guys can search for those strings on google and find them. Sorry. - Original Message From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 1:09:26 PM Subject: Re: api key filtering The links didn't work, so here the are again, NOT from a sent folder: PHP Access Control - PHP5 CMS Framework Development | PHP Zone A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb UserRbac - symfony - Trac Acl.php - kohana-mptt - Project Hosting on Google Code CANDIDATE-PHP Generic Access Control Lists http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql makeAclTables.sql php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow PHP Generic Access Control Lists Reardon's Ruminations: Spring Security ACL Schema for Oracle Re: [symfony-users] Implementing an existing ACL API in symfony SchemaWeb - Classes And Properties - ACL Schema trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Using Zend_Acl with a database backend - Zend Framework Wiki W3C ACL System Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 12:50:24 PM Subject: Re: api key filtering Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.netwrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ / Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema Reardon's Ruminations: Spring Security ACL Schema for Oracle trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Acl.php - kohana-mptt - Project Hosting on Google Code Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform The page cannot be found Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell
Re: Indexing same data in multiple fields with different filters
I'm assuming that this is just one example of many different kinds of transformations you could do. It *seems* like a variant of a synonym analyzer, so you could write a custom analyzer (it's not actuall hard) to create a bunch of synonyms for your special terms at index time. Or you could use the synonyms at query time (query time is more flexible) Best Erick On Thu, Jan 20, 2011 at 5:38 AM, shm s...@dbc.dk wrote: Hi, I have a little problem regarding indexing, that i don't know how to solve, i need to index the same data in different ways into the same field. The problem is a normalization problem, and here is an example: I have a special character \uA732, which i need to normalize in two different ways for phrase searching. So if i encounter this character in, for example, title field I would like it to result in these two phrase fields: raw data = \uA732lborg phrase.title= ålborg phrase.title= aalborg Because both ways are valid representations of tyhe phrase. I can copy the field from the raw data, but then i cannot normalize them differently, so i am at a loss. Does anyone have a solution or a good idea? Regards shm
Re: Multicore Relaod Theoretical Question
This seems far too complex to me. Why not just optimize on the master and let replication do all the rest for you? Best Erick On Fri, Jan 21, 2011 at 1:07 PM, Em mailformailingli...@yahoo.de wrote: Hi, are there no experiences or thoughts? How would you solve this at Lucene-Level? Regards Em wrote: Hello list, I got a theoretical question about a Multicore-Situation: I got two cores: active, inactive The active core serves all the queries. The inactive core is the tricky thing: I create an optimized index outside the environment and want to insert that optimized index 1 to 1 into the inactive core, which means replacing everything inside the index-directory. After this is done, I would like to reload the inactive core, so that it is ready for a core-swap and ready for serving queries on top of the new inserted optimized index. Is it possible to handle such a situation? Thank you. -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2303585.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: api key filtering
1024 is the default number, it can be increased. See MaxBooleanClauses in solrconfig.xml This shouldn't be a problem with 2K clauses, but expanding it to tens of thousands is probably a mistake (but test to be sure). Best Erick On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com wrote: Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net wrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema Reardon's Ruminations: Spring Security ACL Schema for Oracle trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Acl.php - kohana-mptt - Project Hosting on Google Code Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform The page cannot be found Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 11:48:22 AM Subject: api key filtering Just wanted to see if others are handling this in some special way, but I think this is pretty simple. We have a database of api keys that map to allowed db records. I'm planning on indexing the db records into solr, along with their api keys in an indexed, non-stored, multi-valued field. Then, to query for docs that belong to a particular api key, they'll be queried using a filter query on api_key. The only concern of mine is that, what if we end up with 100k api_keys? Would it be a problem to have 100k non-stored keys in each document? We have about 500k documents total. Matt
Re: Multicore Relaod Theoretical Question
Hi Erick, thanks for your response. Yes, it's really not that easy. However, the target is to avoid any kind of master-slave-setup. The most recent idea i got is to create a new core with a data-dir pointing to an already existing directory with a fully optimized index. Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2310709.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: api key filtering
Got it, here are the links that I have on RBAC/ACL/Access Control. Some of these are specific to Solr. http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ http://www.xaprb.com/blog/2006/08/18/role-based-access-control-in-sql-part-2/ http://php.dzone.com/articles/php-access-control?page=0,1 http://www.tonymarston.net/php-mysql/role-based-access-control.html http://www.tonymarston.net/php-mysql/menuguide/appendixc.html http://php.dzone.com/articles/php-access-control?page=0,1 http://trac.symfony-project.org/wiki/UserRbac http://www.tonymarston.net/php-mysql/role-based-access-control.html http://www.tonymarston.net/php-mysql/menuguide/appendixc.html http://trac.symfony-project.org/wiki/UserRbac http://code.google.com/p/kohana-mptt/source/browse/trunk/acl/libraries/Acl.php?r=82 http://www.oracle.com/technetwork/articles/javaee/ajax-135201.html http://phpgacl.sourceforge.net/ http://www.java2s.com/Code/Java/GWT/ClassthatactsasaclienttoaJSONservice.htm http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql http://dev.juokaz.com/ http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql http://stackoverflow.com/questions/54230/cakephp-acl-database-setup-aro-aco-structure http://phpgacl.sourceforge.net/ http://blog.reardonsoftware.com/2010/07/spring-security-acl-schema-for-oracle.html http://www.mail-archive.com/symfony-users@googlegroups.com/msg29537.html http://www.schemaweb.info/schema/SchemaInfo.aspx?id=167 http://www.assembla.com/code/backendpro/subversion/nodes/trunk/modules/auth/libraries/Khacl.php?rev=169 http://framework.zend.com/wiki/display/ZFUSER/Using+Zend_Acl+with+a+database+backend http://www.w3.org/2001/04/20-ACLs#Structure http://lucene.472066.n3.nabble.com/Modelling-Access-Control-td1756817.html#a1759372 http://www.tonymarston.net/php-mysql/role-based-access-control.html http://phpgacl.sourceforge.net/ http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/#comment-112 http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/ http://www.xaprb.com/blog/2006/08/18/role-based-access-control-in-sql-part-2/ http://php.dzone.com/articles/php-access-control?page=0,1 https://issues.apache.org/jira/browse/SOLR-1834 http://www.tonymarston.net/php-mysql/role-based-access-control.html http://php.dzone.com/articles/php-access-control?page=0,1 http://www.yiiframework.com/doc/guide/1.1/en/topics.auth#role-based-access-control http://lucene.472066.n3.nabble.com/Modelling-Access-Control-td1756817.html#a1759372 http://phpgacl.sourceforge.net/ http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/#comment-112 http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/ http://www.yiiframework.com/doc/guide/topics.auth#role-based-access-control - Original Message From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 1:22:04 PM Subject: Re: api key filtering Dang! There were hot, clickable links in the web mail I put them in. I guess you guys can search for those strings on google and find them. Sorry. - Original Message From: Dennis Gearon gear...@sbcglobal.net To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011 1:09:26 PM Subject: Re: api key filtering The links didn't work, so here the are again, NOT from a sent folder: PHP Access Control - PHP5 CMS Framework Development | PHP Zone A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb UserRbac - symfony - Trac Acl.php - kohana-mptt - Project Hosting on Google Code CANDIDATE-PHP Generic Access Control Lists http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql makeAclTables.sql php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow PHP Generic Access Control Lists Reardon's Ruminations: Spring Security ACL Schema for Oracle Re: [symfony-users] Implementing an existing ACL API in symfony SchemaWeb - Classes And Properties - ACL Schema trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Using Zend_Acl with a database backend - Zend Framework Wiki W3C ACL System Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Matt Mitchell goodie...@gmail.com To: solr-user@lucene.apache.org Sent: Sat, January 22, 2011
Re: Multicore Relaod Theoretical Question
Em, yes, you can replace the index (get the new one into a separate folder like index.new and then rename it to the index folder) outside the Solr, then just do the http call to reload the core. Note that the old index files may still be in use (continue to serve the queries while reloading), even if the old index folder is deleted - that is on Linux filesystems, not sure about NTFS. That means the space on disk will be freed only when the old files are not referenced by Solr searcher any longer. -Alexander On Sat, Jan 22, 2011 at 1:51 PM, Em mailformailingli...@yahoo.de wrote: Hi Erick, thanks for your response. Yes, it's really not that easy. However, the target is to avoid any kind of master-slave-setup. The most recent idea i got is to create a new core with a data-dir pointing to an already existing directory with a fully optimized index. Regards, Em -- View this message in context: http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2310709.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: old index files not deleted on slave
The file system checked out, I also tried creating a slave on a different machine and could reproduce the issue. I logged SOLR-2329. On Sat, Dec 18, 2010 at 8:01 PM, Lance Norskog goks...@gmail.com wrote: This could be a quirk of the native locking feature. What's the file system? Can you fsck it? If this error keeps happening, please file this. It should not happen. Add the text above and also your solrconfigs if you can. One thing you could try is to change from the native locking policy to the simple locking policy - but only on the child. On Sat, Dec 18, 2010 at 4:44 PM, feedly team feedly...@gmail.com wrote: I have set up index replication (triggered on optimize). The problem I am having is the old index files are not being deleted on the slave. After each replication, I can see the old files still hanging around as well as the files that have just been pulled. This causes the data directory size to increase by the index size every replication until the disk fills up. Checking the logs, I see the following error: SEVERE: SnapPull failed org.apache.solr.common.SolrException: Index fetch failed : at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1065) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:954) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:192) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376) at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319) ... 11 more lsof reveals that the file is still opened from the java process. I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup is pretty vanilla. The OS is linux, the indexes are on local directories, write permissions look ok, nothing unusual in the config (default deletion policy, etc.). Contents of the index data dir: master: -rw-rw-r-- 1 feeddo feeddo 191 Dec 14 01:06 _1lg.fnm -rw-rw-r-- 1 feeddo feeddo 26M Dec 14 01:07 _1lg.fdx -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt -rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis -rw-rw-r-- 1 feeddo feeddo 15M Dec 14 01:12 _1lg.tii -rw-rw-r-- 1 feeddo feeddo 144M Dec 14 01:12 _1lg.prx -rw-rw-r-- 1 feeddo feeddo 277M Dec 14 01:12 _1lg.frq -rw-rw-r-- 1 feeddo feeddo 311 Dec 14 01:12 segments_1ji -rw-rw-r-- 1 feeddo feeddo 23M Dec 14 01:12 _1lg.nrm -rw-rw-r-- 1 feeddo feeddo 191 Dec 18 01:11 _24e.fnm -rw-rw-r-- 1 feeddo feeddo 26M Dec 18 01:12 _24e.fdx -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 01:12 _24e.fdt -rw-rw-r-- 1 feeddo feeddo 483M Dec 18 01:23 _24e.tis -rw-rw-r-- 1 feeddo feeddo 15M Dec 18 01:23 _24e.tii -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 01:23 _24e.prx -rw-rw-r-- 1 feeddo feeddo 283M Dec 18 01:23 _24e.frq -rw-rw-r-- 1 feeddo feeddo 311 Dec 18 01:24 segments_1xz -rw-rw-r-- 1 feeddo feeddo 23M Dec 18 01:24 _24e.nrm -rw-rw-r-- 1 feeddo feeddo 191 Dec 18 13:15 _25z.fnm -rw-rw-r-- 1 feeddo feeddo 26M Dec 18 13:16 _25z.fdx -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 13:16 _25z.fdt -rw-rw-r-- 1 feeddo feeddo 484M Dec 18 13:35 _25z.tis -rw-rw-r-- 1 feeddo feeddo 15M Dec 18 13:35 _25z.tii -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 13:35 _25z.prx -rw-rw-r-- 1 feeddo feeddo 284M Dec 18 13:35 _25z.frq -rw-rw-r-- 1 feeddo feeddo 20 Dec 18 13:35
Re: SolrCloud Questions for MultiCore Setup
A collection is your data, like newspaper articles or movie titles. It is a user-level concept, not really a Solr design concept. A core is a Solr/Lucene index. It is addressable as solr/collection-name on one machine. You can use a core to store a collection, or you can break it up among multiple cores (usually for performance reasons). When you use a core like this, it is called a shard. All of the different shards of a collection form the collection. Solr has a feature called Distributed Search that presents the separate shards as if it were one Solr collection. You should set up Distributed Search first. It does not use SolrCloud, but shows you how these ideas work. After that, Solr Cloud will make more sense. Lance On Sat, Jan 22, 2011 at 9:35 AM, Em mailformailingli...@yahoo.de wrote: Hello list, i want to experiment with the new SolrCloud feature. So far, I got absolutely no experience in distributed search with Solr. However, there are some things that remain unclear to me: 1 ) What is the usecase of a collection? As far as I understood: A collection is the same as a core but in a distributed sense. It contains a set of cores on one or multiple machines. It makes sense that all the cores in a collection got the same schema and solrconfig - right? Can someone tell me if I understood the concept of a collection correctly? 2 ) The wiki says this will cause an update -Durl=http://localhost:8983/solr/collection1/update However, as far as I know this cause an update to a CORE named collection1 at localhost:8983, not to the full collection. Am I correct here? So *I* have to care about consistency between the different replicas inside my cloud? 3 ) If I got replicas of the same shard inside a collection, how does SolrCloud determine that two documents in a result set are equal? Is it neccessary to define a unique key? Is it random which of the two documents is picked into the final resultset? --- I think these are my most basic questions. However, there is one more tricky thing: If I understood the collection-idea correctly: What happens if I create two cores and each core belongs to a different collection and THEN I do a SWAP. Say: core1-collection1, core2-collection2 SWAP core1,core2 Does core2 now maps to collection1? Thank you! -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2309443.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
Re: old index files not deleted on slave
I see the file -rw-rw-r-- 1 feeddo feeddo0 Dec 15 01:19 lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock was created on Dec. 15. At the end of the replication, as far as I remember, the SnapPuller tries to open the writer to ensure the old files are deleted, and in your case it cannot obtain a lock on the index folder on Dec 16, 17,18. Can you reproduce the problem if you delete the lock file, restart the slave and try replication again? Do you have any other Writer(s) open for this folder outside of this core? -Alexander On Sat, Jan 22, 2011 at 3:52 PM, feedly team feedly...@gmail.com wrote: The file system checked out, I also tried creating a slave on a different machine and could reproduce the issue. I logged SOLR-2329. On Sat, Dec 18, 2010 at 8:01 PM, Lance Norskog goks...@gmail.com wrote: This could be a quirk of the native locking feature. What's the file system? Can you fsck it? If this error keeps happening, please file this. It should not happen. Add the text above and also your solrconfigs if you can. One thing you could try is to change from the native locking policy to the simple locking policy - but only on the child. On Sat, Dec 18, 2010 at 4:44 PM, feedly team feedly...@gmail.com wrote: I have set up index replication (triggered on optimize). The problem I am having is the old index files are not being deleted on the slave. After each replication, I can see the old files still hanging around as well as the files that have just been pulled. This causes the data directory size to increase by the index size every replication until the disk fills up. Checking the logs, I see the following error: SEVERE: SnapPull failed org.apache.solr.common.SolrException: Index fetch failed : at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329) at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265) at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock at org.apache.lucene.store.Lock.obtain(Lock.java:84) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1065) at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:954) at org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:192) at org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99) at org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173) at org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376) at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319) ... 11 more lsof reveals that the file is still opened from the java process. I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup is pretty vanilla. The OS is linux, the indexes are on local directories, write permissions look ok, nothing unusual in the config (default deletion policy, etc.). Contents of the index data dir: master: -rw-rw-r-- 1 feeddo feeddo 191 Dec 14 01:06 _1lg.fnm -rw-rw-r-- 1 feeddo feeddo 26M Dec 14 01:07 _1lg.fdx -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt -rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis -rw-rw-r-- 1 feeddo feeddo 15M Dec 14 01:12 _1lg.tii -rw-rw-r-- 1 feeddo feeddo 144M Dec 14 01:12 _1lg.prx -rw-rw-r-- 1 feeddo feeddo 277M Dec 14 01:12 _1lg.frq -rw-rw-r-- 1 feeddo feeddo 311 Dec 14 01:12 segments_1ji -rw-rw-r-- 1 feeddo feeddo 23M Dec 14 01:12 _1lg.nrm -rw-rw-r-- 1 feeddo feeddo 191 Dec 18 01:11 _24e.fnm -rw-rw-r-- 1 feeddo feeddo 26M Dec 18 01:12 _24e.fdx -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 01:12 _24e.fdt -rw-rw-r-- 1 feeddo feeddo 483M Dec 18 01:23 _24e.tis -rw-rw-r-- 1 feeddo feeddo 15M Dec 18 01:23 _24e.tii -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 01:23
Re: DIH with full-import and cleaning still keeps old index
Your not doing optimize, I think optimize would delete your old index. Try it out with additional parameter optimize=true - Espen On Thu, Jan 20, 2011 at 11:30 AM, Bernd Fehling bernd.fehl...@uni-bielefeld.de wrote: Hi list, after sending full-import=trueclean=truecommit=true Solr 4.x (apache-solr-4.0-2010-11-24_09-25-17) responds with: - DataImporter doFullImport - DirectUpdateHandler2 deleteAll ... - DocBuilder finish - SolrDeletionPolicy.onCommit: commits:num=2 - SolrDeletionPolicy updateCommits - SolrIndexSearcher init - INFO: end_commit_flush - SolrIndexSearcher warm ... - QuerySenderListener newSearcher - SolrCore registerSearcher - SolrIndexSearcher close ... This all looks good to me but why is the old index not deleted? Am I missing a parameter? Regards, Bernd
RE: api key filtering
If you COULD solve your problem by indexing 'public', or other tokens from a limited vocabulary of document roles, in a field -- then I'd definitely suggest you look into doing that, rather than doing odd things with Solr instead. If the only barrier is not currently having sufficient logic at the indexing stage to do that, then it is going to end up being a lot less of a headache in the long term to simply add a layer at the indexing stage to add that in, then trying to get Solr to do things outside of it's, well, 'comfort zone'. Of course, depending on your requirements, it might not be possible to do that, maybe you can't express the semantics in terms of a limited set of roles applied to documents. And then maybe your best option really is sending an up to 2k element list (not exactly the same list every time, presumably) of acceptable documents to Solr with every query, and maybe you can get that to work reasonably. Depending on how many different complete lists of documents you have, maybe there's a way to use Solr caches effectively in that situation, or maybe that's not even neccesary since lookup by unique id should be pretty quick anyway, not really sure. But if the semantics are possible, much better to work with Solr rather than against it, it's going to take a lot less tinkering to get Solr to perform well if you can just send an fq=role:public or something, instead of a list of document IDs. You won't need to worry about it, it'll just work, because you know you're having Solr do what it's built to do. Totally worth a bit of work to add a logic layer at the indexing stage. IMO. From: Erick Erickson [erickerick...@gmail.com] Sent: Saturday, January 22, 2011 4:50 PM To: solr-user@lucene.apache.org Subject: Re: api key filtering 1024 is the default number, it can be increased. See MaxBooleanClauses in solrconfig.xml This shouldn't be a problem with 2K clauses, but expanding it to tens of thousands is probably a mistake (but test to be sure). Best Erick On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com wrote: Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net wrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema Reardon's Ruminations: Spring Security ACL Schema for Oracle trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla Acl.php - kohana-mptt - Project Hosting on Google Code Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform The page cannot be found Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea
Re: api key filtering
Totally agree, do it at indexing time, in the index. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Jonathan Rochkind rochk...@jhu.edu To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Sat, January 22, 2011 5:28:50 PM Subject: RE: api key filtering If you COULD solve your problem by indexing 'public', or other tokens from a limited vocabulary of document roles, in a field -- then I'd definitely suggest you look into doing that, rather than doing odd things with Solr instead. If the only barrier is not currently having sufficient logic at the indexing stage to do that, then it is going to end up being a lot less of a headache in the long term to simply add a layer at the indexing stage to add that in, then trying to get Solr to do things outside of it's, well, 'comfort zone'. Of course, depending on your requirements, it might not be possible to do that, maybe you can't express the semantics in terms of a limited set of roles applied to documents. And then maybe your best option really is sending an up to 2k element list (not exactly the same list every time, presumably) of acceptable documents to Solr with every query, and maybe you can get that to work reasonably. Depending on how many different complete lists of documents you have, maybe there's a way to use Solr caches effectively in that situation, or maybe that's not even neccesary since lookup by unique id should be pretty quick anyway, not really sure. But if the semantics are possible, much better to work with Solr rather than against it, it's going to take a lot less tinkering to get Solr to perform well if you can just send an fq=role:public or something, instead of a list of document IDs. You won't need to worry about it, it'll just work, because you know you're having Solr do what it's built to do. Totally worth a bit of work to add a logic layer at the indexing stage. IMO. From: Erick Erickson [erickerick...@gmail.com] Sent: Saturday, January 22, 2011 4:50 PM To: solr-user@lucene.apache.org Subject: Re: api key filtering 1024 is the default number, it can be increased. See MaxBooleanClauses in solrconfig.xml This shouldn't be a problem with 2K clauses, but expanding it to tens of thousands is probably a mistake (but test to be sure). Best Erick On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com wrote: Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net wrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ / Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php
Re: api key filtering
I think that indexing the access information is going to work nicely, and I agree that sticking with the simplest/solr way is best. The constraint is super simple... you can view this set of documents or you can't... based on an api key: fq=api_key:xxx Thanks for the feedback on this guys! Matt 2011/1/22 Jonathan Rochkind rochk...@jhu.edu If you COULD solve your problem by indexing 'public', or other tokens from a limited vocabulary of document roles, in a field -- then I'd definitely suggest you look into doing that, rather than doing odd things with Solr instead. If the only barrier is not currently having sufficient logic at the indexing stage to do that, then it is going to end up being a lot less of a headache in the long term to simply add a layer at the indexing stage to add that in, then trying to get Solr to do things outside of it's, well, 'comfort zone'. Of course, depending on your requirements, it might not be possible to do that, maybe you can't express the semantics in terms of a limited set of roles applied to documents. And then maybe your best option really is sending an up to 2k element list (not exactly the same list every time, presumably) of acceptable documents to Solr with every query, and maybe you can get that to work reasonably. Depending on how many different complete lists of documents you have, maybe there's a way to use Solr caches effectively in that situation, or maybe that's not even neccesary since lookup by unique id should be pretty quick anyway, not really sure. But if the semantics are possible, much better to work with Solr rather than against it, it's going to take a lot less tinkering to get Solr to perform well if you can just send an fq=role:public or something, instead of a list of document IDs. You won't need to worry about it, it'll just work, because you know you're having Solr do what it's built to do. Totally worth a bit of work to add a logic layer at the indexing stage. IMO. From: Erick Erickson [erickerick...@gmail.com] Sent: Saturday, January 22, 2011 4:50 PM To: solr-user@lucene.apache.org Subject: Re: api key filtering 1024 is the default number, it can be increased. See MaxBooleanClauses in solrconfig.xml This shouldn't be a problem with 2K clauses, but expanding it to tens of thousands is probably a mistake (but test to be sure). Best Erick On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com wrote: Hey thanks I'll definitely have a read. The only problem with this though, is that our api is a thin layer of app-code, with solr only (no db), we index data from our sql db into solr, and push the index off for consumption. The only other idea I had was to send a list of the allowed document ids along with every solr query, but then I'm sure I'd run into a filter query limit. Each key could be associated with up to 2k documents, so that's 2k values in an fq which would probably be too many for lucene (I think its limit 1024). Matt On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net wrote: The only way that you would have that many api keys per record, is if one of them represented 'public', right? 'public' is a ROLE. Your answer is to use RBAC style techniques. Here are some links that I have on the subject. What I'm thinking of doing is: Sorry for formatting, Firefox is freaking out. I cut and pasted these from an email from my sent box. I hope the links came out. Part 1 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/ Part2 Role-based access control in SQL, part 2 at Xaprb ACL/RBAC Bookmarks ALL UserRbac - symfony - Trac A Role-Based Access Control (RBAC) system for PHP Appendix C: Task-Field Access Role-based access control in SQL, part 2 at Xaprb PHP Access Control - PHP5 CMS Framework Development | PHP Zone Linux file and directory permissions MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password per RECORD/Entity permissions? - symfony users | Google Groups Special Topics: Authentication and Authorization | The Definitive Guide to Yii | Yii Framework att.net Mail (gear...@sbcglobal.net) Solr - User - Modelling Access Control PHP Generic Access Control Lists Row-level Model Access Control for CakePHP « some flot, some jet Row-level Model Access Control for CakePHP « some flot, some jet Yahoo! GeoCities: Get a web site with easy-to-use site building tools. Class that acts as a client to a JSON service : JSON « GWT « Java Juozas Kaziukėnas devBlog Re: [symfony-users] Implementing an existing ACL API in symfony php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow W3C ACL System makeAclTables.sql SchemaWeb - Classes And Properties - ACL Schema