date:20110122

Re: SolrJ Tutorial

2011-01-22 Thread Bing Li

I got the solution. Attach one complete sample code I made as follows.

Thanks,
LB

package com.greatfree.Solr;

import org.apache.solr.client.solrj.SolrServer;
import org.apache.solr.client.solrj.SolrServerException;
import org.apache.solr.client.solrj.impl.CommonsHttpSolrServer;
import org.apache.solr.client.solrj.response.QueryResponse;
import org.apache.solr.common.params.ModifiableSolrParams;
import org.apache.solr.client.solrj.SolrQuery;
import org.apache.solr.common.SolrDocumentList;
import org.apache.solr.client.solrj.beans.Field;

import java.net.MalformedURLException;

public class SolrJExample
{
public static void main(String[] args) throws MalformedURLException,
SolrServerException
{
SolrServer solr = new CommonsHttpSolrServer(
http://192.168.210.195:8080/solr/CategorizedHub;);

SolrQuery query = new SolrQuery();
query.setQuery(*:*);
QueryResponse rsp = solr.query(query);
SolrDocumentList docs = rsp.getResults();
System.out.println(docs.getNumFound());

try
{
SolrServer solrScore = new CommonsHttpSolrServer(
http://192.168.210.195:8080/solr/score;);
Score score = new Score();
score.id = 4;
score.type = modern;
score.name = iphone;
score.score = 97;
solrScore.addBean(score);
solrScore.commit();
}
catch (Exception e)
{
System.out.println(e.toString());
}

}
}


On Sat, Jan 22, 2011 at 3:58 PM, Lance Norskog goks...@gmail.com wrote:

 The unit tests are simple and show the steps.

 Lance

 On Fri, Jan 21, 2011 at 10:41 PM, Bing Li lbl...@gmail.com wrote:
  Hi, all,
 
  In the past, I always used SolrNet to interact with Solr. It works great.
  Now, I need to use SolrJ. I think it should be easier to do that than
  SolrNet since Solr and SolrJ should be homogeneous. But I cannot find a
  tutorial that is easy to follow. No tutorials explain the SolrJ
 programming
  step by step. No complete samples are found. Could anybody offer me some
  online resources to learn SolrJ?
 
  I also noticed Solr Cell and SolrJ POJO. Do you have detailed resources
 to
  them?
 
  Thanks so much!
  LB
 



 --
 Lance Norskog
 goks...@gmail.com

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-22 Thread Sami Siren

 Where do you get your Lucene/Solr downloads from?

 [] ASF Mirrors (linked in our release announcements or via the Lucene website)

 [X] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)

 [X] I/we build them from source via an SVN/Git checkout.

 [] Other (someone in your company mirrors them internally or via a downstream 
 project)

--
 Sami Siren

Re: Is solr 4.0 ready for prime time? (or other ways to use geo distance in search)

2011-01-22 Thread Robert Muir

On Fri, Jan 21, 2011 at 11:53 PM, Lance Norskog goks...@gmail.com wrote:
 The Solr 4 branch is nowhere near ready for prime time. For example,
 within the past week code was added that forces you to completely
 reindex all of the documents you had. Solr 4 is really the trunk.
 The low-level stuff is being massively changed to allow very big
 performance improvements and new features.

Changing the index format is not a sign of instability, we did this to
improve performance. So, changing the index format is in no way a bad
sign, nor indicative of whether or not the trunk is good for
production use.

You aren't forced to re-index all your documents if you are riding
trunk -- its your decision to make that tradeoff when you type 'svn
update'. If you want stability you can take a snapshot (e.g. nightly
build), and just stick with it.

Re: Is solr 4.0 ready for prime time? (or other ways to use geo distance in search)

2011-01-22 Thread Estrada Groups

I tried to build yeaterdays svn trunk of 4.0 and got massive failures... The 
Hudson zipped up version seems to work without any issues. Has anyone else seem 
this build issue on the Mac? I guess this also has to do with Grants recent 
poll...

Adam


On Jan 22, 2011, at 6:34 AM, Robert Muir rcm...@gmail.com wrote:

 On Fri, Jan 21, 2011 at 11:53 PM, Lance Norskog goks...@gmail.com wrote:
 The Solr 4 branch is nowhere near ready for prime time. For example,
 within the past week code was added that forces you to completely
 reindex all of the documents you had. Solr 4 is really the trunk.
 The low-level stuff is being massively changed to allow very big
 performance improvements and new features.
 
 Changing the index format is not a sign of instability, we did this to
 improve performance. So, changing the index format is in no way a bad
 sign, nor indicative of whether or not the trunk is good for
 production use.
 
 You aren't forced to re-index all your documents if you are riding
 trunk -- its your decision to make that tradeoff when you type 'svn
 update'. If you want stability you can take a snapshot (e.g. nightly
 build), and just stick with it.

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-22 Thread Jan-Olav Eide


 
 [] ASF Mirrors (linked in our release announcements or via the Lucene website)
 
 [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.)
 
 [] I/we build them from source via an SVN/Git checkout.
 
 [] Other (someone in your company mirrors them internally or via a downstream 
 project)
 
 Please put an X in the box that applies to you.  Multiple selections are OK 
 (for instance, if one project uses a mirror and another uses Maven)
 
 Please do not turn this thread into a discussion on Maven and it's 
 (de)merits, I simply want to know, informally, where people get their JARs 
 from.  In other words, no discussion is necessary (we already have that going 
 on d...@lucene.apache.org which you are welcome to join.)
 
 Thanks,
 Grant

SolrCloud Questions for MultiCore Setup

2011-01-22 Thread Em

Hello list,

i want to experiment with the new SolrCloud feature. So far, I got
absolutely no experience in distributed search with Solr.
However, there are some things that remain unclear to me:

1 ) What is the usecase of a collection?
As far as I understood: A collection is the same as a core but in a
distributed sense. It contains a set of cores on one or multiple machines.
It makes sense that all the cores in a collection got the same schema and
solrconfig - right?
Can someone tell me if I understood the concept of a collection correctly?

2 ) The wiki says this will cause an update
-Durl=http://localhost:8983/solr/collection1/update
However, as far as I know this cause an update to a CORE named collection1
at localhost:8983, not to the full collection. Am I correct here?
So *I* have to care about consistency between the different replicas inside
my cloud?

3 ) If I got replicas of the same shard inside a collection, how does
SolrCloud determine that two documents in a result set are equal? Is it
neccessary to define a unique key? Is it random which of the two documents
is picked into the final resultset?

---
I think these are my most basic questions.
However, there is one more tricky thing:

If I understood the collection-idea correctly: What happens if I create two
cores and each core belongs to a different collection and THEN I do a SWAP.
Say: core1-collection1, core2-collection2
SWAP core1,core2
Does core2 now maps to collection1?

Thank you!
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-Questions-for-MultiCore-Setup-tp2309443p2309443.html
Sent from the Solr - User mailing list archive at Nabble.com.

api key filtering

2011-01-22 Thread Matt Mitchell

Just wanted to see if others are handling this in some special way, but I
think this is pretty simple.

We have a database of api keys that map to allowed db records. I'm
planning on indexing the db records into solr, along with their api keys in
an indexed, non-stored, multi-valued field. Then, to query for docs that
belong to a particular api key, they'll be queried using a filter query on
api_key.

The only concern of mine is that, what if we end up with 100k api_keys?
Would it be a problem to have 100k non-stored keys in each document? We have
about 500k documents total.

Matt

Re: api key filtering

2011-01-22 Thread Dennis Gearon

The only way that you would have that many api keys per record, is if one of 
them represented 'public', right? 'public' is a ROLE. Your answer is to use 
RBAC 
style techniques.


Here are some links that I have on the subject. What I'm thinking of doing is:
Sorry for formatting, Firefox is freaking out. I cut and pasted these from an 
email from my sent box. I hope the links came out.


Part 1

http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/


Part2
Role-based access control in SQL, part 2 at Xaprb 





ACL/RBAC Bookmarks ALL

UserRbac - symfony - Trac 
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
PHP Access Control - PHP5 CMS Framework Development | PHP Zone 
Linux file and directory permissions 
MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root Password 
per RECORD/Entity permissions? - symfony users | Google Groups 
Special Topics: Authentication and Authorization | The Definitive Guide to Yii 
| 
Yii Framework 

att.net Mail (gear...@sbcglobal.net) 
Solr - User - Modelling Access Control 
PHP Generic Access Control Lists 
Row-level Model Access Control for CakePHP « some flot, some jet 
Row-level Model Access Control for CakePHP « some flot, some jet 
Yahoo! GeoCities: Get a web site with easy-to-use site building tools. 
Class that acts as a client to a JSON service : JSON « GWT « Java 
Juozas Kaziukėnas devBlog 
Re: [symfony-users] Implementing an existing ACL API in symfony 
php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow 
W3C ACL System 
makeAclTables.sql 
SchemaWeb - Classes And Properties - ACL Schema 
Reardon's Ruminations: Spring Security ACL Schema for Oracle 
trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla 
Acl.php - kohana-mptt - Project Hosting on Google Code 
Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform 
The page cannot be found 
 

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Matt Mitchell goodie...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 11:48:22 AM
Subject: api key filtering

Just wanted to see if others are handling this in some special way, but I
think this is pretty simple.

We have a database of api keys that map to allowed db records. I'm
planning on indexing the db records into solr, along with their api keys in
an indexed, non-stored, multi-valued field. Then, to query for docs that
belong to a particular api key, they'll be queried using a filter query on
api_key.

The only concern of mine is that, what if we end up with 100k api_keys?
Would it be a problem to have 100k non-stored keys in each document? We have
about 500k documents total.

Matt

Re: api key filtering

2011-01-22 Thread Matt Mitchell

Hey thanks I'll definitely have a read. The only problem with this though,
is that our api is a thin layer of app-code, with solr only (no db), we
index data from our sql db into solr, and push the index off for
consumption.

The only other idea I had was to send a list of the allowed document ids
along with every solr query, but then I'm sure I'd run into a filter query
limit. Each key could be associated with up to 2k documents, so that's 2k
values in an fq which would probably be too many for lucene (I think its
limit 1024).

Matt

On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 The only way that you would have that many api keys per record, is if one
 of
 them represented 'public', right? 'public' is a ROLE. Your answer is to use
 RBAC
 style techniques.


 Here are some links that I have on the subject. What I'm thinking of doing
 is:
 Sorry for formatting, Firefox is freaking out. I cut and pasted these from
 an
 email from my sent box. I hope the links came out.


 Part 1


 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/


 Part2
 Role-based access control in SQL, part 2 at Xaprb





 ACL/RBAC Bookmarks ALL

 UserRbac - symfony - Trac
 A Role-Based Access Control (RBAC) system for PHP
 Appendix C: Task-Field Access
 Role-based access control in SQL, part 2 at Xaprb
 PHP Access Control - PHP5 CMS Framework Development | PHP Zone
 Linux file and directory permissions
 MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
 Password
 per RECORD/Entity permissions? - symfony users | Google Groups
 Special Topics: Authentication and Authorization | The Definitive Guide to
 Yii |
 Yii Framework

 att.net Mail (gear...@sbcglobal.net)
 Solr - User - Modelling Access Control
 PHP Generic Access Control Lists
 Row-level Model Access Control for CakePHP « some flot, some jet
 Row-level Model Access Control for CakePHP « some flot, some jet
 Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
 Class that acts as a client to a JSON service : JSON « GWT « Java
 Juozas Kaziukėnas devBlog
 Re: [symfony-users] Implementing an existing ACL API in symfony
 php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
 W3C ACL System
 makeAclTables.sql
 SchemaWeb - Classes And Properties - ACL Schema
 Reardon's Ruminations: Spring Security ACL Schema for Oracle
 trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla
 Acl.php - kohana-mptt - Project Hosting on Google Code
 Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform
 The page cannot be found


  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Matt Mitchell goodie...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sat, January 22, 2011 11:48:22 AM
 Subject: api key filtering

 Just wanted to see if others are handling this in some special way, but I
 think this is pretty simple.

 We have a database of api keys that map to allowed db records. I'm
 planning on indexing the db records into solr, along with their api keys in
 an indexed, non-stored, multi-valued field. Then, to query for docs that
 belong to a particular api key, they'll be queried using a filter query on
 api_key.

 The only concern of mine is that, what if we end up with 100k api_keys?
 Would it be a problem to have 100k non-stored keys in each document? We
 have
 about 500k documents total.

 Matt

Re: Solr with many indexes

2011-01-22 Thread Erick Erickson

See below.

On Wed, Jan 19, 2011 at 7:26 PM, Joscha Feth jos...@feth.com wrote:

 Hello Erick,

 Thanks for your answer!

 But I question why you *require* many different indexes. [...] including
  isolating one
  users'
  data from all others, [...]


 Yes, thats exactly what I am after - I need to make sure that indexes don't
 mix, as every user shall only be able to query his own data (index).


well, this can also be handled by simply appending the equivalent of
+user:theuser
to each query. This solution does have some interesting side effects
though.
In particular if you autosuggest based on combined documents, users will see
terms NOT in documents they own.



 And even using lots of cores can be made to work if you don't pre-warm
  newly-opened
  cores, assuming that the response time when using cold searchers is
  adequate.
 

 Could you explain that further or point me to some documentation? Are you
 talking about: http://wiki.apache.org/solr/CoreAdmin#UNLOAD? if yes, LOAD
 does not seem to be implemented, yet. Or has this something to do with
 http://wiki.apache.org/solr/SolrCaching#autowarmCount only? About what
 time
 per X documents are we talking here for delay if auto warming is disabled?
 Is there more documentation about this setting?


It's the autoWarm parameter. When you open a core the first few queries that
run
on it will pay some penalty for filling caches etc. If your cores are small
enough,
then this penalty may not be noticeable to your users, in which case you can
just
not bother autowarming (see firstSearcher , newSearcher). You might also
be able to get away with having very small caches, it mostly depends on your
usage patterns. If your pattern as that a user signs on, makes one search
and
signs off, there may not be much good in having large caches. On the other
and,
if users sign on and search for hours continually, their experience may be
enhanced
by having significant caches. It all depends.

Hopt that helps
Erick


 Kind regards,
 Joscha

Re: api key filtering

2011-01-22 Thread Dennis Gearon

The links didn't work, so here the are again, NOT from a sent folder:

PHP Access Control - PHP5 CMS Framework Development | PHP Zone
 A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
PHP Access Control - PHP5 CMS Framework Development | PHP Zone
 UserRbac - symfony - Trac 
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
UserRbac - symfony - Trac 
Acl.php - kohana-mptt - Project Hosting on Google Code 
CANDIDATE-PHP Generic Access Control Lists 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
makeAclTables.sql 
php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow 
PHP Generic Access Control Lists 
Reardon's Ruminations: Spring Security ACL Schema for Oracle 
Re: [symfony-users] Implementing an existing ACL API in symfony 
SchemaWeb - Classes And Properties - ACL Schema 
trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla 
Using Zend_Acl with a database backend - Zend Framework Wiki 
W3C ACL System 

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Matt Mitchell goodie...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 12:50:24 PM
Subject: Re: api key filtering

Hey thanks I'll definitely have a read. The only problem with this though,
is that our api is a thin layer of app-code, with solr only (no db), we
index data from our sql db into solr, and push the index off for
consumption.

The only other idea I had was to send a list of the allowed document ids
along with every solr query, but then I'm sure I'd run into a filter query
limit. Each key could be associated with up to 2k documents, so that's 2k
values in an fq which would probably be too many for lucene (I think its
limit 1024).

Matt

On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 The only way that you would have that many api keys per record, is if one
 of
 them represented 'public', right? 'public' is a ROLE. Your answer is to use
 RBAC
 style techniques.


 Here are some links that I have on the subject. What I'm thinking of doing
 is:
 Sorry for formatting, Firefox is freaking out. I cut and pasted these from
 an
 email from my sent box. I hope the links came out.


 Part 1


http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
/


 Part2
 Role-based access control in SQL, part 2 at Xaprb





 ACL/RBAC Bookmarks ALL

 UserRbac - symfony - Trac
 A Role-Based Access Control (RBAC) system for PHP
 Appendix C: Task-Field Access
 Role-based access control in SQL, part 2 at Xaprb
 PHP Access Control - PHP5 CMS Framework Development | PHP Zone
 Linux file and directory permissions
 MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
 Password
 per RECORD/Entity permissions? - symfony users | Google Groups
 Special Topics: Authentication and Authorization | The Definitive Guide to
 Yii |
 Yii Framework

 att.net Mail (gear...@sbcglobal.net)
 Solr - User - Modelling Access Control
 PHP Generic Access Control Lists
 Row-level Model Access Control for CakePHP « some flot, some jet
 Row-level Model Access Control for CakePHP « some flot, some jet
 Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
 Class that acts as a client to a JSON service : JSON « GWT « Java
 Juozas Kaziukėnas devBlog
 Re: [symfony-users] Implementing an existing ACL API in symfony
 php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
 W3C ACL System
 makeAclTables.sql
 SchemaWeb - Classes And Properties - ACL Schema
 Reardon's Ruminations: Spring Security ACL Schema for Oracle
 trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla
 Acl.php - kohana-mptt - Project Hosting on Google Code
 Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform
 The page cannot be found


  Dennis Gearon


 Signature Warning
 
 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


 EARTH has a Right To Life,
 otherwise we all die.



 - Original Message 
 From: Matt Mitchell goodie...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Sat, January 22, 2011 11:48:22 AM
 Subject: api key filtering

 Just wanted to see if others are handling this in some special way, but I
 think this is pretty simple.

 We have a database of api keys that map to allowed db records. I'm
 planning on indexing the db records into

Re: solrconfig.xml settings question

2011-01-22 Thread Erick Erickson

Yep, that's about it. By far the main constraint is memory and the caches
are what eats it up. So by minimizing the caches on the master (since they
are filled by searching) you speed that part up.

By maximizing the cache settings on the servers, you make them go as fast
as possible.

RamBufferSize is irrelevant on the searcher. It governs how much data
is stored in RAM when *indexing* before flushing to disk. This usually
gets to diminishing returns at 128M BTW.

Oh, there is one other thing on the searchers that can really hurt to
frequent polling of the master if the master is furiously indexing polling
too often can lead to thrashing if the time it takes to autowarm is longer
than the polling interval, which you can only figure out by measuring...

Best
Erick

On Thu, Jan 20, 2011 at 8:34 AM, kenf_nc ken.fos...@realestate.com wrote:


 Is that it? Of all the strange, esoteric, little understood configuration
 settings available in solrconfig.xml, the only thing that affects Index
 Speed vs Query Speed is turning on/off the Query Cache and RamBufferSize?
 And for the latter, why wouldn't RamBufferSize be the same for both...that
 is, as high as you can make it?
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/solrconfig-xml-settings-question-tp2271594p2294668.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: api key filtering

2011-01-22 Thread Dennis Gearon

Dang! There were hot, clickable links in the web mail I put them in. I guess 
you 
guys can search for those strings on google and find them. Sorry.

- Original Message 
From: Dennis Gearon gear...@sbcglobal.net
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 1:09:26 PM
Subject: Re: api key filtering

The links didn't work, so here the are again, NOT from a sent folder:

PHP Access Control - PHP5 CMS Framework Development | PHP Zone
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
PHP Access Control - PHP5 CMS Framework Development | PHP Zone
UserRbac - symfony - Trac 
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
UserRbac - symfony - Trac 
Acl.php - kohana-mptt - Project Hosting on Google Code 
CANDIDATE-PHP Generic Access Control Lists 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
makeAclTables.sql 
php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow 
PHP Generic Access Control Lists 
Reardon's Ruminations: Spring Security ACL Schema for Oracle 
Re: [symfony-users] Implementing an existing ACL API in symfony 
SchemaWeb - Classes And Properties - ACL Schema 
trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla 
Using Zend_Acl with a database backend - Zend Framework Wiki 
W3C ACL System 

Dennis Gearon

Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 

idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

EARTH has a Right To Life,
otherwise we all die.

- Original Message 
From: Matt Mitchell goodie...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 12:50:24 PM
Subject: Re: api key filtering

Hey thanks I'll definitely have a read. The only problem with this though,
is that our api is a thin layer of app-code, with solr only (no db), we
index data from our sql db into solr, and push the index off for
consumption.

The only other idea I had was to send a list of the allowed document ids
along with every solr query, but then I'm sure I'd run into a filter query
limit. Each key could be associated with up to 2k documents, so that's 2k
values in an fq which would probably be too many for lucene (I think its
limit 1024).

Matt

On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.netwrote:

 The only way that you would have that many api keys per record, is if one
 of
 them represented 'public', right? 'public' is a ROLE. Your answer is to use
 RBAC
 style techniques.

 Here are some links that I have on the subject. What I'm thinking of doing
 is:
 Sorry for formatting, Firefox is freaking out. I cut and pasted these from
 an
 email from my sent box. I hope the links came out.

 Part 1

http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/

/

 Part2
 Role-based access control in SQL, part 2 at Xaprb

 ACL/RBAC Bookmarks ALL

 UserRbac - symfony - Trac
 A Role-Based Access Control (RBAC) system for PHP
 Appendix C: Task-Field Access
 Role-based access control in SQL, part 2 at Xaprb
 PHP Access Control - PHP5 CMS Framework Development | PHP Zone
 Linux file and directory permissions
 MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
 Password
 per RECORD/Entity permissions? - symfony users | Google Groups
 Special Topics: Authentication and Authorization | The Definitive Guide to
 Yii |
 Yii Framework

 att.net Mail (gear...@sbcglobal.net)
 Solr - User - Modelling Access Control
 PHP Generic Access Control Lists
 Row-level Model Access Control for CakePHP « some flot, some jet
 Row-level Model Access Control for CakePHP « some flot, some jet
 Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
 Class that acts as a client to a JSON service : JSON « GWT « Java
 Juozas Kaziukėnas devBlog
 Re: [symfony-users] Implementing an existing ACL API in symfony
 php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
 W3C ACL System
 makeAclTables.sql
 SchemaWeb - Classes And Properties - ACL Schema
 Reardon's Ruminations: Spring Security ACL Schema for Oracle
 trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla
 Acl.php - kohana-mptt - Project Hosting on Google Code
 Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform
 The page cannot be found

  Dennis Gearon

 Signature Warning

 It is always a good idea to learn from your own mistakes. It is usually a
 better
 idea to learn from others’ mistakes, so you do not have to make them
 yourself.
 from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'

 EARTH has a Right To Life,
 otherwise we all die.

 - Original Message 
 From: Matt Mitchell

Re: Indexing same data in multiple fields with different filters

2011-01-22 Thread Erick Erickson

I'm assuming that this is just one example of many different
kinds of transformations you could do. It *seems* like a variant
of a synonym analyzer, so you could write a custom analyzer
(it's not actuall hard) to create a bunch of synonyms
for your special terms at index time. Or you could use the
synonyms at query time (query time is more flexible)

Best
Erick

On Thu, Jan 20, 2011 at 5:38 AM, shm s...@dbc.dk wrote:

 Hi, I have a little problem regarding indexing, that i don't know
 how to solve, i need to index the same data in different ways
 into the same field. The problem is a normalization problem, and
 here is an example:

 I have a special character \uA732, which i need to normalize in
 two different ways for phrase searching. So if i encounter this
 character in, for example, title field I would like it to result
 in these two phrase fields:

raw data = \uA732lborg
phrase.title= ålborg
phrase.title= aalborg

 Because both ways are valid representations of tyhe phrase.

 I can copy the field from the raw data, but then i cannot
 normalize them differently, so i am at a loss.

 Does anyone have a solution or a good idea?

 Regards
   shm

Re: Multicore Relaod Theoretical Question

2011-01-22 Thread Erick Erickson

This seems far too complex to me. Why not just optimize on the master
and let replication do all the rest for you?

Best
Erick

On Fri, Jan 21, 2011 at 1:07 PM, Em mailformailingli...@yahoo.de wrote:


 Hi,

 are there no experiences or thoughts?
 How would you solve this at Lucene-Level?

 Regards


 Em wrote:
 
  Hello list,
 
  I got a theoretical question about a Multicore-Situation:
 
  I got two cores: active, inactive
 
  The active core serves all the queries.
 
  The inactive core is the tricky thing:
  I create an optimized index outside the environment and want to insert
  that optimized index 1 to 1 into the inactive core, which means replacing
  everything inside the index-directory.
  After this is done, I would like to reload the inactive core, so that it
  is ready for a core-swap and ready for serving queries on top of the new
  inserted optimized index.
 
  Is it possible to handle such a situation?
 
  Thank you.
 
 

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2303585.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: api key filtering

2011-01-22 Thread Erick Erickson

1024 is the default number, it can be increased. See MaxBooleanClauses
in solrconfig.xml

This shouldn't be a problem with 2K clauses, but expanding it to tens of
thousands is probably a mistake (but test to be sure).

Best
Erick

On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com wrote:

 Hey thanks I'll definitely have a read. The only problem with this though,
 is that our api is a thin layer of app-code, with solr only (no db), we
 index data from our sql db into solr, and push the index off for
 consumption.

 The only other idea I had was to send a list of the allowed document ids
 along with every solr query, but then I'm sure I'd run into a filter query
 limit. Each key could be associated with up to 2k documents, so that's 2k
 values in an fq which would probably be too many for lucene (I think its
 limit 1024).

 Matt

 On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net
 wrote:

  The only way that you would have that many api keys per record, is if one
  of
  them represented 'public', right? 'public' is a ROLE. Your answer is to
 use
  RBAC
  style techniques.
 
 
  Here are some links that I have on the subject. What I'm thinking of
 doing
  is:
  Sorry for formatting, Firefox is freaking out. I cut and pasted these
 from
  an
  email from my sent box. I hope the links came out.
 
 
  Part 1
 
 
 
 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
 
 
  Part2
  Role-based access control in SQL, part 2 at Xaprb
 
 
 
 
 
  ACL/RBAC Bookmarks ALL
 
  UserRbac - symfony - Trac
  A Role-Based Access Control (RBAC) system for PHP
  Appendix C: Task-Field Access
  Role-based access control in SQL, part 2 at Xaprb
  PHP Access Control - PHP5 CMS Framework Development | PHP Zone
  Linux file and directory permissions
  MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
  Password
  per RECORD/Entity permissions? - symfony users | Google Groups
  Special Topics: Authentication and Authorization | The Definitive Guide
 to
  Yii |
  Yii Framework
 
  att.net Mail (gear...@sbcglobal.net)
  Solr - User - Modelling Access Control
  PHP Generic Access Control Lists
  Row-level Model Access Control for CakePHP « some flot, some jet
  Row-level Model Access Control for CakePHP « some flot, some jet
  Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
  Class that acts as a client to a JSON service : JSON « GWT « Java
  Juozas Kaziukėnas devBlog
  Re: [symfony-users] Implementing an existing ACL API in symfony
  php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
  W3C ACL System
  makeAclTables.sql
  SchemaWeb - Classes And Properties - ACL Schema
  Reardon's Ruminations: Spring Security ACL Schema for Oracle
  trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla
  Acl.php - kohana-mptt - Project Hosting on Google Code
  Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform
  The page cannot be found
 
 
   Dennis Gearon
 
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is usually a
  better
  idea to learn from others’ mistakes, so you do not have to make them
  yourself.
  from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'
 
 
  EARTH has a Right To Life,
  otherwise we all die.
 
 
 
  - Original Message 
  From: Matt Mitchell goodie...@gmail.com
  To: solr-user@lucene.apache.org
  Sent: Sat, January 22, 2011 11:48:22 AM
  Subject: api key filtering
 
  Just wanted to see if others are handling this in some special way, but I
  think this is pretty simple.
 
  We have a database of api keys that map to allowed db records. I'm
  planning on indexing the db records into solr, along with their api keys
 in
  an indexed, non-stored, multi-valued field. Then, to query for docs that
  belong to a particular api key, they'll be queried using a filter query
 on
  api_key.
 
  The only concern of mine is that, what if we end up with 100k api_keys?
  Would it be a problem to have 100k non-stored keys in each document? We
  have
  about 500k documents total.
 
  Matt

Re: Multicore Relaod Theoretical Question

2011-01-22 Thread Em


Hi Erick,

thanks for your response.

Yes, it's really not that easy.

However, the target is to avoid any kind of master-slave-setup.

The most recent idea i got is to create a new core with a data-dir pointing
to an already existing directory with a fully optimized index.

Regards,
Em
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2310709.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: api key filtering

2011-01-22 Thread Dennis Gearon

Got it, here are the links that I have on RBAC/ACL/Access Control. Some of 
these 
are specific to Solr.

http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
 
http://www.xaprb.com/blog/2006/08/18/role-based-access-control-in-sql-part-2/ 


http://php.dzone.com/articles/php-access-control?page=0,1 
http://www.tonymarston.net/php-mysql/role-based-access-control.html 
http://www.tonymarston.net/php-mysql/menuguide/appendixc.html 
http://php.dzone.com/articles/php-access-control?page=0,1 
http://trac.symfony-project.org/wiki/UserRbac 
http://www.tonymarston.net/php-mysql/role-based-access-control.html 
http://www.tonymarston.net/php-mysql/menuguide/appendixc.html 
http://trac.symfony-project.org/wiki/UserRbac
http://code.google.com/p/kohana-mptt/source/browse/trunk/acl/libraries/Acl.php?r=82
 
http://www.oracle.com/technetwork/articles/javaee/ajax-135201.html 
http://phpgacl.sourceforge.net/ 
http://www.java2s.com/Code/Java/GWT/ClassthatactsasaclienttoaJSONservice.htm 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
http://dev.juokaz.com/ 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
http://stackoverflow.com/questions/54230/cakephp-acl-database-setup-aro-aco-structure
 
http://phpgacl.sourceforge.net/ 
http://blog.reardonsoftware.com/2010/07/spring-security-acl-schema-for-oracle.html
 
http://www.mail-archive.com/symfony-users@googlegroups.com/msg29537.html 
http://www.schemaweb.info/schema/SchemaInfo.aspx?id=167 
http://www.assembla.com/code/backendpro/subversion/nodes/trunk/modules/auth/libraries/Khacl.php?rev=169
 
http://framework.zend.com/wiki/display/ZFUSER/Using+Zend_Acl+with+a+database+backend
 
http://www.w3.org/2001/04/20-ACLs#Structure
http://lucene.472066.n3.nabble.com/Modelling-Access-Control-td1756817.html#a1759372
 
http://www.tonymarston.net/php-mysql/role-based-access-control.html 
http://phpgacl.sourceforge.net/ 
http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/#comment-112
 
http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/
 
http://www.xaprb.com/blog/2006/08/18/role-based-access-control-in-sql-part-2/ 
http://php.dzone.com/articles/php-access-control?page=0,1 
https://issues.apache.org/jira/browse/SOLR-1834 
http://www.tonymarston.net/php-mysql/role-based-access-control.html 
http://php.dzone.com/articles/php-access-control?page=0,1 
http://www.yiiframework.com/doc/guide/1.1/en/topics.auth#role-based-access-control
 
http://lucene.472066.n3.nabble.com/Modelling-Access-Control-td1756817.html#a1759372
 
http://phpgacl.sourceforge.net/ 
http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/#comment-112
 
http://jmcneese.wordpress.com/2009/04/05/row-level-model-access-control-for-cakephp/
 
http://www.yiiframework.com/doc/guide/topics.auth#role-based-access-control

 


- Original Message 
From: Dennis Gearon gear...@sbcglobal.net
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 1:22:04 PM
Subject: Re: api key filtering

Dang! There were hot, clickable links in the web mail I put them in. I guess 
you 

guys can search for those strings on google and find them. Sorry.




- Original Message 
From: Dennis Gearon gear...@sbcglobal.net
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 1:09:26 PM
Subject: Re: api key filtering

The links didn't work, so here the are again, NOT from a sent folder:

PHP Access Control - PHP5 CMS Framework Development | PHP Zone
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
PHP Access Control - PHP5 CMS Framework Development | PHP Zone
UserRbac - symfony - Trac 
A Role-Based Access Control (RBAC) system for PHP 
Appendix C: Task-Field Access 
Role-based access control in SQL, part 2 at Xaprb 
UserRbac - symfony - Trac 
Acl.php - kohana-mptt - Project Hosting on Google Code 
CANDIDATE-PHP Generic Access Control Lists 
http://dev.w3.org/perl/modules/W3C/Rnodes/bin/makeAclTables.sql 
makeAclTables.sql 
php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow 
PHP Generic Access Control Lists 
Reardon's Ruminations: Spring Security ACL Schema for Oracle 
Re: [symfony-users] Implementing an existing ACL API in symfony 
SchemaWeb - Classes And Properties - ACL Schema 
trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla 
Using Zend_Acl with a database backend - Zend Framework Wiki 
W3C ACL System 

Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 


idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Matt Mitchell goodie...@gmail.com
To: solr-user@lucene.apache.org
Sent: Sat, January 22, 2011

Re: Multicore Relaod Theoretical Question

2011-01-22 Thread Alexander Kanarsky

Em,

yes, you can replace the index (get the new one into a separate folder
like index.new and then rename it to the index folder) outside the
Solr, then just do the http call to reload the core.

Note that the old index files may still be in use (continue to serve
the queries while reloading), even if the old index folder is deleted
- that is on Linux filesystems, not sure about NTFS.
That means the space on disk will be freed only when the old files are
not referenced by Solr searcher any longer.

-Alexander

On Sat, Jan 22, 2011 at 1:51 PM, Em mailformailingli...@yahoo.de wrote:

 Hi Erick,

 thanks for your response.

 Yes, it's really not that easy.

 However, the target is to avoid any kind of master-slave-setup.

 The most recent idea i got is to create a new core with a data-dir pointing
 to an already existing directory with a fully optimized index.

 Regards,
 Em
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Multicore-Relaod-Theoretical-Question-tp2293999p2310709.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: old index files not deleted on slave

2011-01-22 Thread feedly team

The file system checked out, I also tried creating a slave on a
different machine and could reproduce the issue. I logged SOLR-2329.

On Sat, Dec 18, 2010 at 8:01 PM, Lance Norskog goks...@gmail.com wrote:
 This could be a quirk of the native locking feature. What's the file
 system? Can you fsck it?

 If this error keeps happening, please file this. It should not happen.
 Add the text above and also your solrconfigs if you can.

 One thing you could try is to change from the native locking policy to
 the simple locking policy - but only on the child.

 On Sat, Dec 18, 2010 at 4:44 PM, feedly team feedly...@gmail.com wrote:
 I have set up index replication (triggered on optimize). The problem I
 am having is the old index files are not being deleted on the slave.
 After each replication, I can see the old files still hanging around
 as well as the files that have just been pulled. This causes the data
 directory size to increase by the index size every replication until
 the disk fills up.

 Checking the logs, I see the following error:

 SEVERE: SnapPull failed
 org.apache.solr.common.SolrException: Index fetch failed :
        at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
        at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
 Caused by: org.apache.lucene.store.LockObtainFailedException: Lock
 obtain timed out:
 NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1065)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:954)
        at 
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:192)
        at 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99)
        at 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
        at 
 org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
        at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471)
        at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
        ... 11 more

 lsof reveals that the file is still opened from the java process.

 I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup
 is pretty vanilla. The OS is linux, the indexes are on local
 directories, write permissions look ok, nothing unusual in the config
 (default deletion policy, etc.). Contents of the index data dir:

 master:
 -rw-rw-r-- 1 feeddo feeddo  191 Dec 14 01:06 _1lg.fnm
 -rw-rw-r-- 1 feeddo feeddo  26M Dec 14 01:07 _1lg.fdx
 -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt
 -rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis
 -rw-rw-r-- 1 feeddo feeddo  15M Dec 14 01:12 _1lg.tii
 -rw-rw-r-- 1 feeddo feeddo 144M Dec 14 01:12 _1lg.prx
 -rw-rw-r-- 1 feeddo feeddo 277M Dec 14 01:12 _1lg.frq
 -rw-rw-r-- 1 feeddo feeddo  311 Dec 14 01:12 segments_1ji
 -rw-rw-r-- 1 feeddo feeddo  23M Dec 14 01:12 _1lg.nrm
 -rw-rw-r-- 1 feeddo feeddo  191 Dec 18 01:11 _24e.fnm
 -rw-rw-r-- 1 feeddo feeddo  26M Dec 18 01:12 _24e.fdx
 -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 01:12 _24e.fdt
 -rw-rw-r-- 1 feeddo feeddo 483M Dec 18 01:23 _24e.tis
 -rw-rw-r-- 1 feeddo feeddo  15M Dec 18 01:23 _24e.tii
 -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 01:23 _24e.prx
 -rw-rw-r-- 1 feeddo feeddo 283M Dec 18 01:23 _24e.frq
 -rw-rw-r-- 1 feeddo feeddo  311 Dec 18 01:24 segments_1xz
 -rw-rw-r-- 1 feeddo feeddo  23M Dec 18 01:24 _24e.nrm
 -rw-rw-r-- 1 feeddo feeddo  191 Dec 18 13:15 _25z.fnm
 -rw-rw-r-- 1 feeddo feeddo  26M Dec 18 13:16 _25z.fdx
 -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 13:16 _25z.fdt
 -rw-rw-r-- 1 feeddo feeddo 484M Dec 18 13:35 _25z.tis
 -rw-rw-r-- 1 feeddo feeddo  15M Dec 18 13:35 _25z.tii
 -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 13:35 _25z.prx
 -rw-rw-r-- 1 feeddo feeddo 284M Dec 18 13:35 _25z.frq
 -rw-rw-r-- 1 feeddo feeddo   20 Dec 18 13:35

Re: SolrCloud Questions for MultiCore Setup

2011-01-22 Thread Lance Norskog

A collection is your data, like newspaper articles or movie titles.
It is a user-level concept, not really a Solr design concept.

A core is a Solr/Lucene index. It is addressable as
solr/collection-name on one machine.

You can use a core to store a collection, or you can break it up among
multiple cores (usually for performance reasons). When you use a core
like this, it is called a shard. All of the different shards of a
collection form the collection.

Solr has a feature called Distributed Search that presents the
separate shards as if it were one Solr collection. You should set up
Distributed Search first. It does not use SolrCloud, but shows you how
these ideas work. After that, Solr Cloud will make more sense.

Lance

On Sat, Jan 22, 2011 at 9:35 AM, Em mailformailingli...@yahoo.de wrote:

Hello list,

i want to experiment with the new SolrCloud feature. So far, I got
absolutely no experience in distributed search with Solr.
However, there are some things that remain unclear to me:

---
I think these are my most basic questions.
However, there is one more tricky thing:

--
Lance Norskog
goks...@gmail.com

Re: old index files not deleted on slave

2011-01-22 Thread Alexander Kanarsky

I see the file

-rw-rw-r-- 1 feeddo feeddo0 Dec 15 01:19
lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock

was created on Dec. 15. At the end of the replication, as far as I
remember, the SnapPuller tries to open the writer to ensure the old
files are deleted, and in
your case it cannot obtain a lock on the index folder on Dec 16,
17,18. Can you reproduce the problem if you delete the lock file,
restart the slave
and try replication again? Do you have any other Writer(s) open for
this folder outside of this core?

-Alexander

On Sat, Jan 22, 2011 at 3:52 PM, feedly team feedly...@gmail.com wrote:
 The file system checked out, I also tried creating a slave on a
 different machine and could reproduce the issue. I logged SOLR-2329.

 On Sat, Dec 18, 2010 at 8:01 PM, Lance Norskog goks...@gmail.com wrote:
 This could be a quirk of the native locking feature. What's the file
 system? Can you fsck it?

 If this error keeps happening, please file this. It should not happen.
 Add the text above and also your solrconfigs if you can.

 One thing you could try is to change from the native locking policy to
 the simple locking policy - but only on the child.

 On Sat, Dec 18, 2010 at 4:44 PM, feedly team feedly...@gmail.com wrote:
 I have set up index replication (triggered on optimize). The problem I
 am having is the old index files are not being deleted on the slave.
 After each replication, I can see the old files still hanging around
 as well as the files that have just been pulled. This causes the data
 directory size to increase by the index size every replication until
 the disk fills up.

 Checking the logs, I see the following error:

 SEVERE: SnapPull failed
 org.apache.solr.common.SolrException: Index fetch failed :
        at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:329)
        at 
 org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:265)
        at org.apache.solr.handler.SnapPuller$1.run(SnapPuller.java:159)
        at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
        at 
 java.util.concurrent.FutureTask$Sync.innerRunAndReset(FutureTask.java:317)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:150)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:98)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.runPeriodic(ScheduledThreadPoolExecutor.java:181)
        at 
 java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:205)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
 Caused by: org.apache.lucene.store.LockObtainFailedException: Lock
 obtain timed out:
 NativeFSLock@/var/solrhome/data/index/lucene-cdaa80c0fefe1a7dfc7aab89298c614c-write.lock
        at org.apache.lucene.store.Lock.obtain(Lock.java:84)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1065)
        at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:954)
        at 
 org.apache.solr.update.SolrIndexWriter.init(SolrIndexWriter.java:192)
        at 
 org.apache.solr.update.UpdateHandler.createMainIndexWriter(UpdateHandler.java:99)
        at 
 org.apache.solr.update.DirectUpdateHandler2.openWriter(DirectUpdateHandler2.java:173)
        at 
 org.apache.solr.update.DirectUpdateHandler2.forceOpenWriter(DirectUpdateHandler2.java:376)
        at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:471)
        at 
 org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
        ... 11 more

 lsof reveals that the file is still opened from the java process.

 I am running 4.0 rev 993367 with patch SOLR-1316. Otherwise, the setup
 is pretty vanilla. The OS is linux, the indexes are on local
 directories, write permissions look ok, nothing unusual in the config
 (default deletion policy, etc.). Contents of the index data dir:

 master:
 -rw-rw-r-- 1 feeddo feeddo  191 Dec 14 01:06 _1lg.fnm
 -rw-rw-r-- 1 feeddo feeddo  26M Dec 14 01:07 _1lg.fdx
 -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 14 01:07 _1lg.fdt
 -rw-rw-r-- 1 feeddo feeddo 474M Dec 14 01:12 _1lg.tis
 -rw-rw-r-- 1 feeddo feeddo  15M Dec 14 01:12 _1lg.tii
 -rw-rw-r-- 1 feeddo feeddo 144M Dec 14 01:12 _1lg.prx
 -rw-rw-r-- 1 feeddo feeddo 277M Dec 14 01:12 _1lg.frq
 -rw-rw-r-- 1 feeddo feeddo  311 Dec 14 01:12 segments_1ji
 -rw-rw-r-- 1 feeddo feeddo  23M Dec 14 01:12 _1lg.nrm
 -rw-rw-r-- 1 feeddo feeddo  191 Dec 18 01:11 _24e.fnm
 -rw-rw-r-- 1 feeddo feeddo  26M Dec 18 01:12 _24e.fdx
 -rw-rw-r-- 1 feeddo feeddo 1.9G Dec 18 01:12 _24e.fdt
 -rw-rw-r-- 1 feeddo feeddo 483M Dec 18 01:23 _24e.tis
 -rw-rw-r-- 1 feeddo feeddo  15M Dec 18 01:23 _24e.tii
 -rw-rw-r-- 1 feeddo feeddo 146M Dec 18 01:23

Re: DIH with full-import and cleaning still keeps old index

2011-01-22 Thread Espen Amble Kolstad

Your not doing optimize, I think optimize would delete your old index.
Try it out with additional parameter optimize=true

- Espen

On Thu, Jan 20, 2011 at 11:30 AM, Bernd Fehling
bernd.fehl...@uni-bielefeld.de wrote:
 Hi list,

 after sending full-import=trueclean=truecommit=true
 Solr 4.x (apache-solr-4.0-2010-11-24_09-25-17) responds with:
 - DataImporter doFullImport
 - DirectUpdateHandler2 deleteAll
 ...
 - DocBuilder finish
 - SolrDeletionPolicy.onCommit: commits:num=2
 - SolrDeletionPolicy updateCommits
 - SolrIndexSearcher init
 - INFO: end_commit_flush
 - SolrIndexSearcher warm
 ...
 - QuerySenderListener newSearcher
 - SolrCore registerSearcher
 - SolrIndexSearcher close
 ...

 This all looks good to me but why is the old index not deleted?

 Am I missing a parameter?

 Regards,
 Bernd

RE: api key filtering

2011-01-22 Thread Jonathan Rochkind

If you COULD solve your problem by indexing 'public', or other tokens from a 
limited vocabulary of document roles, in a field -- then I'd definitely suggest 
you look into doing that, rather than doing odd things with Solr instead. If 
the only barrier is not currently having sufficient logic at the indexing stage 
to do that, then it is going to end up being a lot less of a headache in the 
long term to simply add a layer at the indexing stage to add that in, then 
trying to get Solr to do things outside of it's, well, 'comfort zone'. 

Of course, depending on your requirements, it might not be possible to do that, 
maybe you can't express the semantics in terms of a limited set of roles 
applied to documents. And then maybe your best option really is sending an up 
to 2k element list (not exactly the same list every time, presumably) of 
acceptable documents to Solr with every query, and maybe you can get that to 
work reasonably.  Depending on how many different complete lists of documents 
you have, maybe there's a way to use Solr caches effectively in that situation, 
or maybe that's not even neccesary since lookup by unique id should be pretty 
quick anyway, not really sure. 

But if the semantics are possible, much better to work with Solr rather than 
against it, it's going to take a lot less tinkering to get Solr to perform well 
if you can just send an fq=role:public or something, instead of a list of 
document IDs.  You won't need to worry about it, it'll just work, because you 
know you're having Solr do what it's built to do. Totally worth a bit of work 
to add a logic layer at the indexing stage. IMO. 

From: Erick Erickson [erickerick...@gmail.com]
Sent: Saturday, January 22, 2011 4:50 PM
To: solr-user@lucene.apache.org
Subject: Re: api key filtering

1024 is the default number, it can be increased. See MaxBooleanClauses
in solrconfig.xml

This shouldn't be a problem with 2K clauses, but expanding it to tens of
thousands is probably a mistake (but test to be sure).

Best
Erick

On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com wrote:

 Hey thanks I'll definitely have a read. The only problem with this though,
 is that our api is a thin layer of app-code, with solr only (no db), we
 index data from our sql db into solr, and push the index off for
 consumption.

 The only other idea I had was to send a list of the allowed document ids
 along with every solr query, but then I'm sure I'd run into a filter query
 limit. Each key could be associated with up to 2k documents, so that's 2k
 values in an fq which would probably be too many for lucene (I think its
 limit 1024).

 Matt

 On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net
 wrote:

  The only way that you would have that many api keys per record, is if one
  of
  them represented 'public', right? 'public' is a ROLE. Your answer is to
 use
  RBAC
  style techniques.
 
 
  Here are some links that I have on the subject. What I'm thinking of
 doing
  is:
  Sorry for formatting, Firefox is freaking out. I cut and pasted these
 from
  an
  email from my sent box. I hope the links came out.
 
 
  Part 1
 
 
 
 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
 
 
  Part2
  Role-based access control in SQL, part 2 at Xaprb
 
 
 
 
 
  ACL/RBAC Bookmarks ALL
 
  UserRbac - symfony - Trac
  A Role-Based Access Control (RBAC) system for PHP
  Appendix C: Task-Field Access
  Role-based access control in SQL, part 2 at Xaprb
  PHP Access Control - PHP5 CMS Framework Development | PHP Zone
  Linux file and directory permissions
  MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
  Password
  per RECORD/Entity permissions? - symfony users | Google Groups
  Special Topics: Authentication and Authorization | The Definitive Guide
 to
  Yii |
  Yii Framework
 
  att.net Mail (gear...@sbcglobal.net)
  Solr - User - Modelling Access Control
  PHP Generic Access Control Lists
  Row-level Model Access Control for CakePHP « some flot, some jet
  Row-level Model Access Control for CakePHP « some flot, some jet
  Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
  Class that acts as a client to a JSON service : JSON « GWT « Java
  Juozas Kaziukėnas devBlog
  Re: [symfony-users] Implementing an existing ACL API in symfony
  php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
  W3C ACL System
  makeAclTables.sql
  SchemaWeb - Classes And Properties - ACL Schema
  Reardon's Ruminations: Spring Security ACL Schema for Oracle
  trunk/modules/auth/libraries/Khacl.php | Source/SVN | Assembla
  Acl.php - kohana-mptt - Project Hosting on Google Code
  Asynchronous JavaScript Technology and XML (Ajax) With the Java Platform
  The page cannot be found
 
 
   Dennis Gearon
 
 
  Signature Warning
  
  It is always a good idea to learn from your own mistakes. It is usually a
  better
  idea

Re: api key filtering

2011-01-22 Thread Dennis Gearon

Totally agree, do it at indexing time, in the index.

 Dennis Gearon


Signature Warning

It is always a good idea to learn from your own mistakes. It is usually a 
better 
idea to learn from others’ mistakes, so you do not have to make them yourself. 
from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.



- Original Message 
From: Jonathan Rochkind rochk...@jhu.edu
To: solr-user@lucene.apache.org solr-user@lucene.apache.org
Sent: Sat, January 22, 2011 5:28:50 PM
Subject: RE: api key filtering

If you COULD solve your problem by indexing 'public', or other tokens from a 
limited vocabulary of document roles, in a field -- then I'd definitely suggest 
you look into doing that, rather than doing odd things with Solr instead. If 
the 
only barrier is not currently having sufficient logic at the indexing stage to 
do that, then it is going to end up being a lot less of a headache in the long 
term to simply add a layer at the indexing stage to add that in, then trying to 
get Solr to do things outside of it's, well, 'comfort zone'. 


Of course, depending on your requirements, it might not be possible to do that, 
maybe you can't express the semantics in terms of a limited set of roles 
applied 
to documents. And then maybe your best option really is sending an up to 2k 
element list (not exactly the same list every time, presumably) of acceptable 
documents to Solr with every query, and maybe you can get that to work 
reasonably.  Depending on how many different complete lists of documents you 
have, maybe there's a way to use Solr caches effectively in that situation, or 
maybe that's not even neccesary since lookup by unique id should be pretty 
quick 
anyway, not really sure. 


But if the semantics are possible, much better to work with Solr rather than 
against it, it's going to take a lot less tinkering to get Solr to perform well 
if you can just send an fq=role:public or something, instead of a list of 
document IDs.  You won't need to worry about it, it'll just work, because you 
know you're having Solr do what it's built to do. Totally worth a bit of work 
to 
add a logic layer at the indexing stage. IMO. 


From: Erick Erickson [erickerick...@gmail.com]
Sent: Saturday, January 22, 2011 4:50 PM
To: solr-user@lucene.apache.org
Subject: Re: api key filtering

1024 is the default number, it can be increased. See MaxBooleanClauses
in solrconfig.xml

This shouldn't be a problem with 2K clauses, but expanding it to tens of
thousands is probably a mistake (but test to be sure).

Best
Erick

On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com wrote:

 Hey thanks I'll definitely have a read. The only problem with this though,
 is that our api is a thin layer of app-code, with solr only (no db), we
 index data from our sql db into solr, and push the index off for
 consumption.

 The only other idea I had was to send a list of the allowed document ids
 along with every solr query, but then I'm sure I'd run into a filter query
 limit. Each key could be associated with up to 2k documents, so that's 2k
 values in an fq which would probably be too many for lucene (I think its
 limit 1024).

 Matt

 On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net
 wrote:

  The only way that you would have that many api keys per record, is if one
  of
  them represented 'public', right? 'public' is a ROLE. Your answer is to
 use
  RBAC
  style techniques.
 
 
  Here are some links that I have on the subject. What I'm thinking of
 doing
  is:
  Sorry for formatting, Firefox is freaking out. I cut and pasted these
 from
  an
  email from my sent box. I hope the links came out.
 
 
  Part 1
 
 
 
http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
/
 
 
  Part2
  Role-based access control in SQL, part 2 at Xaprb
 
 
 
 
 
  ACL/RBAC Bookmarks ALL
 
  UserRbac - symfony - Trac
  A Role-Based Access Control (RBAC) system for PHP
  Appendix C: Task-Field Access
  Role-based access control in SQL, part 2 at Xaprb
  PHP Access Control - PHP5 CMS Framework Development | PHP Zone
  Linux file and directory permissions
  MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
  Password
  per RECORD/Entity permissions? - symfony users | Google Groups
  Special Topics: Authentication and Authorization | The Definitive Guide
 to
  Yii |
  Yii Framework
 
  att.net Mail (gear...@sbcglobal.net)
  Solr - User - Modelling Access Control
  PHP Generic Access Control Lists
  Row-level Model Access Control for CakePHP « some flot, some jet
  Row-level Model Access Control for CakePHP « some flot, some jet
  Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
  Class that acts as a client to a JSON service : JSON « GWT « Java
  Juozas Kaziukėnas devBlog
  Re: [symfony-users] Implementing an existing ACL API in symfony
  php

Re: api key filtering

2011-01-22 Thread Matt Mitchell

I think that indexing the access information is going to work nicely, and I
agree that sticking with the simplest/solr way is best. The constraint is
super simple... you can view this set of documents or you can't... based on
an api key: fq=api_key:xxx

Thanks for the feedback on this guys!
Matt

2011/1/22 Jonathan Rochkind rochk...@jhu.edu

 If you COULD solve your problem by indexing 'public', or other tokens from
 a limited vocabulary of document roles, in a field -- then I'd definitely
 suggest you look into doing that, rather than doing odd things with Solr
 instead. If the only barrier is not currently having sufficient logic at the
 indexing stage to do that, then it is going to end up being a lot less of a
 headache in the long term to simply add a layer at the indexing stage to add
 that in, then trying to get Solr to do things outside of it's, well,
 'comfort zone'.

 Of course, depending on your requirements, it might not be possible to do
 that, maybe you can't express the semantics in terms of a limited set of
 roles applied to documents. And then maybe your best option really is
 sending an up to 2k element list (not exactly the same list every time,
 presumably) of acceptable documents to Solr with every query, and maybe you
 can get that to work reasonably.  Depending on how many different complete
 lists of documents you have, maybe there's a way to use Solr caches
 effectively in that situation, or maybe that's not even neccesary since
 lookup by unique id should be pretty quick anyway, not really sure.

 But if the semantics are possible, much better to work with Solr rather
 than against it, it's going to take a lot less tinkering to get Solr to
 perform well if you can just send an fq=role:public or something, instead of
 a list of document IDs.  You won't need to worry about it, it'll just work,
 because you know you're having Solr do what it's built to do. Totally worth
 a bit of work to add a logic layer at the indexing stage. IMO.
 
 From: Erick Erickson [erickerick...@gmail.com]
 Sent: Saturday, January 22, 2011 4:50 PM
 To: solr-user@lucene.apache.org
 Subject: Re: api key filtering

 1024 is the default number, it can be increased. See MaxBooleanClauses
 in solrconfig.xml

 This shouldn't be a problem with 2K clauses, but expanding it to tens of
 thousands is probably a mistake (but test to be sure).

 Best
 Erick

 On Sat, Jan 22, 2011 at 3:50 PM, Matt Mitchell goodie...@gmail.com
 wrote:

  Hey thanks I'll definitely have a read. The only problem with this
 though,
  is that our api is a thin layer of app-code, with solr only (no db), we
  index data from our sql db into solr, and push the index off for
  consumption.
 
  The only other idea I had was to send a list of the allowed document ids
  along with every solr query, but then I'm sure I'd run into a filter
 query
  limit. Each key could be associated with up to 2k documents, so that's 2k
  values in an fq which would probably be too many for lucene (I think its
  limit 1024).
 
  Matt
 
  On Sat, Jan 22, 2011 at 3:40 PM, Dennis Gearon gear...@sbcglobal.net
  wrote:
 
   The only way that you would have that many api keys per record, is if
 one
   of
   them represented 'public', right? 'public' is a ROLE. Your answer is to
  use
   RBAC
   style techniques.
  
  
   Here are some links that I have on the subject. What I'm thinking of
  doing
   is:
   Sorry for formatting, Firefox is freaking out. I cut and pasted these
  from
   an
   email from my sent box. I hope the links came out.
  
  
   Part 1
  
  
  
 
 http://www.xaprb.com/blog/2006/08/16/how-to-build-role-based-access-control-in-sql/
  
  
   Part2
   Role-based access control in SQL, part 2 at Xaprb
  
  
  
  
  
   ACL/RBAC Bookmarks ALL
  
   UserRbac - symfony - Trac
   A Role-Based Access Control (RBAC) system for PHP
   Appendix C: Task-Field Access
   Role-based access control in SQL, part 2 at Xaprb
   PHP Access Control - PHP5 CMS Framework Development | PHP Zone
   Linux file and directory permissions
   MySQL :: MySQL 5.0 Reference Manual :: C.5.4.1 How to Reset the Root
   Password
   per RECORD/Entity permissions? - symfony users | Google Groups
   Special Topics: Authentication and Authorization | The Definitive Guide
  to
   Yii |
   Yii Framework
  
   att.net Mail (gear...@sbcglobal.net)
   Solr - User - Modelling Access Control
   PHP Generic Access Control Lists
   Row-level Model Access Control for CakePHP « some flot, some jet
   Row-level Model Access Control for CakePHP « some flot, some jet
   Yahoo! GeoCities: Get a web site with easy-to-use site building tools.
   Class that acts as a client to a JSON service : JSON « GWT « Java
   Juozas Kaziukėnas devBlog
   Re: [symfony-users] Implementing an existing ACL API in symfony
   php - CakePHP ACL Database Setup: ARO / ACO structure? - Stack Overflow
   W3C ACL System
   makeAclTables.sql
   SchemaWeb - Classes And Properties - ACL Schema

Re: SolrJ Tutorial

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

Re: Is solr 4.0 ready for prime time? (or other ways to use geo distance in search)

Re: Is solr 4.0 ready for prime time? (or other ways to use geo distance in search)

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

SolrCloud Questions for MultiCore Setup

api key filtering

Re: api key filtering

Re: api key filtering

Re: Solr with many indexes

Re: api key filtering

Re: solrconfig.xml settings question

Re: api key filtering

Re: Indexing same data in multiple fields with different filters

Re: Multicore Relaod Theoretical Question

Re: api key filtering

Re: Multicore Relaod Theoretical Question

Re: api key filtering

Re: Multicore Relaod Theoretical Question

Re: old index files not deleted on slave

Re: SolrCloud Questions for MultiCore Setup

Re: old index files not deleted on slave

Re: DIH with full-import and cleaning still keeps old index

RE: api key filtering

Re: api key filtering

Re: api key filtering

26 matches

Site Navigation

Mail list logo

Footer information