Unbuffered Exception while setting permissions

2010-06-30 Thread Rakhi Khatwani
Hi,
   I am trying out solr security on my setup from the following links:
http://wiki.apache.org/solr/SolrSecurity
http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores

Following is my configuration:

realms.properties:
admin: admin,server-administrator,content-administrator,admin
other: OBF:1xmk1w261u9r1w1c1xmq
guest: guest,read-only
rakhi: rakhi,RW-role

jetty.xml:
...
Set name=UserRealms
Array type=org.mortbay.jetty.security.UserRealm
Item
New class=org.mortbay.jetty.security.HashUserRealm
  Set name=nameTest Realm/Set
Set name=configSystemProperty name=jetty.home
default=.//etc/realm.properties/Set
/New
/Item
/Array
/Set

...

WebDefault.xml:
!-- block by default. --
security-constraint
web-resource-collection
web-resource-nameDefault/web-resource-name
url-pattern//url-pattern
/web-resource-collection
auth-constraint/ !-- BLOCK! --
/security-constraint

!-- Setting admin access. --
security-constraint
   web-resource-collection
   web-resource-nameSolr authenticated application/web-resource-name
url-pattern/admin/*/url-pattern
url-pattern/core1/admin/*/url-pattern
   /web-resource-collection
   auth-constraint
role-nameadmin/role-name
role-nameFullAccess-role/role-name
   /auth-constraint
/security-constraint

!-- this constraint has no auth constraint or data constraint = allows
without auth. --
security-constraint
web-resource-collection
web-resource-nameAllowedQueries/web-resource-name
  url-pattern/core1/select/*/url-pattern
/web-resource-collection
/security-constraint

login-config
auth-methodBASIC/auth-method
  realm-nameTest Realm/realm-name
/login-config
security-role
role-nameAdmin-role/role-name
/security-role
security-role
role-nameFullAccess-role/role-name
/security-role
security-role
role-nameRW-role/role-name
/security-role


So Far Everything works good. I get a forbidden exception as soon as i try
to commit documents in solr.
but when i add the following security constraint tag in webdefault.xml,

!-- this constraint allows access to modify the data in the SOLR service,
with basic auth --
security-constraint
web-resource-collection
web-resource-nameRW/web-resource-name
!-- the dataimport handler for each individual core --
  url-pattern/core1/dataimport/url-pattern
!-- the update handler (XML over HTTP) for each individual core --
  url-pattern/core1/update/*/url-pattern
/web-resource-collection
auth-constraint
!-- Roles of users are defined int the properties file --
!-- we allow users with rw-only access --
role-nameRW-role/role-name
!-- we allow users with full access --
role-nameFullAccess-role/role-name
/auth-constraint
/security-constraint

I get the following exception:

org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
request can not be repeated.
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
at Authentication.AuthenticationTest.main(AuthenticationTest.java:35)
Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
entity enclosing request can not be repeated.
at
org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
at
org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416)
... 4 more


My Java code is as follows:
public class AuthenticationTest {
public static void main(String[] args) {
try {
HttpClient client = new HttpClient();
AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT);
client.getState().setCredentials(scope, new
UsernamePasswordCredentials(rakhi,rakhi));
  SolrServer server = new CommonsHttpSolrServer(
http://localhost:8983/solr/core1/,client);

SolrQuery query = new SolrQuery();
query.setQuery(*:*);
QueryResponse response = server.query(query);
System.out.println(response.getStatus());

SolrInputDocument doc = new SolrInputDocument();
doc.setField(aid, 0);
doc.setField(rct, Sample Data for authentication);
server.add(doc);
server.commit();
} catch (MalformedURLException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (SolrServerException e) {
// 

Re: Unbuffered Exception while setting permissions

2010-06-30 Thread Rakhi Khatwani
PS: I am using solr 1.4

Regards,
Raakhi

On Wed, Jun 30, 2010 at 12:05 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 Hi,
I am trying out solr security on my setup from the following links:
 http://wiki.apache.org/solr/SolrSecurity

 http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores

 Following is my configuration:

 realms.properties:
 admin: admin,server-administrator,content-administrator,admin
 other: OBF:1xmk1w261u9r1w1c1xmq
 guest: guest,read-only
 rakhi: rakhi,RW-role

 jetty.xml:
 ...
 Set name=UserRealms
 Array type=org.mortbay.jetty.security.UserRealm
 Item
 New class=org.mortbay.jetty.security.HashUserRealm
   Set name=nameTest Realm/Set
 Set name=configSystemProperty name=jetty.home
 default=.//etc/realm.properties/Set
 /New
 /Item
 /Array
 /Set

 ...

 WebDefault.xml:
 !-- block by default. --
 security-constraint
 web-resource-collection
 web-resource-nameDefault/web-resource-name
 url-pattern//url-pattern
 /web-resource-collection
 auth-constraint/ !-- BLOCK! --
 /security-constraint

 !-- Setting admin access. --
  security-constraint
web-resource-collection
web-resource-nameSolr authenticated application/web-resource-name
 url-pattern/admin/*/url-pattern
 url-pattern/core1/admin/*/url-pattern
/web-resource-collection
auth-constraint
 role-nameadmin/role-name
 role-nameFullAccess-role/role-name
/auth-constraint
 /security-constraint

 !-- this constraint has no auth constraint or data constraint = allows
 without auth. --
 security-constraint
 web-resource-collection
 web-resource-nameAllowedQueries/web-resource-name
   url-pattern/core1/select/*/url-pattern
 /web-resource-collection
 /security-constraint

 login-config
 auth-methodBASIC/auth-method
   realm-nameTest Realm/realm-name
 /login-config
 security-role
 role-nameAdmin-role/role-name
 /security-role
 security-role
 role-nameFullAccess-role/role-name
 /security-role
 security-role
 role-nameRW-role/role-name
 /security-role


 So Far Everything works good. I get a forbidden exception as soon as i try
 to commit documents in solr.
 but when i add the following security constraint tag in webdefault.xml,

 !-- this constraint allows access to modify the data in the SOLR service,
 with basic auth --
 security-constraint
 web-resource-collection
 web-resource-nameRW/web-resource-name
 !-- the dataimport handler for each individual core --
   url-pattern/core1/dataimport/url-pattern
 !-- the update handler (XML over HTTP) for each individual core --
   url-pattern/core1/update/*/url-pattern
 /web-resource-collection
 auth-constraint
 !-- Roles of users are defined int the properties file --
 !-- we allow users with rw-only access --
 role-nameRW-role/role-name
 !-- we allow users with full access --
 role-nameFullAccess-role/role-name
 /auth-constraint
 /security-constraint

 I get the following exception:

 org.apache.solr.client.solrj.SolrServerException:
 org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
 request can not be repeated.
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
 at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
 at Authentication.AuthenticationTest.main(AuthenticationTest.java:35)
 Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
 entity enclosing request can not be repeated.
 at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
 at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
 at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416)
 ... 4 more


 My Java code is as follows:
 public class AuthenticationTest {
 public static void main(String[] args) {
 try {
 HttpClient client = new HttpClient();
 AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT);
 client.getState().setCredentials(scope, new
 UsernamePasswordCredentials(rakhi,rakhi));
   SolrServer server = new CommonsHttpSolrServer(
 http://localhost:8983/solr/core1/,client);

 SolrQuery query = new SolrQuery();
 query.setQuery(*:*);
 QueryResponse response = server.query(query);
 System.out.println(response.getStatus());

 SolrInputDocument doc = new 

Strange problem when use dismax handler

2010-06-30 Thread Scott Zhang
Hi. All.
   I am using default dismax to search within solr.
The problem is when I search I want to specify the type to restrict the
result.
Here is what I do:
1. Query String with one type (Works!)
:design)) AND ((type:product) )))
2. Query String with 2 types (Works!)
:design)) AND ((type:product) OR (type:member
3. Query String with 3 types (Works!)
:design)) AND ((type:product) OR (type:member) OR (type:forum)
)))
4. Query string with more than 3 types (doesn't work!)
:design)) AND ((type:product) OR (type:member) OR (type:forum)
OR (type:stamp) OR (type:answer) OR (type:page
Nothing was returned.
Don't know why. I think this should be caused by dismax setting, probably
mm (Minimum 'Should' Match)

But I have no idea how to configure it?

Please help. Thanks!


Regards.
Scott


Re: Strange problem when use dismax handler

2010-06-30 Thread Scott Zhang
I use debugQuery to check my query url:
I notice the query url is parsed incorrectly.

The type:book was parsed as query string too. Sign~~~

+((+DisjunctionMaxQuery((keyword_level1:design^10.0 |
keyword_level2:design)~0.01) DisjunctionMaxQuery((keyword_level1:type^10.0 |
keyword_level2:type)~0.01) DisjunctionMaxQuery((keyword_level1:product^10.0
| keyword_level2:product)~0.01)
DisjunctionMaxQuery((keyword_level1:type^10.0 | keyword_level2:type)~0.01)
DisjunctionMaxQuery((keyword_level1:book^10.0 |
keyword_level2:book)~0.01))~3) ()


Then the question is how to specify the query filed , as in my case, type =
book besides use the design to search over 2 fields ?

Thanks

On Wed, Jun 30, 2010 at 4:08 PM, Scott Zhang macromars...@gmail.com wrote:

 Hi. All.
I am using default dismax to search within solr.
 The problem is when I search I want to specify the type to restrict the
 result.
 Here is what I do:
 1. Query String with one type (Works!)
 :design)) AND ((type:product) )))
 2. Query String with 2 types (Works!)
 :design)) AND ((type:product) OR (type:member
 3. Query String with 3 types (Works!)
 :design)) AND ((type:product) OR (type:member) OR (type:forum)
 )))
 4. Query string with more than 3 types (doesn't work!)
 :design)) AND ((type:product) OR (type:member) OR (type:forum)
 OR (type:stamp) OR (type:answer) OR (type:page
 Nothing was returned.
 Don't know why. I think this should be caused by dismax setting, probably
 mm (Minimum 'Should' Match)

 But I have no idea how to configure it?

 Please help. Thanks!


 Regards.
 Scott





Re: [ANN] Solr 1.4.1 Released

2010-06-30 Thread Stevo Slavić
Created issue https://issues.apache.org/jira/browse/SOLR-1977 for this.

Regards,
Stevo.

On Sun, Jun 27, 2010 at 2:54 AM, Ken Krugler kkrugler_li...@transpac.comwrote:


 On Jun 26, 2010, at 5:18pm, Jason Chaffee wrote:

  It appears the 1.4.1 version was deployed with a new maven groupId

 For eample, if you are trying to download solr-core, here are the
 differences between 1.4.0 and 1.4.1.

 1.4.0
 groupId: org.apache.solr
 artifactId: solr-core

 1.4.1
 groupId: org.apache.solr.solr
 artifactId:solr-core

 Was this change intentional or a mistake?  If it was a mistake, can
 someone please fix it in maven's central repository?


 I believe it was a mistake. From a recent email thread on this list, Mark
 Miller said:

  Can a solr/maven dude look at this? I simply used the copy command on
 the release to-do wiki (sounds like it should be updated).

 If no one steps up, I'll try and straighten it out later.

 On 6/25/10 10:28 AM, Stevo Slavić wrote:

 Congrats on the release!

 Something seems to be wrong with solr 1.4.1 maven artifacts, there is in
 extra solr in the path. E.g. solr-parent-1.4.1.pom at in

 http://repo1.maven.org/maven2/org/apache/solr/solr/solr-parent/1.4.1/solr-parent-1.4.1.pomwhile
 it should be at

 http://repo1.maven.org/maven2/org/apache/solr/solr-parent/1.4.1/solr-parent-1.4.1.pom
 .
 Pom's seem to contain correct maven artifact coordinates.

 Regards,
 Stevo.


 -- Ken

 
 Ken Krugler
 +1 530-210-6378
 http://bixolabs.com
 e l a s t i c   w e b   m i n i n g







Re: Strange problem when use dismax handler

2010-06-30 Thread Scott Zhang
Well. I figured it out.

I should use fq parameter.

fq=type:music type:movie type:product



On Wed, Jun 30, 2010 at 4:15 PM, Scott Zhang macromars...@gmail.com wrote:

 I use debugQuery to check my query url:
 I notice the query url is parsed incorrectly.

 The type:book was parsed as query string too. Sign~~~

 +((+DisjunctionMaxQuery((keyword_level1:design^10.0 |
 keyword_level2:design)~0.01) DisjunctionMaxQuery((keyword_level1:type^10.0 |
 keyword_level2:type)~0.01) DisjunctionMaxQuery((keyword_level1:product^10.0
 | keyword_level2:product)~0.01)
 DisjunctionMaxQuery((keyword_level1:type^10.0 | keyword_level2:type)~0.01)
 DisjunctionMaxQuery((keyword_level1:book^10.0 |
 keyword_level2:book)~0.01))~3) ()


 Then the question is how to specify the query filed , as in my case, type =
 book besides use the design to search over 2 fields ?

 Thanks


 On Wed, Jun 30, 2010 at 4:08 PM, Scott Zhang macromars...@gmail.comwrote:

 Hi. All.
I am using default dismax to search within solr.
 The problem is when I search I want to specify the type to restrict the
 result.
 Here is what I do:
 1. Query String with one type (Works!)
 :design)) AND ((type:product) )))
 2. Query String with 2 types (Works!)
 :design)) AND ((type:product) OR (type:member
 3. Query String with 3 types (Works!)
 :design)) AND ((type:product) OR (type:member) OR (type:forum)
 )))
 4. Query string with more than 3 types (doesn't work!)
 :design)) AND ((type:product) OR (type:member) OR (type:forum)
 OR (type:stamp) OR (type:answer) OR (type:page
 Nothing was returned.
 Don't know why. I think this should be caused by dismax setting, probably
 mm (Minimum 'Should' Match)

 But I have no idea how to configure it?

 Please help. Thanks!


 Regards.
 Scott






Re: Unbuffered Exception while setting permissions

2010-06-30 Thread Rakhi Khatwani
I was going through the logs,
Everytime i try doing an update (and ofcourse ending up with unbuffered
exception) the log outputs the following line
[30/Jun/2010:09:02:52 +] POST /solr/core1/update?wt=javabinversion=1
HTTP/1.1 401 1389

Regards
Raakhi

On Wed, Jun 30, 2010 at 12:27 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 PS: I am using solr 1.4

 Regards,
 Raakhi

 On Wed, Jun 30, 2010 at 12:05 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 Hi,
I am trying out solr security on my setup from the following links:
 http://wiki.apache.org/solr/SolrSecurity

 http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores

 Following is my configuration:

 realms.properties:
 admin: admin,server-administrator,content-administrator,admin
 other: OBF:1xmk1w261u9r1w1c1xmq
 guest: guest,read-only
 rakhi: rakhi,RW-role

 jetty.xml:
 ...
 Set name=UserRealms
 Array type=org.mortbay.jetty.security.UserRealm
 Item
 New class=org.mortbay.jetty.security.HashUserRealm
   Set name=nameTest Realm/Set
 Set name=configSystemProperty name=jetty.home
 default=.//etc/realm.properties/Set
 /New
 /Item
 /Array
 /Set

 ...

 WebDefault.xml:
 !-- block by default. --
 security-constraint
 web-resource-collection
 web-resource-nameDefault/web-resource-name
 url-pattern//url-pattern
 /web-resource-collection
 auth-constraint/ !-- BLOCK! --
 /security-constraint

 !-- Setting admin access. --
  security-constraint
web-resource-collection
web-resource-nameSolr authenticated application/web-resource-name
 url-pattern/admin/*/url-pattern
 url-pattern/core1/admin/*/url-pattern
/web-resource-collection
auth-constraint
 role-nameadmin/role-name
 role-nameFullAccess-role/role-name
/auth-constraint
 /security-constraint

 !-- this constraint has no auth constraint or data constraint = allows
 without auth. --
 security-constraint
 web-resource-collection
 web-resource-nameAllowedQueries/web-resource-name
   url-pattern/core1/select/*/url-pattern
 /web-resource-collection
 /security-constraint

 login-config
 auth-methodBASIC/auth-method
   realm-nameTest Realm/realm-name
 /login-config
 security-role
 role-nameAdmin-role/role-name
 /security-role
 security-role
 role-nameFullAccess-role/role-name
 /security-role
 security-role
 role-nameRW-role/role-name
 /security-role


 So Far Everything works good. I get a forbidden exception as soon as i try
 to commit documents in solr.
 but when i add the following security constraint tag in webdefault.xml,

 !-- this constraint allows access to modify the data in the SOLR service,
 with basic auth --
 security-constraint
 web-resource-collection
 web-resource-nameRW/web-resource-name
 !-- the dataimport handler for each individual core --
   url-pattern/core1/dataimport/url-pattern
 !-- the update handler (XML over HTTP) for each individual core --
   url-pattern/core1/update/*/url-pattern
 /web-resource-collection
 auth-constraint
 !-- Roles of users are defined int the properties file --
 !-- we allow users with rw-only access --
 role-nameRW-role/role-name
 !-- we allow users with full access --
 role-nameFullAccess-role/role-name
 /auth-constraint
 /security-constraint

 I get the following exception:

 org.apache.solr.client.solrj.SolrServerException:
 org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
 request can not be repeated.
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
 at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64)
 at Authentication.AuthenticationTest.main(AuthenticationTest.java:35)
 Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
 entity enclosing request can not be repeated.
 at
 org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
 at
 org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
 at
 org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
 at
 org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
 at
 org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416)
 ... 4 more


 My Java code is as follows:
 public class AuthenticationTest {
 public static void main(String[] args) {
 try {
 HttpClient client = new HttpClient();
 AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT);
 

Re: problem with formulating a negative query

2010-06-30 Thread Sascha Szott

Hi Erick,

thanks for your explanations. But why are all docs being *removed* from 
the set of all docs that contain R in their topic field? This would 
correspond to a boolean AND and would stand in conflict with the clause 
q.op=OR. This seems a bit strange to me.


Furthermore, Smiley  Pugh stated in their Solr 1.4 book on pg. 102 that 
adding the a subexpression containing the negative query (-[* TO *]) and 
the match-all-docs clause (*:*) is only a workaround. Why is this 
workaround necessary at all?


Best,
Sascha

Erick Erickson wrote:

This may help:
http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Boolean%20operators

But the clause you specified translates roughly as find all the
documents that contain R, then remove any of them that match
* TO *. * TO * contains all the documents with R, so everything
you just matched is removed from your results.

HTH
Erick

On Tue, Jun 29, 2010 at 12:40 PM, Sascha Szottsz...@zib.de  wrote:


Hi Ahmet,

it works, thanks a lot!

To be true I have no idea what's the problem with
defType=luceneq.op=ORdf=topicq=R NOT [* TO *]

-Sascha


Ahmet Arslan wrote:


I have a (multi-valued) field topic in my index which does

not need to exist in every document. Now, I'm struggling
with formulating a query that returns all documents that
either have no topic field at all *or* whose topic field
value is R.



Does this work?
defType=luceneq.op=ORq=topic:R (+*:* -topic:[* TO *])




Re: Unbuffered Exception while setting permissions

2010-06-30 Thread Rakhi Khatwani
This error usually occurs when i do a server.add(inpDoc).

Behind the logs:

192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET
/solr/GPTWPI/update?qt=%2Fupdateoptimize=truewt=javabinversion=1
HTTP/1.1 200 41

192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET
/solr/GPTWPI/select?q=aid%3A30234wt=javabinversion=1 HTTP/1.1 401 1389

192.168.0.106 - admin [30/Jun/2010:11:30:38 +] GET
/solr/GPTWPI/select?q=aid%3A30234wt=javabinversion=1 HTTP/1.1 200 70

192.168.0.106 - - [30/Jun/2010:11:30:38 +] POST
/solr/GPTWPI/update?wt=javabinversion=1 HTTP/1.1 200 41 (Works when i
comment out the auth-constraint for RW)

AND

192.168.0.106 - - [30/Jun/2010:11:29:09 +] POST
/solr/GPTWPI/update?wt=javabinversion=1 HTTP/1.1 401 1389 (Does not work
when i add the auth-constraint for RW)

192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET
/solr/GPTWPI/update?qt=%2Fupdatecommit=truewt=javabinversion=1 HTTP/1.1
200 41

so what i conclude is that the authentication does not work when we do a
POST method and works for GET methods. correct me if i am wrong.
and how do i get it working?

Regards,
Raakhi

On Wed, Jun 30, 2010 at 2:22 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 I was going through the logs,
 Everytime i try doing an update (and ofcourse ending up with unbuffered
 exception) the log outputs the following line
 [30/Jun/2010:09:02:52 +] POST /solr/core1/update?wt=javabinversion=1
 HTTP/1.1 401 1389

 Regards
 Raakhi

 On Wed, Jun 30, 2010 at 12:27 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 PS: I am using solr 1.4

 Regards,
 Raakhi

 On Wed, Jun 30, 2010 at 12:05 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 Hi,
I am trying out solr security on my setup from the following
 links:
 http://wiki.apache.org/solr/SolrSecurity

 http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores

 Following is my configuration:

 realms.properties:
 admin: admin,server-administrator,content-administrator,admin
 other: OBF:1xmk1w261u9r1w1c1xmq
 guest: guest,read-only
 rakhi: rakhi,RW-role

 jetty.xml:
 ...
 Set name=UserRealms
 Array type=org.mortbay.jetty.security.UserRealm
 Item
 New class=org.mortbay.jetty.security.HashUserRealm
   Set name=nameTest Realm/Set
 Set name=configSystemProperty name=jetty.home
 default=.//etc/realm.properties/Set
 /New
 /Item
 /Array
 /Set

 ...

 WebDefault.xml:
 !-- block by default. --
 security-constraint
 web-resource-collection
 web-resource-nameDefault/web-resource-name
 url-pattern//url-pattern
 /web-resource-collection
 auth-constraint/ !-- BLOCK! --
 /security-constraint

 !-- Setting admin access. --
  security-constraint
web-resource-collection
web-resource-nameSolr authenticated application/web-resource-name
 url-pattern/admin/*/url-pattern
 url-pattern/core1/admin/*/url-pattern
/web-resource-collection
auth-constraint
 role-nameadmin/role-name
 role-nameFullAccess-role/role-name
/auth-constraint
 /security-constraint

 !-- this constraint has no auth constraint or data constraint = allows
 without auth. --
 security-constraint
 web-resource-collection
 web-resource-nameAllowedQueries/web-resource-name
   url-pattern/core1/select/*/url-pattern
 /web-resource-collection
 /security-constraint

 login-config
 auth-methodBASIC/auth-method
   realm-nameTest Realm/realm-name
 /login-config
 security-role
 role-nameAdmin-role/role-name
 /security-role
 security-role
 role-nameFullAccess-role/role-name
 /security-role
 security-role
 role-nameRW-role/role-name
 /security-role


 So Far Everything works good. I get a forbidden exception as soon as i
 try to commit documents in solr.
 but when i add the following security constraint tag in webdefault.xml,

 !-- this constraint allows access to modify the data in the SOLR
 service, with basic auth --
 security-constraint
 web-resource-collection
 web-resource-nameRW/web-resource-name
 !-- the dataimport handler for each individual core --
   url-pattern/core1/dataimport/url-pattern
 !-- the update handler (XML over HTTP) for each individual core --
   url-pattern/core1/update/*/url-pattern
 /web-resource-collection
 auth-constraint
 !-- Roles of users are defined int the properties file --
 !-- we allow users with rw-only access --
 role-nameRW-role/role-name
 !-- we allow users with full access --
 role-nameFullAccess-role/role-name
 /auth-constraint
 /security-constraint

 I get the following exception:

 org.apache.solr.client.solrj.SolrServerException:
 org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
 request can not be repeated.
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469)
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243)
 at
 org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
 at 

Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting

2010-06-30 Thread stockii

I get only this exception when i sort on my popularity field. WHY ???

...sort=score desc, popularity desc -- BAD
...sort=score desc -- All fine.

thats not good =(
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-heelp-Sorting-tp932956p932968.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting

2010-06-30 Thread Koji Sekiguchi

(10/06/30 20:27), stockii wrote:

Hello.

I get an SEVERE: java.lang.ArrayIndexOutOfBoundsException and i dont know
the reason for this.

I have 4 cores. and every core is running but. for few minutes i get these
bad exception in one core. its absolutlety not acceptable ...

  When i search with this:
.../?
omitHeader=falsefl=idsort=score+desc,+popularity+descstart=0q=fuckjson.nl=mapqt=standardwt=jsonrows=16version=1.2

I get the execption, but not when i search withoutsort- parameter i get no
exception. ?!?!?!


SEVERE: java.lang.ArrayIndexOutOfBoundsException: 30988
 at
org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:713)
   

If popularity field is multiValued and/or tokenized,
sorting on such field throws AIOOBE in Solr 1.4.1.
(Sort works correctly on untokenized/single-valued field)

Koji

--
http://www.rondhuit.com/en/



Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting

2010-06-30 Thread stockii

aha. the type is sint 

do i need to use string ore which field did not use any tokenizer ? ^^ 
i thought sint is untokenized...
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-heelp-Sorting-tp932956p933003.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting

2010-06-30 Thread stockii

it makes no sense.

i can on my playground-core sort for popularity, but not on my live-core.
both cores, have the same configuration ... 

the problem occurs after use facet-search. thats maybe an reason ? 
or too many data ? 
to low server ? 

any idea 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-heelp-Sorting-tp932956p933099.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: REST calls

2010-06-30 Thread Tim Williams
On Wed, Jun 30, 2010 at 12:39 AM, Don Werve d...@madwombat.com wrote:
 2010/6/27 Jason Chaffee jchaf...@ebates.com

 The solr docs say it is RESTful, yet it seems that it doesn't use http
 headers in a RESTful way.  For example, it doesn't seem to use the Accept:
 request header to determine the media-type to be returned.  Instead, it
 requires a query parameter to be used in the URL.  Also, it doesn't seem to
 use return 304 Not Modified if the request header if-modified-since is
 used.


 The summary:

 Solr is restful, and does a very good job of it.

I'm not so sure...

 The long version:

 There is no official 'REST' standard that dictates the behavior of the
 implementation; rather, REST is a set of guidelines on building APIs that
 are both discoverable and easily usable without having to resort to
 third-party libraries.

 Generally speaking, an application is RESTful if it provides an API that
 accepts arguments passed as HTTP form variables, returns results in an open
 format (XML, JSON, YAML, etc.), and respects certain semantics relating to
 HTTP verbs; e.g., GET/HEAD return the resource without modification, DELETEs
 are destructive, POST creates a resource, PUT alters it.

 Solr meets all of these requirements.

With fairly limited knowledge of Solr (I'm a lucene user), I'd like to
offer an alternate view.

- Solr seems to violate the hypermedia-driven constraint.  (e.g. it
seems not to be hypertext driven at all) [1]

- Solr seems to violate the uniform interface constraint and the
identification of resources constraint.(E.g. by having commands in
the entity body instead of exposing resources with state that is
manipulated through the standard methods and, I gather, overloading
methods instead of using standard ones (e.g. deletes).

I'd conclude Solr is not RESTful.

The representation argument is a bit of a red-herring, btw.  Not using
Accept for conneg isn't the problem, using agent-driven negotiation
without being hypertext driven is [from a REST pov].

--tim

[1] - http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven


Re: REST calls

2010-06-30 Thread Tim Williams
On Wed, Jun 30, 2010 at 9:17 AM, Jak Akdemir jakde...@gmail.com wrote:
 On Wed, Jun 30, 2010 at 7:39 AM, Don Werve d...@madwombat.com wrote:

 2010/6/27 Jason Chaffee jchaf...@ebates.com

  The solr docs say it is RESTful, yet it seems that it doesn't use http
  headers in a RESTful way.  For example, it doesn't seem to use the
 Accept:
  request header to determine the media-type to be returned.  Instead, it
  requires a query parameter to be used in the URL.  Also, it doesn't seem
 to
  use return 304 Not Modified if the request header if-modified-since is
  used.
 

 The summary:

 Solr is restful, and does a very good job of it.

 The long version:

 There is no official 'REST' standard that dictates the behavior of the
 implementation; rather, REST is a set of guidelines on building APIs that
 are both discoverable and easily usable without having to resort to
 third-party libraries.

 Generally speaking, an application is RESTful if it provides an API that
 accepts arguments passed as HTTP form variables, returns results in an open
 format (XML, JSON, YAML, etc.), and respects certain semantics relating to
 HTTP verbs; e.g., GET/HEAD return the resource without modification,
 DELETEs
 are destructive, POST creates a resource, PUT alters it.


 Actually it is not a constraint to use all of four *GET*, *PUT*, *POST*, *
 DELETE.*
 To define RESTful, using Get and Post requests are enough as Roy Fielding
 offered.
 http://roy.gbiv.com/untangled/2009/it-is-okay-to-use-post

In Roy's post, I'd point out: POST only becomes an issue when it is
used in a situation for which some other method is ideally suited
(e.g. DELETE to delete).

Also, GET and POST *could* be enough if and only if you took care to
design your resources properly[1].

--tim

[1] - http://www.amundsen.com/blog/archives/1063


Re: REST calls

2010-06-30 Thread Yonik Seeley
Solr's APIs are described as REST-like, and probably do qualify as
restful the way the term is commonly used.

I'm personally much more interested in making our APIs more powerful
and easier to use, regardless of any REST purity tests.

-Yonik
http://www.lucidimagination.com


Re: dataimport.properties is not updated on delta-import

2010-06-30 Thread warb

I've finally found the problem causing the delta-import to fail and thought I
would post it here for future reference (if someone makes the same mistake I
did).

I had forgot to collect the id column in the deltaImportQuery. I should,
of course, have known this from the log entires about documents not being
added because they lacked id fields.

My broken query: SELECT column_1, column_2, column_3 FROM table WHERE id =
${dataimporter.delta.id}
My working query: SELECT id, column_1, column_2, column_3 FROM table WHERE
id = ${dataimporter.delta.id}

Thanks all for you help!
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-properties-is-not-updated-on-delta-import-tp916753p933342.html
Sent from the Solr - User mailing list archive at Nabble.com.


Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread bbarani

Hi,

I am trying to delete a group of documents using wildcard. Something like 

update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletedocfield%20name=uid6-HOST*/field/doc/delete'

I want to delete all documents which contains the uid starting with 6-HOST
but this query doesnt seem to work.. Am I doing anything wrong??

Thanks,
BB
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933468.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: REST calls

2010-06-30 Thread Chantal Ackermann
On Wed, 2010-06-30 at 16:12 +0200, Yonik Seeley wrote:
 Solr's APIs are described as REST-like, and probably do qualify as
 restful the way the term is commonly used.
 
 I'm personally much more interested in making our APIs more powerful
 and easier to use, regardless of any REST purity tests.
 
 -Yonik
 http://www.lucidimagination.com


Hi Yonik,

yes, please - and thanks!

I'm again and again positively surprised how efficient and yet simple
SOLR's (and Lucene's) query and response language (incl. response
formats) is. Some things seem complex/difficult at first (like dismax or
function queries) but turn out to be simple/easy to use considering the
complexity of the problems they solve.

Chantal



Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Sascha Szott

Hi,

you can delete all docs that match a certain query:

deletequeryuid:6-HOST*/query/delete

-Sascha

bbarani wrote:


Hi,

I am trying to delete a group of documents using wildcard. Something like

update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletedocfield%20name=uid6-HOST*/field/doc/delete'

I want to delete all documents which contains the uid starting with 6-HOST
but this query doesnt seem to work.. Am I doing anything wrong??

Thanks,
BB


Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread bbarani

Hi,

Thanks a lot for your reply..

I tried the below query

update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletequeryuid:6-HOST*/query/delete'

But even now none of the documents are getting deleted.. Am I forming the
URL wrong?

Thanks,
BB
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933585.html
Sent from the Solr - User mailing list archive at Nabble.com.


SolrQueryResponse - Solr Documents

2010-06-30 Thread Amdebirhan, Samson, VF-Group
Hello all,

 

How can I view solr docs in response writers before the response is sent
to the client ? What I get is only DocSlice with int values having size
equal the docs requested. All this while debugging on the
SolrQueryResponse Object.

 

 

Thanks

Sam 



Re: Leading Wildcard query strangeness

2010-06-30 Thread dbashford

An update in case someone stumbles upon this...

At first I thought you mean the fields I intend to do leading wildcard
searches on needed to have ReversedWildcardFilterFactory on them.  But that
didn't make sense because our prod app isn't using that at all.

But our prod app does have the text_rev still in it from the example
schema we copied over and used as our template.  One of the things we've
done in dev is clean that out and try to get down to what we are using. So I
tossed the text_rev back into the schema.xml, didn't actually use that field
type for any fields, and now I can do leading wildcard searches again.

I'm going to guess that is what you meant, that the very presence of the
filter in the schema, whether it is used or not, allows you to do wildcard
searches.

Is that documented anywhere and I just missed it?  I'm sure it is.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Leading-Wildcard-query-strangeness-tp931809p933600.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Sascha Szott

Hi,

does /select?q=uid:6-HOST* return any documents?

-Sascha

bbarani wrote:


Hi,

Thanks a lot for your reply..

I tried the below query

update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletequeryuid:6-HOST*/query/delete'

But even now none of the documents are getting deleted.. Am I forming the
URL wrong?

Thanks,
BB


Re: Leading Wildcard query strangeness

2010-06-30 Thread Ahmet Arslan
 I'm going to guess that is what you meant, that the very
 presence of the
 filter in the schema, whether it is used or not, allows you
 to do wildcard
 searches.

Exactly.

 Is that documented anywhere and I just missed it?  I'm
 sure it is.

I knew it from source code of SolrQueryParser.
protected void checkAllowLeadingWildcards()
I didn't see anything in wiki about it.





Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread bbarani

Yeah, I am getting the results when I use /select handler.

I tried the below query..

/select?q=uid:6-HOST*

Got result name=response numFound=52920 start=0

Thanks
BB
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933697.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Jan Høydahl / Cominvent
Hi,

You need to use HTTP POST in order to send those parameters I believe. Try with 
curl:

curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml 
--data-binary deletequeryuid:6-HOST*/query/delete

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 30. juni 2010, at 17.53, bbarani wrote:

 
 Yeah, I am getting the results when I use /select handler.
 
 I tried the below query..
 
 /select?q=uid:6-HOST*
 
 Got result name=response numFound=52920 start=0
 
 Thanks
 BB
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933697.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Wiki Documentation of facet.sort

2010-06-30 Thread Chantal Ackermann
Hi there,

in the wiki, on http://wiki.apache.org/solr/SimpleFacetParameters
it says:


The default is true/count if facet.limit is greater than 0, false/index
otherwise.


I've just migrated to 1.4.1 (reindexed). I can't remember how it was
with 1.4.0.

When I specify my facet query with facet.mincount=0 (explicitely) or
without mincount (default is 0), the resulting facets are sorted by
count, nevertheless. Changing mincount from 0 to 1 and back actually
makes not difference in the sorting.
I'm fine with a constant default behaviour (always sorting by count,
e.g., no matter what parameters are given).
If this is intended - shall I change the wiki accordingly?

Cheers,
Chantal



Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Sascha Szott

Hi,

take a look inside Solr's log file. Are there any error messages with 
respect to the update request?


Furthermore, you could try the following two commands instead:

curl http://host:port/solr/update; --form-string 
stream.body=deletequeryuid:6-HOST*/query/delete


curl http://host:port/solr/update; --form-string stream.body=commit/

-Sascha

bbarani wrote:


Yeah, I am getting the results when I use /select handler.

I tried the below query..

/select?q=uid:6-HOST*

Gotresult name=response numFound=52920 start=0

Thanks
BB


Disable Solr Response Formatting

2010-06-30 Thread JohnRodey

By default my SOLR response comes back formatted, like such



C/



Is there a way to tell it to return it unformatted? like:

C/ 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disable Solr Response Formatting

2010-06-30 Thread JohnRodey

Oops, let me try that again...

By default my SOLR response comes back formatted, like such 



  
C/
  





Is there a way to tell it to return it unformatted? like: 

C/ 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933793.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread bbarani

Hi,

I was able to sucessfully delete multiple documents using the below URL

/update?stream.body=deletequeryuid:6-HOST*/query/delete

Thanks,
BB
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933799.html
Sent from the Solr - User mailing list archive at Nabble.com.


GC tuning - heap size autoranging

2010-06-30 Thread Robert Petersen

Is this a true statement???  This seems to contradict other statements 
regarding setting the heap size I have seen here...

Default Heap Size
If not otherwise set on the command line, the initial and maximum heap sizes 
are calculated based on the amount of memory on the machine. The proportion of 
memory to use for the heap is controlled by the command line 
options DefaultInitialRAMFraction and DefaultMaxRAMFraction, as shown in the 
table below. (In the table, memory represents the amount of memory on the 
machine.)

Pasted from 
http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#available_collectors.selecting


RE: Re: Disable Solr Response Formatting

2010-06-30 Thread Markus Jelsma
Hi,

 

My client makes a mess out of your example but if you mean formatting as in 
indenting, then send indent=false, but it's already false by default. Check 
your requestHandler settings.

 

Cheers,
 
-Original message-
From: JohnRodey timothydd...@yahoo.com
Sent: Wed 30-06-2010 18:39
To: solr-user@lucene.apache.org; 
Subject: Re: Disable Solr Response Formatting


Oops, let me try that again...

By default my SOLR response comes back formatted, like such 



 
   C/
 


   


Is there a way to tell it to return it unformatted? like: 

C/ 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933793.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Is there a way to delete multiple documents using wildcard?

2010-06-30 Thread Jan Høydahl / Cominvent
Hmm, nice one - I was not aware of that trick.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 30. juni 2010, at 18.41, bbarani wrote:

 
 Hi,
 
 I was able to sucessfully delete multiple documents using the below URL
 
 /update?stream.body=deletequeryuid:6-HOST*/query/delete
 
 Thanks,
 BB
 -- 
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933799.html
 Sent from the Solr - User mailing list archive at Nabble.com.



RE: Re: Disable Solr Response Formatting

2010-06-30 Thread JohnRodey

Thanks!  I was looking for things to change in the solrconfig.xml file.

indent=off
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933966.html
Sent from the Solr - User mailing list archive at Nabble.com.


Bizarre Terms revisited

2010-06-30 Thread Darren Govoni
Hi,
  I really think there is something not quite right going on here
after much study. Here is my findings.

Using MLT, I get terms that appear to be long concatenations of words
that are space delimited in the original text.
I can't think of any reason for these sentence-like terms to exist  (see
below).

All my data and config follows:

Here is the output from MLT:

lst name=interestingTerms 
float name=text_t:result1.0/float 
float name=text_t:concepts1.0/float 
float name=text_t:identified1.0/float 
float name=text_t:row1.0/float 
float name=text_t:based1.0/float 
float name=text_t:0001.0/float  
float name=text_t:ontreweb1.0/float 
float name=text_t:in1.0/float 
float name=text_t:and1.0/float 
float name=text_t:21.0/float 

!-- These do not look like valid or useful terms to have in the index.
--
!-- Why do these exist? --
float
name=text_t:searchinonelanguagefindresultsinanother1.0/float 
float name=text_t:ontrewebstartpage1.0/float 
float name=text_t:unlimitedmutliwordandphrasematching1.0/float 
float name=text_t:wordsandphrases1.0/float 
float name=text_t:pluggablevocabulariesontologies1.0/float 
float name=text_t:mappedconcepts1.0/float 
float name=text_t:ontrewebproductfeatures1.0/float 
float name=text_t:multilinguallexiconsfrenchenglishetc1.0/float 
float name=text_t:multipleinheritanceofconcepts1.0/float 

float name=text_t:41.0/float 
float name=text_t:string1.0/float 
float name=text_t:english1.0/float 
float name=text_t:mapped1.0/float 
float name=text_t:multilingual1.0/float 
float name=text_t:mutliword1.0/float 
/lst 

My field:

   field name=text_t  type=textgenindexed=true  stored=true
multiValued=true termVectors=true termPositions=true
termOffsets=true/ 


Field definition taken from the default schema.xml

fieldType name=textgen class=solr.TextField
positionIncrementGap=100 
  analyzer type=index 
tokenizer class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true / 
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ 
filter class=solr.LowerCaseFilterFactory/ 
  /analyzer 
  analyzer type=query 
tokenizer class=solr.WhitespaceTokenizerFactory/ 
filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/ 
filter class=solr.StopFilterFactory
ignoreCase=true
words=stopwords.txt
enablePositionIncrements=true
/ 
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ 
filter class=solr.LowerCaseFilterFactory/ 
  /analyzer 
/fieldType 

Original text (partially snipped) as it appears in the stored index.

Ontreweb Product Features 

     

Unlimited mutliword and phrase matching Multiple inheritance of concepts 
Pluggable vocabularies, ontologies Multilingual 
lexicons: french, english, etc. Search in one language, find results in another 
200,000+ words and phrases, 35,000 mapped 
concepts.

1. 2. 3. 4.




RE: OOM on uninvert field request

2010-06-30 Thread Robert Petersen
Hey so after adding those GC options, I was able to incrementally push my max 
(and min) memory settings up and when we got to max=min=12GB we started looking 
much better!  One slave handles all the load with no OOMs at all!  I'm watching 
the live tomcat log using 'tail'.  Next I will convert that field type to 
(trie) int and reindex.  I'll have to start a new index from scratch with a 
field type change like that so I'll have to delete the old one first on our 
master... It takes us a couple days to index 15 million products (some are sets 
so the final index size is only 8 million) so I don't want to do *that* too 
often as the slaves will be quite stale by the time it's done!  :)

Thanks for the help!

-Original Message-
From: Robert Petersen [mailto:rober...@buy.com] 
Sent: Wednesday, June 30, 2010 9:49 AM
To: solr-user@lucene.apache.org
Subject: RE: OOM on uninvert field request

At and above 4GB we get those GC errors though!  Should I switch to something 
like this?

Recommended Options
To use i-cms in Java SE 6, use the following command line options:

-XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode \
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps


Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead 
limit exceeded
at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068)
at 
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418)
at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:467)
at 
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319)
... 11 more
Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded


-Original Message-
From: Lance Norskog [mailto:goks...@gmail.com] 
Sent: Tuesday, June 29, 2010 8:42 PM
To: solr-user@lucene.apache.org
Subject: Re: OOM on uninvert field request

Yes, it is better to use ints for ids than strings. Also, the Trie int
fields have a compressed format that may cut the storage needs even
more. 8m * 4 = 32mb, times a few hundred, we'll say 300, is 900mb of
IDs.  I don't know how these fields are stored, but if they are
separate objects we've blown up to several gigs (per-object overheads
are surprising).

4G is probably not enough for what you want. If you watch the total
memory with 'top' and hit it with different queries, you will get a
stronger sense of how much memory your use cases need.

On Tue, Jun 29, 2010 at 4:32 PM, Robert Petersen rober...@buy.com wrote:
 Hello I am trying to find the right max and min settings for Java 1.6 on 20GB 
 index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am 
 currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m 
 -Xms4096m) for both min and max which is doing pretty well but occasionally 
 still getting the below OOM errors.  We're running on dual quad core xeons 
 with 16GB memory installed.  I've been getting the below OOM exceptions still 
 though.

 Is the memsize mentioned in the INFO for the uninvert in bytes?  Ie is 
 memSize=29604020 mean 29MB?  We have a few hundred of these fields and they 
 contain ints used as IDs, and so I guess could they eat all the memory to 
 uninvert them all after we apply load and enough queries are performed.  Does 
 the field type matter, would int be better than string if these are lookup 
 ids sparsely populated across the index?  BTW these are used for faceting and 
 filtering only.

                dynamicField name=*_contentAttributeToken  type=string  
 indexed=true multiValued=true   stored=true required=false/

 Jun 29, 2010 3:54:50 PM org.apache.solr.request.UnInvertedField uninvert
 INFO: UnInverted multi-valued field 
 {field=768_contentAttributeToken,memSize=29604014,tindexSize=50,time=1841,phase1=1824,nTerms=1,bigTerms=0,termInstances=18,uses=0}
 Jun 29, 2010 3:54:52 PM org.apache.solr.request.UnInvertedField uninvert
 INFO: UnInverted multi-valued field 
 {field=749_contentAttributeToken,memSize=29604020,tindexSize=56,time=1847,phase1=1829,nTerms=143,bigTerms=0,termInstances=951,uses=0}
 Jun 29, 2010 3:54:59 PM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: Java heap space
        at 
 org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:191)
        at 
 org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:178)
        at 
 org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839)
        at 
 org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:250)
        at 
 org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283)
        at 
 org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166)




-- 
Lance Norskog
goks...@gmail.com


Multiple Solr servers and a shared index vs master+slaves

2010-06-30 Thread David Thompson
I'm a newbie looking at setting up an intranet search service using Solr, so 
I'm having a hard time understanding why I should forego the high availability 
and clustering mechanisms we already have available, and use Solr's 
implementations instead.  I'm hoping some experienced Solr architects could 
take the time to comment.

Our corporate standard is for any java web app to be deployed as an ear file 
targeted to a 4-server Weblogic 10.3 cluster on virtual Solaris boxes, 
operating behind a cluster of Apache web servers.  All servers have NFS mounts 
to high availability SANs.  So my Solr proof-of-concept tries to make use of 
those tools.  I've deployed Solr to the cluster, and all of them use the same 
solr.home on the NFS mount.  This seems to be just fine for searching, query 
requests are evenly distributed across the cluster, and search performance 
seems to be fine with the index living on the NFS mount.  

The problems, of course, start when add/update requests come in.  This setup is 
the equivalent of having 4 standalone Solr servers using the same index.  So if 
I use the simple lock file mechanism, in my testing so far it seems to keep 
them all separate just fine, except that the first update comes in to serverA, 
it grabs the write lock, then if any other servers receive an update near the 
same time, it must wait for the write lock to be be removed by serverA after it 
commits.  I think I can pretty well mitigate this by directing all updates 
through a single server (via virtual IP address), but then I need the other 
servers to realize the index has changed after each commit.  It looks like I 
can make a call like http://serverB/solr/update/extract?commit=true and that's 
good enough to get it to open a new reader, but that seems a little clunky.  
I've read in another thread about the use of commit hooks that can trigger 
user-defined events, I think, so
 I'm looking into that now.

Now when I look at using Solr's master+slaves architecture, I feel like it's 
duplicating the trusted (and expensive) services we already have at our 
disposal.  Weblogic+Apache clusters do a good job of distributing load, 
monitoring health, failing-over, restarting, etc.  And if we used slaves that 
pulled index snapshots, they'd be using (by policy) the same NFS mount to store 
those snapshots, so we'd be pulling it over the wire only to write it right 
next to the original index.  If we didn't have these HA clustering mechanisms 
available already, then I'm sure I'd be much more willing to look at a Solr 
master+slave architecture.  But since we do, it seems like I'm a little bit 
hamstrung to use Solr's mechanisms anyway.  So, that's my scenario, comments 
welcome.  :)

 -dKt



  

Re: OOM on uninvert field request

2010-06-30 Thread Yonik Seeley
On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen rober...@buy.com wrote:
 Hello I am trying to find the right max and min settings for Java 1.6 on 20GB 
 index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am 
 currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m 
 -Xms4096m) for both min and max which is doing pretty well but occasionally 
 still getting the below OOM errors.  We're running on dual quad core xeons 
 with 16GB memory installed.  I've been getting the below OOM exceptions still 
 though.

 Is the memsize mentioned in the INFO for the uninvert in bytes? is 
 memSize=29604020 mean 29MB?

Yes.

 We have a few hundred of these fields and they contain ints used as IDs, and 
 so I guess could they eat all the memory to uninvert them all after we apply 
 load and enough queries are performed.  Does the field type matter, would int 
 be better than string if these are lookup ids sparsely populated across the 
 index?

No, using UnInvertedField faceting, the fieldType won't matter much at
all for the space it takes up.

The key here is that it looks like the number of unique terms in these
fields is low - you would probably do much better with
facet.method=enum (which iterates over terms rather than documents).

-Yonik
http://www.lucidimagination.com


Re: Unbuffered Exception while setting permissions

2010-06-30 Thread Lance Norskog
Other problems with this error have been solved by doing pre-emptive
authentication.

On Wed, Jun 30, 2010 at 4:26 AM, Rakhi Khatwani rkhatw...@gmail.com wrote:
 This error usually occurs when i do a server.add(inpDoc).

 Behind the logs:

 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET
 /solr/GPTWPI/update?qt=%2Fupdateoptimize=truewt=javabinversion=1
 HTTP/1.1 200 41

 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET
 /solr/GPTWPI/select?q=aid%3A30234wt=javabinversion=1 HTTP/1.1 401 1389

 192.168.0.106 - admin [30/Jun/2010:11:30:38 +] GET
 /solr/GPTWPI/select?q=aid%3A30234wt=javabinversion=1 HTTP/1.1 200 70

 192.168.0.106 - - [30/Jun/2010:11:30:38 +] POST
 /solr/GPTWPI/update?wt=javabinversion=1 HTTP/1.1 200 41 (Works when i
 comment out the auth-constraint for RW)

                                        AND

 192.168.0.106 - - [30/Jun/2010:11:29:09 +] POST
 /solr/GPTWPI/update?wt=javabinversion=1 HTTP/1.1 401 1389 (Does not work
 when i add the auth-constraint for RW)

 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET
 /solr/GPTWPI/update?qt=%2Fupdatecommit=truewt=javabinversion=1 HTTP/1.1
 200 41

 so what i conclude is that the authentication does not work when we do a
 POST method and works for GET methods. correct me if i am wrong.
 and how do i get it working?

 Regards,
 Raakhi

 On Wed, Jun 30, 2010 at 2:22 PM, Rakhi Khatwani rkhatw...@gmail.com wrote:

 I was going through the logs,
 Everytime i try doing an update (and ofcourse ending up with unbuffered
 exception) the log outputs the following line
 [30/Jun/2010:09:02:52 +] POST /solr/core1/update?wt=javabinversion=1
 HTTP/1.1 401 1389

 Regards
 Raakhi

 On Wed, Jun 30, 2010 at 12:27 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 PS: I am using solr 1.4

 Regards,
 Raakhi

 On Wed, Jun 30, 2010 at 12:05 PM, Rakhi Khatwani rkhatw...@gmail.comwrote:

 Hi,
        I am trying out solr security on my setup from the following
 links:
 http://wiki.apache.org/solr/SolrSecurity

 http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores

 Following is my configuration:

 realms.properties:
 admin: admin,server-administrator,content-administrator,admin
 other: OBF:1xmk1w261u9r1w1c1xmq
 guest: guest,read-only
 rakhi: rakhi,RW-role

 jetty.xml:
 ...
 Set name=UserRealms
 Array type=org.mortbay.jetty.security.UserRealm
 Item
 New class=org.mortbay.jetty.security.HashUserRealm
   Set name=nameTest Realm/Set
 Set name=configSystemProperty name=jetty.home
 default=.//etc/realm.properties/Set
 /New
 /Item
 /Array
 /Set

 ...

 WebDefault.xml:
 !-- block by default. --
 security-constraint
 web-resource-collection
 web-resource-nameDefault/web-resource-name
 url-pattern//url-pattern
 /web-resource-collection
 auth-constraint/ !-- BLOCK! --
 /security-constraint

 !-- Setting admin access. --
  security-constraint
    web-resource-collection
    web-resource-nameSolr authenticated application/web-resource-name
     url-pattern/admin/*/url-pattern
     url-pattern/core1/admin/*/url-pattern
    /web-resource-collection
    auth-constraint
     role-nameadmin/role-name
     role-nameFullAccess-role/role-name
    /auth-constraint
 /security-constraint

 !-- this constraint has no auth constraint or data constraint = allows
 without auth. --
 security-constraint
 web-resource-collection
 web-resource-nameAllowedQueries/web-resource-name
   url-pattern/core1/select/*/url-pattern
 /web-resource-collection
 /security-constraint

 login-config
 auth-methodBASIC/auth-method
   realm-nameTest Realm/realm-name
 /login-config
 security-role
 role-nameAdmin-role/role-name
 /security-role
 security-role
 role-nameFullAccess-role/role-name
 /security-role
 security-role
 role-nameRW-role/role-name
 /security-role


 So Far Everything works good. I get a forbidden exception as soon as i
 try to commit documents in solr.
 but when i add the following security constraint tag in webdefault.xml,

 !-- this constraint allows access to modify the data in the SOLR
 service, with basic auth --
 security-constraint
 web-resource-collection
 web-resource-nameRW/web-resource-name
 !-- the dataimport handler for each individual core --
   url-pattern/core1/dataimport/url-pattern
 !-- the update handler (XML over HTTP) for each individual core --
   url-pattern/core1/update/*/url-pattern
 /web-resource-collection
 auth-constraint
 !-- Roles of users are defined int the properties file --
 !-- we allow users with rw-only access --
 role-nameRW-role/role-name
 !-- we allow users with full access --
 role-nameFullAccess-role/role-name
 /auth-constraint
 /security-constraint

 I get the following exception:

 org.apache.solr.client.solrj.SolrServerException:
 org.apache.commons.httpclient.ProtocolException: Unbuffered entity 
 enclosing
 request can not be repeated.
 at
 org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469)
 at
 

Re: REST calls

2010-06-30 Thread Lance Norskog
The stream.file/stream.url/stream.body parameters allow a GET to alter
the index. The core management operations are also useable from GET.

This allows one to bookmark and mail around a link that changes or
blows up the index. Apparently this is not ReStFuL It is IMVHO insane.

On Wed, Jun 30, 2010 at 7:45 AM, Chantal Ackermann
chantal.ackerm...@btelligent.de wrote:
 On Wed, 2010-06-30 at 16:12 +0200, Yonik Seeley wrote:
 Solr's APIs are described as REST-like, and probably do qualify as
 restful the way the term is commonly used.

 I'm personally much more interested in making our APIs more powerful
 and easier to use, regardless of any REST purity tests.

 -Yonik
 http://www.lucidimagination.com


 Hi Yonik,

 yes, please - and thanks!

 I'm again and again positively surprised how efficient and yet simple
 SOLR's (and Lucene's) query and response language (incl. response
 formats) is. Some things seem complex/difficult at first (like dismax or
 function queries) but turn out to be simple/easy to use considering the
 complexity of the problems they solve.

 Chantal





-- 
Lance Norskog
goks...@gmail.com


Re: REST calls

2010-06-30 Thread Yonik Seeley
On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote:
  Apparently this is not ReStFuL It is IMVHO insane.

Patches welcome...

-Yonik
http://www.lucidimagination.com


Re: OOM on uninvert field request

2010-06-30 Thread Lance Norskog
On Wed, Jun 30, 2010 at 1:38 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen rober...@buy.com wrote:
 Hello I am trying to find the right max and min settings for Java 1.6 on 
 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am 
 currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m 
 -Xms4096m) for both min and max which is doing pretty well but occasionally 
 still getting the below OOM errors.  We're running on dual quad core xeons 
 with 16GB memory installed.  I've been getting the below OOM exceptions 
 still though.

 Is the memsize mentioned in the INFO for the uninvert in bytes? is 
 memSize=29604020 mean 29MB?

 Yes.

 We have a few hundred of these fields and they contain ints used as IDs, and 
 so I guess could they eat all the memory to uninvert them all after we apply 
 load and enough queries are performed.  Does the field type matter, would 
 int be better than string if these are lookup ids sparsely populated across 
 the index?

 No, using UnInvertedField faceting, the fieldType won't matter much at
 all for the space it takes up.

 The key here is that it looks like the number of unique terms in these
 fields is low - you would probably do much better with
 facet.method=enum (which iterates over terms rather than documents).

 -Yonik
 http://www.lucidimagination.com




-- 
Lance Norskog
goks...@gmail.com


Re: REST calls

2010-06-30 Thread Lance Norskog
I've looked at the problem. It's fairly involved. It probably would
take several iterations. (But not as many as field collapsing :)

On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
 On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote:
  Apparently this is not ReStFuL It is IMVHO insane.

 Patches welcome...

 -Yonik
 http://www.lucidimagination.com




-- 
Lance Norskog
goks...@gmail.com


tomcat solr logs

2010-06-30 Thread Robert Petersen
Sorry if this is at all off topic.  Our solr log files need grooming and we 
would also like to analyze them, perhaps pulling various data points into a DB 
table, is there a preferred app for doing log file analysis and/or an easy way 
to delete the old log files?


Re: REST calls

2010-06-30 Thread Ryan McKinley
If there is a real desire/need to make things restful in the
official sense, it is worth looking at using a REST framework as the
controller rather then the current solution.  perhaps:

http://www.restlet.org/
https://jersey.dev.java.net/

These would be cool since they encapsulate lots of the request
plumbing work that it would be better if we could leverage more widely
used approaches then support our own.

That said, what we have is functional and powerful -- if you are
concerned about people editing the index (with GET/POST or whatever)
there are plenty of ways to solve this.

ryan


On Wed, Jun 30, 2010 at 5:31 PM, Lance Norskog goks...@gmail.com wrote:
 I've looked at the problem. It's fairly involved. It probably would
 take several iterations. (But not as many as field collapsing :)

 On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote:
  Apparently this is not ReStFuL It is IMVHO insane.

 Patches welcome...

 -Yonik
 http://www.lucidimagination.com




 --
 Lance Norskog
 goks...@gmail.com



RE: OOM on uninvert field request

2010-06-30 Thread Robert Petersen
Most of these hundreds of facet fields have tens of values but a couple have 
thousands, is thousands of different values too many to do enum or is that 
still ok?  If so I could apply it carte blanche to the whole field...

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Wednesday, June 30, 2010 1:38 PM
To: solr-user@lucene.apache.org
Subject: Re: OOM on uninvert field request

On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen rober...@buy.com wrote:
 Hello I am trying to find the right max and min settings for Java 1.6 on 20GB 
 index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am 
 currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m 
 -Xms4096m) for both min and max which is doing pretty well but occasionally 
 still getting the below OOM errors.  We're running on dual quad core xeons 
 with 16GB memory installed.  I've been getting the below OOM exceptions still 
 though.

 Is the memsize mentioned in the INFO for the uninvert in bytes? is 
 memSize=29604020 mean 29MB?

Yes.

 We have a few hundred of these fields and they contain ints used as IDs, and 
 so I guess could they eat all the memory to uninvert them all after we apply 
 load and enough queries are performed.  Does the field type matter, would int 
 be better than string if these are lookup ids sparsely populated across the 
 index?

No, using UnInvertedField faceting, the fieldType won't matter much at
all for the space it takes up.

The key here is that it looks like the number of unique terms in these
fields is low - you would probably do much better with
facet.method=enum (which iterates over terms rather than documents).

-Yonik
http://www.lucidimagination.com


Re: Very basic questions: Faceted front-end?

2010-06-30 Thread Peter Spam
Wow, thanks Lance - it's really fast now!

The last piece of the puzzle is setting up a nice front-end.  Are there any 
pre-built front-ends available, that mimic Google (for example), with facets?


-Peter

On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:

 To highlight a field, Solr needs some extra Lucene values. If these
 are not configured for the field in the schema, Solr has to re-analyze
 the field to highlight it. If you want faster highlighting, you have
 to add term vectors to the schema. Here is the grand map of such
 things:
 
 http://wiki.apache.org/solr/FieldOptionsByUseCase
 
 On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 What are you actual highlighting requirements? you could try
 things like maxAnalyzedChars, requireFieldMatch, etc
 
 http://wiki.apache.org/solr/HighlightingParameters
 has a good list, but you've probably already seen that page
 
 Best
 Erick
 
 On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam ps...@mac.com wrote:
 
 To follow up, I've found that my queries are very fast (even with fq=),
 until I add hl=true.  What can I do to speed up highlighting?  Should I
 consider injecting a line at a time, rather than the entire file as a field?
 
 
 -Pete
 
 On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
 
 Thanks for everyone's help - I have this working now, but sometimes the
 queries are incredibly slow!!  For example, int name=QTime461360/int.
  Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
 inject without throwing heap memory errors.  However, my data set is very
 small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
 for now, this is a test).  The largest file is 34MB.
 
 Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
 
 
 ---
 
 For the schema.xml, types is all default.  For fields, here are the
 only lines that aren't commented out:
 
   field name=id type=string indexed=true stored=true
 required=true /
   field name=body type=text indexed=true stored=true
 multiValued=true/
   field name=timestamp type=date indexed=true stored=true
 default=NOW multiValued=false/
   field name=build type=string indexed=true stored=true
 multiValued=false/
   field name=device type=string indexed=true stored=true
 multiValued=false/
   dynamicField name=* type=ignored multiValued=true /
 
 ... then, for the rest:
 
 uniqueKeyid/uniqueKey
 
 !-- field for the QueryParser to use when an explicit fieldname is
 absent --
 defaultSearchFieldbody/defaultSearchField
 
 !-- SolrQueryParser configuration: defaultOperator=AND|OR --
 solrQueryParser defaultOperator=AND/
 
 
 
 ---
 
 
 Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
 
 
 
 ---
 
 
 Injecting:
 
 #!/bin/sh
 
 J=0
 for i in `find . -name \*.txt`; do
   (( J++ ))
   curl 
 http://localhost:8983/solr/update/extract?literal.id=doc$Jfmap.content=body;
 -F myfi...@$i;
 done;
 
 
 echo - Committing
 curl http://localhost:8983/solr/update/extract?commit=true;
 
 
 
 ---
 
 
 Searching:
 
 
 http://localhost:8983/solr/select?q=testinghl=truefl=id,scorehl.snippets=5hl.mergeContiguous=true
 
 
 
 
 
 -Pete
 
 On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
 
 try adding hl.fl=text
 to specify your highlight field. I don't understand why you're only
 getting the ID field back though. Do note that the highlighting
 is after the docs, related by the ID.
 
 Try a (non highlighting) query of just * to verify that you're
 pointing at the index you think you are. It's possible that
 you've modified a different index with SolrJ than your web
 server is pointing at.
 
 Also, SOLR has no way of knowing you're modified your index
 with SolrJ, so it may not be automatically reopening an
 IndexReader so your recent changes may not be visible
 until you force the SOLR reader to reopen.
 
 HTH
 Erick
 
 On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam ps...@mac.com wrote:
 
 On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
 
 1) I can get my docs in the index, but when I search, it
 returns the entire document.  I'd love to have it only
 return the line (or two) around the search term.
 
 Solr can generate Google-like snippets as you describe.
 http://wiki.apache.org/solr/HighlightingParameters
 
 Here's how I commit my documents:
 
 J=0;
 for i in `find . -name \*.txt`; do
  (( J++ ))
  curl http://localhost:8983/solr/update/extract?literal.id=doc$J;
 -F myfi...@$i;
 done;
 
 echo - Committing
 curl http://localhost:8983/solr/update/extract?commit=true;
 
 
 Then, I try to query using
 
 

Disk usage per-field

2010-06-30 Thread Shawn Heisey
Is it possible for Solr (or Luke/Lucene) to tell me exactly how much of 
the total index disk space is used by each field?  It would also be very 
nice to know, for each field, how much is used by the index and how much 
is used for stored data.




RE: REST calls

2010-06-30 Thread Jason Chaffee
Using Accept headers is a pretty standard practice and so are conditional GETs.

Quite easy to test with curl:

 curl  -X GET -H Accept:application/xml http://solr.com/search

curl  -X GET -H  Accept:application/json http://solr.com/search


Jason

-Original Message-
From: Don Werve [mailto:d...@madwombat.com] 
Sent: Tuesday, June 29, 2010 9:40 PM
To: solr-user@lucene.apache.org
Subject: Re: REST calls

2010/6/27 Jason Chaffee jchaf...@ebates.com

 The solr docs say it is RESTful, yet it seems that it doesn't use http
 headers in a RESTful way.  For example, it doesn't seem to use the Accept:
 request header to determine the media-type to be returned.  Instead, it
 requires a query parameter to be used in the URL.  Also, it doesn't seem to
 use return 304 Not Modified if the request header if-modified-since is
 used.


The summary:

Solr is restful, and does a very good job of it.

The long version:

There is no official 'REST' standard that dictates the behavior of the
implementation; rather, REST is a set of guidelines on building APIs that
are both discoverable and easily usable without having to resort to
third-party libraries.

Generally speaking, an application is RESTful if it provides an API that
accepts arguments passed as HTTP form variables, returns results in an open
format (XML, JSON, YAML, etc.), and respects certain semantics relating to
HTTP verbs; e.g., GET/HEAD return the resource without modification, DELETEs
are destructive, POST creates a resource, PUT alters it.

Solr meets all of these requirements.

The nature of the result format, and how to change it, is entirely up to the
implementer.  A common convention is to use a filename extension (.json,
.xml) appended to the URL.  It's less common to specify the request format
as part of the query parameters (like Solr does), but not unheard of.  And,
to be honest, this is actually the first time I've heard of using the
'Accept' header to change the result format, as it makes it a lot harder to
use a web browser, or command-line tools like curl or wget, to debug your
API.


RE: REST calls

2010-06-30 Thread Jason Chaffee
In that case, being able to use Accept headers and conditional GET's
would make them more powerful and easier to use.  The Accept header
could be used, if present, otherwise use the query parameter.  Or, vice
versa.  Also, conditional GET's are a big win when you know the data and
results are not changing often.

Jason

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Wednesday, June 30, 2010 7:12 AM
To: solr-user@lucene.apache.org
Subject: Re: REST calls

Solr's APIs are described as REST-like, and probably do qualify as
restful the way the term is commonly used.

I'm personally much more interested in making our APIs more powerful
and easier to use, regardless of any REST purity tests.

-Yonik
http://www.lucidimagination.com


RE: REST calls

2010-06-30 Thread Jason Chaffee
Two more jaxrs solutions:

http://www.jboss.org/resteasy

http://cxf.apache.org/docs/jax-rs.html


However, I am not suggesting changing the core implementation.  Just want to 
make it more powerful by utilizing headers.  I can accept the other issues that 
have been mentioned as not RESTful.  

Also, I do plan to make patches for the issues I mentioned.  I just wanted to 
know if I was missing anything or someone else already had contributed an 
extension.

Jason

-Original Message-
From: Ryan McKinley [mailto:ryan...@gmail.com] 
Sent: Wednesday, June 30, 2010 3:07 PM
To: solr-user@lucene.apache.org
Subject: Re: REST calls

If there is a real desire/need to make things restful in the
official sense, it is worth looking at using a REST framework as the
controller rather then the current solution.  perhaps:

http://www.restlet.org/
https://jersey.dev.java.net/

These would be cool since they encapsulate lots of the request
plumbing work that it would be better if we could leverage more widely
used approaches then support our own.

That said, what we have is functional and powerful -- if you are
concerned about people editing the index (with GET/POST or whatever)
there are plenty of ways to solve this.

ryan


On Wed, Jun 30, 2010 at 5:31 PM, Lance Norskog goks...@gmail.com wrote:
 I've looked at the problem. It's fairly involved. It probably would
 take several iterations. (But not as many as field collapsing :)

 On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley
 yo...@lucidimagination.com wrote:
 On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote:
  Apparently this is not ReStFuL It is IMVHO insane.

 Patches welcome...

 -Yonik
 http://www.lucidimagination.com




 --
 Lance Norskog
 goks...@gmail.com



Re: Very basic questions: Faceted front-end?

2010-06-30 Thread Peter Spam
Ah, I found this:

https://issues.apache.org/jira/browse/SOLR-634

... aka solr-ui.  Is there anything else along these lines?  Thanks!


-Peter

On Jun 30, 2010, at 3:59 PM, Peter Spam wrote:

 Wow, thanks Lance - it's really fast now!
 
 The last piece of the puzzle is setting up a nice front-end.  Are there any 
 pre-built front-ends available, that mimic Google (for example), with facets?
 
 
 -Peter
 
 On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote:
 
 To highlight a field, Solr needs some extra Lucene values. If these
 are not configured for the field in the schema, Solr has to re-analyze
 the field to highlight it. If you want faster highlighting, you have
 to add term vectors to the schema. Here is the grand map of such
 things:
 
 http://wiki.apache.org/solr/FieldOptionsByUseCase
 
 On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson erickerick...@gmail.com 
 wrote:
 What are you actual highlighting requirements? you could try
 things like maxAnalyzedChars, requireFieldMatch, etc
 
 http://wiki.apache.org/solr/HighlightingParameters
 has a good list, but you've probably already seen that page
 
 Best
 Erick
 
 On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam ps...@mac.com wrote:
 
 To follow up, I've found that my queries are very fast (even with fq=),
 until I add hl=true.  What can I do to speed up highlighting?  Should I
 consider injecting a line at a time, rather than the entire file as a 
 field?
 
 
 -Pete
 
 On Jun 29, 2010, at 11:07 AM, Peter Spam wrote:
 
 Thanks for everyone's help - I have this working now, but sometimes the
 queries are incredibly slow!!  For example, int name=QTime461360/int.
 Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to
 inject without throwing heap memory errors.  However, my data set is very
 small!  36 text files, for a total of 113MB.  (It will grow to many TB, but
 for now, this is a test).  The largest file is 34MB.
 
 Therefore, I'm sure I'm doing something wrong :-)  Here's my config:
 
 
 ---
 
 For the schema.xml, types is all default.  For fields, here are the
 only lines that aren't commented out:
 
  field name=id type=string indexed=true stored=true
 required=true /
  field name=body type=text indexed=true stored=true
 multiValued=true/
  field name=timestamp type=date indexed=true stored=true
 default=NOW multiValued=false/
  field name=build type=string indexed=true stored=true
 multiValued=false/
  field name=device type=string indexed=true stored=true
 multiValued=false/
  dynamicField name=* type=ignored multiValued=true /
 
 ... then, for the rest:
 
 uniqueKeyid/uniqueKey
 
 !-- field for the QueryParser to use when an explicit fieldname is
 absent --
 defaultSearchFieldbody/defaultSearchField
 
 !-- SolrQueryParser configuration: defaultOperator=AND|OR --
 solrQueryParser defaultOperator=AND/
 
 
 
 ---
 
 
 Invoking:  java -Xmx3584M -Xms1024M -jar start.jar
 
 
 
 ---
 
 
 Injecting:
 
 #!/bin/sh
 
 J=0
 for i in `find . -name \*.txt`; do
  (( J++ ))
  curl 
 http://localhost:8983/solr/update/extract?literal.id=doc$Jfmap.content=body;
 -F myfi...@$i;
 done;
 
 
 echo - Committing
 curl http://localhost:8983/solr/update/extract?commit=true;
 
 
 
 ---
 
 
 Searching:
 
 
 http://localhost:8983/solr/select?q=testinghl=truefl=id,scorehl.snippets=5hl.mergeContiguous=true
 
 
 
 
 
 -Pete
 
 On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote:
 
 try adding hl.fl=text
 to specify your highlight field. I don't understand why you're only
 getting the ID field back though. Do note that the highlighting
 is after the docs, related by the ID.
 
 Try a (non highlighting) query of just * to verify that you're
 pointing at the index you think you are. It's possible that
 you've modified a different index with SolrJ than your web
 server is pointing at.
 
 Also, SOLR has no way of knowing you're modified your index
 with SolrJ, so it may not be automatically reopening an
 IndexReader so your recent changes may not be visible
 until you force the SOLR reader to reopen.
 
 HTH
 Erick
 
 On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam ps...@mac.com wrote:
 
 On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote:
 
 1) I can get my docs in the index, but when I search, it
 returns the entire document.  I'd love to have it only
 return the line (or two) around the search term.
 
 Solr can generate Google-like snippets as you describe.
 http://wiki.apache.org/solr/HighlightingParameters
 
 Here's how I commit my documents:
 
 J=0;
 for i in `find . -name \*.txt`; do
 (( J++ ))
 curl http://localhost:8983/solr/update/extract?literal.id=doc$J;
 -F 

Re: OOM on uninvert field request

2010-06-30 Thread Yonik Seeley
On Wed, Jun 30, 2010 at 6:19 PM, Robert Petersen rober...@buy.com wrote:
 Most of these hundreds of facet fields have tens of values but a couple have 
 thousands, is thousands of different values too many to do enum or is that 
 still ok?  If so I could apply it carte blanche to the whole field...

enum can still handle thousands, but often slower (and remember to
increase the size of your filterCache which will now see greater
usage).

I would do facet.method=enum for the default and then override that
for those few fields with thousands of unique terms via
f.123_contentAttributeToken.facet.method=fc

-Yonik
http://www.lucidimagination.com

 -Original Message-
 From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
 Sent: Wednesday, June 30, 2010 1:38 PM
 To: solr-user@lucene.apache.org
 Subject: Re: OOM on uninvert field request

 On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen rober...@buy.com wrote:
 Hello I am trying to find the right max and min settings for Java 1.6 on 
 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am 
 currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m 
 -Xms4096m) for both min and max which is doing pretty well but occasionally 
 still getting the below OOM errors.  We're running on dual quad core xeons 
 with 16GB memory installed.  I've been getting the below OOM exceptions 
 still though.

 Is the memsize mentioned in the INFO for the uninvert in bytes? is 
 memSize=29604020 mean 29MB?

 Yes.

 We have a few hundred of these fields and they contain ints used as IDs, and 
 so I guess could they eat all the memory to uninvert them all after we apply 
 load and enough queries are performed.  Does the field type matter, would 
 int be better than string if these are lookup ids sparsely populated across 
 the index?

 No, using UnInvertedField faceting, the fieldType won't matter much at
 all for the space it takes up.

 The key here is that it looks like the number of unique terms in these
 fields is low - you would probably do much better with
 facet.method=enum (which iterates over terms rather than documents).

 -Yonik
 http://www.lucidimagination.com



Re: Wiki Documentation of facet.sort

2010-06-30 Thread Koji Sekiguchi

(10/07/01 1:12), Chantal Ackermann wrote:

Hi there,

in the wiki, on http://wiki.apache.org/solr/SimpleFacetParameters
it says:


The default is true/count if facet.limit is greater than 0, false/index
otherwise.


I've just migrated to 1.4.1 (reindexed). I can't remember how it was
with 1.4.0.

When I specify my facet query with facet.mincount=0 (explicitely) or
without mincount (default is 0), the resulting facets are sorted by
count, nevertheless. Changing mincount from 0 to 1 and back actually
makes not difference in the sorting.
I'm fine with a constant default behaviour (always sorting by count,
e.g., no matter what parameters are given).
If this is intended - shall I change the wiki accordingly?

Cheers,
Chantal
   

Chantal,

Wiki says facet.limit but you are changing facet.mincount?
:)

Koji

--
http://www.rondhuit.com/en/



Re: REST calls

2010-06-30 Thread Erik Hatcher

Solr has 304 support with the last-modified and etag headers.

Erik

On Jun 30, 2010, at 7:52 PM, Jason Chaffee wrote:


In that case, being able to use Accept headers and conditional GET's
would make them more powerful and easier to use.  The Accept header
could be used, if present, otherwise use the query parameter.  Or,  
vice
versa.  Also, conditional GET's are a big win when you know the data  
and

results are not changing often.

Jason

-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik
Seeley
Sent: Wednesday, June 30, 2010 7:12 AM
To: solr-user@lucene.apache.org
Subject: Re: REST calls

Solr's APIs are described as REST-like, and probably do qualify as
restful the way the term is commonly used.

I'm personally much more interested in making our APIs more powerful
and easier to use, regardless of any REST purity tests.

-Yonik
http://www.lucidimagination.com




Dilemma - Very Frequent Synonym updates for Huge Index

2010-06-30 Thread Ravi Kiran
Hello,
Hoping some solr guru can help me out here. We are a news
organization trying to migrate 10 million documents from FAST to solr. The
plan is to have our Editorial team add/modify synonyms multiple times during
a day as they deem appropriate. Hence we plan on using query time synonyms
as we cannot reindex every time they modify the synonyms file(for the
entities extracted by OpenNLP like locations/organizations/person names from
article body) . Since the synonyms are for names Iam concerned that the
multi-phrase issue crops up with the query-time synonyms. for example
synonyms could be as follows

The Washington Post Co., The Washington Post, Washington Post, The Post,
TWP, WAPO
DHS,D.H.S,D.H.S.,Department of Homeland Security,Homeland Security
USCIS, United States Citizenship and Immigration Services, U.S.C.I.S.

Barack Obama,Barack H. Obama,Barack Hussein Obama,President Obama
Hillary Clinton,Hillary R. Clinton,Hillary Rodham Clinton,Secretary
Clinton,Sen. Clinton
William J. Clinton,William Jefferson Clinton,President Clinton,President
Bill Clinton

Virginia, Va., VA
D.C,Washington D.C, District of Columbia

I have the following fieldType in schema.xml for the keywords/entites...What
issues should I be aware off ? And is there a better way to achieve it
without having to reindex a million docs on each synonym change. NOTE that I
use tokenizerFactory=solr.KeywordTokenizerFactory for the
SynonymFilterFactory to keep the words intact without splitting

!--  Field Type Keywords/Entities Extracted from OpenNLP --
fieldType name=keywordText class=solr.TextField
sortMissingLast=true omitNorms=true positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt,entity-stopwords.txt enablePositionIncrements=true/

filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.TrimFilterFactory /
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt,entity-stopwords.txt enablePositionIncrements=true
/
filter class=solr.SynonymFilterFactory
tokenizerFactory=solr.KeywordTokenizerFactory
synonyms=person-synonyms.txt,organization-synonyms.txt,location-synonyms.txt,subject-synonyms.txt
ignoreCase=true expand=true /
filter class=solr.RemoveDuplicatesTokenFilterFactory/
  /analyzer
/fieldType