Unbuffered Exception while setting permissions
Hi, I am trying out solr security on my setup from the following links: http://wiki.apache.org/solr/SolrSecurity http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores Following is my configuration: realms.properties: admin: admin,server-administrator,content-administrator,admin other: OBF:1xmk1w261u9r1w1c1xmq guest: guest,read-only rakhi: rakhi,RW-role jetty.xml: ... Set name=UserRealms Array type=org.mortbay.jetty.security.UserRealm Item New class=org.mortbay.jetty.security.HashUserRealm Set name=nameTest Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set /New /Item /Array /Set ... WebDefault.xml: !-- block by default. -- security-constraint web-resource-collection web-resource-nameDefault/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint/ !-- BLOCK! -- /security-constraint !-- Setting admin access. -- security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern/admin/*/url-pattern url-pattern/core1/admin/*/url-pattern /web-resource-collection auth-constraint role-nameadmin/role-name role-nameFullAccess-role/role-name /auth-constraint /security-constraint !-- this constraint has no auth constraint or data constraint = allows without auth. -- security-constraint web-resource-collection web-resource-nameAllowedQueries/web-resource-name url-pattern/core1/select/*/url-pattern /web-resource-collection /security-constraint login-config auth-methodBASIC/auth-method realm-nameTest Realm/realm-name /login-config security-role role-nameAdmin-role/role-name /security-role security-role role-nameFullAccess-role/role-name /security-role security-role role-nameRW-role/role-name /security-role So Far Everything works good. I get a forbidden exception as soon as i try to commit documents in solr. but when i add the following security constraint tag in webdefault.xml, !-- this constraint allows access to modify the data in the SOLR service, with basic auth -- security-constraint web-resource-collection web-resource-nameRW/web-resource-name !-- the dataimport handler for each individual core -- url-pattern/core1/dataimport/url-pattern !-- the update handler (XML over HTTP) for each individual core -- url-pattern/core1/update/*/url-pattern /web-resource-collection auth-constraint !-- Roles of users are defined int the properties file -- !-- we allow users with rw-only access -- role-nameRW-role/role-name !-- we allow users with full access -- role-nameFullAccess-role/role-name /auth-constraint /security-constraint I get the following exception: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) at Authentication.AuthenticationTest.main(AuthenticationTest.java:35) Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416) ... 4 more My Java code is as follows: public class AuthenticationTest { public static void main(String[] args) { try { HttpClient client = new HttpClient(); AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT); client.getState().setCredentials(scope, new UsernamePasswordCredentials(rakhi,rakhi)); SolrServer server = new CommonsHttpSolrServer( http://localhost:8983/solr/core1/,client); SolrQuery query = new SolrQuery(); query.setQuery(*:*); QueryResponse response = server.query(query); System.out.println(response.getStatus()); SolrInputDocument doc = new SolrInputDocument(); doc.setField(aid, 0); doc.setField(rct, Sample Data for authentication); server.add(doc); server.commit(); } catch (MalformedURLException e) { // TODO Auto-generated catch block e.printStackTrace(); } catch (SolrServerException e) { //
Re: Unbuffered Exception while setting permissions
PS: I am using solr 1.4 Regards, Raakhi On Wed, Jun 30, 2010 at 12:05 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: Hi, I am trying out solr security on my setup from the following links: http://wiki.apache.org/solr/SolrSecurity http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores Following is my configuration: realms.properties: admin: admin,server-administrator,content-administrator,admin other: OBF:1xmk1w261u9r1w1c1xmq guest: guest,read-only rakhi: rakhi,RW-role jetty.xml: ... Set name=UserRealms Array type=org.mortbay.jetty.security.UserRealm Item New class=org.mortbay.jetty.security.HashUserRealm Set name=nameTest Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set /New /Item /Array /Set ... WebDefault.xml: !-- block by default. -- security-constraint web-resource-collection web-resource-nameDefault/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint/ !-- BLOCK! -- /security-constraint !-- Setting admin access. -- security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern/admin/*/url-pattern url-pattern/core1/admin/*/url-pattern /web-resource-collection auth-constraint role-nameadmin/role-name role-nameFullAccess-role/role-name /auth-constraint /security-constraint !-- this constraint has no auth constraint or data constraint = allows without auth. -- security-constraint web-resource-collection web-resource-nameAllowedQueries/web-resource-name url-pattern/core1/select/*/url-pattern /web-resource-collection /security-constraint login-config auth-methodBASIC/auth-method realm-nameTest Realm/realm-name /login-config security-role role-nameAdmin-role/role-name /security-role security-role role-nameFullAccess-role/role-name /security-role security-role role-nameRW-role/role-name /security-role So Far Everything works good. I get a forbidden exception as soon as i try to commit documents in solr. but when i add the following security constraint tag in webdefault.xml, !-- this constraint allows access to modify the data in the SOLR service, with basic auth -- security-constraint web-resource-collection web-resource-nameRW/web-resource-name !-- the dataimport handler for each individual core -- url-pattern/core1/dataimport/url-pattern !-- the update handler (XML over HTTP) for each individual core -- url-pattern/core1/update/*/url-pattern /web-resource-collection auth-constraint !-- Roles of users are defined int the properties file -- !-- we allow users with rw-only access -- role-nameRW-role/role-name !-- we allow users with full access -- role-nameFullAccess-role/role-name /auth-constraint /security-constraint I get the following exception: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) at Authentication.AuthenticationTest.main(AuthenticationTest.java:35) Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416) ... 4 more My Java code is as follows: public class AuthenticationTest { public static void main(String[] args) { try { HttpClient client = new HttpClient(); AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT); client.getState().setCredentials(scope, new UsernamePasswordCredentials(rakhi,rakhi)); SolrServer server = new CommonsHttpSolrServer( http://localhost:8983/solr/core1/,client); SolrQuery query = new SolrQuery(); query.setQuery(*:*); QueryResponse response = server.query(query); System.out.println(response.getStatus()); SolrInputDocument doc = new
Strange problem when use dismax handler
Hi. All. I am using default dismax to search within solr. The problem is when I search I want to specify the type to restrict the result. Here is what I do: 1. Query String with one type (Works!) :design)) AND ((type:product) ))) 2. Query String with 2 types (Works!) :design)) AND ((type:product) OR (type:member 3. Query String with 3 types (Works!) :design)) AND ((type:product) OR (type:member) OR (type:forum) ))) 4. Query string with more than 3 types (doesn't work!) :design)) AND ((type:product) OR (type:member) OR (type:forum) OR (type:stamp) OR (type:answer) OR (type:page Nothing was returned. Don't know why. I think this should be caused by dismax setting, probably mm (Minimum 'Should' Match) But I have no idea how to configure it? Please help. Thanks! Regards. Scott
Re: Strange problem when use dismax handler
I use debugQuery to check my query url: I notice the query url is parsed incorrectly. The type:book was parsed as query string too. Sign~~~ +((+DisjunctionMaxQuery((keyword_level1:design^10.0 | keyword_level2:design)~0.01) DisjunctionMaxQuery((keyword_level1:type^10.0 | keyword_level2:type)~0.01) DisjunctionMaxQuery((keyword_level1:product^10.0 | keyword_level2:product)~0.01) DisjunctionMaxQuery((keyword_level1:type^10.0 | keyword_level2:type)~0.01) DisjunctionMaxQuery((keyword_level1:book^10.0 | keyword_level2:book)~0.01))~3) () Then the question is how to specify the query filed , as in my case, type = book besides use the design to search over 2 fields ? Thanks On Wed, Jun 30, 2010 at 4:08 PM, Scott Zhang macromars...@gmail.com wrote: Hi. All. I am using default dismax to search within solr. The problem is when I search I want to specify the type to restrict the result. Here is what I do: 1. Query String with one type (Works!) :design)) AND ((type:product) ))) 2. Query String with 2 types (Works!) :design)) AND ((type:product) OR (type:member 3. Query String with 3 types (Works!) :design)) AND ((type:product) OR (type:member) OR (type:forum) ))) 4. Query string with more than 3 types (doesn't work!) :design)) AND ((type:product) OR (type:member) OR (type:forum) OR (type:stamp) OR (type:answer) OR (type:page Nothing was returned. Don't know why. I think this should be caused by dismax setting, probably mm (Minimum 'Should' Match) But I have no idea how to configure it? Please help. Thanks! Regards. Scott
Re: [ANN] Solr 1.4.1 Released
Created issue https://issues.apache.org/jira/browse/SOLR-1977 for this. Regards, Stevo. On Sun, Jun 27, 2010 at 2:54 AM, Ken Krugler kkrugler_li...@transpac.comwrote: On Jun 26, 2010, at 5:18pm, Jason Chaffee wrote: It appears the 1.4.1 version was deployed with a new maven groupId For eample, if you are trying to download solr-core, here are the differences between 1.4.0 and 1.4.1. 1.4.0 groupId: org.apache.solr artifactId: solr-core 1.4.1 groupId: org.apache.solr.solr artifactId:solr-core Was this change intentional or a mistake? If it was a mistake, can someone please fix it in maven's central repository? I believe it was a mistake. From a recent email thread on this list, Mark Miller said: Can a solr/maven dude look at this? I simply used the copy command on the release to-do wiki (sounds like it should be updated). If no one steps up, I'll try and straighten it out later. On 6/25/10 10:28 AM, Stevo Slavić wrote: Congrats on the release! Something seems to be wrong with solr 1.4.1 maven artifacts, there is in extra solr in the path. E.g. solr-parent-1.4.1.pom at in http://repo1.maven.org/maven2/org/apache/solr/solr/solr-parent/1.4.1/solr-parent-1.4.1.pomwhile it should be at http://repo1.maven.org/maven2/org/apache/solr/solr-parent/1.4.1/solr-parent-1.4.1.pom . Pom's seem to contain correct maven artifact coordinates. Regards, Stevo. -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Strange problem when use dismax handler
Well. I figured it out. I should use fq parameter. fq=type:music type:movie type:product On Wed, Jun 30, 2010 at 4:15 PM, Scott Zhang macromars...@gmail.com wrote: I use debugQuery to check my query url: I notice the query url is parsed incorrectly. The type:book was parsed as query string too. Sign~~~ +((+DisjunctionMaxQuery((keyword_level1:design^10.0 | keyword_level2:design)~0.01) DisjunctionMaxQuery((keyword_level1:type^10.0 | keyword_level2:type)~0.01) DisjunctionMaxQuery((keyword_level1:product^10.0 | keyword_level2:product)~0.01) DisjunctionMaxQuery((keyword_level1:type^10.0 | keyword_level2:type)~0.01) DisjunctionMaxQuery((keyword_level1:book^10.0 | keyword_level2:book)~0.01))~3) () Then the question is how to specify the query filed , as in my case, type = book besides use the design to search over 2 fields ? Thanks On Wed, Jun 30, 2010 at 4:08 PM, Scott Zhang macromars...@gmail.comwrote: Hi. All. I am using default dismax to search within solr. The problem is when I search I want to specify the type to restrict the result. Here is what I do: 1. Query String with one type (Works!) :design)) AND ((type:product) ))) 2. Query String with 2 types (Works!) :design)) AND ((type:product) OR (type:member 3. Query String with 3 types (Works!) :design)) AND ((type:product) OR (type:member) OR (type:forum) ))) 4. Query string with more than 3 types (doesn't work!) :design)) AND ((type:product) OR (type:member) OR (type:forum) OR (type:stamp) OR (type:answer) OR (type:page Nothing was returned. Don't know why. I think this should be caused by dismax setting, probably mm (Minimum 'Should' Match) But I have no idea how to configure it? Please help. Thanks! Regards. Scott
Re: Unbuffered Exception while setting permissions
I was going through the logs, Everytime i try doing an update (and ofcourse ending up with unbuffered exception) the log outputs the following line [30/Jun/2010:09:02:52 +] POST /solr/core1/update?wt=javabinversion=1 HTTP/1.1 401 1389 Regards Raakhi On Wed, Jun 30, 2010 at 12:27 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: PS: I am using solr 1.4 Regards, Raakhi On Wed, Jun 30, 2010 at 12:05 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: Hi, I am trying out solr security on my setup from the following links: http://wiki.apache.org/solr/SolrSecurity http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores Following is my configuration: realms.properties: admin: admin,server-administrator,content-administrator,admin other: OBF:1xmk1w261u9r1w1c1xmq guest: guest,read-only rakhi: rakhi,RW-role jetty.xml: ... Set name=UserRealms Array type=org.mortbay.jetty.security.UserRealm Item New class=org.mortbay.jetty.security.HashUserRealm Set name=nameTest Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set /New /Item /Array /Set ... WebDefault.xml: !-- block by default. -- security-constraint web-resource-collection web-resource-nameDefault/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint/ !-- BLOCK! -- /security-constraint !-- Setting admin access. -- security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern/admin/*/url-pattern url-pattern/core1/admin/*/url-pattern /web-resource-collection auth-constraint role-nameadmin/role-name role-nameFullAccess-role/role-name /auth-constraint /security-constraint !-- this constraint has no auth constraint or data constraint = allows without auth. -- security-constraint web-resource-collection web-resource-nameAllowedQueries/web-resource-name url-pattern/core1/select/*/url-pattern /web-resource-collection /security-constraint login-config auth-methodBASIC/auth-method realm-nameTest Realm/realm-name /login-config security-role role-nameAdmin-role/role-name /security-role security-role role-nameFullAccess-role/role-name /security-role security-role role-nameRW-role/role-name /security-role So Far Everything works good. I get a forbidden exception as soon as i try to commit documents in solr. but when i add the following security constraint tag in webdefault.xml, !-- this constraint allows access to modify the data in the SOLR service, with basic auth -- security-constraint web-resource-collection web-resource-nameRW/web-resource-name !-- the dataimport handler for each individual core -- url-pattern/core1/dataimport/url-pattern !-- the update handler (XML over HTTP) for each individual core -- url-pattern/core1/update/*/url-pattern /web-resource-collection auth-constraint !-- Roles of users are defined int the properties file -- !-- we allow users with rw-only access -- role-nameRW-role/role-name !-- we allow users with full access -- role-nameFullAccess-role/role-name /auth-constraint /security-constraint I get the following exception: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:64) at Authentication.AuthenticationTest.main(AuthenticationTest.java:35) Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487) at org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114) at org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096) at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398) at org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397) at org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:416) ... 4 more My Java code is as follows: public class AuthenticationTest { public static void main(String[] args) { try { HttpClient client = new HttpClient(); AuthScope scope = new AuthScope(AuthScope.ANY_HOST,AuthScope.ANY_PORT);
Re: problem with formulating a negative query
Hi Erick, thanks for your explanations. But why are all docs being *removed* from the set of all docs that contain R in their topic field? This would correspond to a boolean AND and would stand in conflict with the clause q.op=OR. This seems a bit strange to me. Furthermore, Smiley Pugh stated in their Solr 1.4 book on pg. 102 that adding the a subexpression containing the negative query (-[* TO *]) and the match-all-docs clause (*:*) is only a workaround. Why is this workaround necessary at all? Best, Sascha Erick Erickson wrote: This may help: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html#Boolean%20operators But the clause you specified translates roughly as find all the documents that contain R, then remove any of them that match * TO *. * TO * contains all the documents with R, so everything you just matched is removed from your results. HTH Erick On Tue, Jun 29, 2010 at 12:40 PM, Sascha Szottsz...@zib.de wrote: Hi Ahmet, it works, thanks a lot! To be true I have no idea what's the problem with defType=luceneq.op=ORdf=topicq=R NOT [* TO *] -Sascha Ahmet Arslan wrote: I have a (multi-valued) field topic in my index which does not need to exist in every document. Now, I'm struggling with formulating a query that returns all documents that either have no topic field at all *or* whose topic field value is R. Does this work? defType=luceneq.op=ORq=topic:R (+*:* -topic:[* TO *])
Re: Unbuffered Exception while setting permissions
This error usually occurs when i do a server.add(inpDoc). Behind the logs: 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET /solr/GPTWPI/update?qt=%2Fupdateoptimize=truewt=javabinversion=1 HTTP/1.1 200 41 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET /solr/GPTWPI/select?q=aid%3A30234wt=javabinversion=1 HTTP/1.1 401 1389 192.168.0.106 - admin [30/Jun/2010:11:30:38 +] GET /solr/GPTWPI/select?q=aid%3A30234wt=javabinversion=1 HTTP/1.1 200 70 192.168.0.106 - - [30/Jun/2010:11:30:38 +] POST /solr/GPTWPI/update?wt=javabinversion=1 HTTP/1.1 200 41 (Works when i comment out the auth-constraint for RW) AND 192.168.0.106 - - [30/Jun/2010:11:29:09 +] POST /solr/GPTWPI/update?wt=javabinversion=1 HTTP/1.1 401 1389 (Does not work when i add the auth-constraint for RW) 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET /solr/GPTWPI/update?qt=%2Fupdatecommit=truewt=javabinversion=1 HTTP/1.1 200 41 so what i conclude is that the authentication does not work when we do a POST method and works for GET methods. correct me if i am wrong. and how do i get it working? Regards, Raakhi On Wed, Jun 30, 2010 at 2:22 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: I was going through the logs, Everytime i try doing an update (and ofcourse ending up with unbuffered exception) the log outputs the following line [30/Jun/2010:09:02:52 +] POST /solr/core1/update?wt=javabinversion=1 HTTP/1.1 401 1389 Regards Raakhi On Wed, Jun 30, 2010 at 12:27 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: PS: I am using solr 1.4 Regards, Raakhi On Wed, Jun 30, 2010 at 12:05 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: Hi, I am trying out solr security on my setup from the following links: http://wiki.apache.org/solr/SolrSecurity http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores Following is my configuration: realms.properties: admin: admin,server-administrator,content-administrator,admin other: OBF:1xmk1w261u9r1w1c1xmq guest: guest,read-only rakhi: rakhi,RW-role jetty.xml: ... Set name=UserRealms Array type=org.mortbay.jetty.security.UserRealm Item New class=org.mortbay.jetty.security.HashUserRealm Set name=nameTest Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set /New /Item /Array /Set ... WebDefault.xml: !-- block by default. -- security-constraint web-resource-collection web-resource-nameDefault/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint/ !-- BLOCK! -- /security-constraint !-- Setting admin access. -- security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern/admin/*/url-pattern url-pattern/core1/admin/*/url-pattern /web-resource-collection auth-constraint role-nameadmin/role-name role-nameFullAccess-role/role-name /auth-constraint /security-constraint !-- this constraint has no auth constraint or data constraint = allows without auth. -- security-constraint web-resource-collection web-resource-nameAllowedQueries/web-resource-name url-pattern/core1/select/*/url-pattern /web-resource-collection /security-constraint login-config auth-methodBASIC/auth-method realm-nameTest Realm/realm-name /login-config security-role role-nameAdmin-role/role-name /security-role security-role role-nameFullAccess-role/role-name /security-role security-role role-nameRW-role/role-name /security-role So Far Everything works good. I get a forbidden exception as soon as i try to commit documents in solr. but when i add the following security constraint tag in webdefault.xml, !-- this constraint allows access to modify the data in the SOLR service, with basic auth -- security-constraint web-resource-collection web-resource-nameRW/web-resource-name !-- the dataimport handler for each individual core -- url-pattern/core1/dataimport/url-pattern !-- the update handler (XML over HTTP) for each individual core -- url-pattern/core1/update/*/url-pattern /web-resource-collection auth-constraint !-- Roles of users are defined int the properties file -- !-- we allow users with rw-only access -- role-nameRW-role/role-name !-- we allow users with full access -- role-nameFullAccess-role/role-name /auth-constraint /security-constraint I get the following exception: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at
Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting
I get only this exception when i sort on my popularity field. WHY ??? ...sort=score desc, popularity desc -- BAD ...sort=score desc -- All fine. thats not good =( -- View this message in context: http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-heelp-Sorting-tp932956p932968.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting
(10/06/30 20:27), stockii wrote: Hello. I get an SEVERE: java.lang.ArrayIndexOutOfBoundsException and i dont know the reason for this. I have 4 cores. and every core is running but. for few minutes i get these bad exception in one core. its absolutlety not acceptable ... When i search with this: .../? omitHeader=falsefl=idsort=score+desc,+popularity+descstart=0q=fuckjson.nl=mapqt=standardwt=jsonrows=16version=1.2 I get the execption, but not when i search withoutsort- parameter i get no exception. ?!?!?! SEVERE: java.lang.ArrayIndexOutOfBoundsException: 30988 at org.apache.lucene.search.FieldCacheImpl$StringIndexCache.createValue(FieldCacheImpl.java:713) If popularity field is multiValued and/or tokenized, sorting on such field throws AIOOBE in Solr 1.4.1. (Sort works correctly on untokenized/single-valued field) Koji -- http://www.rondhuit.com/en/
Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting
aha. the type is sint do i need to use string ore which field did not use any tokenizer ? ^^ i thought sint is untokenized... -- View this message in context: http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-heelp-Sorting-tp932956p933003.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ArrayIndexOutOfBoundsException heeeeeelp !?!?!?!!?! Sorting
it makes no sense. i can on my playground-core sort for popularity, but not on my live-core. both cores, have the same configuration ... the problem occurs after use facet-search. thats maybe an reason ? or too many data ? to low server ? any idea -- View this message in context: http://lucene.472066.n3.nabble.com/ArrayIndexOutOfBoundsException-heelp-Sorting-tp932956p933099.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: REST calls
On Wed, Jun 30, 2010 at 12:39 AM, Don Werve d...@madwombat.com wrote: 2010/6/27 Jason Chaffee jchaf...@ebates.com The solr docs say it is RESTful, yet it seems that it doesn't use http headers in a RESTful way. For example, it doesn't seem to use the Accept: request header to determine the media-type to be returned. Instead, it requires a query parameter to be used in the URL. Also, it doesn't seem to use return 304 Not Modified if the request header if-modified-since is used. The summary: Solr is restful, and does a very good job of it. I'm not so sure... The long version: There is no official 'REST' standard that dictates the behavior of the implementation; rather, REST is a set of guidelines on building APIs that are both discoverable and easily usable without having to resort to third-party libraries. Generally speaking, an application is RESTful if it provides an API that accepts arguments passed as HTTP form variables, returns results in an open format (XML, JSON, YAML, etc.), and respects certain semantics relating to HTTP verbs; e.g., GET/HEAD return the resource without modification, DELETEs are destructive, POST creates a resource, PUT alters it. Solr meets all of these requirements. With fairly limited knowledge of Solr (I'm a lucene user), I'd like to offer an alternate view. - Solr seems to violate the hypermedia-driven constraint. (e.g. it seems not to be hypertext driven at all) [1] - Solr seems to violate the uniform interface constraint and the identification of resources constraint.(E.g. by having commands in the entity body instead of exposing resources with state that is manipulated through the standard methods and, I gather, overloading methods instead of using standard ones (e.g. deletes). I'd conclude Solr is not RESTful. The representation argument is a bit of a red-herring, btw. Not using Accept for conneg isn't the problem, using agent-driven negotiation without being hypertext driven is [from a REST pov]. --tim [1] - http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hypertext-driven
Re: REST calls
On Wed, Jun 30, 2010 at 9:17 AM, Jak Akdemir jakde...@gmail.com wrote: On Wed, Jun 30, 2010 at 7:39 AM, Don Werve d...@madwombat.com wrote: 2010/6/27 Jason Chaffee jchaf...@ebates.com The solr docs say it is RESTful, yet it seems that it doesn't use http headers in a RESTful way. For example, it doesn't seem to use the Accept: request header to determine the media-type to be returned. Instead, it requires a query parameter to be used in the URL. Also, it doesn't seem to use return 304 Not Modified if the request header if-modified-since is used. The summary: Solr is restful, and does a very good job of it. The long version: There is no official 'REST' standard that dictates the behavior of the implementation; rather, REST is a set of guidelines on building APIs that are both discoverable and easily usable without having to resort to third-party libraries. Generally speaking, an application is RESTful if it provides an API that accepts arguments passed as HTTP form variables, returns results in an open format (XML, JSON, YAML, etc.), and respects certain semantics relating to HTTP verbs; e.g., GET/HEAD return the resource without modification, DELETEs are destructive, POST creates a resource, PUT alters it. Actually it is not a constraint to use all of four *GET*, *PUT*, *POST*, * DELETE.* To define RESTful, using Get and Post requests are enough as Roy Fielding offered. http://roy.gbiv.com/untangled/2009/it-is-okay-to-use-post In Roy's post, I'd point out: POST only becomes an issue when it is used in a situation for which some other method is ideally suited (e.g. DELETE to delete). Also, GET and POST *could* be enough if and only if you took care to design your resources properly[1]. --tim [1] - http://www.amundsen.com/blog/archives/1063
Re: REST calls
Solr's APIs are described as REST-like, and probably do qualify as restful the way the term is commonly used. I'm personally much more interested in making our APIs more powerful and easier to use, regardless of any REST purity tests. -Yonik http://www.lucidimagination.com
Re: dataimport.properties is not updated on delta-import
I've finally found the problem causing the delta-import to fail and thought I would post it here for future reference (if someone makes the same mistake I did). I had forgot to collect the id column in the deltaImportQuery. I should, of course, have known this from the log entires about documents not being added because they lacked id fields. My broken query: SELECT column_1, column_2, column_3 FROM table WHERE id = ${dataimporter.delta.id} My working query: SELECT id, column_1, column_2, column_3 FROM table WHERE id = ${dataimporter.delta.id} Thanks all for you help! -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-properties-is-not-updated-on-delta-import-tp916753p933342.html Sent from the Solr - User mailing list archive at Nabble.com.
Is there a way to delete multiple documents using wildcard?
Hi, I am trying to delete a group of documents using wildcard. Something like update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletedocfield%20name=uid6-HOST*/field/doc/delete' I want to delete all documents which contains the uid starting with 6-HOST but this query doesnt seem to work.. Am I doing anything wrong?? Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933468.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: REST calls
On Wed, 2010-06-30 at 16:12 +0200, Yonik Seeley wrote: Solr's APIs are described as REST-like, and probably do qualify as restful the way the term is commonly used. I'm personally much more interested in making our APIs more powerful and easier to use, regardless of any REST purity tests. -Yonik http://www.lucidimagination.com Hi Yonik, yes, please - and thanks! I'm again and again positively surprised how efficient and yet simple SOLR's (and Lucene's) query and response language (incl. response formats) is. Some things seem complex/difficult at first (like dismax or function queries) but turn out to be simple/easy to use considering the complexity of the problems they solve. Chantal
Re: Is there a way to delete multiple documents using wildcard?
Hi, you can delete all docs that match a certain query: deletequeryuid:6-HOST*/query/delete -Sascha bbarani wrote: Hi, I am trying to delete a group of documents using wildcard. Something like update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletedocfield%20name=uid6-HOST*/field/doc/delete' I want to delete all documents which contains the uid starting with 6-HOST but this query doesnt seem to work.. Am I doing anything wrong?? Thanks, BB
Re: Is there a way to delete multiple documents using wildcard?
Hi, Thanks a lot for your reply.. I tried the below query update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletequeryuid:6-HOST*/query/delete' But even now none of the documents are getting deleted.. Am I forming the URL wrong? Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933585.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrQueryResponse - Solr Documents
Hello all, How can I view solr docs in response writers before the response is sent to the client ? What I get is only DocSlice with int values having size equal the docs requested. All this while debugging on the SolrQueryResponse Object. Thanks Sam
Re: Leading Wildcard query strangeness
An update in case someone stumbles upon this... At first I thought you mean the fields I intend to do leading wildcard searches on needed to have ReversedWildcardFilterFactory on them. But that didn't make sense because our prod app isn't using that at all. But our prod app does have the text_rev still in it from the example schema we copied over and used as our template. One of the things we've done in dev is clean that out and try to get down to what we are using. So I tossed the text_rev back into the schema.xml, didn't actually use that field type for any fields, and now I can do leading wildcard searches again. I'm going to guess that is what you meant, that the very presence of the filter in the schema, whether it is used or not, allows you to do wildcard searches. Is that documented anywhere and I just missed it? I'm sure it is. -- View this message in context: http://lucene.472066.n3.nabble.com/Leading-Wildcard-query-strangeness-tp931809p933600.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to delete multiple documents using wildcard?
Hi, does /select?q=uid:6-HOST* return any documents? -Sascha bbarani wrote: Hi, Thanks a lot for your reply.. I tried the below query update?commit=true%20-H%20Content-Type:%20text/xml%20--data-binary%20'deletequeryuid:6-HOST*/query/delete' But even now none of the documents are getting deleted.. Am I forming the URL wrong? Thanks, BB
Re: Leading Wildcard query strangeness
I'm going to guess that is what you meant, that the very presence of the filter in the schema, whether it is used or not, allows you to do wildcard searches. Exactly. Is that documented anywhere and I just missed it? I'm sure it is. I knew it from source code of SolrQueryParser. protected void checkAllowLeadingWildcards() I didn't see anything in wiki about it.
Re: Is there a way to delete multiple documents using wildcard?
Yeah, I am getting the results when I use /select handler. I tried the below query.. /select?q=uid:6-HOST* Got result name=response numFound=52920 start=0 Thanks BB -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933697.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to delete multiple documents using wildcard?
Hi, You need to use HTTP POST in order to send those parameters I believe. Try with curl: curl http://localhost:8983/solr/update?commit=true -H Content-Type: text/xml --data-binary deletequeryuid:6-HOST*/query/delete -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 30. juni 2010, at 17.53, bbarani wrote: Yeah, I am getting the results when I use /select handler. I tried the below query.. /select?q=uid:6-HOST* Got result name=response numFound=52920 start=0 Thanks BB -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933697.html Sent from the Solr - User mailing list archive at Nabble.com.
Wiki Documentation of facet.sort
Hi there, in the wiki, on http://wiki.apache.org/solr/SimpleFacetParameters it says: The default is true/count if facet.limit is greater than 0, false/index otherwise. I've just migrated to 1.4.1 (reindexed). I can't remember how it was with 1.4.0. When I specify my facet query with facet.mincount=0 (explicitely) or without mincount (default is 0), the resulting facets are sorted by count, nevertheless. Changing mincount from 0 to 1 and back actually makes not difference in the sorting. I'm fine with a constant default behaviour (always sorting by count, e.g., no matter what parameters are given). If this is intended - shall I change the wiki accordingly? Cheers, Chantal
Re: Is there a way to delete multiple documents using wildcard?
Hi, take a look inside Solr's log file. Are there any error messages with respect to the update request? Furthermore, you could try the following two commands instead: curl http://host:port/solr/update; --form-string stream.body=deletequeryuid:6-HOST*/query/delete curl http://host:port/solr/update; --form-string stream.body=commit/ -Sascha bbarani wrote: Yeah, I am getting the results when I use /select handler. I tried the below query.. /select?q=uid:6-HOST* Gotresult name=response numFound=52920 start=0 Thanks BB
Disable Solr Response Formatting
By default my SOLR response comes back formatted, like such C/ Is there a way to tell it to return it unformatted? like: C/ -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933785.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Disable Solr Response Formatting
Oops, let me try that again... By default my SOLR response comes back formatted, like such C/ Is there a way to tell it to return it unformatted? like: C/ -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to delete multiple documents using wildcard?
Hi, I was able to sucessfully delete multiple documents using the below URL /update?stream.body=deletequeryuid:6-HOST*/query/delete Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933799.html Sent from the Solr - User mailing list archive at Nabble.com.
GC tuning - heap size autoranging
Is this a true statement??? This seems to contradict other statements regarding setting the heap size I have seen here... Default Heap Size If not otherwise set on the command line, the initial and maximum heap sizes are calculated based on the amount of memory on the machine. The proportion of memory to use for the heap is controlled by the command line options DefaultInitialRAMFraction and DefaultMaxRAMFraction, as shown in the table below. (In the table, memory represents the amount of memory on the machine.) Pasted from http://java.sun.com/javase/technologies/hotspot/gc/gc_tuning_6.html#available_collectors.selecting
RE: Re: Disable Solr Response Formatting
Hi, My client makes a mess out of your example but if you mean formatting as in indenting, then send indent=false, but it's already false by default. Check your requestHandler settings. Cheers, -Original message- From: JohnRodey timothydd...@yahoo.com Sent: Wed 30-06-2010 18:39 To: solr-user@lucene.apache.org; Subject: Re: Disable Solr Response Formatting Oops, let me try that again... By default my SOLR response comes back formatted, like such C/ Is there a way to tell it to return it unformatted? like: C/ -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933793.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is there a way to delete multiple documents using wildcard?
Hmm, nice one - I was not aware of that trick. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com On 30. juni 2010, at 18.41, bbarani wrote: Hi, I was able to sucessfully delete multiple documents using the below URL /update?stream.body=deletequeryuid:6-HOST*/query/delete Thanks, BB -- View this message in context: http://lucene.472066.n3.nabble.com/Is-there-a-way-to-delete-multiple-documents-using-wildcard-tp933468p933799.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Re: Disable Solr Response Formatting
Thanks! I was looking for things to change in the solrconfig.xml file. indent=off -- View this message in context: http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933966.html Sent from the Solr - User mailing list archive at Nabble.com.
Bizarre Terms revisited
Hi, I really think there is something not quite right going on here after much study. Here is my findings. Using MLT, I get terms that appear to be long concatenations of words that are space delimited in the original text. I can't think of any reason for these sentence-like terms to exist (see below). All my data and config follows: Here is the output from MLT: lst name=interestingTerms float name=text_t:result1.0/float float name=text_t:concepts1.0/float float name=text_t:identified1.0/float float name=text_t:row1.0/float float name=text_t:based1.0/float float name=text_t:0001.0/float float name=text_t:ontreweb1.0/float float name=text_t:in1.0/float float name=text_t:and1.0/float float name=text_t:21.0/float !-- These do not look like valid or useful terms to have in the index. -- !-- Why do these exist? -- float name=text_t:searchinonelanguagefindresultsinanother1.0/float float name=text_t:ontrewebstartpage1.0/float float name=text_t:unlimitedmutliwordandphrasematching1.0/float float name=text_t:wordsandphrases1.0/float float name=text_t:pluggablevocabulariesontologies1.0/float float name=text_t:mappedconcepts1.0/float float name=text_t:ontrewebproductfeatures1.0/float float name=text_t:multilinguallexiconsfrenchenglishetc1.0/float float name=text_t:multipleinheritanceofconcepts1.0/float float name=text_t:41.0/float float name=text_t:string1.0/float float name=text_t:english1.0/float float name=text_t:mapped1.0/float float name=text_t:multilingual1.0/float float name=text_t:mutliword1.0/float /lst My field: field name=text_t type=textgenindexed=true stored=true multiValued=true termVectors=true termPositions=true termOffsets=true/ Field definition taken from the default schema.xml fieldType name=textgen class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Original text (partially snipped) as it appears in the stored index. Ontreweb Product Features Unlimited mutliword and phrase matching Multiple inheritance of concepts Pluggable vocabularies, ontologies Multilingual lexicons: french, english, etc. Search in one language, find results in another 200,000+ words and phrases, 35,000 mapped concepts. 1. 2. 3. 4.
RE: OOM on uninvert field request
Hey so after adding those GC options, I was able to incrementally push my max (and min) memory settings up and when we got to max=min=12GB we started looking much better! One slave handles all the load with no OOMs at all! I'm watching the live tomcat log using 'tail'. Next I will convert that field type to (trie) int and reindex. I'll have to start a new index from scratch with a field type change like that so I'll have to delete the old one first on our master... It takes us a couple days to index 15 million products (some are sets so the final index size is only 8 million) so I don't want to do *that* too often as the slaves will be quite stale by the time it's done! :) Thanks for the help! -Original Message- From: Robert Petersen [mailto:rober...@buy.com] Sent: Wednesday, June 30, 2010 9:49 AM To: solr-user@lucene.apache.org Subject: RE: OOM on uninvert field request At and above 4GB we get those GC errors though! Should I switch to something like this? Recommended Options To use i-cms in Java SE 6, use the following command line options: -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode \ -XX:+PrintGCDetails -XX:+PrintGCTimeStamps Caused by: java.lang.RuntimeException: java.lang.OutOfMemoryError: GC overhead limit exceeded at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:1068) at org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:418) at org.apache.solr.handler.SnapPuller.doCommit(SnapPuller.java:467) at org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:319) ... 11 more Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: Tuesday, June 29, 2010 8:42 PM To: solr-user@lucene.apache.org Subject: Re: OOM on uninvert field request Yes, it is better to use ints for ids than strings. Also, the Trie int fields have a compressed format that may cut the storage needs even more. 8m * 4 = 32mb, times a few hundred, we'll say 300, is 900mb of IDs. I don't know how these fields are stored, but if they are separate objects we've blown up to several gigs (per-object overheads are surprising). 4G is probably not enough for what you want. If you watch the total memory with 'top' and hit it with different queries, you will get a stronger sense of how much memory your use cases need. On Tue, Jun 29, 2010 at 4:32 PM, Robert Petersen rober...@buy.com wrote: Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m -Xms4096m) for both min and max which is doing pretty well but occasionally still getting the below OOM errors. We're running on dual quad core xeons with 16GB memory installed. I've been getting the below OOM exceptions still though. Is the memsize mentioned in the INFO for the uninvert in bytes? Ie is memSize=29604020 mean 29MB? We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed. Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index? BTW these are used for faceting and filtering only. dynamicField name=*_contentAttributeToken type=string indexed=true multiValued=true stored=true required=false/ Jun 29, 2010 3:54:50 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field {field=768_contentAttributeToken,memSize=29604014,tindexSize=50,time=1841,phase1=1824,nTerms=1,bigTerms=0,termInstances=18,uses=0} Jun 29, 2010 3:54:52 PM org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field {field=749_contentAttributeToken,memSize=29604020,tindexSize=56,time=1847,phase1=1829,nTerms=143,bigTerms=0,termInstances=951,uses=0} Jun 29, 2010 3:54:59 PM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.request.UnInvertedField.uninvert(UnInvertedField.java:191) at org.apache.solr.request.UnInvertedField.init(UnInvertedField.java:178) at org.apache.solr.request.UnInvertedField.getUnInvertedField(UnInvertedField.java:839) at org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:250) at org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:283) at org.apache.solr.request.SimpleFacets.getFacetCounts(SimpleFacets.java:166) -- Lance Norskog goks...@gmail.com
Multiple Solr servers and a shared index vs master+slaves
I'm a newbie looking at setting up an intranet search service using Solr, so I'm having a hard time understanding why I should forego the high availability and clustering mechanisms we already have available, and use Solr's implementations instead. I'm hoping some experienced Solr architects could take the time to comment. Our corporate standard is for any java web app to be deployed as an ear file targeted to a 4-server Weblogic 10.3 cluster on virtual Solaris boxes, operating behind a cluster of Apache web servers. All servers have NFS mounts to high availability SANs. So my Solr proof-of-concept tries to make use of those tools. I've deployed Solr to the cluster, and all of them use the same solr.home on the NFS mount. This seems to be just fine for searching, query requests are evenly distributed across the cluster, and search performance seems to be fine with the index living on the NFS mount. The problems, of course, start when add/update requests come in. This setup is the equivalent of having 4 standalone Solr servers using the same index. So if I use the simple lock file mechanism, in my testing so far it seems to keep them all separate just fine, except that the first update comes in to serverA, it grabs the write lock, then if any other servers receive an update near the same time, it must wait for the write lock to be be removed by serverA after it commits. I think I can pretty well mitigate this by directing all updates through a single server (via virtual IP address), but then I need the other servers to realize the index has changed after each commit. It looks like I can make a call like http://serverB/solr/update/extract?commit=true and that's good enough to get it to open a new reader, but that seems a little clunky. I've read in another thread about the use of commit hooks that can trigger user-defined events, I think, so I'm looking into that now. Now when I look at using Solr's master+slaves architecture, I feel like it's duplicating the trusted (and expensive) services we already have at our disposal. Weblogic+Apache clusters do a good job of distributing load, monitoring health, failing-over, restarting, etc. And if we used slaves that pulled index snapshots, they'd be using (by policy) the same NFS mount to store those snapshots, so we'd be pulling it over the wire only to write it right next to the original index. If we didn't have these HA clustering mechanisms available already, then I'm sure I'd be much more willing to look at a Solr master+slave architecture. But since we do, it seems like I'm a little bit hamstrung to use Solr's mechanisms anyway. So, that's my scenario, comments welcome. :) -dKt
Re: OOM on uninvert field request
On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen rober...@buy.com wrote: Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m -Xms4096m) for both min and max which is doing pretty well but occasionally still getting the below OOM errors. We're running on dual quad core xeons with 16GB memory installed. I've been getting the below OOM exceptions still though. Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB? Yes. We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed. Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index? No, using UnInvertedField faceting, the fieldType won't matter much at all for the space it takes up. The key here is that it looks like the number of unique terms in these fields is low - you would probably do much better with facet.method=enum (which iterates over terms rather than documents). -Yonik http://www.lucidimagination.com
Re: Unbuffered Exception while setting permissions
Other problems with this error have been solved by doing pre-emptive authentication. On Wed, Jun 30, 2010 at 4:26 AM, Rakhi Khatwani rkhatw...@gmail.com wrote: This error usually occurs when i do a server.add(inpDoc). Behind the logs: 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET /solr/GPTWPI/update?qt=%2Fupdateoptimize=truewt=javabinversion=1 HTTP/1.1 200 41 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET /solr/GPTWPI/select?q=aid%3A30234wt=javabinversion=1 HTTP/1.1 401 1389 192.168.0.106 - admin [30/Jun/2010:11:30:38 +] GET /solr/GPTWPI/select?q=aid%3A30234wt=javabinversion=1 HTTP/1.1 200 70 192.168.0.106 - - [30/Jun/2010:11:30:38 +] POST /solr/GPTWPI/update?wt=javabinversion=1 HTTP/1.1 200 41 (Works when i comment out the auth-constraint for RW) AND 192.168.0.106 - - [30/Jun/2010:11:29:09 +] POST /solr/GPTWPI/update?wt=javabinversion=1 HTTP/1.1 401 1389 (Does not work when i add the auth-constraint for RW) 192.168.0.106 - - [30/Jun/2010:11:30:38 +] GET /solr/GPTWPI/update?qt=%2Fupdatecommit=truewt=javabinversion=1 HTTP/1.1 200 41 so what i conclude is that the authentication does not work when we do a POST method and works for GET methods. correct me if i am wrong. and how do i get it working? Regards, Raakhi On Wed, Jun 30, 2010 at 2:22 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: I was going through the logs, Everytime i try doing an update (and ofcourse ending up with unbuffered exception) the log outputs the following line [30/Jun/2010:09:02:52 +] POST /solr/core1/update?wt=javabinversion=1 HTTP/1.1 401 1389 Regards Raakhi On Wed, Jun 30, 2010 at 12:27 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: PS: I am using solr 1.4 Regards, Raakhi On Wed, Jun 30, 2010 at 12:05 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: Hi, I am trying out solr security on my setup from the following links: http://wiki.apache.org/solr/SolrSecurity http://www.lucidimagination.com/search/document/d1e338dc452db2e4/how_can_i_protect_the_solr_cores Following is my configuration: realms.properties: admin: admin,server-administrator,content-administrator,admin other: OBF:1xmk1w261u9r1w1c1xmq guest: guest,read-only rakhi: rakhi,RW-role jetty.xml: ... Set name=UserRealms Array type=org.mortbay.jetty.security.UserRealm Item New class=org.mortbay.jetty.security.HashUserRealm Set name=nameTest Realm/Set Set name=configSystemProperty name=jetty.home default=.//etc/realm.properties/Set /New /Item /Array /Set ... WebDefault.xml: !-- block by default. -- security-constraint web-resource-collection web-resource-nameDefault/web-resource-name url-pattern//url-pattern /web-resource-collection auth-constraint/ !-- BLOCK! -- /security-constraint !-- Setting admin access. -- security-constraint web-resource-collection web-resource-nameSolr authenticated application/web-resource-name url-pattern/admin/*/url-pattern url-pattern/core1/admin/*/url-pattern /web-resource-collection auth-constraint role-nameadmin/role-name role-nameFullAccess-role/role-name /auth-constraint /security-constraint !-- this constraint has no auth constraint or data constraint = allows without auth. -- security-constraint web-resource-collection web-resource-nameAllowedQueries/web-resource-name url-pattern/core1/select/*/url-pattern /web-resource-collection /security-constraint login-config auth-methodBASIC/auth-method realm-nameTest Realm/realm-name /login-config security-role role-nameAdmin-role/role-name /security-role security-role role-nameFullAccess-role/role-name /security-role security-role role-nameRW-role/role-name /security-role So Far Everything works good. I get a forbidden exception as soon as i try to commit documents in solr. but when i add the following security constraint tag in webdefault.xml, !-- this constraint allows access to modify the data in the SOLR service, with basic auth -- security-constraint web-resource-collection web-resource-nameRW/web-resource-name !-- the dataimport handler for each individual core -- url-pattern/core1/dataimport/url-pattern !-- the update handler (XML over HTTP) for each individual core -- url-pattern/core1/update/*/url-pattern /web-resource-collection auth-constraint !-- Roles of users are defined int the properties file -- !-- we allow users with rw-only access -- role-nameRW-role/role-name !-- we allow users with full access -- role-nameFullAccess-role/role-name /auth-constraint /security-constraint I get the following exception: org.apache.solr.client.solrj.SolrServerException: org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing request can not be repeated. at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:469) at
Re: REST calls
The stream.file/stream.url/stream.body parameters allow a GET to alter the index. The core management operations are also useable from GET. This allows one to bookmark and mail around a link that changes or blows up the index. Apparently this is not ReStFuL It is IMVHO insane. On Wed, Jun 30, 2010 at 7:45 AM, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: On Wed, 2010-06-30 at 16:12 +0200, Yonik Seeley wrote: Solr's APIs are described as REST-like, and probably do qualify as restful the way the term is commonly used. I'm personally much more interested in making our APIs more powerful and easier to use, regardless of any REST purity tests. -Yonik http://www.lucidimagination.com Hi Yonik, yes, please - and thanks! I'm again and again positively surprised how efficient and yet simple SOLR's (and Lucene's) query and response language (incl. response formats) is. Some things seem complex/difficult at first (like dismax or function queries) but turn out to be simple/easy to use considering the complexity of the problems they solve. Chantal -- Lance Norskog goks...@gmail.com
Re: REST calls
On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote: Apparently this is not ReStFuL It is IMVHO insane. Patches welcome... -Yonik http://www.lucidimagination.com
Re: OOM on uninvert field request
On Wed, Jun 30, 2010 at 1:38 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen rober...@buy.com wrote: Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m -Xms4096m) for both min and max which is doing pretty well but occasionally still getting the below OOM errors. We're running on dual quad core xeons with 16GB memory installed. I've been getting the below OOM exceptions still though. Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB? Yes. We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed. Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index? No, using UnInvertedField faceting, the fieldType won't matter much at all for the space it takes up. The key here is that it looks like the number of unique terms in these fields is low - you would probably do much better with facet.method=enum (which iterates over terms rather than documents). -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: REST calls
I've looked at the problem. It's fairly involved. It probably would take several iterations. (But not as many as field collapsing :) On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote: Apparently this is not ReStFuL It is IMVHO insane. Patches welcome... -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
tomcat solr logs
Sorry if this is at all off topic. Our solr log files need grooming and we would also like to analyze them, perhaps pulling various data points into a DB table, is there a preferred app for doing log file analysis and/or an easy way to delete the old log files?
Re: REST calls
If there is a real desire/need to make things restful in the official sense, it is worth looking at using a REST framework as the controller rather then the current solution. perhaps: http://www.restlet.org/ https://jersey.dev.java.net/ These would be cool since they encapsulate lots of the request plumbing work that it would be better if we could leverage more widely used approaches then support our own. That said, what we have is functional and powerful -- if you are concerned about people editing the index (with GET/POST or whatever) there are plenty of ways to solve this. ryan On Wed, Jun 30, 2010 at 5:31 PM, Lance Norskog goks...@gmail.com wrote: I've looked at the problem. It's fairly involved. It probably would take several iterations. (But not as many as field collapsing :) On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote: Apparently this is not ReStFuL It is IMVHO insane. Patches welcome... -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
RE: OOM on uninvert field request
Most of these hundreds of facet fields have tens of values but a couple have thousands, is thousands of different values too many to do enum or is that still ok? If so I could apply it carte blanche to the whole field... -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, June 30, 2010 1:38 PM To: solr-user@lucene.apache.org Subject: Re: OOM on uninvert field request On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen rober...@buy.com wrote: Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m -Xms4096m) for both min and max which is doing pretty well but occasionally still getting the below OOM errors. We're running on dual quad core xeons with 16GB memory installed. I've been getting the below OOM exceptions still though. Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB? Yes. We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed. Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index? No, using UnInvertedField faceting, the fieldType won't matter much at all for the space it takes up. The key here is that it looks like the number of unique terms in these fields is low - you would probably do much better with facet.method=enum (which iterates over terms rather than documents). -Yonik http://www.lucidimagination.com
Re: Very basic questions: Faceted front-end?
Wow, thanks Lance - it's really fast now! The last piece of the puzzle is setting up a nice front-end. Are there any pre-built front-ends available, that mimic Google (for example), with facets? -Peter On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote: To highlight a field, Solr needs some extra Lucene values. If these are not configured for the field in the schema, Solr has to re-analyze the field to highlight it. If you want faster highlighting, you have to add term vectors to the schema. Here is the grand map of such things: http://wiki.apache.org/solr/FieldOptionsByUseCase On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson erickerick...@gmail.com wrote: What are you actual highlighting requirements? you could try things like maxAnalyzedChars, requireFieldMatch, etc http://wiki.apache.org/solr/HighlightingParameters has a good list, but you've probably already seen that page Best Erick On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam ps...@mac.com wrote: To follow up, I've found that my queries are very fast (even with fq=), until I add hl=true. What can I do to speed up highlighting? Should I consider injecting a line at a time, rather than the entire file as a field? -Pete On Jun 29, 2010, at 11:07 AM, Peter Spam wrote: Thanks for everyone's help - I have this working now, but sometimes the queries are incredibly slow!! For example, int name=QTime461360/int. Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to inject without throwing heap memory errors. However, my data set is very small! 36 text files, for a total of 113MB. (It will grow to many TB, but for now, this is a test). The largest file is 34MB. Therefore, I'm sure I'm doing something wrong :-) Here's my config: --- For the schema.xml, types is all default. For fields, here are the only lines that aren't commented out: field name=id type=string indexed=true stored=true required=true / field name=body type=text indexed=true stored=true multiValued=true/ field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ field name=build type=string indexed=true stored=true multiValued=false/ field name=device type=string indexed=true stored=true multiValued=false/ dynamicField name=* type=ignored multiValued=true / ... then, for the rest: uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldbody/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=AND/ --- Invoking: java -Xmx3584M -Xms1024M -jar start.jar --- Injecting: #!/bin/sh J=0 for i in `find . -name \*.txt`; do (( J++ )) curl http://localhost:8983/solr/update/extract?literal.id=doc$Jfmap.content=body; -F myfi...@$i; done; echo - Committing curl http://localhost:8983/solr/update/extract?commit=true; --- Searching: http://localhost:8983/solr/select?q=testinghl=truefl=id,scorehl.snippets=5hl.mergeContiguous=true -Pete On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote: try adding hl.fl=text to specify your highlight field. I don't understand why you're only getting the ID field back though. Do note that the highlighting is after the docs, related by the ID. Try a (non highlighting) query of just * to verify that you're pointing at the index you think you are. It's possible that you've modified a different index with SolrJ than your web server is pointing at. Also, SOLR has no way of knowing you're modified your index with SolrJ, so it may not be automatically reopening an IndexReader so your recent changes may not be visible until you force the SOLR reader to reopen. HTH Erick On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam ps...@mac.com wrote: On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like snippets as you describe. http://wiki.apache.org/solr/HighlightingParameters Here's how I commit my documents: J=0; for i in `find . -name \*.txt`; do (( J++ )) curl http://localhost:8983/solr/update/extract?literal.id=doc$J; -F myfi...@$i; done; echo - Committing curl http://localhost:8983/solr/update/extract?commit=true; Then, I try to query using
Disk usage per-field
Is it possible for Solr (or Luke/Lucene) to tell me exactly how much of the total index disk space is used by each field? It would also be very nice to know, for each field, how much is used by the index and how much is used for stored data.
RE: REST calls
Using Accept headers is a pretty standard practice and so are conditional GETs. Quite easy to test with curl: curl -X GET -H Accept:application/xml http://solr.com/search curl -X GET -H Accept:application/json http://solr.com/search Jason -Original Message- From: Don Werve [mailto:d...@madwombat.com] Sent: Tuesday, June 29, 2010 9:40 PM To: solr-user@lucene.apache.org Subject: Re: REST calls 2010/6/27 Jason Chaffee jchaf...@ebates.com The solr docs say it is RESTful, yet it seems that it doesn't use http headers in a RESTful way. For example, it doesn't seem to use the Accept: request header to determine the media-type to be returned. Instead, it requires a query parameter to be used in the URL. Also, it doesn't seem to use return 304 Not Modified if the request header if-modified-since is used. The summary: Solr is restful, and does a very good job of it. The long version: There is no official 'REST' standard that dictates the behavior of the implementation; rather, REST is a set of guidelines on building APIs that are both discoverable and easily usable without having to resort to third-party libraries. Generally speaking, an application is RESTful if it provides an API that accepts arguments passed as HTTP form variables, returns results in an open format (XML, JSON, YAML, etc.), and respects certain semantics relating to HTTP verbs; e.g., GET/HEAD return the resource without modification, DELETEs are destructive, POST creates a resource, PUT alters it. Solr meets all of these requirements. The nature of the result format, and how to change it, is entirely up to the implementer. A common convention is to use a filename extension (.json, .xml) appended to the URL. It's less common to specify the request format as part of the query parameters (like Solr does), but not unheard of. And, to be honest, this is actually the first time I've heard of using the 'Accept' header to change the result format, as it makes it a lot harder to use a web browser, or command-line tools like curl or wget, to debug your API.
RE: REST calls
In that case, being able to use Accept headers and conditional GET's would make them more powerful and easier to use. The Accept header could be used, if present, otherwise use the query parameter. Or, vice versa. Also, conditional GET's are a big win when you know the data and results are not changing often. Jason -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, June 30, 2010 7:12 AM To: solr-user@lucene.apache.org Subject: Re: REST calls Solr's APIs are described as REST-like, and probably do qualify as restful the way the term is commonly used. I'm personally much more interested in making our APIs more powerful and easier to use, regardless of any REST purity tests. -Yonik http://www.lucidimagination.com
RE: REST calls
Two more jaxrs solutions: http://www.jboss.org/resteasy http://cxf.apache.org/docs/jax-rs.html However, I am not suggesting changing the core implementation. Just want to make it more powerful by utilizing headers. I can accept the other issues that have been mentioned as not RESTful. Also, I do plan to make patches for the issues I mentioned. I just wanted to know if I was missing anything or someone else already had contributed an extension. Jason -Original Message- From: Ryan McKinley [mailto:ryan...@gmail.com] Sent: Wednesday, June 30, 2010 3:07 PM To: solr-user@lucene.apache.org Subject: Re: REST calls If there is a real desire/need to make things restful in the official sense, it is worth looking at using a REST framework as the controller rather then the current solution. perhaps: http://www.restlet.org/ https://jersey.dev.java.net/ These would be cool since they encapsulate lots of the request plumbing work that it would be better if we could leverage more widely used approaches then support our own. That said, what we have is functional and powerful -- if you are concerned about people editing the index (with GET/POST or whatever) there are plenty of ways to solve this. ryan On Wed, Jun 30, 2010 at 5:31 PM, Lance Norskog goks...@gmail.com wrote: I've looked at the problem. It's fairly involved. It probably would take several iterations. (But not as many as field collapsing :) On Wed, Jun 30, 2010 at 2:11 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Jun 30, 2010 at 4:55 PM, Lance Norskog goks...@gmail.com wrote: Apparently this is not ReStFuL It is IMVHO insane. Patches welcome... -Yonik http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: Very basic questions: Faceted front-end?
Ah, I found this: https://issues.apache.org/jira/browse/SOLR-634 ... aka solr-ui. Is there anything else along these lines? Thanks! -Peter On Jun 30, 2010, at 3:59 PM, Peter Spam wrote: Wow, thanks Lance - it's really fast now! The last piece of the puzzle is setting up a nice front-end. Are there any pre-built front-ends available, that mimic Google (for example), with facets? -Peter On Jun 29, 2010, at 9:04 PM, Lance Norskog wrote: To highlight a field, Solr needs some extra Lucene values. If these are not configured for the field in the schema, Solr has to re-analyze the field to highlight it. If you want faster highlighting, you have to add term vectors to the schema. Here is the grand map of such things: http://wiki.apache.org/solr/FieldOptionsByUseCase On Tue, Jun 29, 2010 at 6:29 PM, Erick Erickson erickerick...@gmail.com wrote: What are you actual highlighting requirements? you could try things like maxAnalyzedChars, requireFieldMatch, etc http://wiki.apache.org/solr/HighlightingParameters has a good list, but you've probably already seen that page Best Erick On Tue, Jun 29, 2010 at 9:11 PM, Peter Spam ps...@mac.com wrote: To follow up, I've found that my queries are very fast (even with fq=), until I add hl=true. What can I do to speed up highlighting? Should I consider injecting a line at a time, rather than the entire file as a field? -Pete On Jun 29, 2010, at 11:07 AM, Peter Spam wrote: Thanks for everyone's help - I have this working now, but sometimes the queries are incredibly slow!! For example, int name=QTime461360/int. Also, I had to bump up the min/max RAM size to 1GB/3.5GB for things to inject without throwing heap memory errors. However, my data set is very small! 36 text files, for a total of 113MB. (It will grow to many TB, but for now, this is a test). The largest file is 34MB. Therefore, I'm sure I'm doing something wrong :-) Here's my config: --- For the schema.xml, types is all default. For fields, here are the only lines that aren't commented out: field name=id type=string indexed=true stored=true required=true / field name=body type=text indexed=true stored=true multiValued=true/ field name=timestamp type=date indexed=true stored=true default=NOW multiValued=false/ field name=build type=string indexed=true stored=true multiValued=false/ field name=device type=string indexed=true stored=true multiValued=false/ dynamicField name=* type=ignored multiValued=true / ... then, for the rest: uniqueKeyid/uniqueKey !-- field for the QueryParser to use when an explicit fieldname is absent -- defaultSearchFieldbody/defaultSearchField !-- SolrQueryParser configuration: defaultOperator=AND|OR -- solrQueryParser defaultOperator=AND/ --- Invoking: java -Xmx3584M -Xms1024M -jar start.jar --- Injecting: #!/bin/sh J=0 for i in `find . -name \*.txt`; do (( J++ )) curl http://localhost:8983/solr/update/extract?literal.id=doc$Jfmap.content=body; -F myfi...@$i; done; echo - Committing curl http://localhost:8983/solr/update/extract?commit=true; --- Searching: http://localhost:8983/solr/select?q=testinghl=truefl=id,scorehl.snippets=5hl.mergeContiguous=true -Pete On Jun 28, 2010, at 5:22 PM, Erick Erickson wrote: try adding hl.fl=text to specify your highlight field. I don't understand why you're only getting the ID field back though. Do note that the highlighting is after the docs, related by the ID. Try a (non highlighting) query of just * to verify that you're pointing at the index you think you are. It's possible that you've modified a different index with SolrJ than your web server is pointing at. Also, SOLR has no way of knowing you're modified your index with SolrJ, so it may not be automatically reopening an IndexReader so your recent changes may not be visible until you force the SOLR reader to reopen. HTH Erick On Mon, Jun 28, 2010 at 6:49 PM, Peter Spam ps...@mac.com wrote: On Jun 28, 2010, at 2:00 PM, Ahmet Arslan wrote: 1) I can get my docs in the index, but when I search, it returns the entire document. I'd love to have it only return the line (or two) around the search term. Solr can generate Google-like snippets as you describe. http://wiki.apache.org/solr/HighlightingParameters Here's how I commit my documents: J=0; for i in `find . -name \*.txt`; do (( J++ )) curl http://localhost:8983/solr/update/extract?literal.id=doc$J; -F
Re: OOM on uninvert field request
On Wed, Jun 30, 2010 at 6:19 PM, Robert Petersen rober...@buy.com wrote: Most of these hundreds of facet fields have tens of values but a couple have thousands, is thousands of different values too many to do enum or is that still ok? If so I could apply it carte blanche to the whole field... enum can still handle thousands, but often slower (and remember to increase the size of your filterCache which will now see greater usage). I would do facet.method=enum for the default and then override that for those few fields with thousands of unique terms via f.123_contentAttributeToken.facet.method=fc -Yonik http://www.lucidimagination.com -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, June 30, 2010 1:38 PM To: solr-user@lucene.apache.org Subject: Re: OOM on uninvert field request On Tue, Jun 29, 2010 at 7:32 PM, Robert Petersen rober...@buy.com wrote: Hello I am trying to find the right max and min settings for Java 1.6 on 20GB index with 8 million docs, running 1.6_018 JVM with solr 1.4, and am currently have java set to an even 4GB (export JAVA_OPTS=-Xmx4096m -Xms4096m) for both min and max which is doing pretty well but occasionally still getting the below OOM errors. We're running on dual quad core xeons with 16GB memory installed. I've been getting the below OOM exceptions still though. Is the memsize mentioned in the INFO for the uninvert in bytes? is memSize=29604020 mean 29MB? Yes. We have a few hundred of these fields and they contain ints used as IDs, and so I guess could they eat all the memory to uninvert them all after we apply load and enough queries are performed. Does the field type matter, would int be better than string if these are lookup ids sparsely populated across the index? No, using UnInvertedField faceting, the fieldType won't matter much at all for the space it takes up. The key here is that it looks like the number of unique terms in these fields is low - you would probably do much better with facet.method=enum (which iterates over terms rather than documents). -Yonik http://www.lucidimagination.com
Re: Wiki Documentation of facet.sort
(10/07/01 1:12), Chantal Ackermann wrote: Hi there, in the wiki, on http://wiki.apache.org/solr/SimpleFacetParameters it says: The default is true/count if facet.limit is greater than 0, false/index otherwise. I've just migrated to 1.4.1 (reindexed). I can't remember how it was with 1.4.0. When I specify my facet query with facet.mincount=0 (explicitely) or without mincount (default is 0), the resulting facets are sorted by count, nevertheless. Changing mincount from 0 to 1 and back actually makes not difference in the sorting. I'm fine with a constant default behaviour (always sorting by count, e.g., no matter what parameters are given). If this is intended - shall I change the wiki accordingly? Cheers, Chantal Chantal, Wiki says facet.limit but you are changing facet.mincount? :) Koji -- http://www.rondhuit.com/en/
Re: REST calls
Solr has 304 support with the last-modified and etag headers. Erik On Jun 30, 2010, at 7:52 PM, Jason Chaffee wrote: In that case, being able to use Accept headers and conditional GET's would make them more powerful and easier to use. The Accept header could be used, if present, otherwise use the query parameter. Or, vice versa. Also, conditional GET's are a big win when you know the data and results are not changing often. Jason -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Wednesday, June 30, 2010 7:12 AM To: solr-user@lucene.apache.org Subject: Re: REST calls Solr's APIs are described as REST-like, and probably do qualify as restful the way the term is commonly used. I'm personally much more interested in making our APIs more powerful and easier to use, regardless of any REST purity tests. -Yonik http://www.lucidimagination.com
Dilemma - Very Frequent Synonym updates for Huge Index
Hello, Hoping some solr guru can help me out here. We are a news organization trying to migrate 10 million documents from FAST to solr. The plan is to have our Editorial team add/modify synonyms multiple times during a day as they deem appropriate. Hence we plan on using query time synonyms as we cannot reindex every time they modify the synonyms file(for the entities extracted by OpenNLP like locations/organizations/person names from article body) . Since the synonyms are for names Iam concerned that the multi-phrase issue crops up with the query-time synonyms. for example synonyms could be as follows The Washington Post Co., The Washington Post, Washington Post, The Post, TWP, WAPO DHS,D.H.S,D.H.S.,Department of Homeland Security,Homeland Security USCIS, United States Citizenship and Immigration Services, U.S.C.I.S. Barack Obama,Barack H. Obama,Barack Hussein Obama,President Obama Hillary Clinton,Hillary R. Clinton,Hillary Rodham Clinton,Secretary Clinton,Sen. Clinton William J. Clinton,William Jefferson Clinton,President Clinton,President Bill Clinton Virginia, Va., VA D.C,Washington D.C, District of Columbia I have the following fieldType in schema.xml for the keywords/entites...What issues should I be aware off ? And is there a better way to achieve it without having to reindex a million docs on each synonym change. NOTE that I use tokenizerFactory=solr.KeywordTokenizerFactory for the SynonymFilterFactory to keep the words intact without splitting !-- Field Type Keywords/Entities Extracted from OpenNLP -- fieldType name=keywordText class=solr.TextField sortMissingLast=true omitNorms=true positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt,entity-stopwords.txt enablePositionIncrements=true/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.TrimFilterFactory / filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt,entity-stopwords.txt enablePositionIncrements=true / filter class=solr.SynonymFilterFactory tokenizerFactory=solr.KeywordTokenizerFactory synonyms=person-synonyms.txt,organization-synonyms.txt,location-synonyms.txt,subject-synonyms.txt ignoreCase=true expand=true / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType