Re: Getting 500s on distributed queries with SolrCloud
The Grouping feature only works if groups are in the same shard. Perhaps that is the problem here? I could find https://issues.apache.org/jira/browse/SOLR-4164 which says that once the sharding was fixed, the problem went away. We should come up with a better exception message though. On Fri, Mar 21, 2014 at 10:49 PM, Ugo Matrangolo ugo.matrang...@gmail.com wrote: Hi, I have a two shard collection running and I'm getting this error on each query: 2014-03-21 17:08:42,018 [qtp-75] ERROR org.apache.solr.servlet.SolrDispatchFilter - *null:java.lang.IllegalArgumentException: numHits must be 0; please use TotalHitCountCollector if you just need the total hit count* at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1130) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector.init(AbstractSecondPassGroupingCollector.java:75) at org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector.init(TermSecondPassGroupingCollector.java:49) at org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.create(TopGroupsFieldCommand.java:129) at org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:142) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) Note that I'm using grouping and disabling it fixed the problem. I was aware that SolrCloud does not fully supports grouping in a distributed setup but I was expecting incorrect results (that have to addressed with custom hashing afaik) and not an error. Does anyone see this error before? Ugo -- Regards, Shalin Shekhar Mangar.
how to generate json response from the php solarium ?
How can i get the json response from solr ? I mean how can i get response of the searched results in json format and print it in solarium php code ? -- Regards, *Sohan Kalsariya*
Re: how to generate json response from the php solarium ?
On 24 March 2014 12:35, Sohan Kalsariya sohankalsar...@gmail.com wrote: How can i get the json response from solr ? I mean how can i get response of the searched results in json format and print it in solarium php code ? Adding wt=json to the query will get you Solr results in JSON format. Please refer to the Solarium documentation for how to print the results. Regards, Gora
Re: Solr dih to read Clob contents
My database configuration is as below entity name=x query=SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY FROM BOOK_REC fetch first 40 rows only transformer=ClobTransformer field column=MBR name=mbr / entity name=y dataSource=xmldata dataField=x.SMRY processor=XPathEntityProcessor forEach=/*:summary rootEntity=true field column=card_no xpath=/cardNo / /entity /entity and i get my response from solr as below doc str name=card_noorg...@1c8e807/str Am i mising anything? Thanks, Prasi On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty g...@mimirtech.com wrote: On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote: Hi, I have a requirement to index a database table with clob content. Each row in my table a column which is an xml stored as clob. I want to read the contents of xmlthrough dih and map each of the xml tag to a separate solr field, Below is my clob content. root authorA/author date02-Dec-2013/date . . . /root i want to read the contents of the clob and map author to author_solr and date to date_solr . Is this possible with a clob tranformer or a script tranformer. You will need to use a FieldReaderDataSource, and a XPathEntityProcessor along with the ClobTransformer. You do not provide details of your DIH data configuration file, but this should look something like: dataSource name=xmldata type=FieldReaderDataSource/ ... document entity name=x query=... transformer=ClobTransformer entity name=y dataSource=xmldata dataField=x.clob_column processor=XPathEntityProcessor forEach=/root field column=author_solr xpath=/author / field column=date_solr xpath=/date / /entity /entity /document Regards, Gora
Re: join and filter query with AND
Hi, Yonik, thank you for explaining me the reason of the issue. The workarounds you suggested are working fine. Kranti, your suggestion was also good :-) Thanks a lot! On 21 March 2014 20:00, Kranti Parisa kranti.par...@gmail.com wrote: My example should also work, am I missing something? q=({!join from=inner_id to=outer_id fromIndex=othercore v=$joinQuery})joinQuery=(city:Stara Zagora AND prod:214) Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Fri, Mar 21, 2014 at 2:11 PM, Yonik Seeley yo...@heliosearch.com wrote: Correct. This is only a limitation of embedding a local-params style subquery within lucene syntax. The parser, not knowing the syntax of the embedded query, currently assumes the query text ends at whitespace or other special punctuation such as ). Original: (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara Zagora)) AND (prod:214) Some possible workarounds that should work: q={!join from=inner_id to=outer_id fromIndex=othercore}city:Stara Zagora fq=prod:214 q=({!join from=inner_id to=outer_id fromIndex=othercore v='city:Stara Zagora'} AND prod:214) q=({!join from=inner_id to=outer_id fromIndex=othercore v=$jq} AND prod:214) jq=city:Stara Zagora -Yonik http://heliosearch.org - solve Solr GC pauses with off-heap filters and fieldcache On Fri, Mar 21, 2014 at 1:54 PM, Jack Krupansky j...@basetechnology.com wrote: I suspect that this is a bug in the implementation of the parsing of embedded nested query parsers . That's a fairly new feature compared to non-embedded nested query parsers - maybe Yonik could shed some light. This may date from when he made a copy of the Lucene query parser for Solr and added the parsing of embedded nested query parsers to the grammar. It seems like the embedded nested query parser is only being applied to a single, white space-delimited term, and not respecting the fact that the term is a quoted phrase. -- Jack Krupansky -Original Message- From: Marcin Rzewucki Sent: Thursday, March 20, 2014 5:19 AM To: solr-user@lucene.apache.org Subject: Re: join and filter query with AND Nope. There is no line break in the string and it is not feed from file. What else could be the reason ? On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com wrote: It looks to me like you're feeding this from some kind of text file and you really _do_ have a line break after Stara Or have a line break in the string you paste into the URL or something similar. Kind of shooting in the dark though. Erick On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have the following issue with join query parser and filter query. For such query: str name=q*:*/str str name=fq (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara Zagora)) AND (prod:214) /str I got error: lst name=error str name=msg org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. Encountered: EOF after : \Stara /str int name=code400/int /lst Stack: DEBUG - 2014-03-19 13:35:20.825; org.eclipse.jetty.servlet.ServletHandler; chain=SolrRequestFilter-default DEBUG - 2014-03-19 13:35:20.826; org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter SolrRequestFilter ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. E ncountered: EOF after : \Stara at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at
Re: Solr4.7 No live SolrServers available to handle this request
Hi Greg, This is my Clusterstate.json. WatchedEvent state:SyncConnected type:None path:null [zk: 10.10.1.72:2185(CONNECTED) 0] get /clusterstate.json {set_recent:{ shards:{ shard1:{ range:8000-d554, state:active, replicas:{ 10.10.1.16:4040_solr_set_recent_shard1_replica1:{ state:active, base_url:http://10.10.1.16:4040/solr;, core:set_recent_shard1_replica1, node_name:10.10.1.16:4040_solr}, 10.10.1.72:2020_solr_set_recent_shard1_replica2:{ state:active, base_url:http://10.10.1.72:2020/solr;, core:set_recent_shard1_replica2, node_name:10.10.1.72:2020_solr}, 10.10.1.19:3030_solr_set_recent_shard1_replica3:{ state:active, base_url:http://10.10.1.19:3030/solr;, core:set_recent_shard1_replica3, node_name:10.10.1.19:3030_solr, leader:true}, 10.10.1.21:1010_solr_set_recent_shard1_replica4:{ state:active, base_url:http://10.10.1.21:1010/solr;, core:set_recent_shard1_replica4, node_name:10.10.1.21:1010_solr}, 10.10.1.14:5050_solr_set_recent_shard1_replica5:{ state:active, base_url:http://10.10.1.14:5050/solr;, core:set_recent_shard1_replica5, node_name:10.10.1.14:5050_solr}}}, shard2:{ range:d555-2aa9, state:active, replicas:{ 10.10.1.16:4040_solr_set_recent_shard2_replica1:{ state:active, base_url:http://10.10.1.16:4040/solr;, core:set_recent_shard2_replica1, node_name:10.10.1.16:4040_solr}, 10.10.1.72:2020_solr_set_recent_shard2_replica2:{ state:active, base_url:http://10.10.1.72:2020/solr;, core:set_recent_shard2_replica2, node_name:10.10.1.72:2020_solr}, 10.10.1.19:3030_solr_set_recent_shard2_replica3:{ state:active, base_url:http://10.10.1.19:3030/solr;, core:set_recent_shard2_replica3, node_name:10.10.1.19:3030_solr, leader:true}, 10.10.1.21:1010_solr_set_recent_shard2_replica4:{ state:active, base_url:http://10.10.1.21:1010/solr;, core:set_recent_shard2_replica4, node_name:10.10.1.21:1010_solr}, 10.10.1.14:5050_solr_set_recent_shard2_replica5:{ state:active, base_url:http://10.10.1.14:5050/solr;, core:set_recent_shard2_replica5, node_name:10.10.1.14:5050_solr}}}, shard3:{ range:2aaa-7fff, state:active, replicas:{ 10.10.1.16:4040_solr_set_recent_shard3_replica1:{ state:active, base_url:http://10.10.1.16:4040/solr;, core:set_recent_shard3_replica1, node_name:10.10.1.16:4040_solr}, 10.10.1.72:2020_solr_set_recent_shard3_replica2:{ state:active, base_url:http://10.10.1.72:2020/solr;, core:set_recent_shard3_replica2, node_name:10.10.1.72:2020_solr}, 10.10.1.19:3030_solr_set_recent_shard3_replica3:{ state:active, base_url:http://10.10.1.19:3030/solr;, core:set_recent_shard3_replica3, node_name:10.10.1.19:3030_solr, leader:true}, 10.10.1.21:1010_solr_set_recent_shard3_replica4:{ state:active, base_url:http://10.10.1.21:1010/solr;, core:set_recent_shard3_replica4, node_name:10.10.1.21:1010_solr}, 10.10.1.14:5050_solr_set_recent_shard3_replica5:{ state:active, base_url:http://10.10.1.14:5050/solr;, core:set_recent_shard3_replica5, node_name:10.10.1.14:5050_solr, maxShardsPerNode:3, router:{name:compositeId}, replicationFactor:5}} cZxid = 0x10014 ctime = Tue Mar 18 13:05:38 IST 2014 mZxid = 0x5027c mtime = Mon Mar 24 14:22:24 IST 2014 pZxid = 0x10014 cversion = 0 dataVersion = 387 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 4182 numChildren = 0 Kindly let me know for further inputs.. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-7-No-live-SolrServers-available-to-handle-this-request-tp4125679p4126478.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr4.7 No live SolrServers available to handle this request
Hi Greg, This is my Clusterstate.json. WatchedEvent state:SyncConnected type:None path:null [zk: 10.10.1.72:2185(CONNECTED) 0] get /clusterstate.json {set_recent:{ shards:{ shard1:{ range:8000-d554, state:active, replicas:{ 10.10.1.16:4040_solr_set_recent_shard1_replica1:{ state:active, base_url:http://10.10.1.16:4040/solr;, core:set_recent_shard1_replica1, node_name:10.10.1.16:4040_solr}, 10.10.1.72:2020_solr_set_recent_shard1_replica2:{ state:active, base_url:http://10.10.1.72:2020/solr;, core:set_recent_shard1_replica2, node_name:10.10.1.72:2020_solr}, 10.10.1.19:3030_solr_set_recent_shard1_replica3:{ state:active, base_url:http://10.10.1.19:3030/solr;, core:set_recent_shard1_replica3, node_name:10.10.1.19:3030_solr, leader:true}, 10.10.1.21:1010_solr_set_recent_shard1_replica4:{ state:active, base_url:http://10.10.1.21:1010/solr;, core:set_recent_shard1_replica4, node_name:10.10.1.21:1010_solr}, 10.10.1.14:5050_solr_set_recent_shard1_replica5:{ state:active, base_url:http://10.10.1.14:5050/solr;, core:set_recent_shard1_replica5, node_name:10.10.1.14:5050_solr}}}, shard2:{ range:d555-2aa9, state:active, replicas:{ 10.10.1.16:4040_solr_set_recent_shard2_replica1:{ state:active, base_url:http://10.10.1.16:4040/solr;, core:set_recent_shard2_replica1, node_name:10.10.1.16:4040_solr}, 10.10.1.72:2020_solr_set_recent_shard2_replica2:{ state:active, base_url:http://10.10.1.72:2020/solr;, core:set_recent_shard2_replica2, node_name:10.10.1.72:2020_solr}, 10.10.1.19:3030_solr_set_recent_shard2_replica3:{ state:active, base_url:http://10.10.1.19:3030/solr;, core:set_recent_shard2_replica3, node_name:10.10.1.19:3030_solr, leader:true}, 10.10.1.21:1010_solr_set_recent_shard2_replica4:{ state:active, base_url:http://10.10.1.21:1010/solr;, core:set_recent_shard2_replica4, node_name:10.10.1.21:1010_solr}, 10.10.1.14:5050_solr_set_recent_shard2_replica5:{ state:active, base_url:http://10.10.1.14:5050/solr;, core:set_recent_shard2_replica5, node_name:10.10.1.14:5050_solr}}}, shard3:{ range:2aaa-7fff, state:active, replicas:{ 10.10.1.16:4040_solr_set_recent_shard3_replica1:{ state:active, base_url:http://10.10.1.16:4040/solr;, core:set_recent_shard3_replica1, node_name:10.10.1.16:4040_solr}, 10.10.1.72:2020_solr_set_recent_shard3_replica2:{ state:active, base_url:http://10.10.1.72:2020/solr;, core:set_recent_shard3_replica2, node_name:10.10.1.72:2020_solr}, 10.10.1.19:3030_solr_set_recent_shard3_replica3:{ state:active, base_url:http://10.10.1.19:3030/solr;, core:set_recent_shard3_replica3, node_name:10.10.1.19:3030_solr, leader:true}, 10.10.1.21:1010_solr_set_recent_shard3_replica4:{ state:active, base_url:http://10.10.1.21:1010/solr;, core:set_recent_shard3_replica4, node_name:10.10.1.21:1010_solr}, 10.10.1.14:5050_solr_set_recent_shard3_replica5:{ state:active, base_url:http://10.10.1.14:5050/solr;, core:set_recent_shard3_replica5, node_name:10.10.1.14:5050_solr, maxShardsPerNode:3, router:{name:compositeId}, replicationFactor:5}} cZxid = 0x10014 ctime = Tue Mar 18 13:05:38 IST 2014 mZxid = 0x5027c mtime = Mon Mar 24 14:22:24 IST 2014 pZxid = 0x10014 cversion = 0 dataVersion = 387 aclVersion = 0 ephemeralOwner = 0x0 dataLength = 4182 numChildren = 0 Kindly let me know for further inputs.. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr4-7-No-live-SolrServers-available-to-handle-this-request-tp4125679p4126479.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr dih to read Clob contents
1. I don't see the definition of a datasource named 'xmldata' in your data-config. 2. You have forEach=/*:summary but I don't think that is a syntax supported by XPathRecordReader. If you can give a sample of the xml stored as Clob in your database, then we can help you write the right xpaths. On Mon, Mar 24, 2014 at 12:55 PM, Prasi S prasi1...@gmail.com wrote: My database configuration is as below entity name=x query=SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY FROM BOOK_REC fetch first 40 rows only transformer=ClobTransformer field column=MBR name=mbr / entity name=y dataSource=xmldata dataField=x.SMRY processor=XPathEntityProcessor forEach=/*:summary rootEntity=true field column=card_no xpath=/cardNo / /entity /entity and i get my response from solr as below doc str name=card_noorg...@1c8e807/str Am i mising anything? Thanks, Prasi On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty g...@mimirtech.com wrote: On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote: Hi, I have a requirement to index a database table with clob content. Each row in my table a column which is an xml stored as clob. I want to read the contents of xmlthrough dih and map each of the xml tag to a separate solr field, Below is my clob content. root authorA/author date02-Dec-2013/date . . . /root i want to read the contents of the clob and map author to author_solr and date to date_solr . Is this possible with a clob tranformer or a script tranformer. You will need to use a FieldReaderDataSource, and a XPathEntityProcessor along with the ClobTransformer. You do not provide details of your DIH data configuration file, but this should look something like: dataSource name=xmldata type=FieldReaderDataSource/ ... document entity name=x query=... transformer=ClobTransformer entity name=y dataSource=xmldata dataField=x.clob_column processor=XPathEntityProcessor forEach=/root field column=author_solr xpath=/author / field column=date_solr xpath=/date / /entity /entity /document Regards, Gora -- Regards, Shalin Shekhar Mangar.
Re: Solr dih to read Clob contents
Below is my full configuration, dataConfig dataSource driver=com.ibm.db2.jcc.DB2Driver url=jdbc:db2://IP:port/dbname user= password= / dataSource name=xmldata type=FieldReaderDataSource/ document entity name=x query=SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY FROM BOOK_REC fetch first 40 rows only transformer=ClobTransformer field column=MBR name=mbr / entity name=y dataSource=xmldata dataField=x.SMRY processor=XPathEntityProcessor forEach=/*:summary rootEntity=true field column=card_no xpath=/cardNo / /entity /entity /document /dataConfig And this is my xml data ns:summary xmlns:ns=*** cardNoZAYQ5181/tripId firstNameSam/firstName lastNameMathews/lastName date2013-01-18T23:29:04.492/date /ns:summary Thanks, Prasi On Mon, Mar 24, 2014 at 3:23 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: 1. I don't see the definition of a datasource named 'xmldata' in your data-config. 2. You have forEach=/*:summary but I don't think that is a syntax supported by XPathRecordReader. If you can give a sample of the xml stored as Clob in your database, then we can help you write the right xpaths. On Mon, Mar 24, 2014 at 12:55 PM, Prasi S prasi1...@gmail.com wrote: My database configuration is as below entity name=x query=SELECT ID, XMLSERIALIZE(SMRY as CLOB(1M)) as SMRY FROM BOOK_REC fetch first 40 rows only transformer=ClobTransformer field column=MBR name=mbr / entity name=y dataSource=xmldata dataField=x.SMRY processor=XPathEntityProcessor forEach=/*:summary rootEntity=true field column=card_no xpath=/cardNo / /entity /entity and i get my response from solr as below doc str name=card_noorg...@1c8e807/str Am i mising anything? Thanks, Prasi On Thu, Mar 20, 2014 at 4:25 PM, Gora Mohanty g...@mimirtech.com wrote: On 20 March 2014 14:53, Prasi S prasi1...@gmail.com wrote: Hi, I have a requirement to index a database table with clob content. Each row in my table a column which is an xml stored as clob. I want to read the contents of xmlthrough dih and map each of the xml tag to a separate solr field, Below is my clob content. root authorA/author date02-Dec-2013/date . . . /root i want to read the contents of the clob and map author to author_solr and date to date_solr . Is this possible with a clob tranformer or a script tranformer. You will need to use a FieldReaderDataSource, and a XPathEntityProcessor along with the ClobTransformer. You do not provide details of your DIH data configuration file, but this should look something like: dataSource name=xmldata type=FieldReaderDataSource/ ... document entity name=x query=... transformer=ClobTransformer entity name=y dataSource=xmldata dataField=x.clob_column processor=XPathEntityProcessor forEach=/root field column=author_solr xpath=/author / field column=date_solr xpath=/date / /entity /entity /document Regards, Gora -- Regards, Shalin Shekhar Mangar.
Re: Getting 500s on distributed queries with SolrCloud
Hi Shalin, Thank you for your answer. I'm already using custom hashing to make sure all the docs that are going to be grouped together are on the same shard. During index I make sure the uniqueKey is something like: productId!skuId so all the skus belonging to the same product will end up on the same shard. At query time I will then group on the product id (I want all the skus grouped by their owning product). While working correctly the above did not fix the problem :/ What I have found by selectively switching off my grouping instructions to SOLR is that the problem is in the group.limit=-1 that I append to each query. This query (with all the skus sharing the same product sharded correctly on the same shard) does not work: http://localhost:9766/skus/product_looks_for_sale? q=newsaledistrib=truefl=doc_id,%20idgroup=true*group.limit=-1* while this works fine: http://localhost:9766/skus/product_looks_for_sale? q=new-salestart=0distrib=truefl=doc_id,%20idgroup=true *group.limit=100* AFAIK the -1 is only yo tell SOLR to give back all the matching docs in a group so given that I do not think I will have more than 100s skus in a single product I'm going to fix this issue setting the limit to 100. Would be nice to know why the -1 makes the query fail anyway :) Any thought ? Thank you, Ugo On Mon, Mar 24, 2014 at 6:30 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: The Grouping feature only works if groups are in the same shard. Perhaps that is the problem here? I could find https://issues.apache.org/jira/browse/SOLR-4164 which says that once the sharding was fixed, the problem went away. We should come up with a better exception message though. On Fri, Mar 21, 2014 at 10:49 PM, Ugo Matrangolo ugo.matrang...@gmail.com wrote: Hi, I have a two shard collection running and I'm getting this error on each query: 2014-03-21 17:08:42,018 [qtp-75] ERROR org.apache.solr.servlet.SolrDispatchFilter - *null:java.lang.IllegalArgumentException: numHits must be 0; please use TotalHitCountCollector if you just need the total hit count* at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1130) at org.apache.lucene.search.TopFieldCollector.create(TopFieldCollector.java:1079) at org.apache.lucene.search.grouping.AbstractSecondPassGroupingCollector.init(AbstractSecondPassGroupingCollector.java:75) at org.apache.lucene.search.grouping.term.TermSecondPassGroupingCollector.init(TermSecondPassGroupingCollector.java:49) at org.apache.solr.search.grouping.distributed.command.TopGroupsFieldCommand.create(TopGroupsFieldCommand.java:129) at org.apache.solr.search.grouping.CommandHandler.execute(CommandHandler.java:142) at org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:387) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:214) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) Note that I'm using grouping and disabling it fixed the problem. I was aware that SolrCloud does not fully supports grouping in a distributed setup but I was expecting incorrect results (that have to addressed with custom hashing afaik) and not an error. Does anyone see this error before? Ugo -- Regards, Shalin Shekhar Mangar.
highlight did not work correctly
Hi all While using solr 4.6 to highlight the result, I ran into a strange situation. Most searching results were correctly highlighted. But a few gave out all the content of the indexed webpage without any highlighted keywords. Is anybody ever met this problem? Here is my solrconfig.xml -- bool name=hltrue/bool str name=hl.flcontent title/str str name=hl.simple.prelt;bgt;lt;emgt;lt;biggt;/str str name=hl.simple.postlt;/bgt;lt;/emgt;lt;/biggt;/str str name=f.title.hl.fragsize0/str str name=f.title.hl.alternateFieldtitle/str str name=f.content.hl.snippets1/str str name=f.content.hl.fragsize200/str str name=f.content.hl.alternateFieldcontent/str str name=f.content.hl.maxAlternateFieldLength200/str -- I would appreciate for any reply. THX
Re: highlight did not work correctly
Hi, You may need to increase hl.maxAnalyzedChars which has a default of 51200. On Monday, March 24, 2014 2:33 PM, panzj.f...@cn.fujitsu.com panzj.f...@cn.fujitsu.com wrote: Hi all While using solr 4.6 to highlight the result, I ran into a strange situation. Most searching results were correctly highlighted. But a few gave out all the content of the indexed webpage without any highlighted keywords. Is anybody ever met this problem? Here is my solrconfig.xml -- bool name=hltrue/bool str name=hl.flcontent title/str str name=hl.simple.prelt;bgt;lt;emgt;lt;biggt;/str str name=hl.simple.postlt;/bgt;lt;/emgt;lt;/biggt;/str str name=f.title.hl.fragsize0/str str name=f.title.hl.alternateFieldtitle/str str name=f.content.hl.snippets1/str str name=f.content.hl.fragsize200/str str name=f.content.hl.alternateFieldcontent/str str name=f.content.hl.maxAlternateFieldLength200/str -- I would appreciate for any reply. THX
Ram usage
Hi All We have a 4 node cluster with a collection thats sharded into 2 and each shard having a master and a slave for redundancy however 1 node has decied to use twice the ram that the others are using within the cluster The only difference we can spot between the node is that the one with the ram usage is saying its a slave while all the other are reporting that they are masters Does any one have any ideas why this has occurred Cheers, David
Re: Ram usage
Hi David; Which version of Solr you are using? Thanks; Furkan KAMACI 2014-03-24 15:15 GMT+02:00 David Flower dflo...@amplience.com: Hi All We have a 4 node cluster with a collection thats sharded into 2 and each shard having a master and a slave for redundancy however 1 node has decied to use twice the ram that the others are using within the cluster The only difference we can spot between the node is that the one with the ram usage is saying its a slave while all the other are reporting that they are masters Does any one have any ideas why this has occurred Cheers, David
Re: Ram usage
We¹re still on 4.4.0 David On 24/03/2014 13:19, Furkan KAMACI furkankam...@gmail.com wrote: Hi David; Which version of Solr you are using? Thanks; Furkan KAMACI 2014-03-24 15:15 GMT+02:00 David Flower dflo...@amplience.com: Hi All We have a 4 node cluster with a collection thats sharded into 2 and each shard having a master and a slave for redundancy however 1 node has decied to use twice the ram that the others are using within the cluster The only difference we can spot between the node is that the one with the ram usage is saying its a slave while all the other are reporting that they are masters Does any one have any ideas why this has occurred Cheers, David
Re: SolrCloud from Stopping recovery for warnings to crash
Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr started crashing). When we were upgrading, we just upgraded solr and changed versions in collections configs. When solr crashes we get OOM but only 2h after first Stopping recovery warnings. Maybe you have any ideas when Stopping recovery warnings are thrown? Because now we have no idea what could cause this issue. Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar shalinman...@gmail.com : Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can cause out of memory issues. Can you check your logs for out of memory errors? On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis lukasmikuc...@gmail.com wrote: Solr version: 4.7 Architecture: 2 solrs (1 shard, leader + replica) 3 zookeepers Servers: * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper Solr data: * 21 collections * Many fields, small docs, docs count per collection from 1k to 500k About a week ago solr started crashing. It crashes every day, 3-4 times a day. Usually at nigh. I can't tell anything what could it be related to because at that time we haven't done any configuration changes. Load haven't changed too. Everything starts with Stopping recovery for .. warnings (every warnings is repeated several times): WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=** WARN org.apache.solr.cloud.ElectionContext; cancelElection did not find election node to remove WARN org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller; File _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879 WARN - 2014-03-23 04:00:54.126; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0003272 refcount=2} active=true starting pos=356216606 Then again Stopping recovery for .. warnings: WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=** ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: collection1 slice: shard1 ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: I was asked to wait on state down for IP:PORT_solr but I still do not see the requested state. I see state: active live:false After this serves mostly didn't recover. -- Regards, Shalin Shekhar Mangar.
Re: Solr4.7 No live SolrServers available to handle this request
Sathya, We're still missing a fair amount of information here though it looks like your cluster is healthy. How are you indexing and what's the request you're sending that results in the error you're seeing? Have you checked your nodes' logs for errors that correspond with the one you're seeing while indexing? Thanks, Greg On Mar 22, 2014, at 2:32 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Thanks Michael! I just committed your fix. It will be released with 4.7.1 On Fri, Mar 21, 2014 at 8:30 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: I just managed to track this down -- as you said the disconnect was a red herring. Ultimately the problem was caused by a custom analysis component we wrote that was raising an IOException -- it was missing some configuration files it relies on. What might be interesting for solr devs to have a look at is that exception was completely swallowed by JavabinCodec, making it very difficult to track down the problem. Furthermore -- if the /add request was routed directly to the shard where the document was destined to end up, then the IOException raised by the analysis component (a char filter) showed up in the Solr HTTP response (probably because my client used XML format in one test -- javabin is used internally in SolrCloud). But if the request was routed to a different shard, then the only exception that showed up anywhere (in the logs, in the HTTP response) was kind of irrelevant. I think this could be fixed pretty easily; see SOLR-5985 for my suggestion. -Mike On 03/21/2014 10:20 AM, Greg Walters wrote: Broken pipe errors are generally caused by unexpected disconnections and are some times hard to track down. Given the stack traces you've provided it's hard to point to any one thing and I suspect the relevant information was snipped out in the long dump of document fields. You might grab the entire error from the client you're uploading documents with, the server you're connected to and any other nodes that have an error at the same time and put it on pastebin or the like. Thanks, Greg On Mar 20, 2014, at 3:36 PM, Michael Sokolov msoko...@safaribooksonline.com wrote: I'm getting a similar exception when writing documents (on the client side). I can write one document fine, but the second (which is being routed to a different shard) generates the error. It happens every time - definitely not a resource issue or timing problem since this database is completely empty -- I'm just getting started and running some tests, so there must be some kind of setup problem. But it's difficult to diagnose (for me, anyway)! I'd appreciate any insight, hints, guesses, etc. since I'm stuck. Thanks! One node (the leader?) is reporting Internal Server Error in its log, and another node (presumably the shard where the document is being directed) bombs out like this: ERROR - 2014-03-20 15:56:53.022; org.apache.solr.common.SolrException; null:org.apache.solr.common.SolrException: ERROR adding document SolrInputDocument( ... long dump of document fields ) at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:99) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:166) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:136) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:225) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:190) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:173) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:106) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) ... Caused by: java.net.SocketException: Broken pipe at java.net.SocketOutputStream.socketWrite0(Native Method) at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109) at java.net.SocketOutputStream.write(SocketOutputStream.java:153) at org.apache.coyote.http11.InternalOutputBuffer.realWriteBytes(InternalOutputBuffer.java:215) at
Re: SolrCloud from Stopping recovery for warnings to crash
I am guessing that it is all related to memory issues. I guess that as the used heap increases, full GC cycles increase causing ZK timeouts which in turn cause more recoveries to be initiated. In the end, everything blows up with the out of memory errors. Do you log GC activity on your servers? I suggest that you rollback to 4.6.1 for now and upgrade to 4.7.1 when it releases next week. On Mon, Mar 24, 2014 at 7:51 PM, Lukas Mikuckis lukasmikuc...@gmail.com wrote: Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr started crashing). When we were upgrading, we just upgraded solr and changed versions in collections configs. When solr crashes we get OOM but only 2h after first Stopping recovery warnings. Maybe you have any ideas when Stopping recovery warnings are thrown? Because now we have no idea what could cause this issue. Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar shalinman...@gmail.com : Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can cause out of memory issues. Can you check your logs for out of memory errors? On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis lukasmikuc...@gmail.com wrote: Solr version: 4.7 Architecture: 2 solrs (1 shard, leader + replica) 3 zookeepers Servers: * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper Solr data: * 21 collections * Many fields, small docs, docs count per collection from 1k to 500k About a week ago solr started crashing. It crashes every day, 3-4 times a day. Usually at nigh. I can't tell anything what could it be related to because at that time we haven't done any configuration changes. Load haven't changed too. Everything starts with Stopping recovery for .. warnings (every warnings is repeated several times): WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=** WARN org.apache.solr.cloud.ElectionContext; cancelElection did not find election node to remove WARN org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller; File _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879 WARN - 2014-03-23 04:00:54.126; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0003272 refcount=2} active=true starting pos=356216606 Then again Stopping recovery for .. warnings: WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=** ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: collection1 slice: shard1 ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: I was asked to wait on state down for IP:PORT_solr but I still do not see the requested state. I see state: active live:false After this serves mostly didn't recover. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Ram usage
On 3/24/2014 7:15 AM, David Flower wrote: We have a 4 node cluster with a collection thats sharded into 2 and each shard having a master and a slave for redundancy however 1 node has decied to use twice the ram that the others are using within the cluster The only difference we can spot between the node is that the one with the ram usage is saying its a slave while all the other are reporting that they are masters If you are using SolrCloud, then there are no masters and no slaves. Each shard has a leader, but that is not a permanent role. The master and slave designations that you see on the Replication tab have zero meaning in SolrCloud unless a replication happens to be happening right at that moment. In SolrCloud, replication is only used at node startup, and only if it's required. The master/slave roles are decided at the moment of replication and are not changed until another replication becomes necessary. When you say it's using twice the RAM, what *precisely* are you looking at which tells you this? Due to Solr using MMap for file access, some of the numbers reported by the operating system will look bad but will not indicate a problem. Thanks, Shawn
Re: SolrCloud from Stopping recovery for warnings to crash
Garbage Collectors Summary: https://apps.sematext.com/spm-reports/s/rgRnwuShgI Pool Size: https://apps.sematext.com/spm-reports/s/H16ndqichM First Stopping recovery warning: 4:00, OOM error: 6:30. 2014-03-24 16:35 GMT+02:00 Shalin Shekhar Mangar shalinman...@gmail.com: I am guessing that it is all related to memory issues. I guess that as the used heap increases, full GC cycles increase causing ZK timeouts which in turn cause more recoveries to be initiated. In the end, everything blows up with the out of memory errors. Do you log GC activity on your servers? I suggest that you rollback to 4.6.1 for now and upgrade to 4.7.1 when it releases next week. On Mon, Mar 24, 2014 at 7:51 PM, Lukas Mikuckis lukasmikuc...@gmail.com wrote: Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr started crashing). When we were upgrading, we just upgraded solr and changed versions in collections configs. When solr crashes we get OOM but only 2h after first Stopping recovery warnings. Maybe you have any ideas when Stopping recovery warnings are thrown? Because now we have no idea what could cause this issue. Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar shalinman...@gmail.com : Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can cause out of memory issues. Can you check your logs for out of memory errors? On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis lukasmikuc...@gmail.com wrote: Solr version: 4.7 Architecture: 2 solrs (1 shard, leader + replica) 3 zookeepers Servers: * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper Solr data: * 21 collections * Many fields, small docs, docs count per collection from 1k to 500k About a week ago solr started crashing. It crashes every day, 3-4 times a day. Usually at nigh. I can't tell anything what could it be related to because at that time we haven't done any configuration changes. Load haven't changed too. Everything starts with Stopping recovery for .. warnings (every warnings is repeated several times): WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=** WARN org.apache.solr.cloud.ElectionContext; cancelElection did not find election node to remove WARN org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller; File _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879 WARN - 2014-03-23 04:00:54.126; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0003272 refcount=2} active=true starting pos=356216606 Then again Stopping recovery for .. warnings: WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=** ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: collection1 slice: shard1 ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: I was asked to wait on state down for IP:PORT_solr but I still do not see the requested state. I see state: active live:false After this serves mostly didn't recover. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Re: Ram usage
I¹m looking at dashboard page on all 4 nodes and seeing Physical Memory 92% compared with ~41-44% And JVM-Memory 52.9% compared to 23-28% The reason I mentioned slave is that on the core overview page there is An entry for Slave (Searching) that doesn¹t appear on any of the other nodes Cheers, David On 24/03/2014 14:47, Shawn Heisey s...@elyograg.org wrote: On 3/24/2014 7:15 AM, David Flower wrote: We have a 4 node cluster with a collection thats sharded into 2 and each shard having a master and a slave for redundancy however 1 node has decied to use twice the ram that the others are using within the cluster The only difference we can spot between the node is that the one with the ram usage is saying its a slave while all the other are reporting that they are masters If you are using SolrCloud, then there are no masters and no slaves. Each shard has a leader, but that is not a permanent role. The master and slave designations that you see on the Replication tab have zero meaning in SolrCloud unless a replication happens to be happening right at that moment. In SolrCloud, replication is only used at node startup, and only if it's required. The master/slave roles are decided at the moment of replication and are not changed until another replication becomes necessary. When you say it's using twice the RAM, what *precisely* are you looking at which tells you this? Due to Solr using MMap for file access, some of the numbers reported by the operating system will look bad but will not indicate a problem. Thanks, Shawn
Re: join and filter query with AND
glad the suggestions are working for you! Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Mon, Mar 24, 2014 at 4:10 AM, Marcin Rzewucki mrzewu...@gmail.comwrote: Hi, Yonik, thank you for explaining me the reason of the issue. The workarounds you suggested are working fine. Kranti, your suggestion was also good :-) Thanks a lot! On 21 March 2014 20:00, Kranti Parisa kranti.par...@gmail.com wrote: My example should also work, am I missing something? q=({!join from=inner_id to=outer_id fromIndex=othercore v=$joinQuery})joinQuery=(city:Stara Zagora AND prod:214) Thanks, Kranti K. Parisa http://www.linkedin.com/in/krantiparisa On Fri, Mar 21, 2014 at 2:11 PM, Yonik Seeley yo...@heliosearch.com wrote: Correct. This is only a limitation of embedding a local-params style subquery within lucene syntax. The parser, not knowing the syntax of the embedded query, currently assumes the query text ends at whitespace or other special punctuation such as ). Original: (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara Zagora)) AND (prod:214) Some possible workarounds that should work: q={!join from=inner_id to=outer_id fromIndex=othercore}city:Stara Zagora fq=prod:214 q=({!join from=inner_id to=outer_id fromIndex=othercore v='city:Stara Zagora'} AND prod:214) q=({!join from=inner_id to=outer_id fromIndex=othercore v=$jq} AND prod:214) jq=city:Stara Zagora -Yonik http://heliosearch.org - solve Solr GC pauses with off-heap filters and fieldcache On Fri, Mar 21, 2014 at 1:54 PM, Jack Krupansky j...@basetechnology.com wrote: I suspect that this is a bug in the implementation of the parsing of embedded nested query parsers . That's a fairly new feature compared to non-embedded nested query parsers - maybe Yonik could shed some light. This may date from when he made a copy of the Lucene query parser for Solr and added the parsing of embedded nested query parsers to the grammar. It seems like the embedded nested query parser is only being applied to a single, white space-delimited term, and not respecting the fact that the term is a quoted phrase. -- Jack Krupansky -Original Message- From: Marcin Rzewucki Sent: Thursday, March 20, 2014 5:19 AM To: solr-user@lucene.apache.org Subject: Re: join and filter query with AND Nope. There is no line break in the string and it is not feed from file. What else could be the reason ? On 19 March 2014 17:57, Erick Erickson erickerick...@gmail.com wrote: It looks to me like you're feeding this from some kind of text file and you really _do_ have a line break after Stara Or have a line break in the string you paste into the URL or something similar. Kind of shooting in the dark though. Erick On Wed, Mar 19, 2014 at 8:48 AM, Marcin Rzewucki mrzewu...@gmail.com wrote: Hi, I have the following issue with join query parser and filter query. For such query: str name=q*:*/str str name=fq (({!join from=inner_id to=outer_id fromIndex=othercore}city:Stara Zagora)) AND (prod:214) /str I got error: lst name=error str name=msg org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. Encountered: EOF after : \Stara /str int name=code400/int /lst Stack: DEBUG - 2014-03-19 13:35:20.825; org.eclipse.jetty.servlet.ServletHandler; chain=SolrRequestFilter-default DEBUG - 2014-03-19 13:35:20.826; org.eclipse.jetty.servlet.ServletHandler$CachedChain; call filter SolrRequestFilter ERROR - 2014-03-19 13:35:20.828; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Cannot parse 'city:Stara': Lexical error at line 1, column 12. E ncountered: EOF after : \Stara at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:179) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:193) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1916) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:780) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:427) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:217) at
Re: SolrCloud from Stopping recovery for warnings to crash
We tried to set ZK timeout to 1s and did load testing (both indexing and search) and this issue didn't happen. 2014-03-24 17:00 GMT+02:00 Lukas Mikuckis lukasmikuc...@gmail.com: Garbage Collectors Summary: https://apps.sematext.com/spm-reports/s/rgRnwuShgIhttps://app.getsignals.com/link?url=https%3A%2F%2Fapps.sematext.com%2Fspm-reports%2Fs%2FrgRnwuShgIukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIa0jfILDAk=26275c93-7d78-4359-c01e-afe10a004d52 Pool Size: https://apps.sematext.com/spm-reports/s/H16ndqichMhttps://app.getsignals.com/link?url=https%3A%2F%2Fapps.sematext.com%2Fspm-reports%2Fs%2FH16ndqichMukey=agxzfnNpZ25hbHNjcnhyGAsSC1VzZXJQcm9maWxlGICAgIa0jfILDAk=5027ed8d-cdc8-4e12-ea51-ea5677720d9a First Stopping recovery warning: 4:00, OOM error: 6:30. 2014-03-24 16:35 GMT+02:00 Shalin Shekhar Mangar shalinman...@gmail.com: I am guessing that it is all related to memory issues. I guess that as the used heap increases, full GC cycles increase causing ZK timeouts which in turn cause more recoveries to be initiated. In the end, everything blows up with the out of memory errors. Do you log GC activity on your servers? I suggest that you rollback to 4.6.1 for now and upgrade to 4.7.1 when it releases next week. On Mon, Mar 24, 2014 at 7:51 PM, Lukas Mikuckis lukasmikuc...@gmail.com wrote: Yes, we upgraded solr from 4.6.1 to 4.7 3 weeks ago (2 weeks before solr started crashing). When we were upgrading, we just upgraded solr and changed versions in collections configs. When solr crashes we get OOM but only 2h after first Stopping recovery warnings. Maybe you have any ideas when Stopping recovery warnings are thrown? Because now we have no idea what could cause this issue. Mon, 24 Mar 2014 04:03:17 GMT Shalin Shekhar Mangar shalinman...@gmail.com : Did you upgrade recently to Solr 4.7? 4.7 has a bad bug which can cause out of memory issues. Can you check your logs for out of memory errors? On Sun, Mar 23, 2014 at 9:07 PM, Lukas Mikuckis lukasmikuc...@gmail.com wrote: Solr version: 4.7 Architecture: 2 solrs (1 shard, leader + replica) 3 zookeepers Servers: * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper + solr (heap 4gb) - RAM 8gb, 2 cpu cores * zookeeper Solr data: * 21 collections * Many fields, small docs, docs count per collection from 1k to 500k About a week ago solr started crashing. It crashes every day, 3-4 times a day. Usually at nigh. I can't tell anything what could it be related to because at that time we haven't done any configuration changes. Load haven't changed too. Everything starts with Stopping recovery for .. warnings (every warnings is repeated several times): WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=** WARN org.apache.solr.cloud.ElectionContext; cancelElection did not find election node to remove WARN org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:26.286; org.apache.solr.update.PeerSync; no frame of reference to tell if we've missed updates WARN - 2014-03-23 04:00:30.728; org.apache.solr.handler.SnapPuller; File _f9m_Lucene41_0.doc expected to be 6218278 while it is 7759879 WARN - 2014-03-23 04:00:54.126; org.apache.solr.update.UpdateLog$LogReplayer; Starting log replay tlog{file=/path/solr/collection1_shard1_replica2/data/tlog/tlog.0003272 refcount=2} active=true starting pos=356216606 Then again Stopping recovery for .. warnings: WARN org.apache.solr.cloud.RecoveryStrategy; Stopping recovery for zkNodeName=core_node1core=** ERROR - 2014-03-23 05:19:29.566; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: No registered leader was found after waiting for 4000ms , collection: collection1 slice: shard1 ERROR - 2014-03-23 05:20:03.961; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: I was asked to wait on state down for IP:PORT_solr but I still do not see the requested state. I see state: active live:false After this serves mostly didn't recover. -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Using Sentence Information For Snippet Generation
Hi; When I generate snippet via Solr I do not want to remove beginning of any sentence at the snippet. So I need to do a sentence detection. I think that I can do it before I send documents into Solr. I can put some special characters that signs beginning or end of a sentence. Then I can use that information when generating snippet. On the other hand I should not show that special character to the user. What do you think that how can I do it or do you have any other ideas for my purpose? PS: I do not do it for English sentences. Thanks; Furkan KAMACI
Re: Ram usage
I¹m looking at dashboard page on all 4 nodes and seeing Physical Memory 92% compared with ~41-44% And JVM-Memory 52.9% compared to 23-28% The reason I mentioned slave is that on the core overview page there is An entry for Slave (Searching) that doesn¹t appear on any of the other nodes Cheers, David It's completely normal for physical memory to be nearly 100 percent at all times, unless you have memory available that greatly exceeds the size of your index and any other data used by programs on the server. This is simply how operating systems work. http://en.wikipedia.org/wiki/Page_cache JVM memory usage is highly variable and will fluctuate all over. Do a Google image search for 'JVM sawtooth' to see how memory usage graphs within the JVM. Thanks, Shawn
Re: Ram usage
Its not saw toothing though it’s sitting solidly at 52% On 24/03/2014 15:46, Shawn Heisey s...@elyograg.org wrote: I¹m looking at dashboard page on all 4 nodes and seeing Physical Memory 92% compared with ~41-44% And JVM-Memory 52.9% compared to 23-28% The reason I mentioned slave is that on the core overview page there is An entry for Slave (Searching) that doesn¹t appear on any of the other nodes Cheers, David It's completely normal for physical memory to be nearly 100 percent at all times, unless you have memory available that greatly exceeds the size of your index and any other data used by programs on the server. This is simply how operating systems work. http://en.wikipedia.org/wiki/Page_cache JVM memory usage is highly variable and will fluctuate all over. Do a Google image search for 'JVM sawtooth' to see how memory usage graphs within the JVM. Thanks, Shawn
Re: Can the solr dataimporthandler consume an atom feed?
The only message I get is: Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. Requests: 1, Skipped: 0 And there are no errors in the log. Here's what the ibm atom feed looks like: ?xml version=1.0 encoding=utf-16? atom:feed xmlns:atom=http://www.w3.org/2005/Atom; xmlns:wplc=http://www.ibm.com/wplc/atom/1.0; xmlns:age=http://purl.org/atompub/age/1.0; xmlns:snx=http://www.ibm.com/xmlns/prod/sn; xmlns:lconn=http://www.ibm.com/lotus/connections/seedlist/atom/1.0; atom:id https://[redacted]/files/seedlist/myserver?Action=GetDocumentsamp;Format=ATOMamp;Locale=en_USamp;Range=2amp;Start=0/atom:id atom:link href=https://[redacted]/files/seedlist/myserver?Action=GetDocumentsamp;Range=2amp;Start=1000amp;Format=ATOMamp;Locale=en_USamp;State=U0VDT05EXzIwMTQtMDMtMTMgMTY6MjM6NTguODRfMjAxMS0wNi0wNiAwODowNDoxNC42MjJfNmQ1YzQ3MWMtYTM3ZS00ZjlmLWE0OGEtZWZjYjMyZjU2NDgzXzEwMDBfZmFsc2U%3D; rel=next type=application/atom+xml title=Next page / atom:generator xml:lang=en-US version=1.2 lconn:version=4.0.0.0Seedlist Service Backend System/atom:generator atom:category term=ContentSourceType/Files scheme=com.ibm.wplc.taxonomy://feature_taxonomy label=Files / atom:title xml:lang=en-USFiles : 1,000 entries of Seedlist FILES/atom:title wplc:action do=update / wplc:fieldInfo id=title name=Title type=string contentSearchable=true fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=author name=Owner's directory id type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=published name=Created timestamp type=date contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=updated name=Last modification timestamp (major change only, as indicated in UI) type=date contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=false / wplc:fieldInfo id=summary name=Description type=string contentSearchable=true fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=tag name=Tag type=string contentSearchable=true fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=commentCount name=Number of comments type=int contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=true / wplc:fieldInfo id=downloadCount name=Number of downloads type=int contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=true / wplc:fieldInfo id=recommendCount name=Number of recommendations type=int contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=true / wplc:fieldInfo id=fileUpdated name=Binary file last modification timestamp type=date contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=true supportsExactMatch=true / wplc:fieldInfo id=fileSize name=Binary file size type=int contentSearchable=false fieldSearchable=false parametric=true returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=fileName name=File name type=string contentSearchable=true fieldSearchable=true parametric=false returnable=true sortable=true supportsExactMatch=false / wplc:fieldInfo id=sharedWithUser name=Shared with user's directory id type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=sharedWithUserName name=Shared with user's name type=string contentSearchable=false fieldSearchable=false parametric=false returnable=true sortable=false supportsExactMatch=false / wplc:fieldInfo id=libraryId name=The id of library owning the file type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=ORGANISATIONAL_ID name=The id of the organization the owning user belongs to type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=communityId name=The id of the community associated to the file type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=containerType name=The type of the container (library) associated to the file type=string contentSearchable=false fieldSearchable=true parametric=false returnable=true sortable=false supportsExactMatch=true / wplc:fieldInfo id=ATOMAPISOURCE name=Atom API link type=string
Re: Ram usage
On 3/24/2014 9:48 AM, David Flower wrote: Its not saw toothing though it’s sitting solidly at 52% It may be very difficult to see the sawtooth effect unless you actually connect an app like jconsole to your running Solr instance and watch the graphs over time. My point was that what you've described does not sound like a problem at all. If you are having symptoms noticeable from the client side, then we can tackle those, but these memory numbers sound fine to me. Thanks, Shawn
Re: Singles in solr for bigrams,trigrams in parsed_query
Hi, Query rewrite happens down the chain, after query parsing. For example a wildcard query triggers an index based query rewrite where terms matching the wildcard are added into the original query. In your case, looks like the query rewrite will generate the ngrams and add them into the original query. So just make sure, that the analysis page shows what you expect on indexing and querying sides. Out of curiosity: what are you trying to achieve with the query side shingles? Isn't just index time shingles enough? On Thu, Mar 20, 2014 at 8:06 PM, Jyotirmoy Sundi sundi...@gmail.com wrote: Hi Folks, I am using singles to index bigrams/trigrams. The same is also used for query in the schema.xml file. But when I run the query in debug mode for a collections, I dont see the bigrams in the parsed_query . Any idea what I might be missing. solr/colection/select?q=best%20pricedebugQuery=on str name=parsedquery_toStringtext:best text:price/str I was hoping to see str name=parsedquery_toStringtext:best text:price text:best price/str My schema files looks like this: types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=int class=solr.TrieIntField precisionStep=0 omitNorms=true positionIncrementGap=0/ fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index charFilter class=solr.HTMLStripCharFilterFactory/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=4 outputUnigrams=true / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.LengthFilterFactory min=3 max=50 / filter class=solr.WordDelimiterFilterFactory generateWordParts=0 generateNumberParts=0 catenateWords=1 catenateNumbers=1 catenateAll=1 preserveOriginal=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1/ filter class=solr.StopFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.TrimFilterFactory / /analyzer analyzer type=query filter class=solr.LowerCaseFilterFactory/ filter class=solr.LengthFilterFactory min=3 max=50 / tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory/ filter class=solr.TrimFilterFactory / filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=0 splitOnNumerics=0 stemEnglishPossessive=1/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=4 outputUnigrams=true / filter class=solr.CommonGramsFilterFactory words=stopwords.txt ignoreCase=true/ !--filter class=solr.CommonGramsFilterFactory words=stopwords.txt ignoreCase=true/ filter class=solr.ShingleFilterFactory minShingleSize=2 maxShingleSize=4 outputUnigrams=true /-- /analyzer /fieldType /types -- Best Regards, Jyotirmoy Sundi -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: Solr Cloud collection keep going down?
Shawn, Thanks for pointing me in the right direction. After consulting the above document I *think* that the problem may be too large of a heap and which may be affecting GC collection and hence causing ZK timeouts. We have around 20G of memory on these machines with a min/max of heap at 6, 8 respectively (-Xms6G -Xmx10G). The rest was allocated for aside for disk cache. Why did we choose 6-10? No other reason than we wanted to allot enough for disk cache and then everything else was thrown and Solr. Does this sound about right? I took some screenshots for VisualVM and our NewRelic reporting as well as some relevant portions of our SolrConfig.xml. Any thoughts/comments would be greatly appreciated. http://postimg.org/gallery/4t73sdks/1fc10f9c/ Thanks On Sat, Mar 22, 2014 at 2:26 PM, Shawn Heisey s...@elyograg.org wrote: On 3/22/2014 1:23 PM, Software Dev wrote: We have 2 collections with 1 shard each replicated over 5 servers in the cluster. We see a lot of flapping (down or recovering) on one of the collections. When this happens the other collection hosted on the same machine is still marked as active. When this happens it takes a fairly long time (~30 minutes) for the collection to come back online, if at all. I find that its usually more reliable to completely shutdown solr on the affected machine and bring it back up with its core disabled. We then re-enable the core when its marked as active. A few questions: 1) What is the healthcheck in Solr-Cloud? Put another way, what is failing that marks one collection as down but the other on the same machine as up? 2) Why does recovery take forever when a node goes down.. even if its only down for 30 seconds. Our index is only 7-8G and we are running on SSD's. 3) What can be done to diagnose and fix this problem? Unless you are actually using the ping request handler, the healthcheck config will not matter. Or were you referring to something else? Referencing the logs you included in your reply: The EofException errors happen because your client code times out and disconnects before the request it made has completed. That is most likely just a symptom that has nothing at all to do with the problem. Read the following wiki page. What I'm going to say below will reference information you can find there: http://wiki.apache.org/solr/SolrPerformanceProblems Relevant side note: The default zookeeper client timeout is 15 seconds. A typical zookeeper config defines tickTime as 2 seconds, and the timeout cannot be configured to be more than 20 times the tickTime, which means it cannot go beyond 40 seconds. The default timeout value 15 seconds is usually more than enough, unless you are having performance problems. If you are not actually taking Solr instances down, then the fact that you are seeing the log replay messages indicates to me that something is taking so much time that the connection to Zookeeper times out. When it finally responds, it will attempt to recover the index, which means first it will replay the transaction log and then it might replicate the index from the shard leader. Replaying the transaction log is likely the reason it takes so long to recover. The wiki page I linked above has a slow startup section that explains how to fix this. There is some kind of underlying problem that is causing the zookeeper connection to timeout. It is most likely garbage collection pauses or insufficient RAM to cache the index, possibly both. You did not indicate how much total RAM you have or how big your Java heap is. As the wiki page mentions in the SSD section, SSD is not a substitute for having enough RAM to cache at significant percentage of your index. Thanks, Shawn
Solr 4.3.1 memory swapping
Hello all, we have a SolrCloud implementation in production, with two servers running Solr 4.3.1 in a SolrCloud configuration. Our search index is about 70-80GB in size. The trouble is that after several days of uptime, we will suddenly have periods where the operating system Solr is running in starts swapping heavily. This gets progressively worse until the swapping slows things down so much that Zookeeper thinks the nodes are no longer available. If both nodes are swapping, it can lead to an outage, which has happened to us a couple of times. My question is why is it swapping? Here's an example with numbers from our prod environment: - Total physical memory: 16GB - Physical memory usage: 15.58GB (99.4%) - Total swap space: 4GB - Swap space usage: 1.51GB (37.7%) - Total JVM Memory: 10GB - JVM heap: 1.89GB/4.44GB The top command reports that the JVM has 3.8GB resident RAM and 81.8GB virtual. Note that it is using up close to half of the swap space, even though the JVM only needs a subset of the physical memory. So what is causing the swapping, and what should I do about it? I can add more memory to the VMs if I need to, but how much? And how much should I allocate to JVM v. leave available for the OS? I could attach a screen shot of our Solr console and the top output if the listserv allows attachments. Any ideas? Thanks! Darrell Burgan [Description: Infor]http://www.infor.com/ Darrell Burgan | Chief Architect, PeopleAnswers office: 214 445 2172 | mobile: 214 564 4450 | fax: 972 692 5386 | darrell.bur...@infor.commailto:darrell.bur...@infor.com | http://www.infor.com CONFIDENTIALITY NOTE: This email (including any attachments) is confidential and may be protected by legal privilege. If you are not the intended recipient, be aware that any disclosure, copying, distribution, or use of the information contained herein is prohibited. If you have received this message in error, please notify the sender by replying to this message and then delete this message in its entirety. Thank you for your cooperation.
Fixing corrupted index?
My Lucene index - built with Solr using Lucene4.1 - is corrupted. Upon trying to read the index using the following code I get org.apache.solr.common.SolrException: No such core: collection1 exception: File configFile = new File(cacheFolder + File.separator + solr.xml); CoreContainer container = new CoreContainer(cacheFolder, configFile); SolrServer server = new EmbeddedSolrServer(container, collection1); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, idFieldName + : + ClientUtils.escapeQueryChars(queryId)); params.set(fl,idFieldName+,+valueFieldName); QueryResponse response = server.query(params) I used checkindex util to check the integrity of the index and it seems not able to perform the task by throwing the following error: Opening index @ /../solrindex_cache/zookeeper/solr/collection1/data/index ERROR: could not read any segments file in directory java.io.FileNotFoundException: /../solrindex_cache/zookeeper/solr/collection1/data/index/segments_b5tb (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:223) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:285) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:383) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1777) The file segments_b5tb that index checker is looking for is indeed missing in the index folder. The only file that looks similar is segments.gen. However, the index segment files including .si, tip, doc, fdx etc still exist. Is there any way to fix this as it took me 2 weeks to build this index... Many many thanks for your kind advice! -- View this message in context: http://lucene.472066.n3.nabble.com/Fixing-corrupted-index-tp4126644.html Sent from the Solr - User mailing list archive at Nabble.com.
Indexer: java.io.IOException: Job failed!
Hi, I’m trying to integrate Solr with Nutch and I performed all of the necessary steps except after Nutch performs the crawl it appears that I’m receiving a connection refused. 2014-03-24 11:42:43,062 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: TestCrawl/crawldb 2014-03-24 11:42:43,062 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: TestCrawl/linkdb 2014-03-24 11:42:43,062 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: TestCrawl/segments/20140324113941 2014-03-24 11:42:43,304 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-03-24 11:42:43,942 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2014-03-24 11:42:44,456 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2014-03-24 11:42:44,465 INFO solr.SolrUtils - Authenticating as: my username 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: content dest: content 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: title dest: title 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: host dest: host 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: segment dest: segment 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: boost dest: boost 2014-03-24 11:42:44,484 INFO solr.SolrMappingReader - source: digest dest: digest 2014-03-24 11:42:44,484 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2014-03-24 11:42:44,484 INFO solr.SolrMappingReader - source: url dest: id 2014-03-24 11:42:44,484 INFO solr.SolrMappingReader - source: url dest: url 2014-03-24 11:42:44,616 INFO solr.SolrIndexWriter - Indexing 22 documents 2014-03-24 11:42:44,704 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,704 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,707 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,707 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,707 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,707 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,708 INFO solr.SolrIndexWriter - Indexing 22 documents 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,715 WARN mapred.LocalJobRunner - job_local319933392_0001 java.io.IOException at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:173) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:159) at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155) ... 6 more Caused by: java.net.ConnectException: Connection refused at java.net.PlainSocketImpl.socketConnect(Native Method) at java.net.PlainSocketImpl.doConnect(PlainSocketImpl.java:351) at java.net.PlainSocketImpl.connectToAddress(PlainSocketImpl.java:213) at java.net.PlainSocketImpl.connect(PlainSocketImpl.java:200) at
Re: Indexer: java.io.IOException: Job failed!
So the problem might be because I’m running solr on tomcat port 8080. is there a way to resolve this so I can run the command successfully? Thanks, Laura On Mar 24, 2014, at 1:33 PM, Laura McCord lmcc...@ucmerced.edu wrote: Hi, I’m trying to integrate Solr with Nutch and I performed all of the necessary steps except after Nutch performs the crawl it appears that I’m receiving a connection refused. 2014-03-24 11:42:43,062 INFO indexer.IndexerMapReduce - IndexerMapReduce: crawldb: TestCrawl/crawldb 2014-03-24 11:42:43,062 INFO indexer.IndexerMapReduce - IndexerMapReduce: linkdb: TestCrawl/linkdb 2014-03-24 11:42:43,062 INFO indexer.IndexerMapReduce - IndexerMapReduces: adding segment: TestCrawl/segments/20140324113941 2014-03-24 11:42:43,304 WARN util.NativeCodeLoader - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2014-03-24 11:42:43,942 INFO anchor.AnchorIndexingFilter - Anchor deduplication is: off 2014-03-24 11:42:44,456 INFO indexer.IndexWriters - Adding org.apache.nutch.indexwriter.solr.SolrIndexWriter 2014-03-24 11:42:44,465 INFO solr.SolrUtils - Authenticating as: my username 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: content dest: content 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: title dest: title 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: host dest: host 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: segment dest: segment 2014-03-24 11:42:44,483 INFO solr.SolrMappingReader - source: boost dest: boost 2014-03-24 11:42:44,484 INFO solr.SolrMappingReader - source: digest dest: digest 2014-03-24 11:42:44,484 INFO solr.SolrMappingReader - source: tstamp dest: tstamp 2014-03-24 11:42:44,484 INFO solr.SolrMappingReader - source: url dest: id 2014-03-24 11:42:44,484 INFO solr.SolrMappingReader - source: url dest: url 2014-03-24 11:42:44,616 INFO solr.SolrIndexWriter - Indexing 22 documents 2014-03-24 11:42:44,704 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,704 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,707 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,707 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,707 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,707 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,708 INFO solr.SolrIndexWriter - Indexing 22 documents 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused 2014-03-24 11:42:44,709 INFO httpclient.HttpMethodDirector - Retrying request 2014-03-24 11:42:44,715 WARN mapred.LocalJobRunner - job_local319933392_0001 java.io.IOException at org.apache.nutch.indexwriter.solr.SolrIndexWriter.makeIOException(SolrIndexWriter.java:173) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:159) at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:118) at org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:44) at org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.close(ReduceTask.java:467) at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:535) at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:421) at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398) Caused by: org.apache.solr.client.solrj.SolrServerException: java.net.ConnectException: Connection refused at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:478) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:155) ... 6 more Caused by: java.net.ConnectException: Connection refused at
Re: Using Sentence Information For Snippet Generation
Hi Furkan, I have done an implementation with a custom filler (special character) sequence in between sentences. A better solution I landed at was increasing the position of each sentence's first token by a large number, like 1 (perhaps, a smaller number could be used too). Then a user search can be conducted with a proximity query: some tokens ~5000 (the recently committed complexphrase parser supports rich phrase syntax, for example). This of course expects that a sentence fits the 5000 window size and the total number of sentences in the field * 10k does not exceed Integer.MAX_VALUE. Then on the highlighter side you'd get the hits within sentences naturally. Is this something you are looking for? Dmitry On Mon, Mar 24, 2014 at 5:43 PM, Furkan KAMACI furkankam...@gmail.comwrote: Hi; When I generate snippet via Solr I do not want to remove beginning of any sentence at the snippet. So I need to do a sentence detection. I think that I can do it before I send documents into Solr. I can put some special characters that signs beginning or end of a sentence. Then I can use that information when generating snippet. On the other hand I should not show that special character to the user. What do you think that how can I do it or do you have any other ideas for my purpose? PS: I do not do it for English sentences. Thanks; Furkan KAMACI -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: Can the solr dataimporthandler consume an atom feed?
I confirmed the xpath is correct with a third party XPath visualizer. /atom:feed/atom:entry parses the xml correctly. Can anyone confirm or deny that the dataimporthandler can handle an atom feed? -- View this message in context: http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126672.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: w/10 ? [was: Partial Counts in SOLR]
On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis You mean this? http://wiki.apache.org/solr/SurroundQueryParser I guess this parser needs improvement in documentation area. It doesn't explain or have an example of the w/int syntax at all. (Is this the infix notation of W?) An example would help explaining difference between W and N; some readers may not understand what ordered and unordered in this context mean. Kuro
Re: Fixing corrupted index?
Hi, Have a look at: http://lucene.apache.org/core/4_1_0/core/org/apache/lucene/index/CheckIndex.html HTH, Dmitry On Mon, Mar 24, 2014 at 8:16 PM, zqzuk ziqizh...@hotmail.co.uk wrote: My Lucene index - built with Solr using Lucene4.1 - is corrupted. Upon trying to read the index using the following code I get org.apache.solr.common.SolrException: No such core: collection1 exception: File configFile = new File(cacheFolder + File.separator + solr.xml); CoreContainer container = new CoreContainer(cacheFolder, configFile); SolrServer server = new EmbeddedSolrServer(container, collection1); ModifiableSolrParams params = new ModifiableSolrParams(); params.set(q, idFieldName + : + ClientUtils.escapeQueryChars(queryId)); params.set(fl,idFieldName+,+valueFieldName); QueryResponse response = server.query(params) I used checkindex util to check the integrity of the index and it seems not able to perform the task by throwing the following error: Opening index @ /../solrindex_cache/zookeeper/solr/collection1/data/index ERROR: could not read any segments file in directory java.io.FileNotFoundException: /../solrindex_cache/zookeeper/solr/collection1/data/index/segments_b5tb (No such file or directory) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.init(RandomAccessFile.java:233) at org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:223) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:285) at org.apache.lucene.index.SegmentInfos$1.doBody(SegmentInfos.java:347) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:783) at org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:630) at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:343) at org.apache.lucene.index.CheckIndex.checkIndex(CheckIndex.java:383) at org.apache.lucene.index.CheckIndex.main(CheckIndex.java:1777) The file segments_b5tb that index checker is looking for is indeed missing in the index folder. The only file that looks similar is segments.gen. However, the index segment files including .si, tip, doc, fdx etc still exist. Is there any way to fix this as it took me 2 weeks to build this index... Many many thanks for your kind advice! -- View this message in context: http://lucene.472066.n3.nabble.com/Fixing-corrupted-index-tp4126644.html Sent from the Solr - User mailing list archive at Nabble.com. -- Dmitry Blog: http://dmitrykan.blogspot.com Twitter: http://twitter.com/dmitrykan
Re: Fixing corrupted index?
Hi Thanks. But I am already using CheckIndex and the error is given by the CheckIndex utility: it could not even continue after reporting could not read any segements file in directory. -- View this message in context: http://lucene.472066.n3.nabble.com/Fixing-corrupted-index-tp4126644p4126687.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Can the solr dataimporthandler consume an atom feed?
Ok, I found one typo: the links need to be this: /atom:feed/atom:entry/atom:link/@href But the import still doesn't work... :( I guess I have to convert the feed over to RSS 2.0 -- View this message in context: http://lucene.472066.n3.nabble.com/Can-the-solr-dataimporthandler-consume-an-atom-feed-tp4126134p4126691.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: w/10 ? [was: Partial Counts in SOLR]
I think SQP is getting axed, no? Otis -- Performance Monitoring * Log Analytics * Search Analytics Solr Elasticsearch Support * http://sematext.com/ On Mon, Mar 24, 2014 at 3:45 PM, T. Kuro Kurosaka k...@healthline.comwrote: On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis You mean this? http://wiki.apache.org/solr/SurroundQueryParser I guess this parser needs improvement in documentation area. It doesn't explain or have an example of the w/int syntax at all. (Is this the infix notation of W?) An example would help explaining difference between W and N; some readers may not understand what ordered and unordered in this context mean. Kuro
solr 4.x reindexing issues
Hello, We are trying to reindex as part of our move from 3.6.2 to 4.6.1 and have faced various issues reindexing 1.5 Million docs. We dont use solrcloud, its still Master/Slave config. For testing this Iam using a single test server reading from it and putting back into same index. We send docs in batches of 100 but only 10/100 are getting indexed, is this related to the maxBufferedAddsPerServer setting that is hard coded ?? Also I tried to play with autocommit and softcommit settings but in vain. autoCommit maxDocs5/maxDocs maxTime5000/maxTime openSearchertrue/openSearcher /autoCommit autoSoftCommit maxTime1000/maxTime /autoSoftCommit I use these on the test system just to check if docs are being indexed, but even with a batch of 5 my solrj client code runs faster than indexing causing some docs to not get indexed. The function that's indexing is a recursive method call (shown below) which fails after sometime with stack overflow (I did not have this issue with 3.6.2 with same code) private static void processDocs(HttpSolrServer server, Integer start, Integer rows) throws Exception { SolrQuery query = new SolrQuery(); query.setQuery(*:*); query.addFilterQuery(-allfields:[* TO *]); QueryResponse resp = server.query(query); SolrDocumentList list = resp.getResults(); Long total = list.getNumFound(); if(list != null !list.isEmpty()) { for(SolrDocument doc : list) { SolrInputDocument iDoc = ClientUtils.toSolrInputDocument(doc); //To index full doc again iDoc.removeField(_version_); server.add(iDoc, 1000); } System.out.println(Indexed + (start+rows) + / + total); if (total = (start + rows)) { processDocs(server, (start + rows), rows); } } } I also tried turning on the updateLog but that was filling up so fast to the point where it is useless. How do we do bulk updates in solr 4.x environment ?? Is there any setting that Iam missing ?? Thanks Ravi Kiran Bhaskar Technical Architect The Washington Post
Multiple Languages in Same Core
I recently deployed Solr to back the site search feature of a site I work on. The site itself is available in hundreds of languages. With the initial release of site search we have enabled the feature for ten of those languages. This is distributed across eight cores, with two Chinese languages plus Korean combined into one CJK core and each of the other seven languages in their own individual cores. The reason for splitting these into separate cores was so that we could have the same field names across all cores but have different configuration for analyzers, etc, per core. Now I have some questions on this approach. 1) Scalability: Considering I need to scale this to many dozens more languages, perhaps hundreds more, is there a better way so that I don't end up needing dozens or hundreds of cores? My initial plan was that many languages that didn't have special support within Solr would simply get lumped into a single default core that has some default analyzers that are applicable to the majority of languages. 1b) Related to this: is there a practical limit to the number of cores that can be run on one instance of Lucene? 2) Auto Suggest: In phase two I intend to add auto-suggestions as a user types a query. In reviewing how this is implemented and how the suggestion dictionary is built I have concerns. If I have more than one language in a single core (and I keep the same field name for suggestions on all languages within a core) then it seems that I could get suggestions from another language returned with a suggest query. Is there a way to build a separate dictionary for each language, but keep these languages within the same core? If it's helpful to know: I have a field in every core for Locale. Values will be the locale of the language of that document, i.e. en, es, zh_hans, etc. I'd like to be able to: 1) when building a suggestion dictionary, divide it into multiple dictionaries, grouping them by locale, and 2) supply a parameter to the suggest query that allows the suggest component to only return suggestions from the appropriate dictionary for that locale. If the answer to #1 is keep splitting groups of languages that have different analyzers into their own cores and the answer to #2 is that's not supported, then I'd be curious: where would I start to write my own extension that supported #2? I looked last night at the suggest lookup classes, dictionary classes, etc. But I didn't see a clear point where it would be clean to implement something like I'm suggesting above. Best Regards, Jeremy Thomerson
Re: Best approach to handle large volume of documents with constantly high incoming rate?
Jack, thanks. Actually the 20K events/sec is some low-end rate we estimated. It is not necessarily related to sensor; when you want to centralize data from many sources, regardless multi-tenancy, even for a single tenant, many events per second have to be handled. I have a question regarding to the size of nodes used in Solr Cloud, what are the general pros/cons between using big or small nodes to setup Solr Clouds for similar cases as I described? For example, mainly considering memory: 256 (GB) x 4 vs. 32 (GB) x 32 or a little extreme: 256 (GB) x 4 vs. 8 (GB) x 128 Is it better to use fewer bigger nodes to setup a Solr Cloud or better to use more small nodes to setup a Solr Cloud? In the latter (a little extreme example), multiple Solr Clouds could be considered as Erick mentioned. Regards. Shushuai From: Jack Krupansky j...@basetechnology.com To: solr-user@lucene.apache.org Sent: Sunday, March 23, 2014 1:03 AM Subject: Re: Best approach to handle large volume of documents with constantly high incoming rate? I defer to Erick on on this level of detail and experience. Let's continue the discussion - some of it will be a matter of how to configure and tune Solr, how to select, configure, and tune hardware, the need for further Lucene/Solr improvements, and how much further we have to go to get to the next level with Big Data. I mean, 20K events/sec is not necessarily beyond the realm of reality these days with sensor data (20K/sec = 1 event every 50 microseconds) -- Jack Krupansky -Original Message- From: Erick Erickson Sent: Saturday, March 22, 2014 11:02 PM To: solr-user@lucene.apache.org ; shushuai zhu Subject: Re: Best approach to handle large volume of documents with constantly high incoming rate? Well, the commonsense limits Jack is referring to in that post are more (IMO) scales you should count on having to do some _serious_ prototyping/configuring/etc. As you scale out, you'll run into edge cases that aren't the common variety, aren't reliably tested every night, etc. I mean how would you set up a test bed that had 1,000 nodes? Sure, it can be done, but nobody's volunteered yet to provide the Apache Solr project that much hardware. I suspect that it would make Uwe's week if someone did though. In the practical limit vein, one example: You'll run up against the laggard problem. Let's assume that you successfully put up 2,000 nodes, for simplicity's sake, no replicas, just leaders and they all stay up all the time. To successfully do a search, you need to send out a request to all 2,000 nodes. The chance that one of them is slow for _any_ reason (GC, high CPU load, it's just tired) increases the more nodes you have. And since you have to wait until the slowest node responds, your query rate will suffer correspondingly. I've seen 4 node clusters handle 5,000 docs/sec update rate FWIW. YMMV of course. However, you say ...dedicated indexing servers There's no such thing in SolrCloud. Every document gets sent to every member of the slice it belongs to. How else could NRT be supported? When I saw that comment I wondered how well you understand SolrCloud. I flat guarantee you'll understand SolrCloud really, really well if yo try to scale as you indicate :). There'll be a whole bunch of learning experiences along the way, some will be painful. I guarantee that too. Responding to your points 1) Yes, no, and maybe. For relatively small docs on relatively modern hardware, it's a good place to start. Then you have to push it until it falls over to determine your _real_ rates. See: http://searchhub.org/dev/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ 2) Nobody knows. There's no theoretical reason why SolrCloud shouldn't; no a-priori hard limits. I _strongly_ suspect you'll be on the bleeding edge of size, though. Expect some things to be learning experiences. 3) No, it doesn't mean that at all. 64 is an arbitrary number that means, IMO, here there be dragons. As you start to scale out beyond this you'll run into pesky issues I expect. Your network won't be as reliable as you think. You'll find one of your VMs (which I expect you'll be running on) has some glitches. Someone loaded a very CPU intensive program on three of your machines and your Solrs on those machines is being starved. Etc. 4) I've personally seen 1,000 node clusters. You ought to see the very cool. SolrCloud admin graph I recently saw... But I expect you'll actually be in for some kind of divide-and-conquer strategy whereby you have a bunch of clusters that are significantly smaller. You could, for instance, determine that the use-case you support is searching across small ranges, say a week at a time and have 52 clusters of 128 machines or so. You could have 365 clusters of 20 machines. It all depends on how the index will be used. 5) Not at all. See above, I've seen 5K/sec on 4 nodes, also supporting simultaneous
Question on highlighting edgegrams
In 3.5.0 we have the following. fieldType name=autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=30/ /analyzer analyzer type=query tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType If we searched for c with highlighting enabled we would get back results such as: emc/emdat emc/emrocdile emce/mool beans But in the latest Solr (4.7) we get the full words highlighted back. Did something change from these versions with regards to highlighting? Thanks
Re: w/10 ? [was: Partial Counts in SOLR]
Hi, There is no w/int syntax in surround. /* Query language operators: OR, AND, NOT, W, N, (, ), ^, *, ?, and comma */ Ahmet On Monday, March 24, 2014 9:46 PM, T. Kuro Kurosaka k...@healthline.com wrote: On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis You mean this? http://wiki.apache.org/solr/SurroundQueryParser I guess this parser needs improvement in documentation area. It doesn't explain or have an example of the w/int syntax at all. (Is this the infix notation of W?) An example would help explaining difference between W and N; some readers may not understand what ordered and unordered in this context mean. Kuro
Re: w/10 ? [was: Partial Counts in SOLR]
That is similar to Verity VQL, but that used NEAR/10. --wunder On Mar 24, 2014, at 4:21 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi, There is no w/int syntax in surround. /* Query language operators: OR, AND, NOT, W, N, (, ), ^, *, ?, and comma */ Ahmet On Monday, March 24, 2014 9:46 PM, T. Kuro Kurosaka k...@healthline.com wrote: On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis You mean this? http://wiki.apache.org/solr/SurroundQueryParser I guess this parser needs improvement in documentation area. It doesn't explain or have an example of the w/int syntax at all. (Is this the infix notation of W?) An example would help explaining difference between W and N; some readers may not understand what ordered and unordered in this context mean. Kuro -- Walter Underwood wun...@wunderwood.org
Re: Required fields
: What is the default value for the required attribute of a field element : in a schema? I've just looked everywhere I can think of in the wiki, the : reference manual, and the JavaDoc. Most of the documentation doesn't : even mention that attribute. Good catch, fixed... https://cwiki.apache.org/confluence/pages/diffpages.action?pageId=32604269originalId=40506114 https://cwiki.apache.org/confluence/display/solr/Field+Type+Definitions+and+Properties https://cwiki.apache.org/confluence/display/solr/Defining+Fields -Hoss http://www.lucidworks.com/
Re: Multiple Languages in Same Core
Solr In Action has a significant discussion on the multi-lingual approach. They also have some code samples out there. Might be worth a look Regards, Alex. Personal website: http://www.outerthoughts.com/ LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch - Time is the quality of nature that keeps events from happening all at once. Lately, it doesn't seem to be working. (Anonymous - via GTD book) On Tue, Mar 25, 2014 at 4:43 AM, Jeremy Thomerson jer...@thomersonfamily.com wrote: I recently deployed Solr to back the site search feature of a site I work on. The site itself is available in hundreds of languages. With the initial release of site search we have enabled the feature for ten of those languages. This is distributed across eight cores, with two Chinese languages plus Korean combined into one CJK core and each of the other seven languages in their own individual cores. The reason for splitting these into separate cores was so that we could have the same field names across all cores but have different configuration for analyzers, etc, per core. Now I have some questions on this approach. 1) Scalability: Considering I need to scale this to many dozens more languages, perhaps hundreds more, is there a better way so that I don't end up needing dozens or hundreds of cores? My initial plan was that many languages that didn't have special support within Solr would simply get lumped into a single default core that has some default analyzers that are applicable to the majority of languages. 1b) Related to this: is there a practical limit to the number of cores that can be run on one instance of Lucene? 2) Auto Suggest: In phase two I intend to add auto-suggestions as a user types a query. In reviewing how this is implemented and how the suggestion dictionary is built I have concerns. If I have more than one language in a single core (and I keep the same field name for suggestions on all languages within a core) then it seems that I could get suggestions from another language returned with a suggest query. Is there a way to build a separate dictionary for each language, but keep these languages within the same core? If it's helpful to know: I have a field in every core for Locale. Values will be the locale of the language of that document, i.e. en, es, zh_hans, etc. I'd like to be able to: 1) when building a suggestion dictionary, divide it into multiple dictionaries, grouping them by locale, and 2) supply a parameter to the suggest query that allows the suggest component to only return suggestions from the appropriate dictionary for that locale. If the answer to #1 is keep splitting groups of languages that have different analyzers into their own cores and the answer to #2 is that's not supported, then I'd be curious: where would I start to write my own extension that supported #2? I looked last night at the suggest lookup classes, dictionary classes, etc. But I didn't see a clear point where it would be clean to implement something like I'm suggesting above. Best Regards, Jeremy Thomerson
Re: Can the solr dataimporthandler consume an atom feed?
On 25 March 2014 01:15, eShard zim...@yahoo.com wrote: I confirmed the xpath is correct with a third party XPath visualizer. /atom:feed/atom:entry parses the xml correctly. Can anyone confirm or deny that the dataimporthandler can handle an atom feed? Yes, an ATOM feed can be consumed by DIH, as noted in the documentation. We have done this in the past, and a Google search turns up examples, e.g., http://blog.florian-hopf.de/2012/05/importing-atom-feeds-in-solr-using-data.html Have not dealt with namespaces, but here is a line from the documentation that s probably relevant to your ATOM feed: It does not support namespaces, but it can handle xmls with namespaces. When you provide the xpath, just drop the namespace and give the rest (eg if the tag is 'dc:subject' the mapping should just contain 'subject'). Other than that, I still see nothing wrong with your DIH data configuration. The message from the dataimport shows that it did make a request to the URLDataSource. If things still do not work: * Can you double-check that the specified URL in the url attribute of the entity does indeed retrieve the desired XML. * I am pretty sure that you have checked this, but are your fields properly defined in the Solr schema? Regards, Gora
Re: w/10 ? [was: Partial Counts in SOLR]
Basically we just created this syntax for the ease of users, otherwise on back end it uses W or N operators. On Tue, Mar 25, 2014 at 4:21 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, There is no w/int syntax in surround. /* Query language operators: OR, AND, NOT, W, N, (, ), ^, *, ?, and comma */ Ahmet On Monday, March 24, 2014 9:46 PM, T. Kuro Kurosaka k...@healthline.com wrote: On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis You mean this? http://wiki.apache.org/solr/SurroundQueryParser I guess this parser needs improvement in documentation area. It doesn't explain or have an example of the w/int syntax at all. (Is this the infix notation of W?) An example would help explaining difference between W and N; some readers may not understand what ordered and unordered in this context mean. Kuro -- Regards, Salman Akram
Re: w/10 ? [was: Partial Counts in SOLR]
perhaps useful, here is an open source implementation with near[digit] support, incl analysis of proximity tokens. When days become longer maybe itwill be packaged into a nice lib...:-) https://github.com/romanchyla/montysolr/blob/master/contrib/adsabs/grammars/ADS.g On 25 Mar 2014 00:14, Salman Akram salman.ak...@northbaysolutions.net wrote: Basically we just created this syntax for the ease of users, otherwise on back end it uses W or N operators. On Tue, Mar 25, 2014 at 4:21 AM, Ahmet Arslan iori...@yahoo.com wrote: Hi, There is no w/int syntax in surround. /* Query language operators: OR, AND, NOT, W, N, (, ), ^, *, ?, and comma */ Ahmet On Monday, March 24, 2014 9:46 PM, T. Kuro Kurosaka k...@healthline.com wrote: On 3/19/14 5:13 PM, Otis Gospodnetic wrote: Hi, Guessing it's surround query parser's support for within backed by span queries. Otis You mean this? http://wiki.apache.org/solr/SurroundQueryParser I guess this parser needs improvement in documentation area. It doesn't explain or have an example of the w/int syntax at all. (Is this the infix notation of W?) An example would help explaining difference between W and N; some readers may not understand what ordered and unordered in this context mean. Kuro -- Regards, Salman Akram
Re: solr cloud distributed optimize() becomes serialized
Found it - https://issues.apache.org/jira/browse/LUCENE-5481 On Fri, Mar 21, 2014 at 8:11 PM, Mark Miller markrmil...@gmail.com wrote: Recently fixed in Lucene - should be able to find the issue if you dig a little. -- Mark Miller about.me/markrmiller On March 21, 2014 at 10:25:56 AM, Greg Walters (greg.walt...@answers.com) wrote: I've seen this on 4.6. Thanks, Greg On Mar 20, 2014, at 11:58 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: That's not right. Which Solr versions are you on (question for both William and Chris)? On Fri, Mar 21, 2014 at 8:07 AM, William Bell billnb...@gmail.com wrote: Yeah. optimize() also used to come back immediately if the index was already indexed. It just reopened the index. We uses to use that for cleaning up the old directories quickly. But now it does another optimize() even through the index is already optimized. Very strange. On Tue, Mar 18, 2014 at 11:30 AM, Chris Lu chris...@gmail.com wrote: I wonder whether this is a known bug. In previous SOLR cloud versions, 4.4 or maybe 4.5, an explicit optimize(), without any parameters, it usually took 2 minutes for a 32 core cluster. However, in 4.6.1, the same call took about 1 hour. Checking the index modification time for each core shows 2 minutes gap if sorted. We are using a solrj client connecting to zookeeper. I found it is talking to a specific solr server A, and that server A is distributing the calls to all other solr servers. Here is the thread dump for this server A: at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:395) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrServer.request(ConcurrentUpdateSolrServer.java:293) at org.apache.solr.update.SolrCmdDistributor.submit(SolrCmdDistributor.java:226) at org.apache.solr.update.SolrCmdDistributor.distribCommit(SolrCmdDistributor.java:195) at org.apache.solr.update.processor.DistributedUpdateProcessor.processCommit(DistributedUpdateProcessor.java:1250) at org.apache.solr.handler.RequestHandlerUtils.handleCommit(RequestHandlerUtils.java:69) -- Bill Bell billnb...@gmail.com cell 720-256-8076 -- Regards, Shalin Shekhar Mangar. -- Regards, Shalin Shekhar Mangar.
Semantic search with python numpy and Solr
I am beginner with solr, started playing with solr for last one month. I am building a search mechanism for the http://allevents.in and i want to implement semantic search with solr, when someone search events in our website. And the back-end is in php(solarium-client). So can you please guide me for the semantic search with solr? I have gone through the article at http://java.dzone.com/articles/semantic-search-solr-and So how should i proceed further to implement semantic search with solr and specifically with my site(allevents.in) ? Will it give me results like : if someone search : music events in new york, then it should also give results like dj night in new york, concerts in new york and other related results. Is it possible? Can anyone here please guide me or suggest me some material or example of semantic search from the above article? -- Regards, *Sohan Kalsariya*