Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken
Continuing the discussion on mailing list from Jira. An Example *id group f1 f2*1 g1 5 10 2 g1 5 1000 3 g1 5 1000 4 g1 10 100 5 g2 5 10 6 g2 5 1000 7 g2 5 1000 8 g210 100 sort= f1 asc, f2 desc , id desc *Without collapse will give : * (7,g2), (6,g2), (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1) *On collapsing by group_s expected output is : * (7,g2), (3,g1) solr standard collapsing does give this output with group=on,group.field=group_s,group.main=true * Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} : (5,g2), (1,g1) * Summarizing Jira Discussion :* 1. CollapsingQParserPlugin picks up the group heads from matching results and passes those further. So in essence filtering some of the matching documents, so that subsequent collectors never see them. It can also pass on score to subsequent collectors using a dummy scorer. 2. TopDocCollector comes later in hierarchy and it will sort on the collapsed set. That works fine. The issue is with step 1. Collapsing is done by a single comparator which can take its value from a field or function. It defaults to score. Function queries do allow us to combine multiple fields / value sources, however it would be difficult to construct a function for given sort fields. Primarily because a) The range of values for a given sort field is not known in advance. It is possible for one sort field to unbounded, but other to be bounded within a small range. b) The sort field can itself hold custom logic. Because of (a) the group head selected by CollapsingQParserPlugin will be incorrect and subsequent sorting will break. On 14 June 2014 12:38, Umesh Prasad umesh.i...@gmail.com wrote: Thanks Joel for the quick response. I have opened a new jira ticket. https://issues.apache.org/jira/browse/SOLR-6168 On 13 June 2014 17:45, Joel Bernstein joels...@gmail.com wrote: Let's open a new ticket. Joel Bernstein Search Engineer at Heliosearch On Fri, Jun 13, 2014 at 8:08 AM, Umesh Prasad umesh.i...@gmail.com wrote: The patch in SOLR-5408 fixes the issue with sorting only for two sort fields. Sorting still breaks when 3 or more sort fields are used. I have attached a test case, which demonstrates the broken behavior when 3 sort fields are used. The failing test case patch is against Lucene/Solr 4.7 revision number 1602388 Can someone apply and verify the bug ? Also, should I re-open SOLR-5408 or open a new ticket ? --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Re: Warning message logs on startup after upgrading to 4.8.1
On Thu, Jun 19, 2014 at 12:49 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : WARN o.a.s.r.ManagedResource- No stored data found for : /schema/analysis/stopwords/english : WARN o.a.s.r.ManagedResource- No stored data found for : /schema/analysis/synonyms/english : : I fixed these by commenting out the managed_en field type in my : schema, see https://github.com/xwiki/xwiki-platform/commit/d41580c383f40d2aa4e4f551971418536a3f3a20#diff-44d79e64e45f3b05115aebcd714bd897L486 FWIW: Unless i'm missing something, you should have only gotten those warnings in the situation where you started using the 4.8 example schema.xml (or cut/pasted those from the 4.8 into your existing schema) but you didn't use the rest of the cof files that came with 4.8 -- so you didn't have the stored data JSON file that goes with it -- in which case that is a legitimate warning that you have an analysis factory existing to use a managed resource but there is no managed data file available. Yes, you're right, I've merged my schema with the one provided with 4.8. : WARN o.a.s.r.ManagedResource- No stored data found for /rest/managed : WARN o.a.s.r.ManagedResource- No registered observers for /rest/managed : : How can I get rid of these 2? : : This jira issue is related https://issues.apache.org/jira/browse/SOLR-6128 . I agree, there's no reason i can see for those to be warnings -- so as to keep SOLR-6128 focused on just one thing, i've created SOLR-6179 to track the ManagedResource WARNs... https://issues.apache.org/jira/browse/SOLR-6179 Thanks, Marius -Hoss http://www.lucidworks.com/
Re: add new Fields with SolrJ without changing schema.xml
Hello, Because i will not stay working with the actual entreprise. then no one know how to change it manually if they need to add new fields in the futur. so i need to do this with java code, can you please help me with an exemple to complete this: public static void addNewField(Boolean uniqueId,String type, Boolean indexed,Boolean stored,Boolean multivalued,Boolean sortmissinglast,Boolean required){ . . } thanks, Best regards, Anass BENJELLOUN 2014-06-18 18:21 GMT+02:00 Walter Underwood [via Lucene] ml-node+s472066n4142571...@n3.nabble.com: Why can't you change schema.xml? --wunder On Jun 18, 2014, at 8:56 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4142571i=0 wrote: Hello, this is what i want to do: public static void addNewField(Boolean uniqueId,String type, Boolean indexed,Boolean stored,Boolean multivalued,Boolean sortmissinglast,Boolean required){ . . } any exemple please, thanks, Best regards, Anass BENJELLOUN -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142571.html To unsubscribe from add new Fields with SolrJ without changing schema.xml, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142515code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTQyNTE1fC0xMDQyNjMzMDgx . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142769.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: add new Fields with SolrJ without changing schema.xml
Use dynamic fields definitions perhaps? Just suffix the fields with _s, _i, etc. As per schema.xml. You could also use new schemaless mode, but then when they send a value that auto-creates a field of wrong type, it would be really hard to troubleshoot. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Jun 19, 2014 at 2:07 PM, benjelloun anass@gmail.com wrote: Hello, Because i will not stay working with the actual entreprise. then no one know how to change it manually if they need to add new fields in the futur. so i need to do this with java code, can you please help me with an exemple to complete this: public static void addNewField(Boolean uniqueId,String type, Boolean indexed,Boolean stored,Boolean multivalued,Boolean sortmissinglast,Boolean required){ . . } thanks, Best regards, Anass BENJELLOUN 2014-06-18 18:21 GMT+02:00 Walter Underwood [via Lucene] ml-node+s472066n4142571...@n3.nabble.com: Why can't you change schema.xml? --wunder On Jun 18, 2014, at 8:56 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4142571i=0 wrote: Hello, this is what i want to do: public static void addNewField(Boolean uniqueId,String type, Boolean indexed,Boolean stored,Boolean multivalued,Boolean sortmissinglast,Boolean required){ . . } any exemple please, thanks, Best regards, Anass BENJELLOUN -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142571.html To unsubscribe from add new Fields with SolrJ without changing schema.xml, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142515code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTQyNTE1fC0xMDQyNjMzMDgx . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142769.html Sent from the Solr - User mailing list archive at Nabble.com.
Store Java object in field and retrieve it in custom function?
Hi, I'm trying to save a Java object in a binary field and afterwards use this value in a custom solr function. I'm able to put and retrieve the Java object in Base64 via the UI, but I can't seem to be able to retrieve the value in the custom function. In the function I'm using: termsIndex = FieldCache.DEFAULT.getTermsIndex(reader, fieldName); termsIndex.get(doc, spare); Log.debug(Length: + spare.length); The length is always 0. It works well if the field type is not binary, but string. Do you have any tips? Thanks, Costi
Re: add new Fields with SolrJ without changing schema.xml
Hello, I will use DynamicField for some fields but some other fields need to be created. this is an exemple: *Informations* *unique ID field* *Type* *indexed* *stored* *multivalued* *sortmissinglast* *required* *Iddocument* True long false True False True True so how to add this field? by default the id is indexed and its type is string any idea how i can do that without changing manually the shema.xml? thanks, Best regards, Anass BENJELLOUN 2014-06-19 9:28 GMT+02:00 Alexandre Rafalovitch [via Lucene] ml-node+s472066n4142771...@n3.nabble.com: Use dynamic fields definitions perhaps? Just suffix the fields with _s, _i, etc. As per schema.xml. You could also use new schemaless mode, but then when they send a value that auto-creates a field of wrong type, it would be really hard to troubleshoot. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Jun 19, 2014 at 2:07 PM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4142771i=0 wrote: Hello, Because i will not stay working with the actual entreprise. then no one know how to change it manually if they need to add new fields in the futur. so i need to do this with java code, can you please help me with an exemple to complete this: public static void addNewField(Boolean uniqueId,String type, Boolean indexed,Boolean stored,Boolean multivalued,Boolean sortmissinglast,Boolean required){ . . } thanks, Best regards, Anass BENJELLOUN 2014-06-18 18:21 GMT+02:00 Walter Underwood [via Lucene] [hidden email] http://user/SendEmail.jtp?type=nodenode=4142771i=1: Why can't you change schema.xml? --wunder On Jun 18, 2014, at 8:56 AM, benjelloun [hidden email] http://user/SendEmail.jtp?type=nodenode=4142571i=0 wrote: Hello, this is what i want to do: public static void addNewField(Boolean uniqueId,String type, Boolean indexed,Boolean stored,Boolean multivalued,Boolean sortmissinglast,Boolean required){ . . } any exemple please, thanks, Best regards, Anass BENJELLOUN -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142571.html To unsubscribe from add new Fields with SolrJ without changing schema.xml, click here . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142769.html Sent from the Solr - User mailing list archive at Nabble.com. -- If you reply to this email, your message will be added to the discussion below: http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142771.html To unsubscribe from add new Fields with SolrJ without changing schema.xml, click here http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_codenode=4142515code=YW5hc3MuYm5qQGdtYWlsLmNvbXw0MTQyNTE1fC0xMDQyNjMzMDgx . NAML http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewerid=instant_html%21nabble%3Aemail.namlbase=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespacebreadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml -- View this message in context: http://lucene.472066.n3.nabble.com/add-new-Fields-with-SolrJ-without-changing-schema-xml-tp4142515p4142777.html Sent from the Solr - User mailing list archive at Nabble.com.
Segment Count of my Index is greater than the Configured MergeFactor
Hi, I am using Solr 4.5.1. In that i have created an Index 114.8 MB. Also i have the following index configuration indexConfig maxIndexingThreads8/maxIndexingThreads ramBufferSizeMB100/ramBufferSizeMB mergeFactor10/mergeFactor /indexConfig I have given the ramBufferSizeMB of 100 and mergefactor of 10. So this means, that after indexing is completed. i should see =10 segments. Thats my assumption and even documentation says that. But, after the indexing is completed, i went into Solr Dashboard, and selected the collection, for which indexing is completed. It is showing a Segment count of 13. How is this possible? As i have given mergefactor 0f 10, at any point of time, there should not be more than 9 segments in the index. I want to understand why 13 segments are created in my index?? Could appreciate if i can get response ASAP Thanks Radha -- View this message in context: http://lucene.472066.n3.nabble.com/Segment-Count-of-my-Index-is-greater-than-the-Configured-MergeFactor-tp4142783.html Sent from the Solr - User mailing list archive at Nabble.com.
making solr to understand English
Hi, I'm trying to setup solr that should understand English. For example I've indexed our company website (www.biginfolabs.com) or it could be any other website or our own data. If i put some English like queries i should get the one word answer just what Google does;queries are: * Where is India located. * who is the father of Obama Workaround: * Integrated UIMA,Mahout with solr * I read the book called Taming Text and implemented https://github.com/tamingtext/book. But Did not get what i want Can anyone please tell how to move further. It can be anything our team is ready to do it. Thanks, Vivek
Tracing Files Which Have Errors
Hi there, I have posted 190,000 simple XML using POST.JAR and there are only 8 files that were with errors. But how do I know which are the ones have errors? Thank you in advance, Simon Cheng.
Re: making solr to understand English
LoL. That's the several level of abstraction and complication above what Solr provides. You are looking at full Natural Language Processing and things like SemEval (http://en.wikipedia.org/wiki/SemEval ). Or at least statistical and/or frame-based analysis (http://en.wikipedia.org/wiki/Frame_language). Plus, it's domain specific usually, not just point at the website and run. You may want to start from a PhD (yes, that would be the easy bit). Or you could look for heavy-duty commercial systems. Again, the keywords above would be your friends. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Jun 19, 2014 at 4:42 PM, Vivekanand Ittigi vi...@biginfolabs.com wrote: Hi, I'm trying to setup solr that should understand English. For example I've indexed our company website (www.biginfolabs.com) or it could be any other website or our own data. If i put some English like queries i should get the one word answer just what Google does;queries are: * Where is India located. * who is the father of Obama Workaround: * Integrated UIMA,Mahout with solr * I read the book called Taming Text and implemented https://github.com/tamingtext/book. But Did not get what i want Can anyone please tell how to move further. It can be anything our team is ready to do it. Thanks, Vivek
Re: Tracing Files Which Have Errors
How did you post them? Didn't you get an error at some point? If you are completely stuck, you could probably export the IDs only back (as a CSV format) and compare to the list of what you sent. Quite doable. Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency On Thu, Jun 19, 2014 at 4:33 PM, Simon Cheng simonwhch...@gmail.com wrote: Hi there, I have posted 190,000 simple XML using POST.JAR and there are only 8 files that were with errors. But how do I know which are the ones have errors? Thank you in advance, Simon Cheng.
deep faceting issues in distributed mode
Hello, We face an issue with deep faceting in a distributed non-SolrCloud setting. A query comes in through the solr frontend (router) and broadcasts to each shard. The exception below appears in the frontend's logs, but shards' logs are clear, each subquery sent by the router succeeds. RAM graph looks quite nice for the router, there is plenty of RAM free and plenty allocated to every shard and the router. So at least I'm not worried on that side. The issue is easily eliminated by shortening the date range parameter = could imply RAM issue, but this theory is not consistent with what we observe on RAM graph. Is this like a core known bug or could the issue be investigated / debugged further? Solr: 4.3.1 jetty 9 Error response: response lst name=responseHeader int name=status500/int int name=QTime211/int lst name=params str name=facettrue/str str name=facet.mincount1/str str name=facet.offset23250/str str name=q*:*/str str name=facet.limit750/str str name=facet.fieldsome_facet_field/str arr name=fq str DateRangeParam:[2014-05-31T21:00:00.000Z TO 2014-06-13T21:00:00.000Z] /str strSomeOtherParam:(Value1 OR *)/str /arr str name=rows0/str /lst /lst lst name=error str name=msg java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format /str str name=trace org.apache.solr.common.SolrException: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:302) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1820) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:656) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:359) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:155) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1486) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:503) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:138) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:564) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:213) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1094) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:432) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:175) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1028) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:136) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:258) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:317) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97) at org.eclipse.jetty.server.Server.handle(Server.java:445) at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:267) at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:224) at org.eclipse.jetty.io.AbstractConnection$ReadCallback.run(AbstractConnection.java:358) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:601) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:532) at java.lang.Thread.run(Thread.java:722) Caused by: java.lang.RuntimeException: Invalid version (expected 2, but 60) or the data in not in 'javabin' format at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:109) at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:41) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:385) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:180) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:156) at org.apache.solr.handler.component.HttpShardHandler$1.call(HttpShardHandler.java:119) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) ... 1 more /str int name=code500/int /lst /response -- Dmitry Kan Blog: http://dmitrykan.blogspot.com Twitter:
Unable to start solr 4.8
Hi experts, i have cnfigured solrcloud, on three machines , zookeeper started with no errors, tomcat log also no errors , solr log alos no errors reported but all the tomcat configured solr clusterstate shows as 'down' ,8870931 [Thread-13] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... 8870934 [Thread-13] INFO org.apache.solr.cloud.Overseer â Update state numShards=2 message={ operation:state, state:down, base_url:http://10.***.***.28:7090/solr;, core:collection1, roles:null, node_name:10.***.***.28:7090_solr, shard:shard2, collection:collection1, numShards:2, core_node_name:10.***.***.28:7090_solr_collection1} 8870939 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 8870942 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 8919667 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (4) 8933777 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (3) 8965906 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (4) 8965994 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 8965997 [Thread-13] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... 8966000 [Thread-13] INFO org.apache.solr.cloud.Overseer â Update state numShards=2 message={ operation:state, state:down, base_url:http://10.***.***.29:7070/solr;, core:collection1, roles:null, node_name:10.***.***.29:7070_solr, shard:shard1, collection:collection1, numShards:2, core_node_name:110.***.***.29:7070_solr_collection1} 8966006 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 8966008 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4) 8986466 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (5) 8986648 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 8986652 [Thread-13] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... 8986654 [Thread-13] INFO org.apache.solr.cloud.Overseer â Update state numShards=2 message={ operation:state, state:down, base_url:http://10.***.***.30:7080/solr;, core:collection1, roles:null, node_name:10.***.***.30:7080_solr, shard:shard1, collection:collection1, numShards:2, core_node_name:10.***.***.30:7080_solr_collection1} 8986661 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 898 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 9008407 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (6) when i browse the 28,29 and 30th solr url , its throwing error like, HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init failure: Index locked for write for core collection1,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Index locked for write for core collection1 at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:753) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:950) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:116) at
Re: Unable to start solr 4.8
Hi - remove the lock file in your solr/collection_name/data/index.*/ directory. Markus On Thursday, June 19, 2014 04:10:51 AM atp wrote: Hi experts, i have cnfigured solrcloud, on three machines , zookeeper started with no errors, tomcat log also no errors , solr log alos no errors reported but all the tomcat configured solr clusterstate shows as 'down' ,8870931 [Thread-13] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... 8870934 [Thread-13] INFO org.apache.solr.cloud.Overseer â Update state numShards=2 message={ operation:state, state:down, base_url:http://10.***.***.28:7090/solr;, core:collection1, roles:null, node_name:10.***.***.28:7090_solr, shard:shard2, collection:collection1, numShards:2, core_node_name:10.***.***.28:7090_solr_collection1} 8870939 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 8870942 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 8919667 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (4) 8933777 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (3) 8965906 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (4) 8965994 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 8965997 [Thread-13] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... 8966000 [Thread-13] INFO org.apache.solr.cloud.Overseer â Update state numShards=2 message={ operation:state, state:down, base_url:http://10.***.***.29:7070/solr;, core:collection1, roles:null, node_name:10.***.***.29:7070_solr, shard:shard1, collection:collection1, numShards:2, core_node_name:110.***.***.29:7070_solr_collection1} 8966006 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 8966008 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 4) 8986466 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (5) 8986648 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 8986652 [Thread-13] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... 8986654 [Thread-13] INFO org.apache.solr.cloud.Overseer â Update state numShards=2 message={ operation:state, state:down, base_url:http://10.***.***.30:7080/solr;, core:collection1, roles:null, node_name:10.***.***.30:7080_solr, shard:shard1, collection:collection1, numShards:2, core_node_name:10.***.***.30:7080_solr_collection1} 8986661 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 898 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 5) 9008407 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â Updating live nodes... (6) when i browse the 28,29 and 30th solr url , its throwing error like, HTTP Status 500 - {msg=SolrCore 'collection1' is not available due to init failure: Index locked for write for core collection1,trace=org.apache.solr.common.SolrException: SolrCore 'collection1' is not available due to init failure: Index locked for write for core collection1 at org.apache.solr.core.CoreContainer.getCore(CoreContainer.java:753) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 347) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 207) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:241) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:208) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:220) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:122) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171 ) at
Re: Unable to start solr 4.8
Thank you so much Markus , I have removed the contents from Index , now its working but one of the node went Recovering state, the log says, please help this to make to live. â Unable to get file names for indexCommit generation: 2 12118803 [qtp1490747277-11] INFO org.apache.solr.core.SolrCore â [collection1] webapp=/solr path=/replication params={command=filelistqt=/replicationwt=javabingeneration=2version=2} status=0 QTime=3 12182820 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 12182824 [Thread-13] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... 12182827 [qtp1490747277-12] INFO org.apache.solr.handler.admin.CoreAdminHandler â Going to wait for coreNodeName: 10.137.12.247:7080_solr_collection1, state: recovering, checkLive: true, onlyIfLeader: true 12182828 [Thread-13] INFO org.apache.solr.cloud.Overseer â Update state numShards=2 message={ operation:state, state:recovering, base_url:http://10.***.***.29:7080/solr;, core:collection1, roles:null, node_name:10.***.***.29:7080_solr, shard:shard1, collection:collection1, numShards:2, core_node_name:10.***.***.29:7080_solr_collection1} 12182834 [main-EventThread] INFO org.apache.solr.cloud.DistributedQueue â LatchChildWatcher fired on path: /overseer/queue state: SyncConnected type NodeChildrenChanged 12182839 [main-EventThread] INFO org.apache.solr.common.cloud.ZkStateReader â A cluster state change: WatchedEvent state:SyncConnected type:NodeDataChanged path:/clusterstate.json, has occurred - updating... (live nodes size: 6) 12182853 [qtp1490747277-12] INFO org.apache.solr.common.cloud.ZkStateReader â Updating cloud state from ZooKeeper... 12182856 [qtp1490747277-12] INFO org.apache.solr.handler.admin.CoreAdminHandler â Will wait a max of 183 seconds to see collection1 (shard1 of collection1) have state: recovering 12182856 [qtp1490747277-12] INFO org.apache.solr.handler.admin.CoreAdminHandler â Waited coreNodeName: 10.137.12.247:7080_solr_collection1, state: recovering, checkLive: true, onlyIfLeader: true for: 0 seconds. 12182858 [qtp1490747277-12] INFO org.apache.solr.servlet.SolrDispatchFilter â [admin] webapp=null path=/admin/cores params={coreNodeName=10.137.12.247:7080_solr_collection1onlyIfLeaderActive=truestate=recoveringnodeName=10.137.12.247:7080_solraction=PREPRECOVERYcheckLive=truecore=collection1wt=javabinonlyIfLeader=trueversion=2} status=0 QTime=31 12184865 [qtp1490747277-19] INFO org.apache.solr.update.UpdateHandler â start commit{,optimize=false,openSearcher=false,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false} 12184865 [qtp1490747277-19] INFO org.apache.solr.update.UpdateHandler â No uncommitted changes. Skipping IW.commit. 12184867 [qtp1490747277-19] INFO org.apache.solr.update.UpdateHandler â end_commit_flush 12184867 [qtp1490747277-19] INFO org.apache.solr.update.processor.LogUpdateProcessor â [collection1] webapp=/solr path=/update params={waitSearcher=trueopenSearcher=falsecommit=truewt=javabincommit_end_point=trueversion=2softCommit=false} {commit=} 0 3 12184873 [qtp1490747277-18] INFO org.apache.solr.core.SolrCore â [collection1] webapp=/solr path=/replication params={command=indexversionqt=/replicationwt=javabinversion=2} status=0 QTime=1 12184878 [qtp1490747277-18] ERROR org.apache.solr.handler.ReplicationHandler â Unable to get file names for indexCommit generation: 2 java.io.FileNotFoundException: _0.fnm at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:260) at org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:177) at org.apache.solr.handler.ReplicationHandler.getFileList(ReplicationHandler.java:421) at org.apache.solr.handler.ReplicationHandler.handleRequestBody(ReplicationHandler.java:209) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1952) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:774) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) Thanks , ATP -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-start-solr-4-8-tp4142810p4142816.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unable to start solr 4.8
Hi Markus , It been recoverd automatically after several attempts , once again thanks a lot for your help. Now all the nodes are became live. [zk: Hadoop-Main:7001(CONNECTED) 0] get /clusterstate.json {collection1:{ shards:{ shard1:{ range:8000-, state:active, replicas:{ core_node1:{ state:active, base_url:http://10.***.***.28:8983/solr;, core:collection1, node_name:10.***.***.28:8983_solr, leader:true}, core_node3:{ state:active, base_url:http://10.***.***.30:8983/solr;, core:collection1, node_name:10.***.***.30:8983_solr}, 10.***.***.28:7070_solr_collection1:{ state:active, base_url:http://10.***.***.28:7070/solr;, core:collection1, node_name:10.***.***.28:7070_solr}, 10.***.***.29:7080_solr_collection1:{ state:active, base_url:http://10.***.***.29:7080/solr;, core:collection1, node_name:10.***.***.29:7080_solr}}}, shard2:{ range:0-7fff, state:active, replicas:{ core_node2:{ state:active, base_url:http://10.***.***.29:8983/solr;, core:collection1, node_name:10.***.***.29:8983_solr, leader:true}, 10.***.***.30:7090_solr_collection1:{ state:active, base_url:http://10.***.***.30:7090/solr;, core:collection1, node_name:10.***.***.30:7090_solr, maxShardsPerNode:1, router:{name:compositeId}, replicationFactor:1, autoCreated:true}} Regards, ATP. -- View this message in context: http://lucene.472066.n3.nabble.com/Unable-to-start-solr-4-8-tp4142810p4142822.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Segment Count of my Index is greater than the Configured MergeFactor
On 6/19/2014 2:51 AM, RadhaJayalakshmi wrote: I am using Solr 4.5.1. In that i have created an Index 114.8 MB. Also i have the following index configuration indexConfig maxIndexingThreads8/maxIndexingThreads ramBufferSizeMB100/ramBufferSizeMB mergeFactor10/mergeFactor /indexConfig I have given the ramBufferSizeMB of 100 and mergefactor of 10. So this means, that after indexing is completed. i should see =10 segments. Thats my assumption and even documentation says that. But, after the indexing is completed, i went into Solr Dashboard, and selected the collection, for which indexing is completed. It is showing a Segment count of 13. How is this possible? As i have given mergefactor 0f 10, at any point of time, there should not be more than 9 segments in the index. I want to understand why 13 segments are created in my index?? Could appreciate if i can get response ASAP Imagine the following scenario. You start from a clean index and do enough indexing to create ten little segments. At that point, Solr will merge these segments into one large segment. Let's say that now you do enough indexing to create ten more segments. It won't do the merge when you reach nine little segments and one large segment ... it will do the merge when you have ten little segments. When the merge is done, you'll be left with two large segments. If you do enough indexing now to create twenty new segments, then at the end you'll be left with four large segments. After this, if you index nine new segments, you've got thirteen segments in your index and it won't do any more merging until another segment is created. Additional merge levels exist. When you reach ten large segments, Solr will merge those into one huge segment. If indexing continues long enough to create ten huge segments, they will be merged into one enormous segment. It would be possible to have a stable index with 9 segments at each of the levels that I have mentioned -- 36 segments. The merge policy that Solr uses by default will continue creating additional merge levels until the segments at the highest reach at least five gigabytes in size -- nothing larger will be created unless you optimize the index. My effective merge factor is 35. I have personally witnessed stable indexes on my system with 80 or 90 segments. Thanks, Shawn
Re: Calculating filterCache size
Ben: As Shawn says, you're on the right track... Do note, though, that a 10K size here is probably excessive, YMMV of course. And an autowarm count of 5,000 is almost _certainly_ far more than you want. All these fq clauses get re-executed whenever a new searcher is opened (soft commit or hard commit with openSearcher=true). I realize this may just be illustrative. Is this your actual setup? And if so, what is your motivation for 5,000 autowarm count? Best, Erick On Wed, Jun 18, 2014 at 11:42 AM, Shawn Heisey s...@elyograg.org wrote: On 6/18/2014 10:57 AM, Benjamin Wiens wrote: Thanks Erick! So let's say I have a config of filterCache class=solr.FastLRUCache size=1 initialSize=1 autowarmCount=5000/ MaxDocuments = 1,000,000 So according to your formula, filterCache should roughly have the potential to consume this much RAM: ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 = 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb Yes, this is essentially correct. If you want to arrive at a number that's more accurate for the way that OS tools will report memory, you'll divide by 1024 instead of 1000 for each of the larger units. That results in a size of 1.16GB instead of 1.25. Computers think in powers of 2, dividing by 1000 assumes a bias to how people think, in powers of 10. It's the same thing that causes your computer to report 931GB for a 1TB hard drive. Thanks, Shawn
Re: Synonyms - 20th and 20
You almost certainly have WordDelimiterFilterFactory in your analysis chain after the synonym insertion. It's _job_ is to split on letter/non-letter transitions. The admin/analysis page is your friend. Best, Erick On Wed, Jun 18, 2014 at 12:47 PM, Diego Fernandez difer...@redhat.com wrote: What tokenizer and filters are you using? Diego Fernandez - 爱国 Software Engineer US GSS Supportability - Diagnostics - Original Message - I have a synonyms.txt file which has 20th,twentieth Once I apply the synonym, I see 20th, twentieth and 20 for 20th. Does anyone know where 20 comes from? How can I have only 20th and twentieth? Thanks, Jae
Re: Bug in Collapsing QParserPlugin : Sort by 3 or more fields is broken
Umesh, this is a good summary. So, the question is what is the cost (performance and memory) of having the CollapsingQParserPlugin choose the group head by using the Solr sort criteria? Keep in mind that the CollapsingQParserPlugin's main design goal is to provide fast performance when collapsing on a high cardinality field. How you choose the group head can have a big impact here, both on memory consumption performance. The function query collapse criteria was added to allow you to come up with custom formulas for selecting the group head, with little or no impact on performance and memory. Using Solr's recip() function query it seems like you could come up with some nice scenarios where two variables could be used to select the group head. For example: fq={!collapse field=a max='sub(prod(cscore(),1000), recip(field(x),1, 1000, 1000))'} This seems like it would basically give you two sort critea: cscore(), which returns the score, would be the primary criteria. The recip of field x would be the secondary criteria. Joel Bernstein Search Engineer at Heliosearch On Thu, Jun 19, 2014 at 2:18 AM, Umesh Prasad umesh.i...@gmail.com wrote: Continuing the discussion on mailing list from Jira. An Example *id group f1 f2*1 g1 5 10 2 g1 5 1000 3 g1 5 1000 4 g1 10 100 5 g2 5 10 6 g2 5 1000 7 g2 5 1000 8 g210 100 sort= f1 asc, f2 desc , id desc *Without collapse will give : * (7,g2), (6,g2), (3,g1), (2,g1), (5,g2), (1,g1), (8,g2), (4,g1) *On collapsing by group_s expected output is : * (7,g2), (3,g1) solr standard collapsing does give this output with group=on,group.field=group_s,group.main=true * Collapsing with CollapsingQParserPlugin* fq={!collapse field=group_s} : (5,g2), (1,g1) * Summarizing Jira Discussion :* 1. CollapsingQParserPlugin picks up the group heads from matching results and passes those further. So in essence filtering some of the matching documents, so that subsequent collectors never see them. It can also pass on score to subsequent collectors using a dummy scorer. 2. TopDocCollector comes later in hierarchy and it will sort on the collapsed set. That works fine. The issue is with step 1. Collapsing is done by a single comparator which can take its value from a field or function. It defaults to score. Function queries do allow us to combine multiple fields / value sources, however it would be difficult to construct a function for given sort fields. Primarily because a) The range of values for a given sort field is not known in advance. It is possible for one sort field to unbounded, but other to be bounded within a small range. b) The sort field can itself hold custom logic. Because of (a) the group head selected by CollapsingQParserPlugin will be incorrect and subsequent sorting will break. On 14 June 2014 12:38, Umesh Prasad umesh.i...@gmail.com wrote: Thanks Joel for the quick response. I have opened a new jira ticket. https://issues.apache.org/jira/browse/SOLR-6168 On 13 June 2014 17:45, Joel Bernstein joels...@gmail.com wrote: Let's open a new ticket. Joel Bernstein Search Engineer at Heliosearch On Fri, Jun 13, 2014 at 8:08 AM, Umesh Prasad umesh.i...@gmail.com wrote: The patch in SOLR-5408 fixes the issue with sorting only for two sort fields. Sorting still breaks when 3 or more sort fields are used. I have attached a test case, which demonstrates the broken behavior when 3 sort fields are used. The failing test case patch is against Lucene/Solr 4.7 revision number 1602388 Can someone apply and verify the bug ? Also, should I re-open SOLR-5408 or open a new ticket ? --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad -- --- Thanks Regards Umesh Prasad
Query Response in Html
Hi, I am using XSLResponseWriter on my application for to transform xml response into html.The following params i have set for that purpose. query.set(wt, xslt); query.set(indent,true); query.set(tr, example.xsl); but the response is coming as normal text.Even though i remove the params the response is coming same as previous with out any change. I have also tried with velocity Response writer also by setting the following params. query.set(wt, velocity); query.set(v.template,browse); query.set(v.layout, layout); then also i am getting same response as normal text. I would like to get html response. So could you please provide any solution. Thanks, Venkata Krishna Tolusuri. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Response-in-Html-tp4142838.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Calculating filterCache size
Thanks to both of you. Yes the mentioned config is illustrative, we decided for 512 after thorough testing. However, when you google Solr filterCache the first link is the community wiki which has a config even higher than the illustration which is quite different from the official reference guide. It might be a good idea to change this unless there's a very small index. http://wiki.apache.org/solr/SolrCaching#filterCache filterCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ On Thu, Jun 19, 2014 at 9:48 AM, Erick Erickson erickerick...@gmail.com wrote: Ben: As Shawn says, you're on the right track... Do note, though, that a 10K size here is probably excessive, YMMV of course. And an autowarm count of 5,000 is almost _certainly_ far more than you want. All these fq clauses get re-executed whenever a new searcher is opened (soft commit or hard commit with openSearcher=true). I realize this may just be illustrative. Is this your actual setup? And if so, what is your motivation for 5,000 autowarm count? Best, Erick On Wed, Jun 18, 2014 at 11:42 AM, Shawn Heisey s...@elyograg.org wrote: On 6/18/2014 10:57 AM, Benjamin Wiens wrote: Thanks Erick! So let's say I have a config of filterCache class=solr.FastLRUCache size=1 initialSize=1 autowarmCount=5000/ MaxDocuments = 1,000,000 So according to your formula, filterCache should roughly have the potential to consume this much RAM: ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 = 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb Yes, this is essentially correct. If you want to arrive at a number that's more accurate for the way that OS tools will report memory, you'll divide by 1024 instead of 1000 for each of the larger units. That results in a size of 1.16GB instead of 1.25. Computers think in powers of 2, dividing by 1000 assumes a bias to how people think, in powers of 10. It's the same thing that causes your computer to report 931GB for a 1TB hard drive. Thanks, Shawn
Re: Limit Porter stemmer to plural stemming only?
Hi, Do you mind attaching the Plural only Stemmer? I cant find it in this post. Thanks Jerry -- View this message in context: http://lucene.472066.n3.nabble.com/Limit-Porter-stemmer-to-plural-stemming-only-tp486449p4142867.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Why aren't my nested documents nesting?
Thanks, I tried the block join query via the browser this morning with no success; My URL ( encoded of course). I used this as a guide https://cwiki.apache.org/confluence/display/solr/Other+Parsers http://localhost:8088/solr/test_core/select?q={!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true ( equivalent to http://localhost:8088/solr/test_core/select?q=%7b!parent+which%3d%22content_type%3aparentDocument%22%7dATTRIBUTES.STATE%3aTX%26wt%3djson%26indent%3dtrue ) Resulting in response lst name=responseHeader int name=status0/int int name=QTime1/int lst name=params str name=q {!parent which=content_type:parentDocument}ATTRIBUTES.STATE:TXwt=jsonindent=true /str /lst /lst result name=response numFound=0 start=0/ /response On Wed, Jun 18, 2014 at 11:30 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: because you need you query by special query parser http://blog.griddynamics.com/2013/09/solr-block-join-support.html to nest the output you need https://issues.apache.org/jira/browse/SOLR-5285 On Thu, Jun 19, 2014 at 3:20 AM, Vinay B, vybe3...@gmail.com wrote: Probably a silly error. Can someone point out my mistake? Code and output gists at https://gist.github.com/anonymous/fb9cdb5b44e76b2c308d Thanks Code: SolrInputDocument solrDoc = new SolrInputDocument(); solrDoc.addField(id, documentId); solrDoc.addField(content_type, parentDocument); solrDoc.addField(Constants.REMOTE_FILE_PATH, filePath == null ? : filePath); solrDoc.addField(Constants.REMOTE_FILE_LOAD, Constants.TRUE); SolrInputDocument childDoc = new SolrInputDocument(); childDoc.addField(Constants.ID, documentId+-A); childDoc.addField(ATTRIBUTES.STATE, LA); childDoc.addField(ATTRIBUTES.STATE, TX); solrDoc.addChildDocument(childDoc); solrServer.add(solrDoc); solrServer.commit(); -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
clarification on index-to-ram ratio
Hello All, The documentation and general feedback on the mailing list suggest the following: *... Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB ...* http://wiki.apache.org/solr/SolrPerformanceProblems#General_information So, when we say index size does it include ALL the replicas or just one of the replica? Say for example, if the solr instance had 2 replicas each of size 8GB, should we consider 16GB as our index size or just 8GB - for the above index-ram-ratio consideration? Thanks Vinay
Re: Segment Count of my Index is greater than the Configured MergeFactor
: I want to understand why 13 segments are created in my index?? : Could appreciate if i can get response ASAP : Imagine the following scenario. You start from a clean index and do FWIW: the TL;DR of Shawn's response can be seen in this animation of how Log based MergePolicy's work in the simplest scenerios... https://www.youtube.com/watch?v=YW0bOvLp72E More animations of other scnerioes and other MergePolicies can be found here... http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html -Hoss http://www.lucidworks.com/
RE: clarification on index-to-ram ratio
Vinay Pothnis [poth...@gmail.com] wrote: *... Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB ...* So, when we say index size does it include ALL the replicas or just one of the replica? Say for example, if the solr instance had 2 replicas each of size 8GB, should we consider 16GB as our index size or just 8GB - for the above index-ram-ratio consideration? 16GB, according to the above principle. Enough RAM to hold all index data on storage. Two things though, 1) If you have replicas of the same data on the same machine, I hope that you have them on separate physical drives. If not, it is just wasted disk cache with no benefits. 2) The general advice is only really usable when we're either talking fairly small indexes on spinning drives or there is a strong need for the absolute lowest latency possible. As soon as we scale up and do not have copious amounts of money, solid state drives provides much better bang for the buck than a spinning drives + RAM combination. - Toke Eskildsen
Re: clarification on index-to-ram ratio
Thanks! And yes, the replica belongs to a different shard - not the same data. -Vinay On 19 June 2014 11:21, Toke Eskildsen t...@statsbiblioteket.dk wrote: Vinay Pothnis [poth...@gmail.com] wrote: *... Let's say that you have a Solr index size of 8GB. If your OS, Solr's Java heap, and all other running programs require 4GB of memory, then an ideal memory size for that server is at least 12GB ...* So, when we say index size does it include ALL the replicas or just one of the replica? Say for example, if the solr instance had 2 replicas each of size 8GB, should we consider 16GB as our index size or just 8GB - for the above index-ram-ratio consideration? 16GB, according to the above principle. Enough RAM to hold all index data on storage. Two things though, 1) If you have replicas of the same data on the same machine, I hope that you have them on separate physical drives. If not, it is just wasted disk cache with no benefits. 2) The general advice is only really usable when we're either talking fairly small indexes on spinning drives or there is a strong need for the absolute lowest latency possible. As soon as we scale up and do not have copious amounts of money, solid state drives provides much better bang for the buck than a spinning drives + RAM combination. - Toke Eskildsen
Re: Query Response in Html
Show us the complete code you’re using. Is the “text” not HTML text? What are you receiving exactly and what you expecting instead? To use SolrJ with other types of responses (non-XML/javabin) you’ll need to configure a ResponseParser. The NoOpResponseParser may do the trick, where you get back the text (though it would be HTML text) in the “response” key of the NamedList returned. Erik On Jun 19, 2014, at 10:27 AM, Venkata krishna venkat1...@gmail.com wrote: Hi, I am using XSLResponseWriter on my application for to transform xml response into html.The following params i have set for that purpose. query.set(wt, xslt); query.set(indent,true); query.set(tr, example.xsl); but the response is coming as normal text.Even though i remove the params the response is coming same as previous with out any change. I have also tried with velocity Response writer also by setting the following params. query.set(wt, velocity); query.set(v.template,browse); query.set(v.layout, layout); then also i am getting same response as normal text. I would like to get html response. So could you please provide any solution. Thanks, Venkata Krishna Tolusuri. -- View this message in context: http://lucene.472066.n3.nabble.com/Query-Response-in-Html-tp4142838.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Calculating filterCache size
That's specific to using the facet.method=enum, but do admit it's easy to miss that. I added a note about that though... Thanks for pointing that out! On Thu, Jun 19, 2014 at 9:38 AM, Benjamin Wiens benjamin.wi...@gmail.com wrote: Thanks to both of you. Yes the mentioned config is illustrative, we decided for 512 after thorough testing. However, when you google Solr filterCache the first link is the community wiki which has a config even higher than the illustration which is quite different from the official reference guide. It might be a good idea to change this unless there's a very small index. http://wiki.apache.org/solr/SolrCaching#filterCache filterCache class=solr.LRUCache size=16384 initialSize=4096 autowarmCount=4096/ On Thu, Jun 19, 2014 at 9:48 AM, Erick Erickson erickerick...@gmail.com wrote: Ben: As Shawn says, you're on the right track... Do note, though, that a 10K size here is probably excessive, YMMV of course. And an autowarm count of 5,000 is almost _certainly_ far more than you want. All these fq clauses get re-executed whenever a new searcher is opened (soft commit or hard commit with openSearcher=true). I realize this may just be illustrative. Is this your actual setup? And if so, what is your motivation for 5,000 autowarm count? Best, Erick On Wed, Jun 18, 2014 at 11:42 AM, Shawn Heisey s...@elyograg.org wrote: On 6/18/2014 10:57 AM, Benjamin Wiens wrote: Thanks Erick! So let's say I have a config of filterCache class=solr.FastLRUCache size=1 initialSize=1 autowarmCount=5000/ MaxDocuments = 1,000,000 So according to your formula, filterCache should roughly have the potential to consume this much RAM: ((1,000,000 / 8) + 128) * (10,000) = 1,251,280,000 byte / 1,000 = 1,251,280 kb / 1,000 = 1,251.28 mb / 1000 = 1.25 gb Yes, this is essentially correct. If you want to arrive at a number that's more accurate for the way that OS tools will report memory, you'll divide by 1024 instead of 1000 for each of the larger units. That results in a size of 1.16GB instead of 1.25. Computers think in powers of 2, dividing by 1000 assumes a bias to how people think, in powers of 10. It's the same thing that causes your computer to report 931GB for a 1TB hard drive. Thanks, Shawn
[ANN] Heliosearch 0.06 released, native code faceting
FYI, for those who want to try out the new native code faceting, this is the first release containing it (for single valued string fields only as of yet). http://heliosearch.org/download/ Heliosearch v0.06 Features: o Heliosearch v0.06 is based on (and contains all features of) Lucene/Solr 4.9.0 o Native code faceting for single valued string fields. - Written in C++, statically compiled with gcc for Windows, Mac OS-X, Linux - static compilation avoids JVM hotspot warmup period, mis-compilation bugs, and variations between runs - Improves performance over 2x o Top level Off-heap fieldcache for single valued string fields in nCache. - Improves sorting and faceting speed - Reduces garbage collection overhead - Eliminates FieldCache “insanity” that exists in Apache Solr from faceting and sorting on the same field o Full request Parameter substitution / macro expansion, including default value support. o frange query now only returns documents with a value. For example, in Apache Solr, {!frange l=-1 u=1 v=myfield} will also return documents without a value since the numeric default value of 0 lies within the range requested. o New JSON features via Noggit upgrade, allowing optional comments (C/C++ and shell style), unquoted keys, and relaxed escaping that allows one to backslash escape any character. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
Re: Cursor deep paging new behavior
if by old behavior you mean incremending the start param then the the order of results when doing concurrent indexing was always dependent on what exactly your sort was. when using a cursor, the impacts of concurrent indexing are also dependent on what your sort clause looks like -- but in differnet ways. both situations are extensively documented... https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results#PaginationofResults-HowBasicPaginationisAffectedbyIndexUpdates https://cwiki.apache.org/confluence/display/solr/Pagination+of+Results#PaginationofResults-HowcursorsareAffectedbyIndexUpdates : I have a quick question about this new implementation - in the old : implementation AFAIK, in a real-time indexing scenario, the results : gathered from paging would not be consecutive. Meaning you would ask for : 50 docs, new docs arrive, when you ask for the next 50 docs - you get an : arbitrary new document set (50 after any newly inserted docs). : : Having read a bit about the cursor implementation, is it true that the : next 50 results are now consecutive to the first set due to the fact : that lucene actually tracks the mark? -Hoss http://www.lucidworks.com/
Re: Limit Porter stemmer to plural stemming only?
: Can you please share the Java code for Plural Only Porter Stemmer for English if you don't mind? The Porter stemmer algorithm, by definition, does more then just stip plurals. If you are interested in a lighter weight stemmer for english, this is exactly what the EnglishMinimalStemFilterFactory is for... https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/en/EnglishMinimalStemFilterFactory.html Although you may also be interested in combining it with the EnglishPossessiveFilterFactory... https://lucene.apache.org/core/4_8_0/analyzers-common/org/apache/lucene/analysis/en/EnglishPossessiveFilterFactory.html -Hoss http://www.lucidworks.com/
Re: Multivalue wild card search
Ahmet, Assuming there is a multiValued field called Name of type string stored in index - //Doc 1 id : 23512 HotelId : [ 12, 23, 12 ] Name : [ [[\Ethan\, \G\, \\],[\Steve\, \Wonder\, \\]], [], [[\hifte\, \Grop\, \\]] ] // Doc 2 id : 23513 HotelId : [ 12, 12 ] Name : [ [[\Ethan\, \G\, \\],[\Steve\, \\, \\]], [], ] Here, how do I find the document with Name that contains Steve Wonder? I tried q=***[\Steve\, \Wonder\, \\]] but that doesn't work. On Fri, Jun 6, 2014 at 11:10 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Ethan, It is hard to understand your example. Can you re-write it? Using xml? On Friday, June 6, 2014 9:07 PM, Ethan eh198...@gmail.com wrote: Bumping the thread to see if anyone has a solution. On Thu, Jun 5, 2014 at 9:52 AM, Ethan eh198...@gmail.com wrote: Wildcard search do work on multiValued field. I was able to pull up records for following multiValued field - Code : [ 12344, 4534, 674 ] q=Code:45* fetched the correct document. It doesn't work in quotes(q=Code:45*), however. Is there a workaround? On Thu, Jun 5, 2014 at 9:34 AM, Ethan eh198...@gmail.com wrote: Are you implying there is not way to lookup on a multiValued field with a substring? If so, then how is it usually handled? On Wed, Jun 4, 2014 at 4:44 PM, Jack Krupansky j...@basetechnology.com wrote: Wildcard, fuzzy, and regex query operate on a single term of a single tokenized field value or a single string field value. -- Jack Krupansky -Original Message- From: Ethan Sent: Wednesday, June 4, 2014 6:59 PM To: solr-user Subject: Multivalue wild card search I can't seem to find a solution to do wild card search on a multiValued field. For Eg consider a multiValued field called Name with 3 values - Name : [ [[\Ethan\, \G\, \\],[\Steve\, \Wonder\, \\]], [], [[\hifte\, \Grop\, \\]] ] For a multiValued like above, I want search like- q=***[\Steve\, \Wonder\, \\] But I do not get back any results back. Any ideas on to create such query?
Indexing a term into separate Lucene indexes
If I have documents with a person and his email address: u...@domain.commailto:u...@domain.com How can I configure Solr (4.6) so that the email address source field is indexed as - the user part of the address (e.g., user) is in Lucene index X - the domain part of the address (e.g., domain.com) is in a separate Lucene index Y I would like to be able search as follows: - Find all people whose email addresses have user part = userXyz - Find all people whose email addresses have domain part = domainABC.com - Find the person with exact email address = user...@domainabc.com Would I use a copyField declaration in my schema? http://wiki.apache.org/solr/SchemaXml#Copy_Fields Thanks!
Fwd: Tracing Files Which Have Errors
Hi there, I have posted 190,000 simple XML using POST.JAR and there are only 8 files that were with errors. But how do I know which are the ones have errors? Thank you in advance, Simon Cheng.
running Post jar from different server
Hi, I have situation where my SQL Job initiate a console application , where I am calling the post.jar to upload data to SOLR. Both SQL DB and SOLR are 2 different servers. I am calling post.jar from my SQLDB where the path is mapped to a network drive. I am getting an error file not found. Is the above scenario is possible, if anyone has some experience on this can you share or any direction will be really appreciated. Thanks Ravi
Re: running Post jar from different server
Ravi, post.jar is a standalone utility that does not have to be on the same server. If you can share the command you are executing, there might be some pointers in there. Thanks, -- *Sameer Maggon* http://measuredsearch.com On Thu, Jun 19, 2014 at 8:54 PM, EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com wrote: Hi, I have situation where my SQL Job initiate a console application , where I am calling the post.jar to upload data to SOLR. Both SQL DB and SOLR are 2 different servers. I am calling post.jar from my SQLDB where the path is mapped to a network drive. I am getting an error file not found. Is the above scenario is possible, if anyone has some experience on this can you share or any direction will be really appreciated. Thanks Ravi
[ANN][Meta] Apache Solr popularizers LinkedIn group
Hello, ( TL;DR: http://www.linkedin.com/groups?gid=6713853 ) Based on - short :-] - Twitter discussion, we have decided to have a go at some sort of a group for Solr popularizers: people who are teaching Solr, running meetups, building Solr examples, and writing Solr books. Basically, anybody whose goal is not just to learn Solr for themselves but to also spread the sunny goodness message to others. The current attempt at this conversation space is a private LinkedIn group. Which obviously has some advantages and disadvantages. I thought about Google and Yahoo groups, but feel they have even more disadvantages. The group is private (for now) to see if it will foster more frank discussion and maybe things like early slide sharing and work-in-progress. So, if you are popularizing Solr and feel you could benefit from a community of other people struggling with the same meta-level explanation issues, this is for you. Come and help us build that community: http://www.linkedin.com/groups?gid=6713853 Regards, Alex. Personal website: http://www.outerthoughts.com/ Current project: http://www.solr-start.com/ - Accelerating your Solr proficiency
Re: Multivalue wild card search
1. Wildcards do not work within quoted terms. 2. Spaces in terms need to be escaped. 3. The quotes embedded in a term do not need to be escaped. So, try: q=*[Steve,\ Wonder,\ ]] or q=*[Steve,\ Wonder,\ ]* -- Jack Krupansky -Original Message- From: Ethan Sent: Thursday, June 19, 2014 5:16 PM To: solr-user ; Ahmet Arslan Subject: Re: Multivalue wild card search Ahmet, Assuming there is a multiValued field called Name of type string stored in index - //Doc 1 id : 23512 HotelId : [ 12, 23, 12 ] Name : [ [[\Ethan\, \G\, \\],[\Steve\, \Wonder\, \\]], [], [[\hifte\, \Grop\, \\]] ] // Doc 2 id : 23513 HotelId : [ 12, 12 ] Name : [ [[\Ethan\, \G\, \\],[\Steve\, \\, \\]], [], ] Here, how do I find the document with Name that contains Steve Wonder? I tried q=***[\Steve\, \Wonder\, \\]] but that doesn't work. On Fri, Jun 6, 2014 at 11:10 AM, Ahmet Arslan iori...@yahoo.com.invalid wrote: Hi Ethan, It is hard to understand your example. Can you re-write it? Using xml? On Friday, June 6, 2014 9:07 PM, Ethan eh198...@gmail.com wrote: Bumping the thread to see if anyone has a solution. On Thu, Jun 5, 2014 at 9:52 AM, Ethan eh198...@gmail.com wrote: Wildcard search do work on multiValued field. I was able to pull up records for following multiValued field - Code : [ 12344, 4534, 674 ] q=Code:45* fetched the correct document. It doesn't work in quotes(q=Code:45*), however. Is there a workaround? On Thu, Jun 5, 2014 at 9:34 AM, Ethan eh198...@gmail.com wrote: Are you implying there is not way to lookup on a multiValued field with a substring? If so, then how is it usually handled? On Wed, Jun 4, 2014 at 4:44 PM, Jack Krupansky j...@basetechnology.com wrote: Wildcard, fuzzy, and regex query operate on a single term of a single tokenized field value or a single string field value. -- Jack Krupansky -Original Message- From: Ethan Sent: Wednesday, June 4, 2014 6:59 PM To: solr-user Subject: Multivalue wild card search I can't seem to find a solution to do wild card search on a multiValued field. For Eg consider a multiValued field called Name with 3 values - Name : [ [[\Ethan\, \G\, \\],[\Steve\, \Wonder\, \\]], [], [[\hifte\, \Grop\, \\]] ] For a multiValued like above, I want search like- q=***[\Steve\, \Wonder\, \\] But I do not get back any results back. Any ideas on to create such query?
Re: [ANN] Heliosearch 0.06 released, native code faceting
Congrats! Any idea when will native faceting off-heap fieldcache be available for multivalued fields? Most of my fields are multivalued so that's the big one for me. Andy On Thursday, June 19, 2014 3:46 PM, Yonik Seeley yo...@heliosearch.com wrote: FYI, for those who want to try out the new native code faceting, this is the first release containing it (for single valued string fields only as of yet). http://heliosearch.org/download/ Heliosearch v0.06 Features: o Heliosearch v0.06 is based on (and contains all features of) Lucene/Solr 4.9.0 o Native code faceting for single valued string fields. - Written in C++, statically compiled with gcc for Windows, Mac OS-X, Linux - static compilation avoids JVM hotspot warmup period, mis-compilation bugs, and variations between runs - Improves performance over 2x o Top level Off-heap fieldcache for single valued string fields in nCache. - Improves sorting and faceting speed - Reduces garbage collection overhead - Eliminates FieldCache “insanity” that exists in Apache Solr from faceting and sorting on the same field o Full request Parameter substitution / macro expansion, including default value support. o frange query now only returns documents with a value. For example, in Apache Solr, {!frange l=-1 u=1 v=myfield} will also return documents without a value since the numeric default value of 0 lies within the range requested. o New JSON features via Noggit upgrade, allowing optional comments (C/C++ and shell style), unquoted keys, and relaxed escaping that allows one to backslash escape any character. -Yonik http://heliosearch.org - native code faceting, facet functions, sub-facets, off-heap data
Re: Segment Count of my Index is greater than the Configured MergeFactor
Thanks Shawn and Thanks Chris!! Shawn your explanation was very clear and clarified my doubts Chris, The video was also very useful -- View this message in context: http://lucene.472066.n3.nabble.com/Segment-Count-of-my-Index-is-greater-than-the-Configured-MergeFactor-tp4142783p4142987.html Sent from the Solr - User mailing list archive at Nabble.com.
Solr index pdf/word document with attachements
Hi , How can I index word / pdf documents with attachments to solr? I have tried indexing a simple file with an attachment using tika, but it does not index the attachment separately. Only the origiinal document is getting indexed. Thanks, Prasi