Re: Facet pivot and distributed search
Thx! Geert Van Huychem IT Consultant iFrameWorx BVBA Mobile: +32 497 27 69 03 E-mail: ge...@iframeworx.be Site: http://www.iframeworx.be LinkedIn: http://www.linkedin.com/in/geertvanhuychem On Fri, Feb 7, 2014 at 8:55 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Yes this is a open issue. https://issues.apache.org/jira/browse/SOLR-2894 On Fri, Feb 7, 2014 at 1:13 PM, Geert Van Huychem ge...@iframeworx.be wrote: Hi I'm using Solr 4.5 in a multi-core environment. I've setup - one core per documenttype: text, rss, tweet and external documents. - one distrib core which basically distributes the query to the 4 cores mentioned hereabove. Facet pivot works on each core individually, but when I send the exact same query to the distrib core, I get no results. Anyone? Bug? Open issue? Best Geert Van Huychem -- Regards, Shalin Shekhar Mangar.
Re: Tf-Idf for a specific query
Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: Intercept updates and cascade loading of Index.
Thanks for insights. This helps indeed, however im not sure how do i get delta on commit. I guess I need to do some custom query to get what has changed since last update or sort of like that. I would experiment around that, if anyone does that please share. -- View this message in context: http://lucene.472066.n3.nabble.com/Intercept-updates-and-cascade-loading-of-Index-tp4115833p4116010.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: MoreLikeThis
Hi iorixxx, Sorry for the delay in replying. The code is the below: /public void performSearch(DSpaceObject dso) throws SearchServiceException { if(queryResults != null) { return; } this.queryArgs = prepareDefaultFilters(getView()); this.queryArgs.setRows(1); this.queryArgs.add(fl,dc.contributor.author,handle); this.queryArgs.add(mlt,true); this.queryArgs.add(mlt.fl,dc.contributor.author,handle); this.queryArgs.add(mlt.mindf,1); this.queryArgs.add(mlt.mintf,1); this.queryArgs.setQuery(handle: + dso.getHandle()); this.queryArgs.setRows(1); queryResults = getSearchService().search(queryArgs); } / I use dc.contributor.author for similarity (mlt.fl). I don't know which parameter is mlt.qf, the code by default is the above, and I only changed the part of dc.contributor.author. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/MoreLikeThis-tp4114605p4116022.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud: problems with delete
Hi all! Does SolrCloud correct delete documents? When I send many requests via POST with small number of ids – there are some documents left in index, which not deleted. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-problems-with-delete-tp4116026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr joins
Basically, i am trying to understand where and how solr joins differ from lucene joins. Any pointers, much appreciated ? Hello Anand, I'm keen for index-time joins (aka block joins), thus I've never looked into query-time ones. I even didn't ever know that there are two different query-time joins. This diverging might caused the fabulous drama: segmented vs top-level Ok. It seems like, Solr's query time join never scores https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L535 But Lucene's JoinUtils does https://github.com/apache/lucene-solr/blob/trunk/lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.java?source=cc#L78 Also Solr's join joins across different cores. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Re: solr joins
Thanks Mikhail, curious why was scoring left out of solr ? And if there's any plan to port it ? Also, if you can please elaborate on the segmented vs toplevel Thanks, Anand On 2/7/2014 4:53 PM, Mikhail Khludnev wrote: Basically, i am trying to understand where and how solr joins differ from lucene joins. Any pointers, much appreciated ? Hello Anand, I'm keen for index-time joins (aka block joins), thus I've never looked into query-time ones. I even didn't ever know that there are two different query-time joins. This diverging might caused the fabulous drama: segmented vs top-level Ok. It seems like, Solr's query time join never scores https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L535 But Lucene's JoinUtils does https://github.com/apache/lucene-solr/blob/trunk/lucene/join/src/java/org/apache/lucene/search/join/JoinUtil.java?source=cc#L78 Also Solr's join joins across different cores.
Re: solr joins
On Fri, Feb 7, 2014 at 3:32 PM, anand chandak anand.chan...@oracle.comwrote: Thanks Mikhail, curious why was scoring left out of solr ? Have no idea. I've never been involved in query-time join. And if there's any plan to port it ? I suppose there is no plan until someone raise a jira. Also, if you can please elaborate on the segmented vs toplevel http://vimeo.com/44113003 Is Your Index Reader Really Atomic or Maybe Slow? by Uwe Schindler Thanks, Anand On 2/7/2014 4:53 PM, Mikhail Khludnev wrote: Basically, i am trying to understand where and how solr joins differ from lucene joins. Any pointers, much appreciated ? Hello Anand, I'm keen for index-time joins (aka block joins), thus I've never looked into query-time ones. I even didn't ever know that there are two different query-time joins. This diverging might caused the fabulous drama: segmented vs top-level Ok. It seems like, Solr's query time join never scores https://github.com/apache/lucene-solr/blob/trunk/solr/ core/src/java/org/apache/solr/search/JoinQParserPlugin.java#L535 But Lucene's JoinUtils does https://github.com/apache/lucene-solr/blob/trunk/lucene/ join/src/java/org/apache/lucene/search/join/JoinUtil.java?source=cc#L78 Also Solr's join joins across different cores. -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Solr Composite Unique key from existing fields in schema
Hi, I am developing a search application using SOLR. I don't have primary key in any table. Composite key is being used in my application. How do i implement composite key as unique key in this case. please help. i am struggling. -- Thanks Regards Anurag Verma Arise! Awake! And stop not till the goal is reached!
Re: SolrCloud: problems with delete
Hi; What is your commit policy? Check this is works or not: *solr/update?commit=true* If it works then could you write your* autocommit configuration*? Thanks; Furkan KAMACI 2014-02-07 13:23 GMT+02:00 ku3ia dem...@gmail.com: Hi all! Does SolrCloud correct delete documents? When I send many requests via POST with small number of ids - there are some documents left in index, which not deleted. Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-problems-with-delete-tp4116026.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud: problems with delete
autocommit and /update/?commit=true works fine. I tell about, for example, I send 807632 docs to index to my 3 shard cluster - everything is fine, but when I'm trying to remove them, using POST request with small number of ids, lets say 100 per request - some docs are still on index, but seems must not. When I send a POST request with ids number like 1K or 2K all docs are deleted from index. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-problems-with-delete-tp4116026p4116038.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Need help for integrating solr-4.5.1 with UIMA
Hi, I tried almost all combinations in solrconfig.xml for using UIMA with solr. But each time i am indexing data to solr, getting this excpetion 113701 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore û org.apache.solr.common.SolrException: org.apache.uima.resource.ResourceInitializationException at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64) at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:60) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.uima.resource.ResourceInitializationException at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:58) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:61) ... 22 more Caused by: java.lang.NullPointerException at org.apache.uima.util.XMLInputSource.init(XMLInputSource.java:118) at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getInputSource(BasicAEProvider.java:84) at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:50) ... 23 more 113873 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore û [collection1] webapp=/solr path=/update params={version=2.2} status=500 QTime=203 113888 [http-bio-8080-exec-1] ERROR org.apache.solr.servlet.SolrDispatchFilter û null:org.apache.solr.common.SolrException: org.apache.uima.resource. ResourceInitializationException at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64) at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:60) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at
Re: Facet pivot and distributed search
FYI, the last distributed pivot facet patch functionally works, but there are some sub-optimal data structures being used and some unnecessary duplicate processing of values. As a result, we found that for certain worst-case scenarios (i.e. data is not randomly distributed across Solr cores and requires significant refinement) pivot facets with multiple levels could take over a minute to aggregate and process results. This was using a dataset of several hundred million documents and dozens of pivot facets across 120 Solr cores distributed over 20 servers, so it is a more extreme use-case than most will encounter. Nevertheless, we've refactored the code and data structures and brought the processing time from over a minute down to less than a second using the above configuration. We plan to post the patch within the next week. On Fri, Feb 7, 2014 at 3:08 AM, Geert Van Huychem ge...@iframeworx.bewrote: Thx! Geert Van Huychem IT Consultant iFrameWorx BVBA Mobile: +32 497 27 69 03 E-mail: ge...@iframeworx.be Site: http://www.iframeworx.be LinkedIn: http://www.linkedin.com/in/geertvanhuychem On Fri, Feb 7, 2014 at 8:55 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: Yes this is a open issue. https://issues.apache.org/jira/browse/SOLR-2894 On Fri, Feb 7, 2014 at 1:13 PM, Geert Van Huychem ge...@iframeworx.be wrote: Hi I'm using Solr 4.5 in a multi-core environment. I've setup - one core per documenttype: text, rss, tweet and external documents. - one distrib core which basically distributes the query to the 4 cores mentioned hereabove. Facet pivot works on each core individually, but when I send the exact same query to the distrib core, I get no results. Anyone? Bug? Open issue? Best Geert Van Huychem -- Regards, Shalin Shekhar Mangar.
ExtendedDismax and NOT operator
Hi This is my config: requestHandler name=edismax_basic class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qfbody/str str name=pftitle^30 introduction^15 body^10/str str name=ps0/str /lst /requestHandler Executing the following link: http://localhost:8983/solr/distrib/select?q=term1 NOT term2start=0rows=0qt=edismax_basicdebugQuery=true gives me as debuginfo: str name=parsedquery (+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2))) DisjunctionMaxQuery((title:term1 term2^30.0)) DisjunctionMaxQuery((introduction:term1 term2^15.0)) DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord /str My question is: why is term2 included in the phrase query part? Best Geert Van Huychem
Re: Solr and Polygon/Radius based spatial searches
David, Thanks for the response, the info should be very helpful! Lee -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-and-Polygon-Radius-based-spatial-searches-tp4115121p4116068.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtendedDismax and NOT operator
I suspect that's a bug. The phrase boost code should have the logic to exclude negated terms. File a Jira. Thanks for reporting this. -- Jack Krupansky -Original Message- From: Geert Van Huychem Sent: Friday, February 7, 2014 9:40 AM To: solr-user@lucene.apache.org Subject: ExtendedDismax and NOT operator Hi This is my config: requestHandler name=edismax_basic class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qfbody/str str name=pftitle^30 introduction^15 body^10/str str name=ps0/str /lst /requestHandler Executing the following link: http://localhost:8983/solr/distrib/select?q=term1 NOT term2start=0rows=0qt=edismax_basicdebugQuery=true gives me as debuginfo: str name=parsedquery (+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2))) DisjunctionMaxQuery((title:term1 term2^30.0)) DisjunctionMaxQuery((introduction:term1 term2^15.0)) DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord /str My question is: why is term2 included in the phrase query part? Best Geert Van Huychem
Group.Facet issue in Sharded Solr Setup
Am facing an issue with counts when using group.facets in my sharded solr. (Groups do not overlap across shards and for various reasons I cannot use group.truncate) Now, the problem being faced is that for items ranking lower in the faceted list sorted by count, the group facet counts are coming *higher* than actual values. So on doing an online search I came across details of sharded faceting at this link: http://lucene.472066.n3.nabble.com/At-a-high-level-how-does-faceting-in-SolrCloud-work-td4009897.html From the above link it appears there is a *third corrective step* wherein the coordinator node after getting individual results and building a final list, asks each shard to compute it's exact count for selected constraints. I wanted to ask if the Group.facet implementation in 4.x has been factored in this step and that the coordinator node is asking for grouped facet values instead of ungrouped facet counts during the corrective step ? Asking this because, counts are coming right for the 50% of the popular items but are incorrect (and always higher) for lesser items. Also has anyone else faced this ? Ritesh -- View this message in context: http://lucene.472066.n3.nabble.com/Group-Facet-issue-in-Sharded-Solr-Setup-tp4116077.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtendedDismax and NOT operator
Just to clarify: the actual url is properly space-escaped? http://localhost:8983/solr/distrib/select?q=term1%20NOT%20 term2start=0rows=0qt=edismax_basicdebugQuery=true alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-07 12:40 GMT-02:00 Geert Van Huychem ge...@iframeworx.be: Hi This is my config: requestHandler name=edismax_basic class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qfbody/str str name=pftitle^30 introduction^15 body^10/str str name=ps0/str /lst /requestHandler Executing the following link: http://localhost:8983/solr/distrib/select?q=term1 NOT term2start=0rows=0qt=edismax_basicdebugQuery=true gives me as debuginfo: str name=parsedquery (+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2))) DisjunctionMaxQuery((title:term1 term2^30.0)) DisjunctionMaxQuery((introduction:term1 term2^15.0)) DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord /str My question is: why is term2 included in the phrase query part? Best Geert Van Huychem
Re: ExtendedDismax and NOT operator
Yes, it is. Geert Van Huychem IT Consultant iFrameWorx BVBA Mobile: +32 497 27 69 03 E-mail: ge...@iframeworx.be Site: http://www.iframeworx.be LinkedIn: http://www.linkedin.com/in/geertvanhuychem On Fri, Feb 7, 2014 at 6:44 PM, Alexei Martchenko ale...@martchenko.com.brwrote: Just to clarify: the actual url is properly space-escaped? http://localhost:8983/solr/distrib/select?q=term1%20NOT%20 term2start=0rows=0qt=edismax_basicdebugQuery=true alexei martchenko Facebook http://www.facebook.com/alexeiramone | Linkedinhttp://br.linkedin.com/in/alexeimartchenko| Steam http://steamcommunity.com/id/alexeiramone/ | 4sqhttps://pt.foursquare.com/alexeiramone| Skype: alexeiramone | Github https://github.com/alexeiramone | (11) 9 7613.0966 | 2014-02-07 12:40 GMT-02:00 Geert Van Huychem ge...@iframeworx.be: Hi This is my config: requestHandler name=edismax_basic class=solr.SearchHandler lst name=defaults str name=defTypeedismax/str str name=qfbody/str str name=pftitle^30 introduction^15 body^10/str str name=ps0/str /lst /requestHandler Executing the following link: http://localhost:8983/solr/distrib/select?q=term1 NOT term2start=0rows=0qt=edismax_basicdebugQuery=true gives me as debuginfo: str name=parsedquery (+(DisjunctionMaxQuery((body:term1)) -DisjunctionMaxQuery((body:term2))) DisjunctionMaxQuery((title:term1 term2^30.0)) DisjunctionMaxQuery((introduction:term1 term2^15.0)) DisjunctionMaxQuery((body:term1 term2^10.0)))/no_coord /str My question is: why is term2 included in the phrase query part? Best Geert Van Huychem
Re: Need help for integrating solr-4.5.1 with UIMA
The UIMA component is not very error-friendly - NPE gets thrown for missing or misspelled parameter names. Basically, you have to look at the source code based on that stack trace to find out which parameter was missing. -- Jack Krupansky -Original Message- From: rashi gandhi Sent: Friday, February 7, 2014 8:32 AM To: solr-user@lucene.apache.org ; u...@uima.apache.org Subject: Re: Need help for integrating solr-4.5.1 with UIMA Hi, I tried almost all combinations in solrconfig.xml for using UIMA with solr. But each time i am indexing data to solr, getting this excpetion 113701 [http-bio-8080-exec-1] ERROR org.apache.solr.core.SolrCore û org.apache.solr.common.SolrException: org.apache.uima.resource.ResourceInitializationException at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64) at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:60) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:222) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:171) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:99) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:953) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:408) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1008) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.uima.resource.ResourceInitializationException at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:58) at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:61) ... 22 more Caused by: java.lang.NullPointerException at org.apache.uima.util.XMLInputSource.init(XMLInputSource.java:118) at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getInputSource(BasicAEProvider.java:84) at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:50) ... 23 more 113873 [http-bio-8080-exec-1] INFO org.apache.solr.core.SolrCore û [collection1] webapp=/solr path=/update params={version=2.2} status=500 QTime=203 113888 [http-bio-8080-exec-1] ERROR org.apache.solr.servlet.SolrDispatchFilter û null:org.apache.solr.common.SolrException: org.apache.uima.resource. ResourceInitializationException at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64) at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:60) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at
SOLR 4.6 and Highlight Snippets with spannear
Hi. I am using Solr 4.6 with XmlQueryParser from Jira. I have noticed that if I have a spannear query, then no highlights snippets are returned. I have tried both regular highlighter as well as fast vector highlighter. Is there any limitation of the highlighters with respect to spannear queries? Regards Puneet
Swap space,JVM-Memory,Physical memory on Solr Admin UI explanation
Hi, I am using solr 4.6.1 on a Windows 7 server right now with 32 GB RAM.I have a SolrCloud with 3 shards, 2 replicas and an embedded Zookeeper on the 1 box.I have allocated -Xmx5GB RAM to each Solr instance when starting up with -XX:MaxNewSize:1636m I see the Swap space(32.5G/64GB),JVM-Memory(521.1MB/4.73GB),Physical memory(11.07 GB/32GB) on Solr Admin UI. That usage is confusing me. The Swap space was going up when indexing 15 million documents but not the JVM-memory(which went up a max of 1.1G or so). So, does that mean I don't need to allocate that much RAM for each Solr instance ? Could someone explain the 3 terms clearly in terms of their use in Indexing and Querying: Swap space, JVM-memory and Physical memory ? TIA, Vijay
Re: Tf-Idf for a specific query
David, I can imagine that DF for resultset is facets! On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.comwrote: Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
Inconsistent results in a distributed configuration
I´m getting inconsistent results in a distributed configuration. Using stats command over a single core containing about 3 milion docs I´ve got 452660794509326.7 (a double type field). On the other hand, when partitioning the data into 2 or 4 cores I am getting a different result: 452660794509325.4. Has anyone faced the same problem ? Is it a misconfiguration or a bug ? Any hints ? -- View this message in context: http://lucene.472066.n3.nabble.com/Inconsistent-results-in-a-distributed-configuration-tp4116061.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Tf-Idf for a specific query
Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com
After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned No live servers for shard but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp1418442930-22296231] ERROR o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:724)
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
Hey, yeah, blew it on this one. Someone just reported it the other day - the way that a bug was fixed was not back and forward compatible. The first implementation was wrong. You have to update the other nodes to 4.6.1 as well. I’m going to look at some scripting test that can help check for this type of thing. - Mark http://about.me/markrmiller On Feb 7, 2014, 7:01:24 PM, Brett Hoerner br...@bretthoerner.com wrote: I have Solr 4.6.1 on the server and just upgraded my indexer app to SolrJ 4.6.1 and indexing ceased (indexer returned No live servers for shard but the real root from the Solr servers is below). Note that SolrJ 4.6.1 is fine for the query side, just not adding documents. 21:35:21.508 [qtp1418442930-22296231] ERROR o.a.solr.servlet.SolrDispatchFilter - null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:232) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:223) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:114) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:721) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:417) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:201) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:953) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Thread.java:724)
Re: After upgrading indexer to SolrJ 4.6.1: o.a.solr.servlet.SolrDispatchFilter - Unknown type 19
On Fri, Feb 7, 2014 at 6:15 PM, Mark Miller markrmil...@gmail.com wrote: You have to update the other nodes to 4.6.1 as well. I'm not sure I follow, all of the Solr instances in the cluster are 4.6.1 to my knowledge? Thanks, Brett
Index a new record in MySQL
Hi, How do I approach the issue of firing the DIH without it having to index the whole DB when adding a new record? It appears that when a new record is added the delta query on DIH doesn’t pick up the record. And I don’t want to run a full index on the DB when adding 1 single row. Any suggestions please? Thanks -Peri *** DISCLAIMER *** This is a PRIVATE message. If you are not the intended recipient, please delete without copying and kindly advise us by e-mail of the mistake in delivery. NOTE: Regardless of content, this e-mail shall not operate to bind HTC Global Services to any order or other contract unless pursuant to explicit written agreement or government initiative expressly permitting the use of e-mail for such purpose.
Re: Tf-Idf for a specific query
Thanks Mikhai, It seems that, this was what I was looking for. Being new to this, I wasn't aware of such a use of facets. Now I can probably combine the term vectors and facets to fit my scenario. Regards, Dave On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: David, I can imagine that DF for resultset is facets! On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com wrote: Hi Mikhail, The DF seems to be based on the entire document set. What I require is based on a the results of a single query. Suppose my Solr query returns a set of 50K documents from a superset of 10Million documents, I require to calculate the DF just based on the 50K documents. But currently it seems to be calculated on the entire doc set. So, is there any way to get the DF or IDF just on basis of the docs returned by the query? Regards, Dave On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev mkhlud...@griddynamics.com wrote: Hello Dave you can get DF from http://wiki.apache.org/solr/TermsComponent (invert it yourself) then, for certain term you can get number of occurrences per document by http://wiki.apache.org/solr/FunctionQuery#tf On Fri, Feb 7, 2014 at 3:58 AM, David Miller davthehac...@gmail.com wrote: Hi Guys.. I require to obtain Tf-idf score from Solr for a certain set of documents. But the catch is that, I needs the IDF (or DF) to be calculated on the documents returned by the specific query and not the entire corpus. Please provide me some hint on whether Solr has this feature or if I can use the Lucene Api directly to achieve this. Thanks in advance, Dave -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com -- Sincerely yours Mikhail Khludnev Principal Engineer, Grid Dynamics http://www.griddynamics.com mkhlud...@griddynamics.com