Re: Open Too Many Files
or change the index to a compound-index solrconfig.xml: useCompoundFiletrue/useCompoundFile so solr creates one index file and not thousands. - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx -- View this message in context: http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2411736.html Sent from the Solr - User mailing list archive at Nabble.com.
facet.mincount
Hi all, Even after making facet.mincount=1 , it is showing the results with count = 0. Does anyone know why this is happening. -- Thanks Regards, Isan Fulia.
Re: facet.mincount
could you post the query you are submitting to Solr? On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote: Hi all, Even after making facet.mincount=1 , it is showing the results with count = 0. Does anyone know why this is happening. -- Thanks Regards, Isan Fulia.
Re: DataImportHandler: no queries when using entity=something
add to url clean=false http://solr:8983/solr/dataimport?command=full-importentity=games; clean=false *clean* : (default 'true'). Tells whether to clean up the index before the indexing is started
Re: DataImportHandler: no queries when using entity=something
check your log file you might have a connection problem
Re: DataImportHandler: no queries when using entity=something
On Thu, Feb 3, 2011 at 3:23 PM, Darx Oman darxo...@gmail.com wrote: add to url clean=false http://solr:8983/solr/dataimport?command=full-importentity=games; clean=false *clean* : (default 'true'). Tells whether to clean up the index before the indexing is started [...] Sorry, what does that have to do with the original poster's question? Regards, Gora
Re: Open Too Many Files
Or decrease the mergeFactor. or change the index to a compound-index solrconfig.xml: useCompoundFiletrue/useCompoundFile so solr creates one index file and not thousands. - --- System One Server, 12 GB RAM, 2 Solr Instances, 7 Cores, 1 Core with 31 Million Documents other Cores 100.000 - Solr1 for Search-Requests - commit every Minute - 4GB Xmx - Solr2 for Update-Request - delta every 2 Minutes - 4GB Xmx
Re: facet.mincount
Any query followed by facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1 On 3 February 2011 15:14, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: could you post the query you are submitting to Solr? On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote: Hi all, Even after making facet.mincount=1 , it is showing the results with count = 0. Does anyone know why this is happening. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: facet.mincount
Have you seen your log file ,what saying the log file . Is there any exception occur? I have never seen that facet.mincont=1 not working. What version of solr you are using? - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2412389.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Open Too Many Files
best option to use useCompoundFiletrue/useCompoundFile decreasing mergeFactor may cause indexing slow - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2412415.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: from long to tlong, compatible?
Thanks for the fast answer. Yeah, I was afraid that I needed to re-index for the precision to take effect in this case. - Original Message From: Yonik Seeley yo...@lucidimagination.com To: solr-user@lucene.apache.org Sent: Wed, February 2, 2011 10:12:42 PM Subject: Re: from long to tlong, compatible? On Wed, Feb 2, 2011 at 3:46 PM, Dan G diser...@yahoo.se wrote: My question is if it would be possible to just change the field to the preferred type tlong with a precision of 8? Would this change be compatible with my indexed data or should I re-indexed the date (a pain with 800+M docs :))? I think you'll need to re-index, or range queries on that field will miss many of the documents you've already indexed with precisionStep=0 -Yonik http://lucidimagination.com
Re: facet.mincount
I don't think facet.mincount works with date faceting, see here: http://wiki.apache.org/solr/SimpleFacetParameters Dan On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com wrote: Any query followed by facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1 On 3 February 2011 15:14, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: could you post the query you are submitting to Solr? On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote: Hi all, Even after making facet.mincount=1 , it is showing the results with count = 0. Does anyone know why this is happening. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: facet.mincount
I am using solr1.4.1 release version I got the following error while using facet.mincount java.lang.IllegalStateException: STREAM at org.mortbay.jetty.Response.getWriter(Response.java:571) at org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158) at org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151) at org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208) at org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144) at org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95) at org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote: I don't think facet.mincount works with date faceting, see here: http://wiki.apache.org/solr/SimpleFacetParameters Dan On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com wrote: Any query followed by facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1 On 3 February 2011 15:14, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: could you post the query you are submitting to Solr? On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote: Hi all, Even after making facet.mincount=1 , it is showing the results with count = 0. Does anyone know why this is happening. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia. -- Thanks Regards, Isan Fulia.
Re: facet.mincount
Hi Dan, I'm probably just not able to spot this, but where does the wiki page mention that the facet.mincount is not applicable on date fields? On 3 February 2011 10:55, Isan Fulia isan.fu...@germinait.com wrote: I am using solr1.4.1 release version I got the following error while using facet.mincount java.lang.IllegalStateException: STREAM at org.mortbay.jetty.Response.getWriter(Response.java:571) at org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158) at org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151) at org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208) at org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144) at org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95) at org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote: I don't think facet.mincount works with date faceting, see here: http://wiki.apache.org/solr/SimpleFacetParameters Dan On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com wrote: Any query followed by facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1 On 3 February 2011 15:14, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: could you post the query you are submitting to Solr? On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote: Hi all, Even after making facet.mincount=1 , it is showing the results with count = 0. Does anyone know why this is happening. -- Thanks Regards, Isan Fulia.
Re: facet.mincount
I am also not getting where in wiki its mention that facet.mincount will not work with date faceting. But I have checked by query its not working with me also. Have to report a bug. - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2412660.html Sent from the Solr - User mailing list archive at Nabble.com.
How effective are faceted queries ?
Hi, I was wondering if there exists any performance characteristica for facets. As I understand facets, they are a subqueries, that will perform certain counts on the resultset. This mean that a facet will be evaluated on every shard along with the main query. But how will the facet query evaluate? If the resultset is sorted, will the facet query take advantage of that when evaluating? Example: a search is done for all document within a given range of dates by the field createdDate. The resultset is sorted by that field. Would a facet query then be able to use this sorting, when it counts how many documents were created per week, or per day for that matter? Kind regards, Christian Sonne Jensen -- View this message in context: http://lucene.472066.n3.nabble.com/How-effective-are-faceted-queries-tp2412689p2412689.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: facet.mincount
facet.mincount is grouped only under field faceting parameters not date faceting parameters On Thu, Feb 3, 2011 at 11:08 AM, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: Hi Dan, I'm probably just not able to spot this, but where does the wiki page mention that the facet.mincount is not applicable on date fields? On 3 February 2011 10:55, Isan Fulia isan.fu...@germinait.com wrote: I am using solr1.4.1 release version I got the following error while using facet.mincount java.lang.IllegalStateException: STREAM at org.mortbay.jetty.Response.getWriter(Response.java:571) at org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158) at org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151) at org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208) at org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144) at org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95) at org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote: I don't think facet.mincount works with date faceting, see here: http://wiki.apache.org/solr/SimpleFacetParameters Dan On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia isan.fu...@germinait.com wrote: Any query followed by facet=onfacet.date=aUpdDtfacet.date.start=2011-01-02T08:00:00.000Zfacet.date.end=2011-02-03T08:00:00.000Zfacet.date.gap=%2B1HOURfacet.mincount=1 On 3 February 2011 15:14, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: could you post the query you are submitting to Solr? On 3 February 2011 09:33, Isan Fulia isan.fu...@germinait.com wrote: Hi all,
Re: Malformed XML with exotic characters
Hi I've seen almost all funky charsets but gothic is always trouble. I'm also unsure if its really a bug in Solr. It could well be the Xerces being unable to cope. Besides, most systems indeed don't go well with gothic. This mail client does, but my terminal can't find its cursor after (properly) displaying such text. http://got.wikipedia.org/wiki/%F0%90%8C%B7%F0%90%8C%B0%F0%90%8C%BF%F0%90%8C%B1%F0%90%8C%B9%F0%90%8C%B3%F0%90%8C%B0%F0%90%8C%B1%F0%90%8C%B0%F0%90%8C%BF%F0%90%8D%82%F0%90%8C%B2%F0%90%8D%83/Haubidabaurgs Thanks for the input. Cheers, On Tuesday 01 February 2011 19:59:33 Robert Muir wrote: Hi, it might only be a problem with your xml tools (e.g. firefox). the problem here is characters outside of the basic multilingual plane (in this case Gothic). XML tools typically fall apart on these portions of unicode (in lucene we recently reverted to a patched/hacked copy of xerces specifically for this reason). If you care about characters outside of the basic multilingual plane actually working, unfortunately you have to start being very very very particular about what software you use... you can assume most software/setups WON'T work. For example, if you were to use mysql's utf8 character set you would find it doesn't actually support all of UTF-8! in this case you would need to use the recent 'utf8mb4' or something instead, that is actually utf-8! Thats just one example of a well-used piece of software that suffers from issues like this, there are others. Its for reasons like these that if support for these languages is important to you, I would stick with the most simple/textual methods for input and output: e.g. using things like CSV and JSON if you can. I would also fully test every component/jar in your application individually and once you get it working, don't ever upgrade. In any case, if you are having problems with characters outside of the basic multilingual plane, and you suspect its actually a bug in Solr, please open a JIRA issue, especially if you can provide some way to reproduce it
Re: escaping parenthesis in search query don't work...
WordDelimiterFilterFactory is probably stripping out the parens. If you try running your terms through http://localhost:8983/solr/admin/analysis.jsp http://localhost:8983/solr/admin/analysis.jspyou'll see the effects of various tokenizers and filters, be sure to check the verbose checkbox. Here's a very good place to start understanding the intention of the various options: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters In particular, about WordDelimiterFilterFactory: split on intra-word delimiters (all non alpha-numeric characters). - Wi-Fi - Wi, Fi http://wiki.apache.org/solr/AnalyzersTokenizersTokenFiltersBest Erick On Tue, Feb 1, 2011 at 8:52 AM, Pierre-Yves LANDRON pland...@hotmail.comwrote: Hello !I've seen that in order to search term with parenthesis=2C those have to be=escaped as in title:\(term\).But it doesn't seem to work - parenthesis are=n't taken in account.here is the field type I'm using to index these data : fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/!-- in this example, we will only use synonyms at query time filter class=solr.SynonymFilterFactory synonyms=index_synonyms.txt ignoreCase=true expand=false/ -- !-- Case insensitive stop word removal. enablePositionIncrements=true ensures that a 'gap' is left to allow for accurate phrase queries. -- filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.SnowballPorterFilterFactory language=French / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory words=stopwords.txt ignoreCase=true /filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ !-- filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ -- filter class=solr.SnowballPorterFilterFactory language=French / filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType How can I search parenthesis within my query ?Thanks,P.
Re: Terms and termscomponent questions
There are a couple of things going on here. First, WordDelimiterFilterFactory is splitting things up on letter/number boundaries. Take a look at: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for a list of *some* of the available tokenizers. You may want to just use one of the others, or change the parameters to WordDelimiterFilterFilterFactory to not split as it is. See the page: http://localhost:8983/solr/admin/analysis.jsp and check the verbose box to see what the effects of the various elements in your analysis chain are. This is a very important page for understanding the analysis part of the whole operation. Second, if you've been trying different things out, you may well have some old stuff in your index. When you delete documents, the terms are still in the index until an optimize. I'd advise starting with a clean slate for your experiments each time. The cheap way to do this is stop your server and delete solr_home/data/index. Delete the index directory too, not just the contents. So it's possible your TermsComponent is returning data from previous attempts, because I sure don't see how the concatenated terms would be in this index given the definition you've posted. And if none of that works, well, we'll try something else G.. Best Erick On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open openvic...@gmail.comwrote: Dear Erick, Thank you for your answer, here is my fieldtype definition. I took the standard one because I don't need a better one for this field fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Now my field : field name=p_field type=text indexed=true stored=true/ But I have a doubt now... Do I really put a space between words or is it just a coma... If I only put a coma then the whole process is going to be impacted ? What I don't really understand is that I find the separate words, but also their concatenation (but again in one direction only). Let me explain : if a have man bear pig I will find : manbearpig bearpig but never pigman or anyother combination in a different order. Thank you very much Best Regards, Victor 2011/2/1 Erick Erickson erickerick...@gmail.com Nope, this isn't what I'd expect. There are a couple of possibilities: 1 check out what WordDelimiterFilterFactory is doing, although if you're really sending spaces that's probably not it. 2 Let's see the field and fieldType definitions for the field in question. type=text doesn't say anything about analysis, and that's where I'd expect you're having trouble. In particular if your analysis chain uses KeywordTokenizerFactory for instance. 3 Look at the admin/schema browse page, look at your field and see what the actual tokens are. That'll tell you what TermsComponents is returning, perhaps the concatenation is happening somewhere else. Bottom line: Solr will not concatenate terms like this unless you tell it to, so I suspect you're telling it to, you just don't realize it G... Best Erick On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open openvic...@gmail.com wrote: Dear Solr users, I am currently using SolR and TermsComponents to make an auto suggest for my website. I have a field called p_field indexed and stored with type=text in the schema xml. Nothing out of the usual. I feed to Solr a set of words separated by a coma and a space such as (for two documents) : Document 1: word11, word12, word13. word14 Document 2: word21, word22, word23. word24 When I use my newly designed field I get things for the prefix word1 : word11, word12, word13. word14 word11word12 word11word13 etc... Is it normal to have the concatenation of words and not only the words indexed ? Did I miss something about Terms ? Thank you very much, Best regards all, Victor
Re: chaning schema
Erik: Is this a Tomcat-specific issue? Because I regularly delete just the data/index directory on my Windows box running Jetty without any problems. (3_x and trunk) Mostly want to know because I just encouraged someone to just delete the index dir based on my experience... Thanks Erick On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcher erik.hatc...@gmail.comwrote: the trick is, you have to remove the data/ directory, not just the data/index subdirectory. and of course then restart Solr. or delete *:*?commit=true, depending on what's the best fit for your ops. Erik On Feb 1, 2011, at 11:41 , Dennis Gearon wrote: I tried removing the index directory once, and tomcat refused to sart up because it didn't have a segments file. - Original Message From: Erick Erickson erickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, February 1, 2011 5:04:51 AM Subject: Re: chaning schema That sounds right. You can cheat and just remove solr_home/data/index rather than delete *:* though (you should probably do that with the Solr instance stopped) Make sure to remove the directory index as well. Best Erick On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearon gear...@sbcglobal.net wrote: Anyone got a great little script for changing a schema? i.e., after changing: database, the view in the database for data import the data-config.xml file the schema.xml file I BELIEVE that I have to run: a delete command for the whole index *:* a full import and optimize This all sound right? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
Re: Partial matches don't work (solr.NGramFilterFactory
On Wed, Feb 2, 2011 at 4:44 PM, Script Head scripth...@gmail.com wrote: Yes, I have tried searching on text_ngrams as well and it produces no results. On a related note, since I have copyField source=text_ngrams dest=text/ wouldn't the ngrams produced by text_ngrams field definition also be available within the text field? No, look at: http://wiki.apache.org/solr/SchemaXml#Copy_Fields Solr will apply the corresponding analysis chain for each field. http://wiki.apache.org/solr/SchemaXml#Copy_FieldsAnyway, you should be able to find the document when doing queries like text_ngrams:hippo I can see you are storing the field text_ngrams, when you search for Hippopotamus (and find results), how do you see the field text_ngrams on the returned docs? you should see the NGrams there (the same data that you should see when using the analysis page of Solr admin) Tomás 2011/2/2 Tomás Fernández Löbbe tomasflo...@gmail.com: About this: copyField source=text_ngrams dest=text/ The NGrams are going to be indexed on the field text_ngrams, not on text. For the field text, Solr will apply the text analysis (which I guess doesn't have NGrams). You have to search on the text_ngrams field, something like text_ngrams:hippo or text_ngrams:potamu. Are you searching like this? Tomás On Wed, Feb 2, 2011 at 4:07 PM, Script Head scripth...@gmail.com wrote: Hello, I have the following definitions in my schema.xml: fieldType name=testedgengrams class=solr.TextField analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=3 maxGramSize=15/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory/ /analyzer /fieldType ... field name=text_ngrams type=testedgengrams indexed=true stored=true/ ... copyField source=text_ngrams dest=text/ There is a document Hippopotamus is fatter than a Platypus indexed. When I search for Hippopotamus I receive the expected result. When I search for any partial such as Hippo or potamu I get nothing. I could use some guidance. Script Head
Re: value for maxFieldLength
This is not really vary large, Solr should handle this easily (assuming you've given it enough memory) so I'd go with a large number, say 20M. If you start running out of memory, then you've probably given the JVM too little memory. But Solr should handle this without a burp. Best Erick On Wed, Feb 2, 2011 at 10:20 AM, McGibbney, Lewis John lewis.mcgibb...@gcu.ac.uk wrote: Hello list, I am aware that setting the value of maxFieldLength in solrconfig.xml too high may/will result in out-of-mem errors. I wish to provide content extraction on a number of pdf documents which are large, by large I mean 8-11MB (occasionally more), and I am also not sure how many terms reside in each field when it is indexed. My question is therefore what is a sensible number to set this value to in order to include the majority/all terms within documents of this size. Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: geodist and spacial search
Further down that very page G... Here's an example of sorting by distance ascending: - ...q=*:*sfield=storept=45.15,-93.85sort=geodist() aschttp://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*sfield=storept=45.15,-93.85sort=geodist()%20asc The key is just the sort=geodist(), I'm pretty sure that's independent of the bbox, but I could be wrong. Best Erick On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.comwrote: Hi In http://wiki.apache.org/solr/SpatialSearch there is an example of a bbox filter and a geodist function. Is it possible to do a bbox filter and sort by distance - combine the two? Thanks Ericz
Re: Reg filter criteria on multivalued attribute
Hmmm, why doesn't +relationship:DEF_BY -relationship:BEL_TO work? Then I don't think the second part matters... Best Erick On Wed, Feb 2, 2011 at 12:09 PM, bbarani bbar...@gmail.com wrote: Hi, I have a question on filters on multivalued atrribute. Is there a way to filter a multivalue attribute based on a particular value inside that attribute? Consider the below example. arr name=relationship strDEF_BY/str strBEL_TO/str /arr I want to do a search which returns the result which just has only the relationship DEF_BY and not BEL_TO. Currently if I do a normal search for DEF_BY, the documens which contains DEF_BY along with other relationship is being returned rather I want the documents that contain only DEF_BY under relationship shoudl be returned. Also is there a way to make SOLR return the documents based on the number of elements in multivalue attribute? If thats possible I can first make SOLR return those documents and then do a filter against that for my search on top of the results returned. Is there a way to write a query to do this? Any pointers or help in this regard would be appreciated.. Thanks, Barani -- View this message in context: http://lucene.472066.n3.nabble.com/Reg-filter-criteria-on-multivalued-attribute-tp2406904p2406904.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DataImportHandler: no queries when using entity=something
Here's a magic URL, not available from the admin page that may help debugging: /solr/admin/dataimport.jsp Best Erick On Wed, Feb 2, 2011 at 7:38 PM, Jon Drukman j...@cluttered.com wrote: So I'm trying to update a single entity in my index using DataImportHandler. http://solr:8983/solr/dataimport?command=full-importentity=games It ends near-instantaneously without hitting the database at all, apparently. Status shows: str name=Total Requests made to DataSource0/str str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name= Indexing completed. Added/Updated: 0 documents. Deleted 0 documents. /str str name=Committed2011-02-02 16:24:13/str str name=Optimized2011-02-02 16:24:13/str str name=Time taken 0:0:0.20/str The query isn't that extreme. It returns 8771 rows in about 3 seconds. How can I debug this?
RE: value for maxFieldLength
Thank you Erick Lewis -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 03 February 2011 13:25 To: solr-user@lucene.apache.org Subject: Re: value for maxFieldLength This is not really vary large, Solr should handle this easily (assuming you've given it enough memory) so I'd go with a large number, say 20M. If you start running out of memory, then you've probably given the JVM too little memory. But Solr should handle this without a burp. Best Erick On Wed, Feb 2, 2011 at 10:20 AM, McGibbney, Lewis John lewis.mcgibb...@gcu.ac.uk wrote: Hello list, I am aware that setting the value of maxFieldLength in solrconfig.xml too high may/will result in out-of-mem errors. I wish to provide content extraction on a number of pdf documents which are large, by large I mean 8-11MB (occasionally more), and I am also not sure how many terms reside in each field when it is indexed. My question is therefore what is a sensible number to set this value to in order to include the majority/all terms within documents of this size. Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html Email has been scanned for viruses by Altman Technologies' email management service - www.altman.co.uk/emailsystems Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education’s Widening Participation Initiative of the Year 2009 and Herald Society’s Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education’s Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: facet.mincount
ahh..I see your point..well if that's true, then facet.missing/facet.method are also not supported? I'm not sure if this is the case, or the Date Faceting Parameters = Field Value Faceting Parameters + the extra ones. Maybe the page author(s) can clarify. On 3 February 2011 11:32, dan sutton danbsut...@gmail.com wrote: facet.mincount is grouped only under field faceting parameters not date faceting parameters On Thu, Feb 3, 2011 at 11:08 AM, Savvas-Andreas Moysidis savvas.andreas.moysi...@googlemail.com wrote: Hi Dan, I'm probably just not able to spot this, but where does the wiki page mention that the facet.mincount is not applicable on date fields? On 3 February 2011 10:55, Isan Fulia isan.fu...@germinait.com wrote: I am using solr1.4.1 release version I got the following error while using facet.mincount java.lang.IllegalStateException: STREAM at org.mortbay.jetty.Response.getWriter(Response.java:571) at org.apache.jasper.runtime.JspWriterImpl.initOut(JspWriterImpl.java:158) at org.apache.jasper.runtime.JspWriterImpl.flushBuffer(JspWriterImpl.java:151) at org.apache.jasper.runtime.PageContextImpl.release(PageContextImpl.java:208) at org.apache.jasper.runtime.JspFactoryImpl.internalReleasePageContext(JspFactoryImpl.java:144) at org.apache.jasper.runtime.JspFactoryImpl.releasePageContext(JspFactoryImpl.java:95) at org.apache.jsp.admin.index_jsp._jspService(org.apache.jsp.admin.index_jsp:397) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:80) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:373) at org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:464) at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:358) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:367) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:268) at org.mortbay.jetty.servlet.Dispatcher.forward(Dispatcher.java:126) at org.mortbay.jetty.servlet.DefaultServlet.doGet(DefaultServlet.java:431) at javax.servlet.http.HttpServlet.service(HttpServlet.java:707) at javax.servlet.http.HttpServlet.service(HttpServlet.java:820) at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:487) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1098) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:286) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:821) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:513) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) On 3 February 2011 16:17, dan sutton danbsut...@gmail.com wrote: I don't think facet.mincount works with date faceting, see here: http://wiki.apache.org/solr/SimpleFacetParameters Dan On Thu, Feb 3, 2011 at 10:11 AM, Isan Fulia
Re: My spellchecker experiment
On Thu, Feb 3, 2011 at 8:55 AM, Emmanuel Espina espinaemman...@gmail.com wrote: It uses fuzzy queries instead of a ngram query, and then I rank the results by word frequency in the text with the aid of a python script (all that is explained in the post). I got pretty good results (between 50% and 90% improvements), but slower (about double time). Hi Emmanuel: I think its great you are evaluating different techniques here, our spelling could use some help :) By the way: we added a new spellchecking technique that sounds quite similar to what you describe (DirectSpellChecker), but hopefully without the performance issues. Its only available in trunk (http://svn.apache.org/repos/asf/lucene/dev/trunk/) I tried to do a very rough evaluation on its jira issue: https://issues.apache.org/jira/browse/LUCENE-2507, but nothing very serious and as in-depth as what it looks like you did. Anyway, if you want to play you can experiment with it either at the lucene level (its in contrib/spellchecker) or via solr, by using DirectSolrSpellChecker... though I think the parameters in the example solrconfig are likely not the best :) I have an app using this more fleshed-out config (in combination with the new collation options), and it seems to be reasonable: !-- a spellchecker that uses no auxiliary index -- lst name=spellchecker str name=namedefault/str str name=fieldtext/str str name=classnamesolr.DirectSolrSpellChecker/str str name=minPrefix1/str str name=maxEdits2/str str name=maxInspections25/str !-- probably way too high for most apps though -- str name=minQueryLength3/str str name=comparatorClassfreq/str str name=thresholdTokenFrequency1/str str name=distanceMeasureorg.apache.lucene.search.spell.JaroWinklerDistance/str /lst
Re: Terms and termscomponent questions
Dear Erick, You were totally right about the fact that I didn't use any space to separate words, cause SolR to concatenate words ! Everything is solved now. Thank you very much for your help ! Best regards, Victor Kabdebon 2011/2/3 Erick Erickson erickerick...@gmail.com There are a couple of things going on here. First, WordDelimiterFilterFactory is splitting things up on letter/number boundaries. Take a look at: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for a list of *some* of the available tokenizers. You may want to just use one of the others, or change the parameters to WordDelimiterFilterFilterFactory to not split as it is. See the page: http://localhost:8983/solr/admin/analysis.jsp and check the verbose box to see what the effects of the various elements in your analysis chain are. This is a very important page for understanding the analysis part of the whole operation. Second, if you've been trying different things out, you may well have some old stuff in your index. When you delete documents, the terms are still in the index until an optimize. I'd advise starting with a clean slate for your experiments each time. The cheap way to do this is stop your server and delete solr_home/data/index. Delete the index directory too, not just the contents. So it's possible your TermsComponent is returning data from previous attempts, because I sure don't see how the concatenated terms would be in this index given the definition you've posted. And if none of that works, well, we'll try something else G.. Best Erick On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open openvic...@gmail.com wrote: Dear Erick, Thank you for your answer, here is my fieldtype definition. I took the standard one because I don't need a better one for this field fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Now my field : field name=p_field type=text indexed=true stored=true/ But I have a doubt now... Do I really put a space between words or is it just a coma... If I only put a coma then the whole process is going to be impacted ? What I don't really understand is that I find the separate words, but also their concatenation (but again in one direction only). Let me explain : if a have man bear pig I will find : manbearpig bearpig but never pigman or anyother combination in a different order. Thank you very much Best Regards, Victor 2011/2/1 Erick Erickson erickerick...@gmail.com Nope, this isn't what I'd expect. There are a couple of possibilities: 1 check out what WordDelimiterFilterFactory is doing, although if you're really sending spaces that's probably not it. 2 Let's see the field and fieldType definitions for the field in question. type=text doesn't say anything about analysis, and that's where I'd expect you're having trouble. In particular if your analysis chain uses KeywordTokenizerFactory for instance. 3 Look at the admin/schema browse page, look at your field and see what the actual tokens are. That'll tell you what TermsComponents is returning, perhaps the concatenation is happening somewhere else. Bottom line: Solr will not concatenate terms like this unless you tell it to, so I suspect you're telling it to, you just don't realize it G... Best Erick On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open openvic...@gmail.com wrote: Dear Solr users, I am currently using SolR and TermsComponents to make an auto suggest for my website. I have a field called p_field indexed and stored with type=text in the schema xml. Nothing out of the usual. I feed to Solr a set of words separated by a coma and a space such as (for two documents) : Document 1: word11, word12, word13. word14 Document
Re: facet.mincount
Hi facet.mincount not works with facet.date option afaik. There is an issue for it as solr-343, but resolved. Try apply patch, provided as a solution in this issue may solve the problem. Fix version for this may be 1.5 - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2414232.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using terms and N-gram
Thank you, I will do that and hopefuly it will be handy ! But can someone explain me difference between CommonGramFIlterFactory et NGramFilterFactory ? ( Maybe the solution is there) Thank you all, best regards 2011/2/3 Grijesh pintu.grij...@gmail.com Use analysis.jsp to see what happening at index time and query time with your input data.You can use highlighting to see if match found. - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: geodist and spacial search
Hi Erick, Thanks I saw that example, but I am trying to sort by distance AND specify the max distance in 1 query. The reason is: running bbox on 2 million documents with a 20km distance takes only 200ms. Sorting 2 million documents by distance takes over 1.5 seconds! So it will be much faster for solr to first filter the 20km documents and then to sort them. Regards Ericz On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson erickerick...@gmail.comwrote: Further down that very page G... Here's an example of sorting by distance ascending: - ...q=*:*sfield=storept=45.15,-93.85sort=geodist() asc http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*sfield=storept=45.15,-93.85sort=geodist()%20asc The key is just the sort=geodist(), I'm pretty sure that's independent of the bbox, but I could be wrong. Best Erick On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.com wrote: Hi In http://wiki.apache.org/solr/SpatialSearch there is an example of a bbox filter and a geodist function. Is it possible to do a bbox filter and sort by distance - combine the two? Thanks Ericz
Re: chaning schema
Well, the nice thing is that I have an Amazon based dev server, and it's AMI stored. So if I screw something up, I just throw away that server and get a fresh one all configured and full of dev data and BAM back to where I was. So I'll try it again with the -rf flags. I did shut down the server and I am using Tomcat. Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die. - Original Message From: Gora Mohanty g...@mimirtech.com To: solr-user@lucene.apache.org Sent: Thu, February 3, 2011 6:56:29 AM Subject: Re: chaning schema On Thu, Feb 3, 2011 at 6:47 PM, Erick Erickson erickerick...@gmail.com wrote: Erik: Is this a Tomcat-specific issue? Because I regularly delete just the data/index directory on my Windows box running Jetty without any problems. (3_x and trunk) Mostly want to know because I just encouraged someone to just delete the index dir based on my experience... Thanks Erick On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatcher erik.hatc...@gmail.comwrote: the trick is, you have to remove the data/ directory, not just the data/index subdirectory. and of course then restart Solr. or delete *:*?commit=true, depending on what's the best fit for your ops. Erik On Feb 1, 2011, at 11:41 , Dennis Gearon wrote: I tried removing the index directory once, and tomcat refused to sart up because it didn't have a segments file. [...] I have seen this error with Tomcat, but in my experience, this has been due to doing a rm data/index/* rather than rm -rf /data/index, or due to doing this without first shutting down Tomcat. Regards, Gora
Re: Open Too Many Files
Try it. ulimit -n20 2011/2/3 Grijesh pintu.grij...@gmail.com best option to use useCompoundFiletrue/useCompoundFile decreasing mergeFactor may cause indexing slow - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Open-Too-Many-Files-tp2406289p2412415.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Terms and termscomponent questions
Ah, good. Good luck with the rest of your app! WordDelimiterFilterFactory is powerful, but tricky G... Best Erick On Thu, Feb 3, 2011 at 9:51 AM, openvictor Open openvic...@gmail.comwrote: Dear Erick, You were totally right about the fact that I didn't use any space to separate words, cause SolR to concatenate words ! Everything is solved now. Thank you very much for your help ! Best regards, Victor Kabdebon 2011/2/3 Erick Erickson erickerick...@gmail.com There are a couple of things going on here. First, WordDelimiterFilterFactory is splitting things up on letter/number boundaries. Take a look at: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for a list of *some* of the available tokenizers. You may want to just use one of the others, or change the parameters to WordDelimiterFilterFilterFactory to not split as it is. See the page: http://localhost:8983/solr/admin/analysis.jsp and check the verbose box to see what the effects of the various elements in your analysis chain are. This is a very important page for understanding the analysis part of the whole operation. Second, if you've been trying different things out, you may well have some old stuff in your index. When you delete documents, the terms are still in the index until an optimize. I'd advise starting with a clean slate for your experiments each time. The cheap way to do this is stop your server and delete solr_home/data/index. Delete the index directory too, not just the contents. So it's possible your TermsComponent is returning data from previous attempts, because I sure don't see how the concatenated terms would be in this index given the definition you've posted. And if none of that works, well, we'll try something else G.. Best Erick On Tue, Feb 1, 2011 at 10:07 AM, openvictor Open openvic...@gmail.com wrote: Dear Erick, Thank you for your answer, here is my fieldtype definition. I took the standard one because I don't need a better one for this field fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType Now my field : field name=p_field type=text indexed=true stored=true/ But I have a doubt now... Do I really put a space between words or is it just a coma... If I only put a coma then the whole process is going to be impacted ? What I don't really understand is that I find the separate words, but also their concatenation (but again in one direction only). Let me explain : if a have man bear pig I will find : manbearpig bearpig but never pigman or anyother combination in a different order. Thank you very much Best Regards, Victor 2011/2/1 Erick Erickson erickerick...@gmail.com Nope, this isn't what I'd expect. There are a couple of possibilities: 1 check out what WordDelimiterFilterFactory is doing, although if you're really sending spaces that's probably not it. 2 Let's see the field and fieldType definitions for the field in question. type=text doesn't say anything about analysis, and that's where I'd expect you're having trouble. In particular if your analysis chain uses KeywordTokenizerFactory for instance. 3 Look at the admin/schema browse page, look at your field and see what the actual tokens are. That'll tell you what TermsComponents is returning, perhaps the concatenation is happening somewhere else. Bottom line: Solr will not concatenate terms like this unless you tell it to, so I suspect you're telling it to, you just don't realize it G... Best Erick On Tue, Feb 1, 2011 at 1:33 AM, openvictor Open openvic...@gmail.com wrote: Dear Solr users, I am currently using SolR and TermsComponents to make an auto
Solr for finding similar word between two documents
Is there a way to use solr and get similar words between two document (files). Any ideas Regards Rohan
What is the best protocol for data transfer rate HTTP or RMI?
Hello, I am doing a comparative study between Lucene and Solr and wish to obtain more concrete data on the data transfer using the lucene RemoteSearch that uses RMI and data transfer of SOLR that uses the HTTP protocol. Gustavo Maia
Use Parallel Search
Hello, Let me give a brief description of my scenario. Today I am only using Lucene 2.9.3. I have an index of 30 million documents distributed on three machines and each machine with 6 hds (15k rmp). The server queries the search index using the remote class search. And each machine is made to search using the parallel search (search simultaneously in 6 hds). So during the search are simulating using the three machines and 18 hds, returning me to a very good response time. Today I am studying the SOLR and am interested in knowing more about the searches and use of distributed parallel search on the same machine. What would be the best scenario using SOLR that is better than I already am using today only with lucene? Note: I need to have installed on each machine 6 SOLR instantiate from my server? One for each hd? Or would some other alternative way for me to use the 6 hds without having 6 instances of SORL server? Another question would be if the SOLR would have some limiting size index for Hard drive? It would be interesting not index too big because when the index increased the longer the search. Thanks for everything. Gustavo Maia
Re: chaning schema
It could be related Tomcat. I've had inconsistent experiences there too, I _thought_ I could delete just the contents of the data/ directory, but at some point I realized that wasn't working, confusing me as to whether I was remembering correctly that deleting just the contents ever worked. At the moment, on my setup, I definitely need to delete the whole data/ directory . At one point I switched my setup from jetty to tomcat, but at about the same point I switched my setup from single core to multi-core too. So it could be a multi-core thing too (which seems somewhat more likely than jetty vs tomcat making a difference). Or it could be something completely else that none of us know, I just report my limited observations from experience. :) Jonathan On 2/3/2011 8:17 AM, Erick Erickson wrote: Erik: Is this a Tomcat-specific issue? Because I regularly delete just the data/index directory on my Windows box running Jetty without any problems. (3_x and trunk) Mostly want to know because I just encouraged someone to just delete the index dir based on my experience... Thanks Erick On Tue, Feb 1, 2011 at 12:24 PM, Erik Hatchererik.hatc...@gmail.comwrote: the trick is, you have to remove the data/ directory, not just the data/index subdirectory. and of course then restart Solr. or delete *:*?commit=true, depending on what's the best fit for your ops. Erik On Feb 1, 2011, at 11:41 , Dennis Gearon wrote: I tried removing the index directory once, and tomcat refused to sart up because it didn't have a segments file. - Original Message From: Erick Ericksonerickerick...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, February 1, 2011 5:04:51 AM Subject: Re: chaning schema That sounds right. You can cheat and just removesolr_home/data/index rather than delete *:* though (you should probably do that with the Solr instance stopped) Make sure to remove the directory index as well. Best Erick On Tue, Feb 1, 2011 at 1:27 AM, Dennis Gearongear...@sbcglobal.net wrote: Anyone got a great little script for changing a schema? i.e., after changing: database, the view in the database for data import the data-config.xml file the schema.xml file I BELIEVE that I have to run: a delete command for the whole index *:* a full import and optimize This all sound right? Dennis Gearon Signature Warning It is always a good idea to learn from your own mistakes. It is usually a better idea to learn from others’ mistakes, so you do not have to make them yourself. from 'http://blogs.techrepublic.com.com/security/?p=4501tag=nl.e036' EARTH has a Right To Life, otherwise we all die.
RE: Using terms and N-gram
I don't suppose it's something silly like the fact that your indexing chain includes 'words=stopwords.txt', and your query chain does not? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com _ Early COSUGI birds get the worm! Register by 15 February and get a one time viewing of the three course Circulation Basics self-paced training suite. http://www.cosugi.org/ -Original Message- From: openvictor Open [mailto:openvic...@gmail.com] Sent: Thursday, February 03, 2011 12:02 AM To: solr-user@lucene.apache.org Subject: Using terms and N-gram Dear all, I am trying to implement an autocomplete system for research. But I am stuck on some problems that I can't solve. Here is my problem : I give text like : the cat is black and I want to explore all 1 gram to 8 gram for all the text that are passed : the, cat, is, black, the cat, cat is, is black, etc... In order to do that I have defined the following fieldtype in my schema : !--Custom fieldtype-- fieldType name=ngram_field class=solr.TextField analyzer type=index tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.CommonGramsFilterFactory words=stopwords.txt ignoreCase=true maxGramSize=8 minGramSize=1/ /analyzer analyzer type=query tokenizer class=solr.LowerCaseTokenizerFactory / filter class=solr.CommonGramsFilterFactory ignoreCase=true maxGramSize=8 minGramSize=1/ /analyzer /fieldType Then the following field : field name=p_title_ngram type=ngram_field indexed=true stored=true/ Then I feed solr with some phrases and I was really surprised to see that Solr didn't behave as expected. I went to the schema browser to see the result for the very profound query : the cat is black and it rains The results are quite deceiving : first 1 grams are not found. some 2 grams are found like : the_cat, and_it etc... But not what I expected. Is there something I am missing here ? (by the way I also tried to remove the mingramsize and maxgramsize even the words). Thank you, Victor Kabdebon
Re: Using terms and N-gram
First, you'll get a lot of insight by defining something simply and looking at the analysis page from solr admin. That's a very valuable page. To your question: commongrams are shingles that work between stopwords and other words. For instance, this is some text gets analyzed into this, this_is, is, is_some, some text. Note that the stopwords are the only things that get combined with the text after. NGrams form on letters. It's too long to post the whole thing, but the above phrase gets analyzed as t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a single token into grams whereas commongrams essentially combines tokens when they're stopwords. Have you looked at shingles? See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory Best Erick On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open openvic...@gmail.comwrote: Thank you, I will do that and hopefuly it will be handy ! But can someone explain me difference between CommonGramFIlterFactory et NGramFilterFactory ? ( Maybe the solution is there) Thank you all, best regards 2011/2/3 Grijesh pintu.grij...@gmail.com Use analysis.jsp to see what happening at index time and query time with your input data.You can use highlighting to see if match found. - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Using terms and N-gram
Thank you for these inputs. I was silly asking for ngrams because I already knew it. I think I was tired yesterday... Thank you Eric Erickson, once again you gave me a more than useful comment. Indeed Shingles seems to be the perfect fit for the work I want to do. I will try to implement that tonight and I will come back to see if it's working. Regards, Victor 2011/2/3 Erick Erickson erickerick...@gmail.com First, you'll get a lot of insight by defining something simply and looking at the analysis page from solr admin. That's a very valuable page. To your question: commongrams are shingles that work between stopwords and other words. For instance, this is some text gets analyzed into this, this_is, is, is_some, some text. Note that the stopwords are the only things that get combined with the text after. NGrams form on letters. It's too long to post the whole thing, but the above phrase gets analyzed as t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a single token into grams whereas commongrams essentially combines tokens when they're stopwords. Have you looked at shingles? See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory Best Erick On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open openvic...@gmail.com wrote: Thank you, I will do that and hopefuly it will be handy ! But can someone explain me difference between CommonGramFIlterFactory et NGramFilterFactory ? ( Maybe the solution is there) Thank you all, best regards 2011/2/3 Grijesh pintu.grij...@gmail.com Use analysis.jsp to see what happening at index time and query time with your input data.You can use highlighting to see if match found. - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr for finding similar word between two documents
On Thu, Feb 3, 2011 at 11:32 PM, rohan rai hiroha...@gmail.com wrote: Is there a way to use solr and get similar words between two document (files). [...] This is *way* too vague t make any sense out of. Could you elaborate, as I could have sworn that what you seem to want is the essential function of a search engine. Regards, Gora
Re: Solr for finding similar word between two documents
Rohan : what you want to do can be done with quite little effort if your document has a limited size (up to some Mo) with common and basic structures like Hasmap. Do you have any additional information on your problem so that we can give you more useful inputs ? 2011/2/3 Gora Mohanty g...@mimirtech.com On Thu, Feb 3, 2011 at 11:32 PM, rohan rai hiroha...@gmail.com wrote: Is there a way to use solr and get similar words between two document (files). [...] This is *way* too vague t make any sense out of. Could you elaborate, as I could have sworn that what you seem to want is the essential function of a search engine. Regards, Gora
Re: Use Parallel Search
Hello Gustavo, well, I did not use Nutch at all, but I got some experience with using Solr. In Solr you could use a multicore-setup where each core points to another hard-drive of your server. For other Solr-Servers ( and cores as well ) each core is a seperate index, so to query all drives of one server you have to do a distributed request to get all results from all cores (indizes). You got a little bit Http-overhead, because you have to send six http-requests per server to get your results. You could also set up 6 Solr-instances per box or 3 with two cores per instance, but I do not see any reason to do so. Could you please explain what you mean with remote class search? Is it a Nutch-specific thing I never heard before? There is no difference between a Lucene-Index created by Solr and a Lucene-Index created by Nutch or Lucene itself. Solr is just a Server-implementation of the Lucene-Framework. Regards Am 03.02.2011 19:06, schrieb Gustavo Maia: Hello, Let me give a brief description of my scenario. Today I am only using Lucene 2.9.3. I have an index of 30 million documents distributed on three machines and each machine with 6 hds (15k rmp). The server queries the search index using the remote class search. And each machine is made to search using the parallel search (search simultaneously in 6 hds). So during the search are simulating using the three machines and 18 hds, returning me to a very good response time. Today I am studying the SOLR and am interested in knowing more about the searches and use of distributed parallel search on the same machine. What would be the best scenario using SOLR that is better than I already am using today only with lucene? Note: I need to have installed on each machine 6 SOLR instantiate from my server? One for each hd? Or would some other alternative way for me to use the 6 hds without having 6 instances of SORL server? Another question would be if the SOLR would have some limiting size index for Hard drive? It would be interesting not index too big because when the index increased the longer the search. Thanks for everything. Gustavo Maia
Re: Solr for finding similar word between two documents
Lets say 1 have document(file) which is large and contains word inside it. And the 2nd document also is a text file. Problem is to find all those words in 2nd document which is present in first document when both of the files are large enough. Regards Rohan On Fri, Feb 4, 2011 at 1:01 AM, openvictor Open openvic...@gmail.comwrote: Rohan : what you want to do can be done with quite little effort if your document has a limited size (up to some Mo) with common and basic structures like Hasmap. Do you have any additional information on your problem so that we can give you more useful inputs ? 2011/2/3 Gora Mohanty g...@mimirtech.com On Thu, Feb 3, 2011 at 11:32 PM, rohan rai hiroha...@gmail.com wrote: Is there a way to use solr and get similar words between two document (files). [...] This is *way* too vague t make any sense out of. Could you elaborate, as I could have sworn that what you seem to want is the essential function of a search engine. Regards, Gora
Re: geodist and spacial search
Use a filter query? See the {!geofilt} stuff on the wiki page. That gives you your filter to restrict down your result set, then you can sort by exact distance to get your sort of just those docs that make it through the filter. On Feb 3, 2011, at 10:24 AM, Eric Grobler wrote: Hi Erick, Thanks I saw that example, but I am trying to sort by distance AND specify the max distance in 1 query. The reason is: running bbox on 2 million documents with a 20km distance takes only 200ms. Sorting 2 million documents by distance takes over 1.5 seconds! So it will be much faster for solr to first filter the 20km documents and then to sort them. Regards Ericz On Thu, Feb 3, 2011 at 1:27 PM, Erick Erickson erickerick...@gmail.comwrote: Further down that very page G... Here's an example of sorting by distance ascending: - ...q=*:*sfield=storept=45.15,-93.85sort=geodist() asc http://localhost:8983/solr/select?wt=jsonindent=truefl=name,storeq=*:*sfield=storept=45.15,-93.85sort=geodist()%20asc The key is just the sort=geodist(), I'm pretty sure that's independent of the bbox, but I could be wrong. Best Erick On Wed, Feb 2, 2011 at 11:18 AM, Eric Grobler impalah...@googlemail.com wrote: Hi In http://wiki.apache.org/solr/SpatialSearch there is an example of a bbox filter and a geodist function. Is it possible to do a bbox filter and sort by distance - combine the two? Thanks Ericz -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
DataImportHandler usage with RDF database
Hello List, I am very interested in DataImportHandler. I have data stored in an RDF db and wish to use this data to boost query results via Solr. I wish to keep this data stored in db as I have a web app which directly maintains this db. Is it possible to use a DataImportHandler to read RDF data from db in memory, without sending an index commit to Solr. As far as I can see DataImportHandler currently supports full and delta imports which mean I would be indexing. So far I have yet to find a requestHandler which is able to read then store data in memory, then use this data elsewhere prior to returning documents via queryResponseWriter. Can anyone provide their thoughts/insight Thank you Lewis Glasgow Caledonian University is a registered Scottish charity, number SC021474 Winner: Times Higher Education's Widening Participation Initiative of the Year 2009 and Herald Society's Education Initiative of the Year 2009. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,6219,en.html Winner: Times Higher Education's Outstanding Support for Early Career Researchers of the Year 2010, GCU as a lead with Universities Scotland partners. http://www.gcu.ac.uk/newsevents/news/bycategory/theuniversity/1/name,15691,en.html
Re: Use Parallel Search
Can you describe a bit more what you are searching (types of docs) and what your query rate looks like? Also, what features are you using? Faceting? Sorting? ... On Feb 3, 2011, at 1:06 PM, Gustavo Maia wrote: Hello, Let me give a brief description of my scenario. Today I am only using Lucene 2.9.3. I have an index of 30 million documents distributed on three machines and each machine with 6 hds (15k rmp). The server queries the search index using the remote class search. And each machine is made to search using the parallel search (search simultaneously in 6 hds). So during the search are simulating using the three machines and 18 hds, returning me to a very good response time. Today I am studying the SOLR and am interested in knowing more about the searches and use of distributed parallel search on the same machine. What would be the best scenario using SOLR that is better than I already am using today only with lucene? Note: I need to have installed on each machine 6 SOLR instantiate from my server? No, you generally treat Solr like a database and provision it separately from you app. 30M docs may very well all fit nicely on one machine depending on some of your answers above (I've certainly seen bigger) One for each hd? Or would some other alternative way for me to use the 6 hds without having 6 instances of SORL server? I'd probably start simple and see what I can do in 1 instance of Solr and what query/indexing throughput you can get. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem docs using Solr/Lucene: http://www.lucidimagination.com/search
Index Not Matching
Greetings! My organization is new to SOLR, so please bare with me. At times, we experience an out of sync condition between SOLR index files and our Database. We resolved that by clearing the index file and performing a full crawl of the database. Last time we noticed an out of sync condition, we went through our procedure of deleting and crawling, but this time it did not fix it. For example, search for swim on the DB and we get 440 products, but yet SOLR states we have 214 products. Has anyone experience anything like this? Does anyone have any suggestions on a trace we can turn on? Again, we are new to SOLR so any help you can provide is greatly appreciated. Thanks! Will
HTTP ERROR 400 undefined field: *
Hey Guys, I was working on an checkout of the 3.x branch from about 6 months ago. Everything was working pretty well, but we decided that we should update and get what was at the head. However after upgrading, I am now getting this error through the admin: HTTP ERROR 400 undefined field: * If I clear the fl parameter (default is set to *, score) then it works fine with one big problem, no score data. If I try and set fl=score I get the same error except it says undefined field: score?! This works great in the older version, what changed? I've googled for about an hour now and I can't seem to find anything. Jed.
Re: Index Not Matching
Hello, Are you definitely positive your database isn't updated after you index your data? Are you querying against the same field(s) specifying the same criteria both in Solr and in the database? Any chance you might be pointing to a dev/test instance of Solr ? Regards, - Savvas On 3 February 2011 20:17, Esclusa, Will william.escl...@bonton.com wrote: Greetings! My organization is new to SOLR, so please bare with me. At times, we experience an out of sync condition between SOLR index files and our Database. We resolved that by clearing the index file and performing a full crawl of the database. Last time we noticed an out of sync condition, we went through our procedure of deleting and crawling, but this time it did not fix it. For example, search for swim on the DB and we get 440 products, but yet SOLR states we have 214 products. Has anyone experience anything like this? Does anyone have any suggestions on a trace we can turn on? Again, we are new to SOLR so any help you can provide is greatly appreciated. Thanks! Will
Re: Function Question
Thoughts? On Wed, Feb 2, 2011 at 10:38 PM, Bill Bell billnb...@gmail.com wrote: This is posted as an enhancement on SOLR-2345. I am willing to work on it. But I am stuck. I would like to loop through the lat/long values when they are stored in a multiValue list. But it appears that I cannot figure out to do that. For example: sort=geodist() asc This should grab the closest point in the MultiValue list, and return the distance so that is can be scored. The problem is I cannot find a way to get the MultiValue list? In function: src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja va Has code similar to: VectorValueSource p2; this.p2 = vs ListValueSource sources = p2.getSources(); ValueSource latSource = sources.get(0); ValueSource lonSource = sources.get(1); DocValues latVals = latSource.getValues(context1, readerContext1); DocValues lonVals = lonSource.getValues(context1, readerContext1); double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS; double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS; etc... It would be good if I could loop through sources.get() but it only returns 2 sources even when there are 2 pairs of lat/long. The getSources() only returns the following: sources:[double(store_0_coordinate), double(store_1_coordinate)] How do I just get the 4 values in the function?
Re: Index Not Matching
that's odd..are you viewing the results through your application or the admin console? if you aren't, I'd suggest you use the admin console just to eliminate the possibility of an application bug. We had a similar problem in the past and turned out to be a mixup of our dev/test instances.. On 3 February 2011 21:41, Esclusa, Will william.escl...@bonton.com wrote: Hello Saavs, I am 100% sure we are not updating the DB after we index the data. We are specifying the same fields on both queries. Our prod boxes do not have access to QA or DEV, so I would expect a connection error when indexing if this is the case. No connection errors in the logs. -Original Message- From: Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Sent: Thursday, February 03, 2011 4:26 PM To: solr-user@lucene.apache.org Subject: Re: Index Not Matching Hello, Are you definitely positive your database isn't updated after you index your data? Are you querying against the same field(s) specifying the same criteria both in Solr and in the database? Any chance you might be pointing to a dev/test instance of Solr ? Regards, - Savvas On 3 February 2011 20:17, Esclusa, Will william.escl...@bonton.com wrote: Greetings! My organization is new to SOLR, so please bare with me. At times, we experience an out of sync condition between SOLR index files and our Database. We resolved that by clearing the index file and performing a full crawl of the database. Last time we noticed an out of sync condition, we went through our procedure of deleting and crawling, but this time it did not fix it. For example, search for swim on the DB and we get 440 products, but yet SOLR states we have 214 products. Has anyone experience anything like this? Does anyone have any suggestions on a trace we can turn on? Again, we are new to SOLR so any help you can provide is greatly appreciated. Thanks! Will
DB2 and DataImportHandler
I get the following error when trying to index using a DataImportHandler with solr 1.4.1. I see that there is an open JIRA with no resolution. Do I have to write my own data import handler to work around this issue? Thanks, Mark Feb 3, 2011 5:21:09 PM org.apache.solr.handler.dataimport.JdbcDataSource closeConnection SEVERE: Ignoring Error when closing connection com.ibm.db2.jcc.b.SqlException: [jcc][t4][10251][10308][3.50.152] java.sql.Connection.close() requested while a transaction is in progress on the connection. The transaction remains active, and the connection cannot be closed. ERRORCODE=-4471, SQLSTATE=null at com.ibm.db2.jcc.b.wc.a(wc.java:55) at com.ibm.db2.jcc.b.wc.a(wc.java:119) at com.ibm.db2.jcc.b.eb.t(eb.java:996) at com.ibm.db2.jcc.b.eb.w(eb.java:1019) at com.ibm.db2.jcc.b.eb.u(eb.java:1005) at com.ibm.db2.jcc.b.eb.close(eb.java:989) at org.apache.solr.handler.dataimport.JdbcDataSource.closeConnection(JdbcDataSource.java:399) at org.apache.solr.handler.dataimport.JdbcDataSource.close(JdbcDataSource.java:390) at org.apache.solr.handler.dataimport.DataConfig$Entity.clearCache(DataConfig.java:173) at org.apache.solr.handler.dataimport.DataConfig.clearCaches(DataConfig.java:331) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:339) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370)
Re: Index Not Matching
Make sure your index is completely commited. curl 'http://localhost:8983/solr/update?commit=true' http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 for an overview: http://lucene.apache.org/solr/tutorial.html hth, Geert-Jan http://techgurulive.com/2010/11/22/apache-solr-commit-and-optimize/ 2011/2/3 Esclusa, Will william.escl...@bonton.com Both the application and the SOLR gui match (with the incorrect number of course :-) ) At first I thought it could be a schema problem, but we went though it with a fine comb and compared it to the one in our stage environment. What is really weird is that I grabbed one of the product ID that are not showing up in SOLR from the DB, search through the SOLR GUI and it found it. -Original Message- From: Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Sent: Thursday, February 03, 2011 4:57 PM To: solr-user@lucene.apache.org Subject: Re: Index Not Matching that's odd..are you viewing the results through your application or the admin console? if you aren't, I'd suggest you use the admin console just to eliminate the possibility of an application bug. We had a similar problem in the past and turned out to be a mixup of our dev/test instances.. On 3 February 2011 21:41, Esclusa, Will william.escl...@bonton.com wrote: Hello Saavs, I am 100% sure we are not updating the DB after we index the data. We are specifying the same fields on both queries. Our prod boxes do not have access to QA or DEV, so I would expect a connection error when indexing if this is the case. No connection errors in the logs. -Original Message- From: Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Sent: Thursday, February 03, 2011 4:26 PM To: solr-user@lucene.apache.org Subject: Re: Index Not Matching Hello, Are you definitely positive your database isn't updated after you index your data? Are you querying against the same field(s) specifying the same criteria both in Solr and in the database? Any chance you might be pointing to a dev/test instance of Solr ? Regards, - Savvas On 3 February 2011 20:17, Esclusa, Will william.escl...@bonton.com wrote: Greetings! My organization is new to SOLR, so please bare with me. At times, we experience an out of sync condition between SOLR index files and our Database. We resolved that by clearing the index file and performing a full crawl of the database. Last time we noticed an out of sync condition, we went through our procedure of deleting and crawling, but this time it did not fix it. For example, search for swim on the DB and we get 440 products, but yet SOLR states we have 214 products. Has anyone experience anything like this? Does anyone have any suggestions on a trace we can turn on? Again, we are new to SOLR so any help you can provide is greatly appreciated. Thanks! Will
RE: Index Not Matching
: At first I thought it could be a schema problem, but we went though it : with a fine comb and compared it to the one in our stage environment. : What is really weird is that I grabbed one of the product ID that are : not showing up in SOLR from the DB, search through the SOLR GUI and it : found it. unless i'm completely missunderstanding you, that means there is a record in Solr for that record in the DB -- which suggests the problem is not DB records getting indexed, it's analysis of some kind -- does a *:* (ie: return all docs) query to solr return the smae number of results as a select count(*) query on the DB? there's really not enough info here to make any meaningful guesses as to the problem. -Hoss
Re: Index Not Matching
which field type are you specifying in your schema.xml for the fields that you search upon? if you are using text then this causes your input text to be stemmed to a common root making your searches more flexible. For instance: if you have the term dreaming in one row/document and the term dream in another, then this could be stemmed to dreami or something like during indexing. This effectively causes both your documents to match when you search for dream in Solr but you would only return 1 result if you searched directly in your database. On 3 February 2011 22:37, Geert-Jan Brits gbr...@gmail.com wrote: Make sure your index is completely commited. curl 'http://localhost:8983/solr/update?commit=true' http://wiki.apache.org/solr/UpdateXmlMessages#A.22commit.22_and_.22optimize.22 for an overview: http://lucene.apache.org/solr/tutorial.html hth, Geert-Jan http://techgurulive.com/2010/11/22/apache-solr-commit-and-optimize/ 2011/2/3 Esclusa, Will william.escl...@bonton.com Both the application and the SOLR gui match (with the incorrect number of course :-) ) At first I thought it could be a schema problem, but we went though it with a fine comb and compared it to the one in our stage environment. What is really weird is that I grabbed one of the product ID that are not showing up in SOLR from the DB, search through the SOLR GUI and it found it. -Original Message- From: Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Sent: Thursday, February 03, 2011 4:57 PM To: solr-user@lucene.apache.org Subject: Re: Index Not Matching that's odd..are you viewing the results through your application or the admin console? if you aren't, I'd suggest you use the admin console just to eliminate the possibility of an application bug. We had a similar problem in the past and turned out to be a mixup of our dev/test instances.. On 3 February 2011 21:41, Esclusa, Will william.escl...@bonton.com wrote: Hello Saavs, I am 100% sure we are not updating the DB after we index the data. We are specifying the same fields on both queries. Our prod boxes do not have access to QA or DEV, so I would expect a connection error when indexing if this is the case. No connection errors in the logs. -Original Message- From: Savvas-Andreas Moysidis [mailto:savvas.andreas.moysi...@googlemail.com] Sent: Thursday, February 03, 2011 4:26 PM To: solr-user@lucene.apache.org Subject: Re: Index Not Matching Hello, Are you definitely positive your database isn't updated after you index your data? Are you querying against the same field(s) specifying the same criteria both in Solr and in the database? Any chance you might be pointing to a dev/test instance of Solr ? Regards, - Savvas On 3 February 2011 20:17, Esclusa, Will william.escl...@bonton.com wrote: Greetings! My organization is new to SOLR, so please bare with me. At times, we experience an out of sync condition between SOLR index files and our Database. We resolved that by clearing the index file and performing a full crawl of the database. Last time we noticed an out of sync condition, we went through our procedure of deleting and crawling, but this time it did not fix it. For example, search for swim on the DB and we get 440 products, but yet SOLR states we have 214 products. Has anyone experience anything like this? Does anyone have any suggestions on a trace we can turn on? Again, we are new to SOLR so any help you can provide is greatly appreciated. Thanks! Will
RE: Index Not Matching
Hello Hoss, That is exactly what it is going on. It seems to be failing in the analysis of the record. How do I get all the records from SOLR? http://localhost:8080/select?*.* ? Thanks! -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Thursday, February 03, 2011 5:42 PM To: solr-user@lucene.apache.org Subject: RE: Index Not Matching : At first I thought it could be a schema problem, but we went though it : with a fine comb and compared it to the one in our stage environment. : What is really weird is that I grabbed one of the product ID that are : not showing up in SOLR from the DB, search through the SOLR GUI and it : found it. unless i'm completely missunderstanding you, that means there is a record in Solr for that record in the DB -- which suggests the problem is not DB records getting indexed, it's analysis of some kind -- does a *:* (ie: return all docs) query to solr return the smae number of results as a select count(*) query on the DB? there's really not enough info here to make any meaningful guesses as to the problem. -Hoss
response when using my own QParserPlugin
Hi, I wrote a QParserPlugin. When I hit solr and use this QParserPlugin, the response does not have the column names associated with the data such as: 0 29 0 {!tnav} faketn1 CA city san francisco US 10 - - 495,496,497 500,657,498,499 us:ca:san francisco faketn,fakeregression 037.74 -122.49 faketn1 faketn1 faketn1 faketn1 faketn1 99902837 +3774-12250|+3774-12250@1|+3772-12252@2 94116:us 495,496,497 fakecs,fakeatti,fakevenable 500,657,498,499 San Francisco 667 US 37.742369 -122.491240 boldMain Dishes/bold boldPancakes/bold faketn1 2.99 Enjoy best chinese food. faketn1 1;0:0:0:0:8:20% off.0:0:0:3:0.0 4158281775 94116 ACTION_MODEL TN CA 2350 Taraval St Enjoy best chinese food 40233 - 5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3 2027 - How do I get the data to be associated with the index columns so I can parse it and know the context of the data (such as this data is the business name, this data is the address, etc). --- i was hoping it return something like this or some sort of structure. ?xml version=1.0 encoding=UTF-8 ? - response - lstname=responseHeader intname=status0/int intname=QTime1/int - lstname=params strname=indenton/str strname=start0/str strname=qI_NAME_EXACT:faketn1/str strname=rows10/str strname=version2.2/str /lst /lst - resultname=responsenumFound=1start=0 - doc - arrname=I_BASE_ID str-/str str-/str /arr strname=I_BLOCK_CATEGORY_ID495,496,497/str strname=I_CATEGORY_ID500,657,498,499/str strname=I_CITY_DISTRICTus:ca:san francisco/str strname=I_KEYWORDfaketn,fakeregression/str strname=I_LAT_RANGE037.74/str strname=I_LON_RANGE-122.49/str strname=I_NAME_AS_KEYWORDfaketn1/str strname=I_NAME_ENUMfaketn1/str strname=I_NAME_EXACTfaketn1/str strname=I_NAME_NGRAMfaketn1/str strname=I_NAME_PACKfaketn1/str strname=I_POI_ID99902837/str strname=I_SPATIAL_BLOCK+3774-12250|+3774-12250@1|+3772-12252@2/str strname=I_ZIP_DISTRICT94116:us/str strname=S_BLOCK_CATEGORY_ID495,496,497/str strname=S_BLOCK_KEYWORDSfakecs,fakeatti,fakevenable/str strname=S_CATEGORY_ID500,657,498,499/str strname=S_CITYSan Francisco/str strname=S_COMPAIGN_ID667/str strname=S_COUNTRYUS/str str name=S_FAX/ strname=S_LATITUDE37.742369/str strname=S_LONGTITUDE-122.491240/str strname=S_MENUboldMain Dishes/bold boldPancakes/bold faketn1 2.99/str strname=S_MERCHANT_CONTENTEnjoy best chinese food./str strname=S_NAMEfaketn1/str strname=S_OFFERS1;0:0:0:0:8:20% off.0:0:0:3:0.0/str strname=S_PHONE_NUMBER4158281775/str strname=S_POSTALCODE94116/str strname=S_PRICEMODEACTION_MODEL/str strname=S_SOURCE_NAMETN/str str name=S_SPONSOREDTEXT/ strname=S_STATECA/str strname=S_STREET2350 Taraval St/str str name=S_STREET2/ str name=S_SUIT/ strname=S_TAGLINEEnjoy best chinese food/str strname=S_TARGET_DISTANCE_IN_METER40233/str strname=S_TA_ID-/str strname=S_USER_ACTIONS5;10:ACTION_MAP0:3:0.315:ACTION_DRIVE_TO0:3:0.517:ACTION_IMPRESSION0:6:0.005014:ACTION_PROFILE0:3:0.111:ACTION_CALL0:3:0.3/str strname=S_VENDOR_ID2027/str str name=S_WEBURL/ strname=S_YPC_ID-/str /doc /result /response Tri
Re: Scale out design patterns
I am also in the same idea. Based on the field, I could shard but there are two practical difficulties. 1. If normal user logged-in then result could be fetched from the corresponding search server but if Admin user logged-in, then he may need to see all data. The query should be issued across servers and results should be consolidated. 2. Consider a scenario I am sharding based on the User, I am having single search server and It is handling 1000 members. Now as the memory consumption is high, I have added one more search server. New users could access the second server but what about the old users, their data will be still added to the server1. How to address this issue. Is rebuilding the index the only way. Could any one share their experience, How they solved scale out problems? Regards Ganesh - Original Message - From: Anshum ansh...@gmail.com To: java-u...@lucene.apache.org Sent: Friday, January 21, 2011 12:04 PM Subject: Re: Scale out design patterns Hi Ganesh, I'd suggest, if you have a particular dimension/field on which you could shard your data such that the query/data breakup gets predictable, that would be a good way to scale out e.g. if you have users which are equally active/searched then you may want to split their data on a simple mod of some numeric (auto increment) userid. This works well under normal cases unless your partitioning is not predictable. -- Anshum Gupta http://ai-cafe.blogspot.com On Fri, Jan 21, 2011 at 10:52 AM, Ganesh emailg...@yahoo.co.in wrote: Hello all, Could you any one guide me what all the various ways we could scale out? 1. Index: Add data to the nodes in round-robin. Search: Query all the nodes and cluster the results using carrot2. 2.Horizontal partitioning and No shared architecture, Index: Split the data based on userid and index few set of users data in each node. Search: Have a mapper kind of application which could tell which userid is mapped to node, redirect the search traffic to corresponding node. Which one is best? Did you guys tried any of these approach. Please share your thoughts. Regards Ganesh Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php
Re: Function Question
I like it. You would think it would be easy to get the values from a multiValue field in the geodist() function, but I guess it was not built for that. If anyone has done something similar, let me know. Thanks. Bill On Thu, Feb 3, 2011 at 3:18 PM, Geert-Jan Brits gbr...@gmail.com wrote: I don't have a direct answer to your question, but you could consider having fields: latCombined and LongCombined where you pairwise combine the latitudes and longitudes, e.g: latCombined: 48.0-49.0-50.0 longcombined: 2.0-3.0-4.0 Than in your custom scorer above split latCombined and longcombined and calculate the closests distance to the user-defined point. hth, Geert-Jan 2011/2/3 William Bell billnb...@gmail.com Thoughts? On Wed, Feb 2, 2011 at 10:38 PM, Bill Bell billnb...@gmail.com wrote: This is posted as an enhancement on SOLR-2345. I am willing to work on it. But I am stuck. I would like to loop through the lat/long values when they are stored in a multiValue list. But it appears that I cannot figure out to do that. For example: sort=geodist() asc This should grab the closest point in the MultiValue list, and return the distance so that is can be scored. The problem is I cannot find a way to get the MultiValue list? In function: src/java/org/apache/solr/search/function/distance/HaversineConstFunction.ja va Has code similar to: VectorValueSource p2; this.p2 = vs ListValueSource sources = p2.getSources(); ValueSource latSource = sources.get(0); ValueSource lonSource = sources.get(1); DocValues latVals = latSource.getValues(context1, readerContext1); DocValues lonVals = lonSource.getValues(context1, readerContext1); double latRad = latVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS; double lonRad = lonVals.doubleVal(doc) * DistanceUtils.DEGREES_TO_RADIANS; etc... It would be good if I could loop through sources.get() but it only returns 2 sources even when there are 2 pairs of lat/long. The getSources() only returns the following: sources:[double(store_0_coordinate), double(store_1_coordinate)] How do I just get the 4 values in the function?
Re: response when using my own QParserPlugin
Are looking your output in any Browser? Which browser you are using? If chrome then look for view source it will give your desired xml output or change any other browser to see xml output - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/response-when-using-my-own-QParserPlugin-tp2419367p2421499.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: HTTP ERROR 400 undefined field: *
How you have upgraded ? are you changed every thing all jars ,data,config or any thing using from older version? - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/HTTP-ERROR-400-undefined-field-tp2417938p2421569.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: HTTP ERROR 400 undefined field: *
: I was working on an checkout of the 3.x branch from about 6 months ago. : Everything was working pretty well, but we decided that we should update and : get what was at the head. However after upgrading, I am now getting this FWIW: please be specific. head of what? the 3x branch? or trunk? what revision in svn does that corrispond to? (the svnversion command will tell you) : HTTP ERROR 400 undefined field: * : : If I clear the fl parameter (default is set to *, score) then it works fine : with one big problem, no score data. If I try and set fl=score I get the same : error except it says undefined field: score?! : : This works great in the older version, what changed? I've googled for about : an hour now and I can't seem to find anything. i can't reproduce this using either trunk (r1067044) or 3x (r1067045) all of these queries work just fine... http://localhost:8983/solr/select/?q=* http://localhost:8983/solr/select/?q=solrfl=*,score http://localhost:8983/solr/select/?q=solrfl=score http://localhost:8983/solr/select/?q=solr ...you'll have to proivde us with a *lot* more details to help understand why you might be getting an error (like: what your configs look like, what the request looks like, what the full stack trace of your error is in the logs, etc...) -Hoss
Re: facet.mincount
Thanks to all On 3 February 2011 20:21, Grijesh pintu.grij...@gmail.com wrote: Hi facet.mincount not works with facet.date option afaik. There is an issue for it as solr-343, but resolved. Try apply patch, provided as a solution in this issue may solve the problem. Fix version for this may be 1.5 - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/facet-mincount-tp2411930p2414232.html Sent from the Solr - User mailing list archive at Nabble.com. -- Thanks Regards, Isan Fulia.
Re: Use Parallel Search
I am having similar kind of problem. I need to scale out. Could you explain how you have done distributed indexing and search using Lucene. Regards Ganesh - Original Message - From: Gustavo Maia gust...@goshme.com To: solr-user@lucene.apache.org Sent: Thursday, February 03, 2011 11:36 PM Subject: Use Parallel Search Hello, Let me give a brief description of my scenario. Today I am only using Lucene 2.9.3. I have an index of 30 million documents distributed on three machines and each machine with 6 hds (15k rmp). The server queries the search index using the remote class search. And each machine is made to search using the parallel search (search simultaneously in 6 hds). So during the search are simulating using the three machines and 18 hds, returning me to a very good response time. Today I am studying the SOLR and am interested in knowing more about the searches and use of distributed parallel search on the same machine. What would be the best scenario using SOLR that is better than I already am using today only with lucene? Note: I need to have installed on each machine 6 SOLR instantiate from my server? One for each hd? Or would some other alternative way for me to use the 6 hds without having 6 instances of SORL server? Another question would be if the SOLR would have some limiting size index for Hard drive? It would be interesting not index too big because when the index increased the longer the search. Thanks for everything. Gustavo Maia Send free SMS to your Friends on Mobile from your Yahoo! Messenger. Download Now! http://messenger.yahoo.com/download.php
Re: Using terms and N-gram
Okay so as suggested Shingle works perfectly well for what I need ! Thank you Erick 2011/2/3 openvictor Open openvic...@gmail.com Thank you for these inputs. I was silly asking for ngrams because I already knew it. I think I was tired yesterday... Thank you Eric Erickson, once again you gave me a more than useful comment. Indeed Shingles seems to be the perfect fit for the work I want to do. I will try to implement that tonight and I will come back to see if it's working. Regards, Victor 2011/2/3 Erick Erickson erickerick...@gmail.com First, you'll get a lot of insight by defining something simply and looking at the analysis page from solr admin. That's a very valuable page. To your question: commongrams are shingles that work between stopwords and other words. For instance, this is some text gets analyzed into this, this_is, is, is_some, some text. Note that the stopwords are the only things that get combined with the text after. NGrams form on letters. It's too long to post the whole thing, but the above phrase gets analyzed as t, h, i, s, th, hi, is, i, s, is, s, o, m, e, so, om, me.. It splits a single token into grams whereas commongrams essentially combines tokens when they're stopwords. Have you looked at shingles? See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory Best Erick On Thu, Feb 3, 2011 at 10:15 AM, openvictor Open openvic...@gmail.com wrote: Thank you, I will do that and hopefuly it will be handy ! But can someone explain me difference between CommonGramFIlterFactory et NGramFilterFactory ? ( Maybe the solution is there) Thank you all, best regards 2011/2/3 Grijesh pintu.grij...@gmail.com Use analysis.jsp to see what happening at index time and query time with your input data.You can use highlighting to see if match found. - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Using-terms-and-N-gram-tp2410938p2411244.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SOLR 1.4 and Lucene 3.0.3 index problem
Hi, I would not try to change the lucene version in Solr 1.4.1 from 2.9.x to 3.0.x. As said Koji, the best solution is to get the branch 3.x or the trunk and build it. You need svn and ant. 1. Create a working directory $ mkdir ~/solr 2. Get the source $ cd ~/solr $ svn co http://svn.apache.org/repos/asf/lucene/dev/trunk or $ svn co http://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x 3. build $ cd ~/solr/modules $ ant compile $ cd ~/solr/lucene $ ant dist $ cd ~/solr/modules $ ant dist Dominique Le 02/02/11 12:47, Churchill Nanje Mambe a écrit : thanks guys I will try the trunk as for unpacking the war and changing the lucene... I am not an expect and this my get complicated for me maybe over time when I am comfortable Mambe Churchill Nanje 237 33011349, AfroVisioN Founder, President,CEO http://www.afrovisiongroup.com | http://mambenanje.blogspot.com skypeID: mambenanje www.twitter.com/mambenanje On Wed, Feb 2, 2011 at 8:03 AM, Grijeshpintu.grij...@gmail.com wrote: You can extract the solr.war using java's jar -xvf solr.war command change the lucene-2.9.jar with your lucene-3.0.3.jar in WEB-INF/lib directory then use jar -cxf solr.war * to again pack the war deploy that war hope that work - Thanx: Grijesh -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-1-4-and-Lucene-3-0-3-index-problem-tp2396605p2403542.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr faceting on score
No ,You can not get facets on score ,score is not defined in schema facet can be get only fields defined in schema as I know. - Thanx: Grijesh http://lucidimagination.com -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceting-on-score-tp2422076p2422121.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr faceting on score
Thanks for reply -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-faceting-on-score-tp2422076p2422147.html Sent from the Solr - User mailing list archive at Nabble.com.
Problem in faceting
Dear sir, i have problem with faceting. I am searching a text water treatment plant on solr using dismax request handler. The final query which goes to solr is here - str name=parsedquery_toString +((TDR_SUBIND_PROD_NAMES:water^2.5 | TDR_SUBIND_LOC_ZIP:water^2.5 | TDR_SUBIND_COMP_NAME:water^1.5 | TDR_SUBIND_TENDER_NO:water | TDR_SUBIND_SUBTDR_SHORT:water^3.0 | TDR_SUBIND_SUBTDR_DETAILS:water^2.0 | TDR_SUBIND_LOC_CITY:water^3.0 | TDR_SUBIND_LOC_STATE:water^3.0 | TDR_SUBIND_NAME:water^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:treatment^2.5 | TDR_SUBIND_LOC_ZIP:treatment^2.5 | TDR_SUBIND_COMP_NAME:treatment^1.5 | TDR_SUBIND_TENDER_NO:treatment | TDR_SUBIND_SUBTDR_SHORT:treatment^3.0 | TDR_SUBIND_SUBTDR_DETAILS:treatment^2.0 | TDR_SUBIND_LOC_CITY:treatment^3.0 | TDR_SUBIND_LOC_STATE:treatment^3.0 | TDR_SUBIND_NAME:treatment^1.5)~0.2 (TDR_SUBIND_PROD_NAMES:plant^2.5 | TDR_SUBIND_LOC_ZIP:plant^2.5 | TDR_SUBIND_COMP_NAME:plant^1.5 | TDR_SUBIND_TENDER_NO:plant | TDR_SUBIND_SUBTDR_SHORT:plant^3.0 | TDR_SUBIND_SUBTDR_DETAILS:plant^2.0 | TDR_SUBIND_LOC_CITY:plant^3.0 | TDR_SUBIND_LOC_STATE:plant^3.0 | TDR_SUBIND_NAME:plant^1.5)~0.2) (TDR_SUBIND_SUBTDR_DETAILS:water treatment plant^10.0 | TDR_SUBIND_COMP_NAME:water treatment plant^20.0 | TDR_SUBIND_SUBTDR_SHORT:water treatment plant^15.0)~0.2 /str Now i want to do faceting over those results which have complete text water treatment plant in it. means the records which have water treatment plant completely. i donot want to do faceting on the results which has 1 or 2 words matching like water or treatment. But in case of above query i am not able to achieve this thing. The Main Problem : There is a field FACET_CITY in my schema.xml and i want to find out only those cities for which the complete text water treatment plant should match. I don't want those cities for which only water or treatment words are matching. I have two possibilities to achieve this functionality - 1. Either anyhow i can find out the cities list for which the complete text is matching means faceting only on complete text matching documents OR 2. Faceting over first 100 documents only for cities list. It may be for first 100 documents having more score. Please Suggest me how can i achieve this. -- View this message in context: http://lucene.472066.n3.nabble.com/Problem-in-faceting-tp2422182p2422182.html Sent from the Solr - User mailing list archive at Nabble.com.
Facet Query
Hi, Is facet query and fq parameters works only for range queries. can i make a general query for it like searching a facet.query=city:mumbai and getting results back. please suggest. When i made this query i am only getting count back for it . How can i get documents for it. -- View this message in context: http://lucene.472066.n3.nabble.com/Facet-Query-tp2422212p2422212.html Sent from the Solr - User mailing list archive at Nabble.com.