Re: Loading lineshape data into Solr
Anyone? On 15-04-29 09:07 PM, Arthur Zubarev wrote: Hi Solr community, My immediate task at hand is to load lienshape data into Solr (the lineshape data is a set of points on a curve in form of lat. + long. coordinates). The data sits in a SQL Server 2012 table. Extracting the data to a flat file is impossible as it is becoming binary (not readable). The other columns have streets, points of interest, etc.. The end result of the undertaking would be a query to Solr to locate an address based on lat+long. Any hints/tips are welcome! Thank you! Regards, Arthur
Re: Upgraded to 4.10.3, highlighting performance unusably slow
Hi, Can you also include the details of your research that narrowed the issue to the highlighter? Joel Bernstein http://joelsolr.blogspot.com/ On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) < michael.r...@lexisnexis.com> wrote: > Are you able to identify if there is a particular part of the code that is > slow? > > A simple way to do this is to use the jstack command (assuming your server > has the full JDK installed). You can run it like this: > /path/to/java/bin/jstack PID > > If you run that a bunch of times while your highlight query is running, > you might be able to spot the hotspot. Usually I'll do something like this > to see the stacktrace for the thread running the query: > /path/to/java/bin/jstack PID | grep SearchHandler -B30 > > A few more questions: > - What are response times you are seeing before and after the upgrade? Is > "unusably slow" 1 second, 10 seconds...? > - If you run the exact same query multiple times, is it consistently slow? > Or is it only slow on the first run? > - While the query is running, do you see high user CPU on your server, or > high IO wait, or both? (You can check this with the top command or vmstat > command in Linux.) > > -Michael > > -Original Message- > From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu] > Sent: Saturday, May 02, 2015 4:13 PM > To: solr-user@lucene.apache.org > Subject: Upgraded to 4.10.3, highlighting performance unusably slow > > Hello, > > We recently upgraded solr from 3.8.0 to 4.10.3. We saw that this upgrade > caused a incredible slowdown in our searches. We were able to narrow it > down to the highlighting. The slowdown is extreme enough that we are > holding back our release until we can resolve this. Our research indicated > using TermVectors & FastHighlighter were the way to go, however this still > does nothing for the performance. I think we may be overlooking a crucial > configuration, but cannot figure it out. I was hoping for some guidance and > help. Sorry for the long email, I wanted to provide enough information. > > Our documents are largely dynamic fields, and so we have been using ‘*’ as > the field for highlighting. This is the same setting as in prior versions > of solr use. The dynamic fields are of type ’text’ and we added > customizations to the schema.xml for the type ’text’: > > storeOffsetsWithPositions="true" termVectors="true" termPositions="true" > termOffsets="true"> > > > > > > words="stopwords.txt" enablePositionIncrements="true"/> > generateNumberParts="1" catenateWords="1" catenateNumbers="1" > catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > > > words="stopwords.txt" enablePositionIncrements="true"/> > generateNumberParts="1" catenateWords="0" catenateNumbers="0" > catenateAll="0" splitOnCaseChange="1"/> > > protected="protwords.txt"/> > > > > One of the two dynamic fields we use: > > stored="true" required="false" multiValued="true"/> > > In our solrConfig.xml file, we have: > > name="defaults"> explicit > 13 > true > true > > > tvComponent > > > > > class="solr.highlight.GapFragmenter"> > > 100 > > > > > 70 > 0.5 > [-\w ,/\n\"']{20,200} > > > > class="solr.highlight.HtmlFormatter"> > > > > > > > > class="solr.highlight.SimpleFragListBuilder"/> > class="solr.highlight.SingleFragListBuilder"/> > class="solr.highlight.WeightedFragListBuilder"/> > class="solr.highlight.ScoreOrderFragmentsBuilder"> > > > > class="solr.highlight.ScoreOrderFragmentsBuilder"> > > > > > > > class="solr.highlight.SimpleBoundaryScanner"> > > 10 > .,!? > > > > class="solr.highlight.BreakIteratorBoundaryScanner"> > > WORD > en > US > > > > > > And in our code: > > final SolrQuery query = new SolrQuery( luceneQueryStr ); > query.setRequestHandler("/eiHandler"); > query.setStart( request.getStartIndex() ); query.setRows( > request.getMaxResults() ); query.setSort(new > SortClause(request.getSortOrder().getFieldName(), > request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) ); > query.addHighlightField( "*" ); query.setFields( "*", "score" ); > > Any assistance is greatly appreciated. Thank you. > > Sincerely, > Sophia >
Re: "Avoiding" a schema.xml
Thanks! Indeed, one of my issues is that I can not know about the fields to be indexed before seeing (and making some entity extraction) on the browsed documents. It is the reason I thought to avoid the schema definition ... The schema API sounds interesting! Does it exist via SolrJ? Many thanks! Benjamin On Thu, Apr 30, 2015 at 6:27 PM, Erick Erickson wrote: > Could you explain a bit more _why_ you want to do this? As you're > probably well aware, there > are multiple ways to shoot yourself in the foot in lower-level Lucene. > > If you have some situation where you're creating indexes on the fly > that may vary then > you could consider the "managed schema" that lets you create a schema > via API calls, > then you wouldn't need to mess with editing the schema.xml file for > instance. > > Best, > Erick > > On Thu, Apr 30, 2015 at 8:12 AM, Shawn Heisey wrote: > > On 4/30/2015 8:43 AM, Sznajder ForMailingList wrote: > >> I am interested to index some documents in Solr, as I did in Lucene. > >> > >> I mean: giving via solrJ all the information about the field I am adding > >> (Tokenize, store, facet etc...) > >> > >> can we do that? Or is it mandatory to define a schema on the collection? > > > > All that information is defined on the server. You do not have direct > > access to the Lucene index - Solr is intended as an abstraction, so the > > admin and the users/applications that use Solr do not need to understand > > all the low-level details that go into a Lucene application. The admin > > just has to deal with configuration files like schema.xml, and the users > > just need to know what fields are in each document and how the query > > syntax works. Deeper Lucene knowledge is helpful, but not strictly > > necessary. > > > > If you want Lucene-level control, you'll need to write the search server > > yourself using Lucene. If you have very specific needs that Solr's > > approach can't satisfy, you always have this option. > > > > The newest Solr versions do have an example of what's known as a > > "data-driven" schema, or schemaless mode. In this mode, Solr builds up > > the schema automatically, guessing the field type based on what kind of > > data is the first to arrive for each field. This is good for > > prototyping, but for production use, I would want to be in full manual > > control of the schema. > > > > Thanks, > > Shawn > > >
RE: Upgraded to 4.10.3, highlighting performance unusably slow
Are you able to identify if there is a particular part of the code that is slow? A simple way to do this is to use the jstack command (assuming your server has the full JDK installed). You can run it like this: /path/to/java/bin/jstack PID If you run that a bunch of times while your highlight query is running, you might be able to spot the hotspot. Usually I'll do something like this to see the stacktrace for the thread running the query: /path/to/java/bin/jstack PID | grep SearchHandler -B30 A few more questions: - What are response times you are seeing before and after the upgrade? Is "unusably slow" 1 second, 10 seconds...? - If you run the exact same query multiple times, is it consistently slow? Or is it only slow on the first run? - While the query is running, do you see high user CPU on your server, or high IO wait, or both? (You can check this with the top command or vmstat command in Linux.) -Michael -Original Message- From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu] Sent: Saturday, May 02, 2015 4:13 PM To: solr-user@lucene.apache.org Subject: Upgraded to 4.10.3, highlighting performance unusably slow Hello, We recently upgraded solr from 3.8.0 to 4.10.3. We saw that this upgrade caused a incredible slowdown in our searches. We were able to narrow it down to the highlighting. The slowdown is extreme enough that we are holding back our release until we can resolve this. Our research indicated using TermVectors & FastHighlighter were the way to go, however this still does nothing for the performance. I think we may be overlooking a crucial configuration, but cannot figure it out. I was hoping for some guidance and help. Sorry for the long email, I wanted to provide enough information. Our documents are largely dynamic fields, and so we have been using ‘*’ as the field for highlighting. This is the same setting as in prior versions of solr use. The dynamic fields are of type ’text’ and we added customizations to the schema.xml for the type ’text’: One of the two dynamic fields we use: In our solrConfig.xml file, we have: explicit 13 true true tvComponent 100 70 0.5 [-\w ,/\n\"']{20,200} 10 .,!? WORD en US And in our code: final SolrQuery query = new SolrQuery( luceneQueryStr ); query.setRequestHandler("/eiHandler"); query.setStart( request.getStartIndex() ); query.setRows( request.getMaxResults() ); query.setSort(new SortClause(request.getSortOrder().getFieldName(), request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) ); query.addHighlightField( "*" ); query.setFields( "*", "score" ); Any assistance is greatly appreciated. Thank you. Sincerely, Sophia
Indexing
Hi , Is there any way available to directly replicate from mysql using bin log to solr instead using import handler . my mysql queries taking time to fetch data . any solution ... how can i minimize the data fetch latency i have seen https://github.com/linkedin/databus are above be useful for me
suggest.Suggester - Loading stored lookup data failed
Hi, When my solr core is loading, I am getting the below error, even though it is WARN. I just wants to fix this. Please let me know how to fix it.It is showing file missing, do we have any sample file for this. I did not find even in Apache Solr SVN. 2015-05-01 11:33:52,475 WARN suggest.Suggester - Loading stored lookup data failed java.io.FileNotFoundException: /solr/Applications/shards/shard1/data/solr/cores/syslog/data/autocomplete/tst.dat (No such file or directory) at java.io.FileInputStream.open0(Native Method) at java.io.FileInputStream.open(FileInputStream.java:195) at java.io.FileInputStream.(FileInputStream.java:138) at org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:117) at org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:636) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:651) at org.apache.solr.core.SolrCore.(SolrCore.java:849) at org.apache.solr.core.SolrCore.(SolrCore.java:641) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:583) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:264) at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:256) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) Please suggest me what to do to remove this warning from my logs. Thanks, Jilani
Re: Negative Boosting documents with a certain word
Thank you very much Chris. I'm sorry I could not get back to you because I did not have the time to try this. If I change my query from q=laptops to q=laptops%20(*:*%20-Refurbished)^10%20(*:*%20-Recertified)^10 I get exactly what I want! Thank you!! Is there anyway to handle a list of such words. If I have about 10 to 15 words, this query would keep getting longer and longer. Is there a better way to handle this? Right now, I specify the boost for my request handler as: . ln(qty) Is there a way to specify this boost in the Solrconfig.xml? I tried: (*:* -Refurbished)^10 and I get the following exception: ERROR - 2015-05-01 15:13:41.609; org.apache.solr.common.SolrException; org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError: Expected identifier at pos 0 str='(*:* -Refurbished)^10' at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:204) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207) at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419) at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137) at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557) at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231) at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075) at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384) at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193) at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009) at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135) at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255) at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154) at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116) at org.eclipse.jetty.server.Server.handle(Server.java:368) at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489) at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53) at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942) at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004) at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640) at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235) at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72) at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264) at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608) at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543) at java.lang.Thread.run(Unknown Source) Caused by: org.apache.solr.search.SyntaxError: Expected identifier at pos 0 str='(*:* -Refurbished)^10' at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:771) at org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:750) at org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:345) at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68) at org.apache.solr.search.QParser.getQuery(QParser.java:149) at org.apache.solr.search.ExtendedDismaxQParser.getMultiplicativeBoosts(ExtendedDismaxQParser.java:448) at org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:211) at org.apache.solr.search.QParser.getQuery(QParser.java:149) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:147) ... 31 more I'm using Solr 4.10.3 Thank you once again O. O. Chris Hostetter-3 wrote > https://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F > > The general principle you need to follow is to boost documents that
Upgraded to 4.10.3, highlighting performance unusably slow
Hello, We recently upgraded solr from 3.8.0 to 4.10.3. We saw that this upgrade caused a incredible slowdown in our searches. We were able to narrow it down to the highlighting. The slowdown is extreme enough that we are holding back our release until we can resolve this. Our research indicated using TermVectors & FastHighlighter were the way to go, however this still does nothing for the performance. I think we may be overlooking a crucial configuration, but cannot figure it out. I was hoping for some guidance and help. Sorry for the long email, I wanted to provide enough information. Our documents are largely dynamic fields, and so we have been using ‘*’ as the field for highlighting. This is the same setting as in prior versions of solr use. The dynamic fields are of type ’text’ and we added customizations to the schema.xml for the type ’text’: One of the two dynamic fields we use: In our solrConfig.xml file, we have: explicit 13 true true tvComponent 100 70 0.5 [-\w ,/\n\"']{20,200} 10 .,!? WORD en US And in our code: final SolrQuery query = new SolrQuery( luceneQueryStr ); query.setRequestHandler("/eiHandler"); query.setStart( request.getStartIndex() ); query.setRows( request.getMaxResults() ); query.setSort(new SortClause(request.getSortOrder().getFieldName(), request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) ); query.addHighlightField( "*" ); query.setFields( "*", "score" ); Any assistance is greatly appreciated. Thank you. Sincerely, Sophia
solr training
Hey guys, My company has a training budget that it wants me to use. So what I'd like to find out is if there is any instructor lead courses in the NY/NJ area, or courses online that are instructor lead that you could recommend? Thanks, Tim -- GPG me!! gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B
SolrJ 5.1 json.facets
How to access resutls of 'json.facets' from solrJ? I don't see any specific API in QueryResponse. Can I use getBeans API? Thanks, Sandeep -- View this message in context: http://lucene.472066.n3.nabble.com/SolrJ-5-1-json-facets-tp4203509.html Sent from the Solr - User mailing list archive at Nabble.com.