Re: Loading lineshape data into Solr

2015-05-02 Thread Arthur

Anyone?

On 15-04-29 09:07 PM, Arthur Zubarev wrote:

Hi Solr community,
My immediate task at hand is to load lienshape data into Solr (the lineshape 
data is a set of points on a curve in form of lat. + long. coordinates).
The data sits in a SQL Server 2012 table. Extracting the data to a flat file is 
impossible as it is becoming binary (not readable).

The other columns have streets, points of interest, etc..
The end result of the undertaking would be a query to Solr to locate an address 
based on lat+long.
Any hints/tips are welcome!
Thank you!
Regards,
Arthur







Re: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-02 Thread Joel Bernstein
Hi,

Can you also include the details of your research that narrowed the issue
to the highlighter?

Joel Bernstein
http://joelsolr.blogspot.com/

On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) <
michael.r...@lexisnexis.com> wrote:

> Are you able to identify if there is a particular part of the code that is
> slow?
>
> A simple way to do this is to use the jstack command (assuming your server
> has the full JDK installed). You can run it like this:
> /path/to/java/bin/jstack PID
>
> If you run that a bunch of times while your highlight query is running,
> you might be able to spot the hotspot. Usually I'll do something like this
> to see the stacktrace for the thread running the query:
> /path/to/java/bin/jstack PID | grep SearchHandler -B30
>
> A few more questions:
> - What are response times you are seeing before and after the upgrade? Is
> "unusably slow" 1 second, 10 seconds...?
> - If you run the exact same query multiple times, is it consistently slow?
> Or is it only slow on the first run?
> - While the query is running, do you see high user CPU on your server, or
> high IO wait, or both? (You can check this with the top command or vmstat
> command in Linux.)
>
> -Michael
>
> -Original Message-
> From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
> Sent: Saturday, May 02, 2015 4:13 PM
> To: solr-user@lucene.apache.org
> Subject: Upgraded to 4.10.3, highlighting performance unusably slow
>
> Hello,
>
> We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade
> caused a incredible slowdown in our searches. We were able to narrow it
> down to the highlighting. The slowdown is extreme enough that we are
> holding back our release until we can resolve this.  Our research indicated
> using TermVectors & FastHighlighter were the way to go, however this still
> does nothing for the performance. I think we may be overlooking a crucial
> configuration, but cannot figure it out. I was hoping for some guidance and
> help. Sorry for the long email, I wanted to provide enough information.
>
> Our documents are largely dynamic fields, and so we have been using ‘*’ as
> the field for highlighting. This is the same setting as in prior versions
> of solr use. The dynamic fields are of type ’text’ and we added
> customizations to the schema.xml for the type ’text’:
>
>  storeOffsetsWithPositions="true" termVectors="true" termPositions="true"
> termOffsets="true">
>   
> 
> 
> 
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
>  generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
>   
>   
> 
> 
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
>  generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> catenateAll="0" splitOnCaseChange="1"/>
> 
>  protected="protwords.txt"/>
>   
> 
>
> One of the two dynamic fields we use:
>
>  stored="true" required="false" multiValued="true"/>
>
> In our solrConfig.xml file, we have:
>
>   name="defaults"> explicit
>  13
>  true
>  true
>
> 
> tvComponent
> 
> 
> 
>   
>  class="solr.highlight.GapFragmenter">
>   
> 100
>   
> 
> 
>   
> 70
> 0.5
> [-\w ,/\n\"']{20,200}
>   
> 
>
>  class="solr.highlight.HtmlFormatter">
>   
> 
> 
>   
> 
>
> 
>  class="solr.highlight.SimpleFragListBuilder"/>
>  class="solr.highlight.SingleFragListBuilder"/>
>  class="solr.highlight.WeightedFragListBuilder"/>
>  class="solr.highlight.ScoreOrderFragmentsBuilder">
> 
>
> 
>  class="solr.highlight.ScoreOrderFragmentsBuilder">
>   
> 
> 
>   
> 
>
>  class="solr.highlight.SimpleBoundaryScanner">
>   
> 10
> .,!? 	

>   
> 
>
>  class="solr.highlight.BreakIteratorBoundaryScanner">
>   
> WORD
> en
> US
>   
> 
>   
> 
>
> And in our code:
>
> final SolrQuery query = new SolrQuery( luceneQueryStr );
> query.setRequestHandler("/eiHandler");
> query.setStart( request.getStartIndex() ); query.setRows(
> request.getMaxResults() ); query.setSort(new
> SortClause(request.getSortOrder().getFieldName(),
> request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) );
> query.addHighlightField( "*" ); query.setFields( "*", "score" );
>
> Any assistance is greatly appreciated.  Thank you.
>
> Sincerely,
> Sophia
>


Re: "Avoiding" a schema.xml

2015-05-02 Thread Sznajder ForMailingList
Thanks!

Indeed, one of my issues is that I can not know about the fields to be
indexed before seeing (and making some entity extraction) on the browsed
documents.
It is the reason I thought to avoid the schema definition ...

The schema API sounds interesting! Does it exist via SolrJ?

Many thanks!

Benjamin

On Thu, Apr 30, 2015 at 6:27 PM, Erick Erickson 
wrote:

> Could you explain a bit more _why_ you want to do this? As you're
> probably well aware, there
> are multiple ways to shoot yourself in the foot in lower-level Lucene.
>
> If you have some situation where you're creating indexes on the fly
> that may vary then
> you could consider the "managed schema" that lets you create a schema
> via API calls,
> then you wouldn't need to mess with editing the schema.xml file for
> instance.
>
> Best,
> Erick
>
> On Thu, Apr 30, 2015 at 8:12 AM, Shawn Heisey  wrote:
> > On 4/30/2015 8:43 AM, Sznajder ForMailingList wrote:
> >> I am interested to index some documents in Solr, as I did in Lucene.
> >>
> >> I mean: giving via solrJ all the information about the field I am adding
> >> (Tokenize, store, facet etc...)
> >>
> >> can we do that? Or is it mandatory to define a schema on the collection?
> >
> > All that information is defined on the server.  You do not have direct
> > access to the Lucene index - Solr is intended as an abstraction, so the
> > admin and the users/applications that use Solr do not need to understand
> > all the low-level details that go into a Lucene application.  The admin
> > just has to deal with configuration files like schema.xml, and the users
> > just need to know what fields are in each document and how the query
> > syntax works.  Deeper Lucene knowledge is helpful, but not strictly
> > necessary.
> >
> > If you want Lucene-level control, you'll need to write the search server
> > yourself using Lucene.  If you have very specific needs that Solr's
> > approach can't satisfy, you always have this option.
> >
> > The newest Solr versions do have an example of what's known as a
> > "data-driven" schema, or schemaless mode.  In this mode, Solr builds up
> > the schema automatically, guessing the field type based on what kind of
> > data is the first to arrive for each field.  This is good for
> > prototyping, but for production use, I would want to be in full manual
> > control of the schema.
> >
> > Thanks,
> > Shawn
> >
>


RE: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-02 Thread Ryan, Michael F. (LNG-DAY)
Are you able to identify if there is a particular part of the code that is slow?

A simple way to do this is to use the jstack command (assuming your server has 
the full JDK installed). You can run it like this:
/path/to/java/bin/jstack PID

If you run that a bunch of times while your highlight query is running, you 
might be able to spot the hotspot. Usually I'll do something like this to see 
the stacktrace for the thread running the query:
/path/to/java/bin/jstack PID | grep SearchHandler -B30

A few more questions:
- What are response times you are seeing before and after the upgrade? Is 
"unusably slow" 1 second, 10 seconds...?
- If you run the exact same query multiple times, is it consistently slow? Or 
is it only slow on the first run?
- While the query is running, do you see high user CPU on your server, or high 
IO wait, or both? (You can check this with the top command or vmstat command in 
Linux.)

-Michael

-Original Message-
From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu] 
Sent: Saturday, May 02, 2015 4:13 PM
To: solr-user@lucene.apache.org
Subject: Upgraded to 4.10.3, highlighting performance unusably slow

Hello,

We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade 
caused a incredible slowdown in our searches. We were able to narrow it down to 
the highlighting. The slowdown is extreme enough that we are holding back our 
release until we can resolve this.  Our research indicated using TermVectors & 
FastHighlighter were the way to go, however this still does nothing for the 
performance. I think we may be overlooking a crucial configuration, but cannot 
figure it out. I was hoping for some guidance and help. Sorry for the long 
email, I wanted to provide enough information.

Our documents are largely dynamic fields, and so we have been using ‘*’ as the 
field for highlighting. This is the same setting as in prior versions of solr 
use. The dynamic fields are of type ’text’ and we added customizations to the 
schema.xml for the type ’text’:


  








  
  







  


One of the two dynamic fields we use:



In our solrConfig.xml file, we have:

  explicit
 13
 true
 true
   

tvComponent



  

  
100
  


  
70
0.5
[-\w ,/\n\"']{20,200}
  



  


  











  


  



  
10
.,!? 	

  



  
WORD
en
US
  

  


And in our code:

final SolrQuery query = new SolrQuery( luceneQueryStr ); 
query.setRequestHandler("/eiHandler");
query.setStart( request.getStartIndex() ); query.setRows( 
request.getMaxResults() ); query.setSort(new 
SortClause(request.getSortOrder().getFieldName(), 
request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) ); 
query.addHighlightField( "*" ); query.setFields( "*", "score" );

Any assistance is greatly appreciated.  Thank you.

Sincerely,
Sophia


Indexing

2015-05-02 Thread Midas A
Hi ,

Is there  any way available to directly replicate from mysql using bin log
to solr instead using import handler .


my mysql queries taking time to fetch data . any solution ...

how can i minimize the data fetch latency

i have seen
https://github.com/linkedin/databus


are above be useful for me


suggest.Suggester - Loading stored lookup data failed

2015-05-02 Thread Jilani Shaik
Hi,

When my solr core is loading, I am getting the below error, even though it
is WARN. I just wants to fix this. Please let me know how to fix it.It is
showing file missing, do we have any sample file for this. I did not find
even in Apache Solr SVN.

2015-05-01 11:33:52,475 WARN suggest.Suggester - Loading stored lookup data
failed
java.io.FileNotFoundException:
/solr/Applications/shards/shard1/data/solr/cores/syslog/data/autocomplete/tst.dat
(No such file or directory)
at java.io.FileInputStream.open0(Native Method)
at java.io.FileInputStream.open(FileInputStream.java:195)
at java.io.FileInputStream.(FileInputStream.java:138)
at
org.apache.solr.spelling.suggest.Suggester.init(Suggester.java:117)
at
org.apache.solr.handler.component.SpellCheckComponent.inform(SpellCheckComponent.java:636)
at
org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:651)
at org.apache.solr.core.SolrCore.(SolrCore.java:849)
at org.apache.solr.core.SolrCore.(SolrCore.java:641)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:583)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:264)
at org.apache.solr.core.CoreContainer$1.call(CoreContainer.java:256)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)


Please suggest me what to do to remove this warning from my logs.


Thanks,
Jilani


Re: Negative Boosting documents with a certain word

2015-05-02 Thread O. Olson
Thank you very much Chris. I'm sorry I could not get back to you because I
did not have the time to try this.

If I change my query from q=laptops   to 
q=laptops%20(*:*%20-Refurbished)^10%20(*:*%20-Recertified)^10   I get
exactly what I want! Thank you!!

 Is there anyway to handle a list of such words. If I have about 10 to 15
words, this query would keep getting longer and longer. Is there a better
way to handle this?

Right now, I specify the boost for my request handler as:

  .
  ln(qty)
  
 

Is there a way to specify this boost in the Solrconfig.xml?

I tried: (*:* -Refurbished)^10   and I get the
following exception: 

ERROR - 2015-05-01 15:13:41.609; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: org.apache.solr.search.SyntaxError:
Expected identifier at pos 0 str='(*:* -Refurbished)^10'
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:204)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1976)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:368)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
at
org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
at
org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:942)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:1004)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:640)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
at
org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Unknown Source)
Caused by: org.apache.solr.search.SyntaxError: Expected identifier at pos 0
str='(*:* -Refurbished)^10'
at
org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:771)
at
org.apache.solr.search.QueryParsing$StrParser.getId(QueryParsing.java:750)
at
org.apache.solr.search.FunctionQParser.parseValueSource(FunctionQParser.java:345)
at org.apache.solr.search.FunctionQParser.parse(FunctionQParser.java:68)
at org.apache.solr.search.QParser.getQuery(QParser.java:149)
at
org.apache.solr.search.ExtendedDismaxQParser.getMultiplicativeBoosts(ExtendedDismaxQParser.java:448)
at
org.apache.solr.search.ExtendedDismaxQParser.parse(ExtendedDismaxQParser.java:211)
at org.apache.solr.search.QParser.getQuery(QParser.java:149)
at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:147)
... 31 more


I'm using Solr 4.10.3

Thank you once again
O. O.


Chris Hostetter-3 wrote
> https://wiki.apache.org/solr/SolrRelevancyFAQ#How_do_I_give_a_negative_.28or_very_low.29_boost_to_documents_that_match_a_query.3F
> 
> The general principle you need to follow is to boost documents that 

Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-02 Thread Cheng, Sophia Kuen
Hello,

We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade 
caused a incredible slowdown in our searches. We were able to narrow it down to 
the highlighting. The slowdown is extreme enough that we are holding back our 
release until we can resolve this.  Our research indicated using TermVectors & 
FastHighlighter were the way to go, however this still does nothing for the 
performance. I think we may be overlooking a crucial configuration, but cannot 
figure it out. I was hoping for some guidance and help. Sorry for the long 
email, I wanted to provide enough information.

Our documents are largely dynamic fields, and so we have been using ‘*’ as the 
field for highlighting. This is the same setting as in prior versions of solr 
use. The dynamic fields are of type ’text’ and we added customizations to the 
schema.xml for the type ’text’:


  








  
  







  


One of the two dynamic fields we use:



In our solrConfig.xml file, we have:



explicit
 13
 true
 true
   

tvComponent



  

  
100
  


  
70
0.5
[-\w ,/\n\"']{20,200}
  



  


  











  


  



  
10
.,!? 	

  



  
WORD
en
US
  

  


And in our code:

final SolrQuery query = new SolrQuery( luceneQueryStr );
query.setRequestHandler("/eiHandler");
query.setStart( request.getStartIndex() );
query.setRows( request.getMaxResults() );
query.setSort(new SortClause(request.getSortOrder().getFieldName(), 
request.getSortOrder().isAscending()?ORDER.asc:ORDER.desc) );
query.addHighlightField( "*" );
query.setFields( "*", "score" );

Any assistance is greatly appreciated.  Thank you.

Sincerely,
Sophia


solr training

2015-05-02 Thread Tim Dunphy
Hey guys,

 My company has a training budget that it wants me to use. So what I'd like
to find out is if there is any instructor lead courses in the NY/NJ area,
or courses online that are instructor lead that you could recommend?

Thanks,
Tim

-- 
GPG me!!

gpg --keyserver pool.sks-keyservers.net --recv-keys F186197B


SolrJ 5.1 json.facets

2015-05-02 Thread pkulksandeep
How to access resutls of 'json.facets' from solrJ?

I don't see any specific API in QueryResponse. Can I use getBeans API?

Thanks,
Sandeep



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrJ-5-1-json-facets-tp4203509.html
Sent from the Solr - User mailing list archive at Nabble.com.