from:"jaime spicciati"

Re: Upgraded to 4.10.3, highlighting performance unusably slow

2015-05-03 Thread jaime spicciati

We ran into this as well on 4.10.3 (not related to an upgrade). It was
identified during load testing when a small percentage of queries would
take more than 20 seconds to return. We were able to isolate it by
rerunning the same query multiple times and regardless of cache hits the
queries would still take a long time to return. We used this method to
narrow down the performance problem to a small number of very large records
(many many fields in a single record).

We fixed it by turning on hl.requireFieldMatch on the query so that only
fields that have an actual hit are passed through the highlighter.

Hopefully this helps,
Jaime Spicciati

On Sat, May 2, 2015 at 8:20 PM, Joel Bernstein  wrote:

> Hi,
>
> Can you also include the details of your research that narrowed the issue
> to the highlighter?
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Sat, May 2, 2015 at 5:27 PM, Ryan, Michael F. (LNG-DAY) <
> michael.r...@lexisnexis.com> wrote:
>
> > Are you able to identify if there is a particular part of the code that
> is
> > slow?
> >
> > A simple way to do this is to use the jstack command (assuming your
> server
> > has the full JDK installed). You can run it like this:
> > /path/to/java/bin/jstack PID
> >
> > If you run that a bunch of times while your highlight query is running,
> > you might be able to spot the hotspot. Usually I'll do something like
> this
> > to see the stacktrace for the thread running the query:
> > /path/to/java/bin/jstack PID | grep SearchHandler -B30
> >
> > A few more questions:
> > - What are response times you are seeing before and after the upgrade? Is
> > "unusably slow" 1 second, 10 seconds...?
> > - If you run the exact same query multiple times, is it consistently
> slow?
> > Or is it only slow on the first run?
> > - While the query is running, do you see high user CPU on your server, or
> > high IO wait, or both? (You can check this with the top command or vmstat
> > command in Linux.)
> >
> > -Michael
> >
> > -Original Message-
> > From: Cheng, Sophia Kuen [mailto:sophia_ch...@hms.harvard.edu]
> > Sent: Saturday, May 02, 2015 4:13 PM
> > To: solr-user@lucene.apache.org
> > Subject: Upgraded to 4.10.3, highlighting performance unusably slow
> >
> > Hello,
> >
> > We recently upgraded solr from 3.8.0 to 4.10.3.  We saw that this upgrade
> > caused a incredible slowdown in our searches. We were able to narrow it
> > down to the highlighting. The slowdown is extreme enough that we are
> > holding back our release until we can resolve this.  Our research
> indicated
> > using TermVectors & FastHighlighter were the way to go, however this
> still
> > does nothing for the performance. I think we may be overlooking a crucial
> > configuration, but cannot figure it out. I was hoping for some guidance
> and
> > help. Sorry for the long email, I wanted to provide enough information.
> >
> > Our documents are largely dynamic fields, and so we have been using ‘*’
> as
> > the field for highlighting. This is the same setting as in prior versions
> > of solr use. The dynamic fields are of type ’text’ and we added
> > customizations to the schema.xml for the type ’text’:
> >
> >  > storeOffsetsWithPositions="true" termVectors="true" termPositions="true"
> > termOffsets="true">
> >   
> > 
> > 
> > 
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true"/>
> >  > generateNumberParts="1" catenateWords="1" catenateNumbers="1"
> > catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > protected="protwords.txt"/>
> >   
> >   
> > 
> > 
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true"/>
> >  > generateNumberParts="1" catenateWords="0" catenateNumbers="0"
> > catenateAll="0" splitOnCaseChange="1"/>
> > 
> >  > protected="protwords.txt"/>
> >   
> > 
> >
> > One of the two dynamic fields we use:
> >
> >  > stored="true" required="false" multiValued="true"/>
> >
> > In our solrConfig.xml file, we have:
> >
> >   > name="defaults"> explicit
> >  13
> >  true
> >  true
> >
> > 
> > tvComponent
&

Re: Java.net.socketexception: broken pipe Solr 4.10.2

2015-04-14 Thread jaime spicciati

We ran into this during our indexing process running on 4.10.3. After
increasing zookeeper timeouts, client timeouts, socket timeouts,
implementing retry logic on our loading process the thing that worked was
to change the Hard Commit timing. We were performing a Hard Commit every 5
minutes and after a couple hours of loading data some of the shards would
start going down because they would timeout with zookeeper and/or close
connections. Changing the timeouts just moved the problem later in the
ingest process.

Through a combination of decreasing the hard commit timing to 15 seconds,
and migrating to G1 garbage collect, we are able to prevent ingest
failures. For us the periodic stop the world garbage collects were causing
connections to be closed and other nasty things such as zookeeper timeouts
that would cause recovery to kick in. (Soft commits are turned off until
the full ingest/baseline completes). I believe until a Hard Commit is
issued Solr keeps the data in memory which explains why we were
experiencing nasty garbage collects.

The other change we made which may have helped is that we ensured the
socket timeouts were in sync between the jetty instance running Solr and
the SolrJ loading the data. During some of our batch updates Solr would
take a couple minutes to respond back which I believe in some instances the
socket server side would be closed (maxIdleTime setting in Jetty).

Hope this helps,
Jaime Spicciati

Thanks
Jaime

On Tue, Apr 14, 2015 at 9:26 AM, vsilgalis  wrote:

> Right now index size is about 10GB on each shard (yes I could use more
> RAM),
> but I'm looking more for a step up then step down approach.  I will try
> adding more RAM to these machines as my next step.
>
> 1. Zookeeper is external to these boxes in a three node cluster with more
> than enough RAM to keep everything off disk.
>
> 2. os disk cache, when I add more RAM I will just add it as RAM for the
> machine and not to the Java Heap unless that is something you recommend.
>
> 3. java heap looks good so far, GC is minimal as far as i can tell but I
> can
> look into this some more.
>
> 4. we do have 2 cores per machine, but the second core is a joke (10MB)
>
> note: zkClientTimeout is set to 30 for safety's sake.
>
> java settings:
>
> -XX:+CMSClassUnloadingEnabled-XX:+AggressiveOpts-XX:+ParallelRefProcEnabled-XX:+CMSParallelRemarkEnabled-XX:CMSMaxAbortablePrecleanTime=6000-XX:CMSTriggerPermRatio=80-XX:CMSInitiatingOccupancyFraction=50-XX:+UseCMSInitiatingOccupancyOnly-XX:CMSFullGCsBeforeCompaction=1-XX:PretenureSizeThreshold=64m-XX:+CMSScavengeBeforeRemark-XX:ParallelGCThreads=4-XX:ConcGCThreads=4-XX:+UseConcMarkSweepGC-XX:+UseParNewGC-XX:MaxTenuringThreshold=8-XX:TargetSurvivorRatio=90-XX:SurvivorRatio=4-XX:NewRatio=3-XX:-UseSuperWord-Xmx5588m-Xms1596m
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Java-net-socketexception-broken-pipe-Solr-4-10-2-tp4199484p4199561.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Leading Wildcard Support (ReversedWildcardFilterFactory)

2015-02-26 Thread jaime spicciati

Thanks for the quick response.

The index I am currently testing with has the following configuration which
is the default for the text_general_rev

The field type is solr.TextField

maxFractionAsterisk=.33
maxPosAsterisk=3
maxPosQuestion=2
withOriginal=true

Through additional review I think it *might *be working as expected even
though the Analysis tab and debugQuery parsed query lead me to think
otherwise. If I look at the explain plan from the debugQuery and I actually
get a hit, I see word/word(s) that actually come back in reversed order
with the "\u0001 prefix character, so the actual hit against the inverted
index appears to be correct even though the parsed query doesn't reflect
this. Is it safe to say that things are in fact working correctly?

Thanks again



On Thu, Feb 26, 2015 at 3:34 PM, Jack Krupansky 
wrote:

> Please post your field type... or at least confirm a comparison to the
> example in the javadoc:
>
> http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/analysis/ReversedWildcardFilterFactory.html
>
> -- Jack Krupansky
>
> On Thu, Feb 26, 2015 at 2:38 PM, jaime spicciati <
> jaime.spicci...@gmail.com>
> wrote:
>
> > All,
> > I am currently using 4.10.3 running Solr Cloud.
> >
> > I have configured my index analyzer to leverage the
> > solr.ReversedWildcardFilterFactory with various settings for the
> > maxFractionAsterisk, maxPosAsterisk,etc. Currently I am running with the
> > defaults (ie not configured)
> >
> > Using the Analysis capability in the Solr admin I see the "Field Value
> > (Index)" fields going in correctly, both normal order and reversed order.
> > However, on the "Field Value (Query)" side it is not generating a token
> > that is reversed as expected (no matter where I place the * in the
> leading
> > position of the search term). I also confirmed through the Query
> capability
> > with debugQuery turned on that the parsed query is not reversed as
> > expected.
> >
> > From my current understanding you do not need to have anything configured
> > on the index analyzer to make leading wildcards work as expected with the
> > reversedwildcardfilterfactory. The default query parser will know to look
> > at the index analyzer and leverage the ReversedWildcardFilterFactory
> > configuration if the term contains a leading wildcard. (This is what I
> have
> > read)
> >
> > Without uploading my entire configuration to this email I was hoping
> > someone could point me in the right direction because I am at a loss at
> > this point.
> >
> > Thanks!
> >
>

Leading Wildcard Support (ReversedWildcardFilterFactory)

2015-02-26 Thread jaime spicciati

All,
I am currently using 4.10.3 running Solr Cloud.

I have configured my index analyzer to leverage the
solr.ReversedWildcardFilterFactory with various settings for the
maxFractionAsterisk, maxPosAsterisk,etc. Currently I am running with the
defaults (ie not configured)

Using the Analysis capability in the Solr admin I see the "Field Value
(Index)" fields going in correctly, both normal order and reversed order.
However, on the "Field Value (Query)" side it is not generating a token
that is reversed as expected (no matter where I place the * in the leading
position of the search term). I also confirmed through the Query capability
with debugQuery turned on that the parsed query is not reversed as expected.

>From my current understanding you do not need to have anything configured
on the index analyzer to make leading wildcards work as expected with the
reversedwildcardfilterfactory. The default query parser will know to look
at the index analyzer and leverage the ReversedWildcardFilterFactory
configuration if the term contains a leading wildcard. (This is what I have
read)

Without uploading my entire configuration to this email I was hoping
someone could point me in the right direction because I am at a loss at
this point.

Thanks!

Question about session affinity and SolrCloud

2015-02-14 Thread jaime spicciati

All,
This is my current understanding of how SolrCloud load balancing works...

Within SolrCloud, for a cluster with more than 1 shard and at least 1
replica, the Zookeeper aware SolrJ client uses LBHTTPSolrServer which is
round robin across the replicas and leaders in the cluster. In turn the
shard (which can be a leader or replica) that performs the distributed
query may then go to the leader or replica for each shard based on round
robin via LBHTTPSolrServer.

If this is correct then in a SolrCloud instance that has let's say 1
replica, the initial query from the user may go to the leader for shard 1,
then when the user paginates to the second page the subsequent query may go
to the replica of shard 1. This seems inefficient from a caching
perspective where the queryResultCache and possibly the filterCache would
need to be reloaded.

>From what I can find there does not appear to be any option of session
affinity within the SolrCloud query execution?


Thanks!

SolrCloud multi-datacenter failover?

2015-01-02 Thread jaime spicciati

All,

At my current customer we have developed a custom federator that will
federate queries between Endeca and Solr to ease the transition from an
extremely large (TBs of data) Endeca index to Solr. (Endeca is similar to
Solr in terms of search/faceted navigation/etc).



During this transition plan we need to support multi datacenter failover
which we have historically handled via load balancers with the appropriate
failover configurations (think F5). We are currently playing our dataloads
into multiple datacenters to ensure data consistency. (Each datacenter has
a stand-alone instance of solrcloud with its own redundancy/failover)



I am curious to see how the community handles multi datacenter failureover
at the presentation layer (datacenter A goes down and we want to failover
to B). Solrcloud within a datacenter will handle single datacenter failure
within the instance, but in order to support multi datacenter failover I
haven't seen a definitive ‘answer’ as to how to handle this situation.



At this point the only two options I can come up with are

1) Fail the entire datacenter if Solrcloud goes offline (GUI/index/etc go
offline)

 - This is problematic because some portion of user activity will fail,
queries that are in transit will not complete

2) Implement failover at the custom federator level. In doing so we would
need to detect a failure at datacenter A within our federator, then query
datacenter B to fulfill the user request, then potentially fail the entire
datacenter A once all transactions have been fulfilled against A



Since we are looking up the active solr instance via zookeeper (solrcloud)
per datacenter I don’t see any reasonable means of failing over to another
datacenter if a given solrcloud instance goes down?


Any thoughts are welcome at this point?

Thanks

Jaime

Re: Upgraded to 4.10.3, highlighting performance unusably slow

Re: Java.net.socketexception: broken pipe Solr 4.10.2

Re: Leading Wildcard Support (ReversedWildcardFilterFactory)

Leading Wildcard Support (ReversedWildcardFilterFactory)

Question about session affinity and SolrCloud

SolrCloud multi-datacenter failover?

6 matches

Site Navigation

Mail list logo

Footer information