Best way to track cumulative GC pauses in Solr

2015-11-13 Thread Tom Evans
Hi all

We have some issues with our Solr servers spending too much time
paused doing GC. From turning on gc debug, and extracting numbers from
the GC log, we're getting an idea of just how much of a problem.

I'm currently doing this in a hacky, inefficient way:

grep -h 'Total time for which application threads were stopped:' solr_gc* \
| awk '($11 > 0.3) { print $1, $11 }' \
| sed 's#:.*:##' \
| sort -n \
| sum_by_date.py

(Yes, I really am using sed, grep and awk all in one line. Just wrong :)

The "sum_by_date.py" program simply adds up all the values with the
same first column, and remembers the largest value seen. This is
giving me the cumulative GC time for extended pauses (over 0.5s), and
the maximum pause seen in a given time period (hourly), eg:

2015-11-13T11 119.124037 2.203569
2015-11-13T12 184.683309 3.156565
2015-11-13T13 65.934526 1.978202
2015-11-13T14 63.970378 1.411700


This is fine for seeing that we have a problem. However, really I need
to get this in to our monitoring systems - we use munin. I'm
struggling to work out the best way to extract this information for
our monitoring systems, and I think this might be my naivety about
Java, and working out what should be logged.

I've turned on JMX debugging, and looking at the different beans
available using jconsole, but I'm drowning in information. What would
be the best thing to monitor?

Ideally, like the stats above, I'd like to know the cumulative time
spent paused in GC since the last poll, and the longest GC pause that
we see. munin polls every 5 minutes, are there suitable counters
exposed by JMX that it could extract?

Thanks in advance

Tom


HELP!!!!

2015-11-13 Thread Alfredo Vega Ramírez
Greetings, I'm new using Solr. I have problem to create a client 
application. As I do, if I need to use frameworks, or Solr has a way to 
create client applications.


Re: Best way to track cumulative GC pauses in Solr

2015-11-13 Thread Shawn Heisey
On 11/13/2015 8:00 AM, Tom Evans wrote:
> We have some issues with our Solr servers spending too much time
> paused doing GC. From turning on gc debug, and extracting numbers from
> the GC log, we're getting an idea of just how much of a problem.

Try loading your gc log into gcviewer.

https://github.com/chewiebug/GCViewer/releases

Here's a screenshot of this in action with a gc log from Solr loaded:

https://www.dropbox.com/s/orwt0fcmii5691l/solr-gc-gcviewer-1.35-snapshot.png?dl=0

This screenshot is from a snapshot build including a feature request
that I made:

https://github.com/chewiebug/GCViewer/issues/139

If you use the 1.34.1 version, you will not see some of the numbers
shown in my screenshot, but the info you asked for, accumulated GC
pauses, IS included in that version.

Thanks,
Shawn



Re: HELP!!!!

2015-11-13 Thread Alfredo Vega Ramírez
Greetings, I'm new using Solr. I have problem to create a client 
application. As I do, if I need to use frameworks, or Solr has a way to 
create client applications.




Re: HELP!!!!

2015-11-13 Thread Alexandre Rafalovitch
Welcome to the Solr world.

Yes, usually you use a client application. If you are working in Java,
you use SolrJ or you can look into Spring Data. For other languages,
there are libraries too. You can see a reasonable list at:
https://wiki.apache.org/solr/IntegratingSolr . Be aware that not all
clients support all the latest features of Solr.

You do not want to expose Solr directly to the clients, so no
Javascript talking directly to Solr in production (unless you really,
really know what you are doing).

To learn Solr itself, do the examples, check out the books (e.g. Solr
in Action).

Once you are past the absolutely basics, you will have much more
detailed questions to ask.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 13 November 2015 at 11:14, Alfredo Vega Ramírez
 wrote:
> Greetings, I'm new using Solr. I have problem to create a client
> application. As I do, if I need to use frameworks, or Solr has a way to
> create client applications.


Re: Best way to track cumulative GC pauses in Solr

2015-11-13 Thread Walter Underwood
Also, what GC settings are you using? We may be able to make some suggestions.

Cumulative GC pauses aren’t very interesting to me. I’m more interested in the 
longest ones, 90th percentile, 95th, etc.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Nov 13, 2015, at 8:32 AM, Shawn Heisey  wrote:
> 
> On 11/13/2015 8:00 AM, Tom Evans wrote:
>> We have some issues with our Solr servers spending too much time
>> paused doing GC. From turning on gc debug, and extracting numbers from
>> the GC log, we're getting an idea of just how much of a problem.
> 
> Try loading your gc log into gcviewer.
> 
> https://github.com/chewiebug/GCViewer/releases
> 
> Here's a screenshot of this in action with a gc log from Solr loaded:
> 
> https://www.dropbox.com/s/orwt0fcmii5691l/solr-gc-gcviewer-1.35-snapshot.png?dl=0
> 
> This screenshot is from a snapshot build including a feature request
> that I made:
> 
> https://github.com/chewiebug/GCViewer/issues/139
> 
> If you use the 1.34.1 version, you will not see some of the numbers
> shown in my screenshot, but the info you asked for, accumulated GC
> pauses, IS included in that version.
> 
> Thanks,
> Shawn
> 



Re: DocValues error

2015-11-13 Thread Anshum Gupta
Hi Devansh,

Yes you'd need to reindex your data in order to use DocValues. It's
highlighted here @ the official ref guide :

https://cwiki.apache.org/confluence/display/solr/DocValues

On Fri, Nov 13, 2015 at 10:00 AM, Dhutia, Devansh 
wrote:

> We have an existing collection with a field called lastpublishdate of type
> tdate. It already has a lot of data indexed, and we want to add docValues
> to improve our sorting performance on the field.
>
> The old field definition was:
>
>  
>
> We we recently changed it to
>
>   docValues="true"/>
>
> Is that considered a breaking change? Upon deploying the schema &
> reloading the collection, sorting on the field fails the following error:
>
> unexpected docvalues type NONE for field 'lastpublishdate'
> (expected=NUMERIC). Use UninvertingReader or index with docvalues.
>
> Do we really need to wipe & rebuild the entire index to add docValues to
> an existing dataset?
>
> Thanks
>



-- 
Anshum Gupta


DocValues error

2015-11-13 Thread Dhutia, Devansh
We have an existing collection with a field called lastpublishdate of type 
tdate. It already has a lot of data indexed, and we want to add docValues to 
improve our sorting performance on the field.

The old field definition was:

 

We we recently changed it to

 

Is that considered a breaking change? Upon deploying the schema & reloading the 
collection, sorting on the field fails the following error:

unexpected docvalues type NONE for field 'lastpublishdate' (expected=NUMERIC). 
Use UninvertingReader or index with docvalues.

Do we really need to wipe & rebuild the entire index to add docValues to an 
existing dataset?

Thanks


Solr DIH CachedSqlEntityProcessor question

2015-11-13 Thread Nilesh Maheshwari
Hi Gurus,

I am trying to use the Solr DIH CachedSqlEntityProcessor. In my case, I
also need to reference another column from the parent entity in the child
entity. How can that be done?








Observe the product_name column from the parent entity that I am trying to
use in the child entity.

This does not give me any error but produces an incorrect end result. ..

Any thoughts?

Regards,

Nilesh


Re: DevOps question : auto deployment/setup of Solr & Zookeeper on medium-large clusters

2015-11-13 Thread Susheel Kumar
Hi Davis,  I wanted to thank you for suggesting Ansible as one of the
automation tool as it has been working very well in automating the
deployments of Zookeeper, Solr on our clusters.

Thanks,
Susheel

On Wed, Oct 21, 2015 at 10:47 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> Susheel,
>
> Our puppet stuff is very close to our infrastructure, using specific
> Netapp volumes and such, and assuming some files come from NFS.
> It is also personally embarrassing to me that we still use NIS - doh!
>
> -Original Message-
> From: Susheel Kumar [mailto:susheel2...@gmail.com]
> Sent: Tuesday, October 20, 2015 8:34 PM
> To: solr-user@lucene.apache.org
> Subject: Re: DevOps question : auto deployment/setup of Solr & Zookeeper
> on medium-large clusters
>
> Thanks, Davis, Jeff.
>
> We are not using AWS.  Is there any scripts/framework already developed
> using puppet available?
>
> On Tue, Oct 20, 2015 at 7:59 PM, Jeff Wartes 
> wrote:
>
> >
> > If you’re using AWS, there’s this:
> > https://github.com/LucidWorks/solr-scale-tk
> > If you’re using chef, there’s this:
> > https://github.com/vkhatri/chef-solrcloud
> >
> > (There are several other chef cookbooks for Solr out there, but this
> > is the only one I’m aware of that supports Solr 5.3.)
> >
> > For ZK, I’m less familiar, but if you’re using chef there’s this:
> > https://github.com/SimpleFinance/chef-zookeeper
> > And this might be handy to know about too:
> > https://github.com/Netflix/exhibitor/wiki
> >
> >
> > On 10/20/15, 6:37 AM, "Davis, Daniel (NIH/NLM) [C]"
> > 
> > wrote:
> >
> > >Waste of money in my opinion.   I would point you towards other tools -
> > >bash scripts and free configuration managers such as puppet, chef, salt,
> > >or ansible.Depending on what development you are doing, you may want
> > >a continuous integration environment.   For a small company starting
> out,
> > >using a free CI, maybe SaaS, is a good choice.   A professional version
> > >such as Bamboo, TeamCity, Jenkins are almost essential in a large
> > >enterprise if you are doing diverse builds.
> > >
> > >When you create a VM, you can generally specify a script to run after
> the
> > >VM is mostly created.   There is a protocol (PXE Boot) that enables this
> > >- a PXE server listens and hears that a new server with such-and-such
> > >Ethernet Address is starting.   The PXE server makes it boot like a
> > >CD-ROM/DVD install, booting from installation media on the network and
> > >installing.Once that install is down, a custom script may be
> invoked.
> > >  This script is typically a bash script, because you may not be able to
> > >count on too much else being installed.   However, python/perl are also
> > >reasonable choices - just be careful that the modules/libraries you are
> > >using for the script are present.The same PXE protocol is used in
> > >large on-premises installations (vCenter) and in the cloud
> > >(AWS/Digital Ocean).  We don't care about the PXE server - the point
> > >is that you can generally run a bash script after your install.
> > >
> > >The bash script can bootstrap other services such as puppet, chef, or
> > >salt, and/or setup keys so that push configuration management tools such
> > >as ansible can reach the server.   The bash script may even be smart
> > >enough to do all of the setup you need, depending on what other servers
> > >you need to configure.   Smart bash scripts are good for a small
> company,
> > >but for large setups, I'd use puppet, chef, salt, and/or ansible.
> > >
> > >What I tend to do is to deploy things in such a way that puppet
> > >(because it is what we use here) can setup things so that a "solradm"
> > >account can setup everything else, and solr and zookeeper are running
> as a "solrapp"
> > >user using puppet.Then, my continuous integration server, which is
> > >Atlassian Bamboo (you can also use tools such as Jenkins, TeamCity,
> > >BuildBot), installs solr as "solradm" and sets it up to run as
> "solrapp".
> > >
> > >I am not a systems administrator, and I'm not really in "DevOps", my
> > >job is to be above all of that and do "systems architecture" which I
> > >am lucky still involves coding both in system administration and
> applications
> > >development.   So, that's my 2 cents.
> > >
> > >Dan Davis, Systems/Applications Architect (Contractor), Office of
> > >Computer and Communications Systems, National Library of Medicine,
> > >NIH
> > >
> > >-Original Message-
> > >From: Susheel Kumar [mailto:susheel2...@gmail.com]
> > >Sent: Tuesday, October 20, 2015 9:19 AM
> > >To: solr-user@lucene.apache.org
> > >Subject: DevOps question : auto deployment/setup of Solr & Zookeeper
> > >on medium-large clusters
> > >
> > >Hello,
> > >
> > >Resending to see opinion from Dev-Ops perspective on the tools for
> > >installing/deployment of Solr & ZK on large no of machines and
> > >maintaining them. I have heard Bladelogic or HP OO 

RE: HELP!!!!

2015-11-13 Thread Mark Horninger
It depends on what you want to do with it.  There are tons of ways to skin the 
proverbial cat, and it all depends on what you want to do with it.  You really 
need to start by learning the very basics of Solr, and then move forward from 
there.  Once you understand Solr, you must also understand your data at least 
in a limited fashion.  You will need to use that understanding in order to make 
the correct decisions around your schema design, as well as the correct query 
engine.

Only then should you attempt to write a client application for Solr, otherwise 
you are guaranteed failure.

--Mark H.

-Original Message-
From: Alfredo Vega Ramírez [mailto:alfredo.v...@vertice.cu]
Sent: Friday, November 13, 2015 11:28 AM
To: solr-user@lucene.apache.org
Subject: Re: HELP

Greetings, I'm new using Solr. I have problem to create a client application. 
As I do, if I need to use frameworks, or Solr has a way to create client 
applications.

[GrayHair]
GHS Confidentiality Notice

This e-mail message, including any attachments, is for the sole use of the 
intended recipient(s) and may contain confidential and privileged information. 
Any unauthorized review, use, disclosure or distribution of this information is 
prohibited, and may be punishable by law. If this was sent to you in error, 
please notify the sender by reply e-mail and destroy all copies of the original 
message.

GrayHair Software 



DIH Caching w/ BerkleyBackedCache

2015-11-13 Thread Todd Long
We currently index using DIH along with the SortedMapBackedCache cache
implementation which has worked well until recently when we needed to index
a much larger table. We were running into memory issues using the
SortedMapBackedCache so we tried switching to the BerkleyBackedCache but
appear to have some configuration issues. I've included our basic setup
below. The issue we're running into is that it appears the Berkley database
is evicting database files (see message below) before they've completed.
When I watch the cache directory I only ever see two database files at a
time with each one being ~1GB in size (this appears to be hard coded). Is
there some additional configuration I'm missing to prevent the process from
"cleaning" up database files before the index has finished? I think this
"cleanup" continues to kickoff the caching which never completes... without
caching the indexing is ~2 hours. Any help would be greatly appreciated.
Thanks.

Cleaning message: "Chose lowest utilized file for cleaning. fileChosen: 0x0
..."


  

  


  


  




--
View this message in context: 
http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: fl=value equals?

2015-11-13 Thread simon
Please do push your script to github - I (re)-compile custom code
infrequently and never remember how to setup the environment.

On Thu, Nov 12, 2015 at 5:14 AM, Upayavira  wrote:

> Okay, makes sense. As to your question - making a new ValueSourceParser
> that handles 'equals' sounds pretty straight-forward.
>
> If it helps, I have somewhere an Ant project that will unpack Solr and
> compile custom components against it. I could push that to github or
> something.
>
> Upayavira
>
> On Thu, Nov 12, 2015, at 07:59 AM, billnb...@gmail.com wrote:
> > fl=$b tells me it works. Or I can do a sort=$b asc
> >
> > The idea is to calculate a score but only include geo if it is not a
> > national search. Do we want to send in a parameter into the QT which
> > allows us to omit geo from national searches
> >
> >
> > Bill Bell
> > Sent from mobile
> >
> > > On Nov 11, 2015, at 1:15 AM, Upayavira  wrote:
> > >
> > > I concur with Jan - what does b= do?
> > >
> > > Also asking, how did you identify that it worked?
> > >
> > > Upayavira
> > >
> > >> On Wed, Nov 11, 2015, at 02:58 AM, William Bell wrote:
> > >> I was able to get it to work kinda with a map().
> > >>
> > >> http://localhost:8983/solr/select?q=*:*=1=
> > >> <
> http://localhost:8983/solr/select?q=*:*=national=if(equals($radius,%27national%27),0,geodist())
> >
> > >> map($radius,1,1,0,geodist())
> > >>
> > >> Where 1= National
> > >>
> > >> Do you have an example of a SearchComponent? It would be pretty easy
> to
> > >> copy map() and develop an equals() right?
> > >>
> > >> if(equals($radius, 'national'), 0, geodist())
> > >>
> > >> This would probably be useful for everyone.
> > >>
> > >> On Tue, Nov 10, 2015 at 4:05 PM, Jan Høydahl 
> > >> wrote:
> > >>
> > >>> Where is your “b” parameter used? I think that instead of trying to
> set a
> > >>> new “b” http param (which solr will not evaluate as a function), you
> should
> > >>> instead try to insert your function or switch qParser directly where
> the
> > >>> “b” param is used, e.g. in a bq or similar.
> > >>>
> > >>> A bit heavy weight, but you could of course write a custom
> SearchComponent
> > >>> to construct your “b” parameter...
> > >>>
> > >>> --
> > >>> Jan Høydahl, search solution architect
> > >>> Cominvent AS - www.cominvent.com
> > >>>
> >  10. nov. 2015 kl. 23.52 skrev William Bell :
> > 
> >  We are trying to look at a value, and change another value based on
> that.
> > 
> >  For example, for national search we want to pass in
> radius=national, and
> >  then set another variable equal to 0, else set the other variable =
> to
> >  geodist() calculation.
> > 
> >  We tried {!switch} but this only appears to work on fq/q. There is
> no
> >  function for constants for equals
> > >>>
> http://localhost:8983/solr/select?q=*:*=national=if(equals($radius,'national'),0,geodist())
> > 
> >  This does not work:
> > 
> >  http://localhost:8983/solr/select?q=*:*=national={!switch
> >  case.national=0 default=geodist() v=$radius}
> > 
> >  Ideas?
> > 
> > 
> > 
> >  --
> >  Bill Bell
> >  billnb...@gmail.com
> >  cell 720-256-8076
> > >>
> > >>
> > >> --
> > >> Bill Bell
> > >> billnb...@gmail.com
> > >> cell 720-256-8076
>


Re: Solr 5.3 spellcheck always return lower case?

2015-11-13 Thread Erick Erickson
Let's see
1> the fieldType. Possibly you're missing something there
2> The fact that you see the doc return without lowercasing means
nothing, it's returning the _stored_ field which is a verbatim copy.
The spellcheck is returning an _indexed_ value.

Best,
Erick

On Fri, Nov 13, 2015 at 5:39 AM, QuestionNews .  wrote:
> The data displayed when doing a query is correct case. The fieldType
> doesn't do any case manipulation and the requestHandler/searchComponent
> don't have any settings declared that I can see.
>
> Why is my spellcheck returning results that are all lower case?
>
> Is there a way for me to stop this from happening or have spellcheck return
> an additional field.
>
> Thanks for your help and pardon me if I am not using this mailing list
> properly.  It is my first time utilizing it.


Re: DocValues error

2015-11-13 Thread Dhutia, Devansh
Ugh! I totally missed the highlight. 

Thanks for clarifying. 




On 11/13/15, 1:07 PM, "Anshum Gupta"  wrote:

>Hi Devansh,
>
>Yes you'd need to reindex your data in order to use DocValues. It's
>highlighted here @ the official ref guide :
>
>https://cwiki.apache.org/confluence/display/solr/DocValues
>
>On Fri, Nov 13, 2015 at 10:00 AM, Dhutia, Devansh 
>wrote:
>
>> We have an existing collection with a field called lastpublishdate of type
>> tdate. It already has a lot of data indexed, and we want to add docValues
>> to improve our sorting performance on the field.
>>
>> The old field definition was:
>>
>>  
>>
>> We we recently changed it to
>>
>>  > docValues="true"/>
>>
>> Is that considered a breaking change? Upon deploying the schema &
>> reloading the collection, sorting on the field fails the following error:
>>
>> unexpected docvalues type NONE for field 'lastpublishdate'
>> (expected=NUMERIC). Use UninvertingReader or index with docvalues.
>>
>> Do we really need to wipe & rebuild the entire index to add docValues to
>> an existing dataset?
>>
>> Thanks
>>
>
>
>
>-- 
>Anshum Gupta


Disabling Query result cache at runtime

2015-11-13 Thread KNitin
Hi,

 Is there a way to make solr not cache the results when we send the query?
(mainly for query result cache). I need to still enable doc and filter
caching.

Let me know if this is possible,

Thanks
Nitin


Re: Disabling Query result cache at runtime

2015-11-13 Thread Erick Erickson
Why do you want to do this? Worst-case testing?

But you can always set your size parameter for the queryResultCache to 0.

On Fri, Nov 13, 2015 at 10:31 AM, KNitin  wrote:
> Hi,
>
>  Is there a way to make solr not cache the results when we send the query?
> (mainly for query result cache). I need to still enable doc and filter
> caching.
>
> Let me know if this is possible,
>
> Thanks
> Nitin


Re: Boost query at search time according set of roles with least performance impact

2015-11-13 Thread Andrea Open Source
Hi Alessandro,
Thanks for answering. Unfortunately bq is not enough as I have several roles 
that I need to score in different ways. I was thinking of building a custom 
function that reads the weights of the roles from solr config and applies them 
at runtime. I am a bit concerned about performance though and that's the reason 
behind my question. What's your thought about such solution?

King Regards,
Andrea Roggerone

> On 09/nov/2015, at 12:29, Alessandro Benedetti  wrote:
> 
> ehehe your request is kinda delicate :
> 1)  I can't store the
> payload at index time
> 2) Passing all the weights at query time is not an option
> 
> So you seem to exclude all the possible solutions ...
> Anyway, just thinking loud, have you tried the edismax query parser and the
> boost query feature?
> 
> 1) the first strategy is the one you would prefer to avoid :
> you define the AuthorRole, then you use the Boost Query parameter to boost
> differently your roles :
> AuthorRole:"ADMIN"^100 AuthorRole:"ARCHITECT"^50 ect ...
> If you have 20 roles , the query could be not readable.
> 
> 2) you index the "weight" for the role in the original document.
> The you use a Boost Function according to your requirement ( using there
> "weight" field)
> 
> Hope this helps,
> 
> Cheers
> 
> e.g. from the Solr wiki
> The bq (Boost Query) Parameter
> 
> The bq parameter specifies an additional, optional, query clause that will
> be added to the user's main query to influence the score. For example, if
> you wanted to add a relevancy boost for recent documents:
> q=cheese
> bq=date:[NOW/DAY-1YEAR TO NOW/DAY]
> 
> You can specify multiple bq parameters. If you want your query to be parsed
> as separate clauses with separate boosts, use multiple bq parameters.
> The bf (Boost Functions) Parameter
> 
> The bf parameter specifies functions (with optional boosts) that will be
> used to construct FunctionQueries which will be added to the user's main
> query as optional clauses that will influence the score. Any function
> supported natively by Solr can be used, along with a boost value. For
> example:
> recip(rord(myfield),1,2,3)^1.5
> 
> Specifying functions with the bf parameter is essentially just shorthand
> for using the bq param combined with the {!func} parser.
> 
> For example, if you want to show the most recent documents first, you could
> use either of the following:
> bf=recip(rord(creationDate),1,1000,1000)
>  ...or...
> bq={!func}recip(rord(creationDate),1,1000,1000)
> 
> On 6 November 2015 at 16:44, Andrea Roggerone <
> andrearoggerone.o...@gmail.com> wrote:
> 
>> Hi all,
>> I am working on a mechanism that applies additional boosts to documents
>> according to the role covered by the author. For instance we have
>> 
>> CEO|5 Architect|3 Developer|1 TeamLeader|2
>> 
>> keeping in mind that an author could cover multiple roles (e.g. for a
>> design document, a Team Leader could be also a Developer).
>> 
>> I am aware that is possible to implement a function that leverages
>> payloads, however the weights need to be configurable so I can't store the
>> payload at index time.
>> Passing all the weights at query time is not an option as we have more than
>> 20 roles and query readability and performance would be heavily affected.
>> 
>> Do we have any "out of the box mechanism" in Solr to implement the
>> described behavior? If not, what other options do we have?
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England


Re: Best way to track cumulative GC pauses in Solr

2015-11-13 Thread Otis Gospodnetic
Hi Tom,

SPM for SOLR should be helpful here. See http://sematext.com/spm

Otis

 

> On Nov 13, 2015, at 10:00, Tom Evans  wrote:
> 
> Hi all
> 
> We have some issues with our Solr servers spending too much time
> paused doing GC. From turning on gc debug, and extracting numbers from
> the GC log, we're getting an idea of just how much of a problem.
> 
> I'm currently doing this in a hacky, inefficient way:
> 
> grep -h 'Total time for which application threads were stopped:' solr_gc* \
>| awk '($11 > 0.3) { print $1, $11 }' \
>| sed 's#:.*:##' \
>| sort -n \
>| sum_by_date.py
> 
> (Yes, I really am using sed, grep and awk all in one line. Just wrong :)
> 
> The "sum_by_date.py" program simply adds up all the values with the
> same first column, and remembers the largest value seen. This is
> giving me the cumulative GC time for extended pauses (over 0.5s), and
> the maximum pause seen in a given time period (hourly), eg:
> 
> 2015-11-13T11 119.124037 2.203569
> 2015-11-13T12 184.683309 3.156565
> 2015-11-13T13 65.934526 1.978202
> 2015-11-13T14 63.970378 1.411700
> 
> 
> This is fine for seeing that we have a problem. However, really I need
> to get this in to our monitoring systems - we use munin. I'm
> struggling to work out the best way to extract this information for
> our monitoring systems, and I think this might be my naivety about
> Java, and working out what should be logged.
> 
> I've turned on JMX debugging, and looking at the different beans
> available using jconsole, but I'm drowning in information. What would
> be the best thing to monitor?
> 
> Ideally, like the stats above, I'd like to know the cumulative time
> spent paused in GC since the last poll, and the longest GC pause that
> we see. munin polls every 5 minutes, are there suitable counters
> exposed by JMX that it could extract?
> 
> Thanks in advance
> 
> Tom


Re: DIH Caching w/ BerkleyBackedCache

2015-11-13 Thread Mikhail Khludnev
Hello Todd,

"External merge" join helps to avoid boilerplate caching in such simple
cases.

it should be something

  

  




On Fri, Nov 13, 2015 at 10:54 PM, Todd Long  wrote:

> We currently index using DIH along with the SortedMapBackedCache cache
> implementation which has worked well until recently when we needed to index
> a much larger table. We were running into memory issues using the
> SortedMapBackedCache so we tried switching to the BerkleyBackedCache but
> appear to have some configuration issues. I've included our basic setup
> below. The issue we're running into is that it appears the Berkley database
> is evicting database files (see message below) before they've completed.
> When I watch the cache directory I only ever see two database files at a
> time with each one being ~1GB in size (this appears to be hard coded). Is
> there some additional configuration I'm missing to prevent the process from
> "cleaning" up database files before the index has finished? I think this
> "cleanup" continues to kickoff the caching which never completes... without
> caching the indexing is ~2 hours. Any help would be greatly appreciated.
> Thanks.
>
> Cleaning message: "Chose lowest utilized file for cleaning. fileChosen: 0x0
> ..."
>
> 
>   
>
>   
> query="select ID, tp.* from TABLE_PARENT tp">
>
> query="select ID, NAME, VALUE from TABLE_CHILD"
>
> cacheImpl="org.apache.solr.handler.dataimport.BerkleyBackedCache"
>  cacheKey="ID"
>  cacheLookup="parent.ID"
>  persistCacheName="CHILD"
>  persistCacheBaseDir="/some/cache/dir"
>  persistCacheFieldNames="ID,NAME,VALUE"
>  persistCacheFieldTypes="STRING,STRING,STRING"
>  berkleyInternalCacheSize="100"
>  berkleyInternalShared="true" />
>
> 
>   
> 
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/DIH-Caching-w-BerkleyBackedCache-tp4240142.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





FastVectorHighlighter problem with large fields

2015-11-13 Thread Philippe Soares
Hi,
I have documents with very large fields which I want to highlight.
When a word hits in the far end of those fields, the FastVectorHighlighter
is unable to pull any fragment.
I'm able to pull fragments using the original highlighter and setting
maxAnalyzedChars to a very high value, but I can't find any combination
that would make this work with the FastVectorHighlighter.

I tried setting fragSize = 0 and fragListBuilder = "single", but it's still
not returning any snippet. Also my requirement is to pull a single smaller
fragment (say around 300 chars) even if the match is at the end of a large
field.

I don't want to use the original highlighter because it's putting the 
tags around each highlighted word instead of around the first and last
highlighted word.

Is there any way to achieve this with the FVH ?

Thanks in advance for any help you could provide.

Philippe


Issues in export handler

2015-11-13 Thread Ray Niu
Hello:
I am seeing following Exception during call export handler, is anyone
familiar with it?
at org.apache.lucene.util.BitSetIterator.(BitSetIterator.java:58)
at
org.apache.solr.response.SortingResponseWriter.write(SortingResponseWriter.java:138)
at
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:53)
at org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:727)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:459)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:227)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:196)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242)
at
com.stubhub.newplatform.http.filter.DyeHttpFilter.doFilter(DyeHttpFilter.java:33)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:274)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:242)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:275)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:161)
at
org.jboss.web.tomcat.security.SecurityAssociationValve.invoke(SecurityAssociationValve.java:181)
at
org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.event(CatalinaContext.java:285)
at
org.jboss.modcluster.catalina.CatalinaContext$RequestListenerValve.invoke(CatalinaContext.java:261)
at
org.jboss.web.tomcat.security.JaccContextValve.invoke(JaccContextValve.java:88)
at
org.jboss.web.tomcat.security.SecurityContextEstablishmentValve.invoke(SecurityContextEstablishmentValve.java:100)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:159)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.jboss.web.tomcat.service.jca.CachedConnectionValve.invoke(CachedConnectionValve.java:158)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:567)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.jboss.web.tomcat.service.request.ActiveRequestResponseCacheValve.invoke(ActiveRequestResponseCacheValve.java:53)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:362)
at org.apache.coyote.ajp.AjpAprProcessor.process(AjpAprProcessor.java:489)
at
org.apache.coyote.ajp.AjpAprProtocol$AjpConnectionHandler.process(AjpAprProtocol.java:452)
at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:2019)
at java.lang.Thread.run(Thread.java:745)


Re: Disabling Query result cache at runtime

2015-11-13 Thread KNitin
Yes for worst case analysis. I usually do this by setting it in zk config
but wanted to check if we can do this at runtime. We tried the
q={!cache=false} but it does not help.

On Fri, Nov 13, 2015 at 12:53 PM, Mikhail Khludnev <
mkhlud...@griddynamics.com> wrote:

> Does q={!cache=false}foo:bar&... work in this case?
>
> On Fri, Nov 13, 2015 at 9:31 PM, KNitin  wrote:
>
> > Hi,
> >
> >  Is there a way to make solr not cache the results when we send the
> query?
> > (mainly for query result cache). I need to still enable doc and filter
> > caching.
> >
> > Let me know if this is possible,
> >
> > Thanks
> > Nitin
> >
>
>
>
> --
> Sincerely yours
> Mikhail Khludnev
> Principal Engineer,
> Grid Dynamics
>
> 
> 
>


Re: Disabling Query result cache at runtime

2015-11-13 Thread Mikhail Khludnev
Does q={!cache=false}foo:bar&... work in this case?

On Fri, Nov 13, 2015 at 9:31 PM, KNitin  wrote:

> Hi,
>
>  Is there a way to make solr not cache the results when we send the query?
> (mainly for query result cache). I need to still enable doc and filter
> caching.
>
> Let me know if this is possible,
>
> Thanks
> Nitin
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





CloudSolrClient Connect To Zookeeper with ACL Protected files

2015-11-13 Thread Kevin Lee
Hi,

Is there a way to use CloudSolrClient and connect to a Zookeeper instance where 
ACL is enabled and resources/files like /live_nodes, etc are ACL protected?  
Couldn’t find a way to set the ACL credentials.

Thanks,
Kevin

Compression for solrbin?

2015-11-13 Thread Gregg Donovan
We've had success with LZ4 compression in a custom ShardHandler to reduce
network overhead, getting ~25% compression with low CPU impact. LZ4 or
Snappy seem like reasonable choices[1] for maximizing compression +
transfer + decompression times in the data center.

Would it make sense to integrate compression into javabin itself? For the
ShardHandler and transaction log javabin usage it seems to make sense. We
could flip on gzip in Jetty for HTTP, but GZIP may add more CPU than is
desirable and wouldn't help with the transaction log.

If we did, i t seems incrementing the javabin version[2] and
compressing/decompressing inside of JavaBinCodec#marshal[3] and
JavaBinCodec#unmarshal[4] would allow us to retain backwards compatibility
with older clients or existing files.

Thoughts?

--Gregg

[1] http://cyan4973.github.io/lz4/#tab-2
[2]
https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L83
[3]
https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L112:L120
[4]
https://github.com/apache/lucene-solr/blob/trunk/solr/solrj/src/java/org/apache/solr/common/util/JavaBinCodec.java#L129:L137


Re: fl=value equals?

2015-11-13 Thread William Bell
How about we just add a new function called equals() and put into the
solution?

On Fri, Nov 13, 2015 at 11:36 AM, simon  wrote:

> Please do push your script to github - I (re)-compile custom code
> infrequently and never remember how to setup the environment.
>
> On Thu, Nov 12, 2015 at 5:14 AM, Upayavira  wrote:
>
> > Okay, makes sense. As to your question - making a new ValueSourceParser
> > that handles 'equals' sounds pretty straight-forward.
> >
> > If it helps, I have somewhere an Ant project that will unpack Solr and
> > compile custom components against it. I could push that to github or
> > something.
> >
> > Upayavira
> >
> > On Thu, Nov 12, 2015, at 07:59 AM, billnb...@gmail.com wrote:
> > > fl=$b tells me it works. Or I can do a sort=$b asc
> > >
> > > The idea is to calculate a score but only include geo if it is not a
> > > national search. Do we want to send in a parameter into the QT which
> > > allows us to omit geo from national searches
> > >
> > >
> > > Bill Bell
> > > Sent from mobile
> > >
> > > > On Nov 11, 2015, at 1:15 AM, Upayavira  wrote:
> > > >
> > > > I concur with Jan - what does b= do?
> > > >
> > > > Also asking, how did you identify that it worked?
> > > >
> > > > Upayavira
> > > >
> > > >> On Wed, Nov 11, 2015, at 02:58 AM, William Bell wrote:
> > > >> I was able to get it to work kinda with a map().
> > > >>
> > > >> http://localhost:8983/solr/select?q=*:*=1=
> > > >> <
> >
> http://localhost:8983/solr/select?q=*:*=national=if(equals($radius,%27national%27),0,geodist())
> > >
> > > >> map($radius,1,1,0,geodist())
> > > >>
> > > >> Where 1= National
> > > >>
> > > >> Do you have an example of a SearchComponent? It would be pretty easy
> > to
> > > >> copy map() and develop an equals() right?
> > > >>
> > > >> if(equals($radius, 'national'), 0, geodist())
> > > >>
> > > >> This would probably be useful for everyone.
> > > >>
> > > >> On Tue, Nov 10, 2015 at 4:05 PM, Jan Høydahl  >
> > > >> wrote:
> > > >>
> > > >>> Where is your “b” parameter used? I think that instead of trying to
> > set a
> > > >>> new “b” http param (which solr will not evaluate as a function),
> you
> > should
> > > >>> instead try to insert your function or switch qParser directly
> where
> > the
> > > >>> “b” param is used, e.g. in a bq or similar.
> > > >>>
> > > >>> A bit heavy weight, but you could of course write a custom
> > SearchComponent
> > > >>> to construct your “b” parameter...
> > > >>>
> > > >>> --
> > > >>> Jan Høydahl, search solution architect
> > > >>> Cominvent AS - www.cominvent.com
> > > >>>
> > >  10. nov. 2015 kl. 23.52 skrev William Bell :
> > > 
> > >  We are trying to look at a value, and change another value based
> on
> > that.
> > > 
> > >  For example, for national search we want to pass in
> > radius=national, and
> > >  then set another variable equal to 0, else set the other variable
> =
> > to
> > >  geodist() calculation.
> > > 
> > >  We tried {!switch} but this only appears to work on fq/q. There is
> > no
> > >  function for constants for equals
> > > >>>
> >
> http://localhost:8983/solr/select?q=*:*=national=if(equals($radius,'national'),0,geodist())
> > > 
> > >  This does not work:
> > > 
> > > 
> http://localhost:8983/solr/select?q=*:*=national={!switch
> > >  case.national=0 default=geodist() v=$radius}
> > > 
> > >  Ideas?
> > > 
> > > 
> > > 
> > >  --
> > >  Bill Bell
> > >  billnb...@gmail.com
> > >  cell 720-256-8076
> > > >>
> > > >>
> > > >> --
> > > >> Bill Bell
> > > >> billnb...@gmail.com
> > > >> cell 720-256-8076
> >
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Solr 5.3 spellcheck always return lower case?

2015-11-13 Thread QuestionNews .
The data displayed when doing a query is correct case. The fieldType
doesn't do any case manipulation and the requestHandler/searchComponent
don't have any settings declared that I can see.

Why is my spellcheck returning results that are all lower case?

Is there a way for me to stop this from happening or have spellcheck return
an additional field.

Thanks for your help and pardon me if I am not using this mailing list
properly.  It is my first time utilizing it.