Re: SOLR war for SOLR 6

2016-06-17 Thread Shawn Heisey
On 6/16/2016 1:20 AM, Bharath Kumar wrote:
> I was trying to generate a solr war out of the solr 6 source, but even
> after i create the war, i was not able to get it deployed correctly on
> jboss. Wanted to know if anyone was able to successfully generate solr
> war and deploy it on tomcat or jboss? Really appreciate your help on
> this.

FYI: If you do this, you're running an unsupported configuration. 
You're on your own for both getting it working AND any problems that are
related to the deployment rather than Solr itself.

You actually don't need to create a war.  Just run "ant clean server" in
the solr directory of the source code and then install the exploded
webapp (found in server/solr-webapp/webapp) into the container.  There
should be instructions available for how to install an exploded webapp
into tomcat or jboss.  As already stated, you are on your own for
finding and following those instructions, and if Solr doesn't deploy,
you will need to talk to somebody who knows the container for help. 
Once they are sure you have the config for the container right, they may
refer you back here ... but because it's an unsupported config, the
amount of support we can offer is minimal.

https://wiki.apache.org/solr/WhyNoWar

If you want the admin UI to work when you install into a user-supplied
container, then you must set the context path for the app to "/solr". 
The admin UI in 6.x will not work if you use another path, and that is
not considered a bug, because the only supported container has the path
hardcoded to /solr.

Thanks,
Shawn



Re: tlogs not deleting as usual in Solr 5.5.1?

2016-06-17 Thread Chris Morley
Thanks Erick - that's what we have settled on doing until we are using 
SolrCloud, which will be later this year with any luck.  We want to get up 
onto Solr 5.5.1 first (ASAP) and we tried disabling tlogs today and that 
seems to fit the bill.
  
  
  


 From: "Erick Erickson" 
Sent: Friday, June 17, 2016 2:36 PM
To: "solr-user" , ch...@depahelix.com
Subject: Re: tlogs not deleting as usual in Solr 5.5.1?   
If you are NOT using SolrCloud and don't
care about Real Time Get, you can just disable the
tlogs entirely. They're not doing you all that much
good in that case...

The tlogs are irrelevant when it comes to master/slave
replication.

FWIW,
Erick

On Fri, Jun 17, 2016 at 9:14 AM, Chris Morley  wrote:
> After some more searching, I found a thread online where Erick Erickson 
is
> telling someone about how there are old tlogs left around in case there 
is
> a need for a peer to sync even if SolrCloud is not enabled. That makes
> sense, but we'll probably want to enable autoCommit and then trigger
> replication on the slaves when we know everything is committed after a 
full
> import. (We disable polling.)
>
>
>
>
> 
> From: "Chris Morley" 
> Sent: Thursday, June 16, 2016 3:20 PM
> To: "Solr Newsgroup" 
> Subject: tlogs not deleting as usual in Solr 5.5.1?
> The repetition below is on purpose to show the contrast between solr
> versions.
>
> In Solr 4.10.3, we have autocommits disabled. We do a dataimport of a 
few
> hundred thousand records and have a tlog that grows to ~1.2G.
>
> In Solr 5.5.1, we have autocommits disabled. We do a dataimport of a few
> hundred thousand records and have a tlog that grows to ~1.6G. (same 
exact
> data, slightly larger tlog but who knows, that's fine)
>
> In Solr 4.10.3 tlogs ARE deleted after issuing update?commit=true.
> (And deleted immediately.)
>
> In Solr 5.5.1 tlogs ARE NOT deleted after issuing update?commit=true.
>
> We want the tlog to delete like it did in Solr 4.10.3. Perhaps there is 
a
> configuration setting or feature of Solr 5.5.1 that causes this?
>
> Would appreciate any tips on configuration or code we could change to
> ensure the tlog will delete after a hard commit.
>
>
>
 



Thank You Guys

2016-06-17 Thread Jamal, Sarfaraz
Hi Guys,

Thank you all - I got synonyms, highlighting, stemming all working the way I 
wanted to.

I am sure I will have more questions later on =)

Thanks!

Sas


Re: tlogs not deleting as usual in Solr 5.5.1?

2016-06-17 Thread Erick Erickson
If you are NOT using SolrCloud and don't
care about Real Time Get, you can just disable the
tlogs entirely. They're not doing you all that much
good in that case...

The tlogs are irrelevant when it comes to master/slave
replication.

FWIW,
Erick

On Fri, Jun 17, 2016 at 9:14 AM, Chris Morley  wrote:
> After some more searching, I found a thread online where Erick Erickson is
> telling someone about how there are old tlogs left around in case there is
> a need for a peer to sync even if SolrCloud is not enabled.  That makes
> sense, but we'll probably want to enable autoCommit and then trigger
> replication on the slaves when we know everything is committed after a full
> import.  (We disable polling.)
>
>
>
>
> 
>  From: "Chris Morley" 
> Sent: Thursday, June 16, 2016 3:20 PM
> To: "Solr Newsgroup" 
> Subject: tlogs not deleting as usual in Solr 5.5.1?
> The repetition below is on purpose to show the contrast between solr
> versions.
>
> In Solr 4.10.3, we have autocommits disabled. We do a dataimport of a few
> hundred thousand records and have a tlog that grows to ~1.2G.
>
> In Solr 5.5.1, we have autocommits disabled. We do a dataimport of a few
> hundred thousand records and have a tlog that grows to ~1.6G. (same exact
> data, slightly larger tlog but who knows, that's fine)
>
> In Solr 4.10.3 tlogs ARE deleted after issuing update?commit=true.
> (And deleted immediately.)
>
> In Solr 5.5.1 tlogs ARE NOT deleted after issuing update?commit=true.
>
> We want the tlog to delete like it did in Solr 4.10.3. Perhaps there is a
> configuration setting or feature of Solr 5.5.1 that causes this?
>
> Would appreciate any tips on configuration or code we could change to
> ensure the tlog will delete after a hard commit.
>
>
>


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread MaryJo Sminkey
> OK - Slapping forehead now... D'oh!
>
> 1.2
> Float, not int!
>


LOL, we've all been there. I'm surprised I didn't notice that myself.

MJ


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread John Bickerstaff
OK - Slapping forehead now... D'oh!

1.2 wrote:

> Hi all -
>
> I've successfully run the hon-lucene-synonyms plugin from the Admin
> console by adding the following to the Raw Query Parameters field...
>
>
> =text=synonym_edismax=true=1.2=1.1
>
> I got those from the Read Me on the github account.
>
> Now I'm trying to make this work via a requestHandler in solrconfig.xml.
>
> I think the following should work, but it just hangs if I add the last
> line referencing synonyms.originalBoost
>
> 
> 
>  
>explicit
>10
>synonym_edismax
>text
>true
>1.2 --> If I add this
> line, the admin console just hangs when I hit /test1
>  
>  
>
> If I do NOT add the last line and only have the line that sets
> synonyms=true, it appears to work fine.
>
> I see the dot notation all over the sample entries in solrconfig.xml...
> Am I missing something here?
>
> Essentially, how do I get these variables set correctly from inside a
> requestHandler configured in the solrconfig.xml file?
>
> On Tue, Jun 7, 2016 at 11:47 AM, Joe Lawson <
> jlaw...@opensourceconnections.com> wrote:
>
>> MaryJo you might want to start a new thread, I think we kinda hijacked
>> this
>> one. Also if you are interested in tuning queries check out
>> http://splainer.io/ and https://www.quepid.com which are interactive
>> tools
>> (both of which my company makes) to tune for search relevancy.
>>
>> On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey 
>> wrote:
>>
>> > I'm really thinking this just might not be the right tool for us, what
>> we
>> > really need is a solution that works like the normal synonym filter
>> does,
>> > just with proper multi-term support, so I can apply the synonyms only on
>> > certain fields (copied fields) that have their own, lower boost
>> settings.
>> > The way this plugin works across the entire query just seems too
>> > problematic when you need to do complex queries with lots of different
>> > boost settings to get good relevancy. Anyone used a different method of
>> > handling multi-term synonyms that isn't as global?
>> >
>> > Mary Jo
>> >
>> >
>> >
>> > On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey 
>> > wrote:
>> >
>> > > Here's the issue I am still having with getting the right search
>> > relevancy
>> > > with the synonym plugin in place. We typically have users searching on
>> > > multiple terms, and we want matches across multiple terms,
>> particularly
>> > > those that appears as phrases, to appear higher than matches for the
>> same
>> > > term multiple times. The synonym filter makes this complicated since
>> we
>> > may
>> > > have cases where the term the user enters, like "sbc", maps to a
>> > multi-term
>> > > synonym like "small block", and we always want the matches for the
>> > original
>> > > term to pop up first, so I'm trying to make sure the original boost is
>> > high
>> > > enough to override a phrase boost that the multi-term synonym would
>> give.
>> > > Unfortunately this then means matches on the same term multiple times
>> get
>> > > pushed up over my phrase matches...those aren't going to be the most
>> > > relevant matches. Not sure there's a way to solve this successfully,
>> > > without a completely different approach to the synonyms... or not
>> > counting
>> > > the number of matches on terms (I assume you can drop that ability,
>> > > although that's not ideal either...just better than what I have now).
>> > >
>> > > MJ
>> > >
>> > >
>> > >
>> > > Sent with MailTrack
>> > > <
>> >
>> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
>> > >
>> > >
>> > > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey 
>> > > wrote:
>> > >
>> > >>
>> > >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
>> > >> jlaw...@opensourceconnections.com> wrote:
>> > >>
>> > >>>
>> > >>> We were thinking, as you experimented with, that the 0.5 and 2.0
>> boosts
>> > >>> were no match for the product name and keyword field boosts so that
>> > would
>> > >>> influence your search as well.
>> > >>
>> > >>
>> > >>
>> > >> Yeah I definitely will have to play with the values a bit as we want
>> the
>> > >> product name matches to always appear highest, whether original or
>> > >> synonyms, but I'll have to figure out how to get that result without
>> one
>> > >> word terms that have multi word synonyms getting overly boosted for a
>> > >> phrase match while still sufficiently boosting the normal phrase
>> > match
>> > >> stuff too. With the normal synonym filter I was able to just copy
>> fields
>> > >> that could have synonyms to a new field (which would be the only one
>> > with
>> > >> the synonym filter), and use a different, lower boost on those
>> fields,
>> > but
>> > >> that won't work with this plugin which applies across everything in
>> the
>> > >> query. Makes it a bit more complicated to get everything just right.
>> > >>
>> > >> MJ
>> > >>
>> > >>
>> > >> Sent 

Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread MaryJo Sminkey
On Fri, Jun 17, 2016 at 2:15 PM, John Bickerstaff 
wrote:

> If I do NOT add the last line and only have the line that sets
> synonyms=true, it appears to work fine.
>
> I see the dot notation all over the sample entries in solrconfig.xml...  Am
> I missing something here?
>
> Essentially, how do I get these variables set correctly from inside a
> requestHandler configured in the solrconfig.xml file?
>


I know I didn't have any issues using those boosts but I was sending them
on the query string (or otherwise as part of my query request), rather than
setting them in the config. You might try that to see if it makes a
difference.

Mary Jo


Re: Solr Cloud and Multi-word Synonyms :: synonym_edismax parser

2016-06-17 Thread John Bickerstaff
Hi all -

I've successfully run the hon-lucene-synonyms plugin from the Admin console
by adding the following to the Raw Query Parameters field...

=text=synonym_edismax=true=1.2=1.1

I got those from the Read Me on the github account.

Now I'm trying to make this work via a requestHandler in solrconfig.xml.

I think the following should work, but it just hangs if I add the last line
referencing synonyms.originalBoost



 
   explicit
   10
   synonym_edismax
   text
   true
   1.2 --> If I add this line,
the admin console just hangs when I hit /test1
 
 

If I do NOT add the last line and only have the line that sets
synonyms=true, it appears to work fine.

I see the dot notation all over the sample entries in solrconfig.xml...  Am
I missing something here?

Essentially, how do I get these variables set correctly from inside a
requestHandler configured in the solrconfig.xml file?

On Tue, Jun 7, 2016 at 11:47 AM, Joe Lawson <
jlaw...@opensourceconnections.com> wrote:

> MaryJo you might want to start a new thread, I think we kinda hijacked this
> one. Also if you are interested in tuning queries check out
> http://splainer.io/ and https://www.quepid.com which are interactive tools
> (both of which my company makes) to tune for search relevancy.
>
> On Tue, Jun 7, 2016 at 1:45 PM, MaryJo Sminkey 
> wrote:
>
> > I'm really thinking this just might not be the right tool for us, what we
> > really need is a solution that works like the normal synonym filter does,
> > just with proper multi-term support, so I can apply the synonyms only on
> > certain fields (copied fields) that have their own, lower boost settings.
> > The way this plugin works across the entire query just seems too
> > problematic when you need to do complex queries with lots of different
> > boost settings to get good relevancy. Anyone used a different method of
> > handling multi-term synonyms that isn't as global?
> >
> > Mary Jo
> >
> >
> >
> > On Tue, Jun 7, 2016 at 1:31 PM, MaryJo Sminkey 
> > wrote:
> >
> > > Here's the issue I am still having with getting the right search
> > relevancy
> > > with the synonym plugin in place. We typically have users searching on
> > > multiple terms, and we want matches across multiple terms, particularly
> > > those that appears as phrases, to appear higher than matches for the
> same
> > > term multiple times. The synonym filter makes this complicated since we
> > may
> > > have cases where the term the user enters, like "sbc", maps to a
> > multi-term
> > > synonym like "small block", and we always want the matches for the
> > original
> > > term to pop up first, so I'm trying to make sure the original boost is
> > high
> > > enough to override a phrase boost that the multi-term synonym would
> give.
> > > Unfortunately this then means matches on the same term multiple times
> get
> > > pushed up over my phrase matches...those aren't going to be the most
> > > relevant matches. Not sure there's a way to solve this successfully,
> > > without a completely different approach to the synonyms... or not
> > counting
> > > the number of matches on terms (I assume you can drop that ability,
> > > although that's not ideal either...just better than what I have now).
> > >
> > > MJ
> > >
> > >
> > >
> > > Sent with MailTrack
> > > <
> >
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> > >
> > >
> > > On Mon, Jun 6, 2016 at 9:39 PM, MaryJo Sminkey 
> > > wrote:
> > >
> > >>
> > >> On Mon, Jun 6, 2016 at 7:36 PM, Joe Lawson <
> > >> jlaw...@opensourceconnections.com> wrote:
> > >>
> > >>>
> > >>> We were thinking, as you experimented with, that the 0.5 and 2.0
> boosts
> > >>> were no match for the product name and keyword field boosts so that
> > would
> > >>> influence your search as well.
> > >>
> > >>
> > >>
> > >> Yeah I definitely will have to play with the values a bit as we want
> the
> > >> product name matches to always appear highest, whether original or
> > >> synonyms, but I'll have to figure out how to get that result without
> one
> > >> word terms that have multi word synonyms getting overly boosted for a
> > >> phrase match while still sufficiently boosting the normal phrase
> > match
> > >> stuff too. With the normal synonym filter I was able to just copy
> fields
> > >> that could have synonyms to a new field (which would be the only one
> > with
> > >> the synonym filter), and use a different, lower boost on those fields,
> > but
> > >> that won't work with this plugin which applies across everything in
> the
> > >> query. Makes it a bit more complicated to get everything just right.
> > >>
> > >> MJ
> > >>
> > >>
> > >> Sent with MailTrack
> > >> <
> >
> https://mailtrack.io/install?source=signature=en=mjsmin...@gmail.com=22
> > >
> > >>
> > >
> > >
> >
>


Re: Long STW GCs with Solr Cloud

2016-06-17 Thread Walter Underwood
I try to adjust the new generation size so that it can handle all the 
allocations needed for HTTP requests. Those short-lived objects should never 
come from tenured space.

Even without facets, I run a pretty big new generation, 2 GB in an 8 GB heap.

The tenured space will always grow in Solr, because objects ejected from cache 
have been around a while. Caches create garbage in tenured space.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Jun 17, 2016, at 10:01 AM, Jeff Wartes  wrote:
> 
> For what it’s worth, I looked into reducing the allocation footprint of 
> CollapsingQParserPlugin a bit, but without success. See 
> https://issues.apache.org/jira/browse/SOLR-9125
> 
> As it happened, I was collapsing on a field with such high cardinality that 
> the chances of a query even doing much collapsing of interest was pretty low. 
> That allowed me to use a vastly stripped-down version of 
> CollapsingQParserPlugin with a *much* lower memory footprint, in exchange for 
> collapsed document heads essentially being picked at random. (That is, when 
> collapsing two documents, the one that gets returned is random.)
> 
> If that’s of interest, I could probably throw the code someplace public.
> 
> 
> On 6/16/16, 3:39 PM, "Cas Rusnov"  wrote:
> 
>> Hey thanks for your reply.
>> 
>> Looks like running the suggested CMS config from Shawn, we're getting some
>> nodes with 30+sec pauses, I gather due to large heap, interestingly enough
>> while the scenario Jeff talked about is remarkably similar (we use field
>> collapsing), including the performance aspects of it, we are getting
>> concurrent mode failures both due to new space allocation failures and due
>> to promotion failures. I suspect there's a lot of garbage building up.
>> We're going to run tests with field collapsing disabled and see if that
>> makes a difference.
>> 
>> Cas
>> 
>> 
>> On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes  wrote:
>> 
>>> Check your gc log for CMS “concurrent mode failure” messages.
>>> 
>>> If a concurrent CMS collection fails, it does a stop-the-world pause while
>>> it cleans up using a *single thread*. This means the stop-the-world CMS
>>> collection in the failure case is typically several times slower than a
>>> concurrent CMS collection. The single-thread business means it will also be
>>> several times slower than the Parallel collector, which is probably what
>>> you’re seeing. I understand that it needs to stop the world in this case,
>>> but I really wish the CMS failure would fall back to a Parallel collector
>>> run instead.
>>> The Parallel collector is always going to be the fastest at getting rid of
>>> garbage, but only because it stops all the application threads while it
>>> runs, so it’s got less complexity to deal with. That said, it’s probably
>>> not going to be orders of magnitude faster than a (successfully) concurrent
>>> CMS collection.
>>> 
>>> Regardless, the bigger the heap, the bigger the pause.
>>> 
>>> If your application is generating a lot of garbage, or can generate a lot
>>> of garbage very suddenly, CMS concurrent mode failures are more likely. You
>>> can turn down the  -XX:CMSInitiatingOccupancyFraction value in order to
>>> give the CMS collection more of a head start at the cost of more frequent
>>> collections. If that doesn’t work, you can try using a bigger heap, but you
>>> may eventually find yourself trying to figure out what about your query
>>> load generates so much garbage (or causes garbage spikes) and trying to
>>> address that. Even G1 won’t protect you from highly unpredictable garbage
>>> generation rates.
>>> 
>>> In my case, for example, I found that a very small subset of my queries
>>> were using the CollapseQParserPlugin, which requires quite a lot of memory
>>> allocations, especially on a large index. Although generally this was fine,
>>> if I got several of these rare queries in a very short window, it would
>>> always spike enough garbage to cause CMS concurrent mode failures. The
>>> single-threaded concurrent-mode failure would then take long enough that
>>> the ZK heartbeat would fail, and things would just go downhill from there.
>>> 
>>> 
>>> 
>>> On 6/15/16, 3:57 PM, "Cas Rusnov"  wrote:
>>> 
 Hey Shawn! Thanks for replying.
 
 Yes I meant HugePages not HugeTable, brain fart. I will give the
 transparent off option a go.
 
 I have attempted to use your CMS configs as is and also the default
 settings and the cluster dies under our load (basically a node will get a
 35-60s GC STW and then the others in the shard will take the load, and
>>> they
 will in turn get long STWs until the shard dies), which is why basically
>>> in
 a fit of desperation I tried out ParallelGC and found it to be half-way
 acceptable. I will run a test using your configs (and the defaults) again
 just to 

Re: Error when searching with special characters

2016-06-17 Thread Ahmet Arslan
Hi,

May be URL encoding issue?
By the way, I would use back slash to escape special characters.

Ahmet

On Friday, June 17, 2016 10:08 AM, Zheng Lin Edwin Yeo  
wrote:



Hi,

I encountered this error when I tried to search with special characters,
like "&" and "#".

{
  "responseHeader":{
"status":400,
"QTime":0},
  "error":{
"msg":"org.apache.solr.search.SyntaxError: Cannot parse
'\"Research ': Lexical error at line 1, column 11.  Encountered: 
after : \"\\\"Research \"",
"code":400}}


I have done the search by putting inverted commands, like: q="Research &
Development"

What could be the issue here?

I'm facing this problem in both Solr 5.4.0 and Solr 6.0.1.


Regards,
Edwin


Re: Long STW GCs with Solr Cloud

2016-06-17 Thread Jeff Wartes
For what it’s worth, I looked into reducing the allocation footprint of 
CollapsingQParserPlugin a bit, but without success. See 
https://issues.apache.org/jira/browse/SOLR-9125

As it happened, I was collapsing on a field with such high cardinality that the 
chances of a query even doing much collapsing of interest was pretty low. That 
allowed me to use a vastly stripped-down version of CollapsingQParserPlugin 
with a *much* lower memory footprint, in exchange for collapsed document heads 
essentially being picked at random. (That is, when collapsing two documents, 
the one that gets returned is random.)

If that’s of interest, I could probably throw the code someplace public.


On 6/16/16, 3:39 PM, "Cas Rusnov"  wrote:

>Hey thanks for your reply.
>
>Looks like running the suggested CMS config from Shawn, we're getting some
>nodes with 30+sec pauses, I gather due to large heap, interestingly enough
>while the scenario Jeff talked about is remarkably similar (we use field
>collapsing), including the performance aspects of it, we are getting
>concurrent mode failures both due to new space allocation failures and due
>to promotion failures. I suspect there's a lot of garbage building up.
>We're going to run tests with field collapsing disabled and see if that
>makes a difference.
>
>Cas
>
>
>On Thu, Jun 16, 2016 at 1:08 PM, Jeff Wartes  wrote:
>
>> Check your gc log for CMS “concurrent mode failure” messages.
>>
>> If a concurrent CMS collection fails, it does a stop-the-world pause while
>> it cleans up using a *single thread*. This means the stop-the-world CMS
>> collection in the failure case is typically several times slower than a
>> concurrent CMS collection. The single-thread business means it will also be
>> several times slower than the Parallel collector, which is probably what
>> you’re seeing. I understand that it needs to stop the world in this case,
>> but I really wish the CMS failure would fall back to a Parallel collector
>> run instead.
>> The Parallel collector is always going to be the fastest at getting rid of
>> garbage, but only because it stops all the application threads while it
>> runs, so it’s got less complexity to deal with. That said, it’s probably
>> not going to be orders of magnitude faster than a (successfully) concurrent
>> CMS collection.
>>
>> Regardless, the bigger the heap, the bigger the pause.
>>
>> If your application is generating a lot of garbage, or can generate a lot
>> of garbage very suddenly, CMS concurrent mode failures are more likely. You
>> can turn down the  -XX:CMSInitiatingOccupancyFraction value in order to
>> give the CMS collection more of a head start at the cost of more frequent
>> collections. If that doesn’t work, you can try using a bigger heap, but you
>> may eventually find yourself trying to figure out what about your query
>> load generates so much garbage (or causes garbage spikes) and trying to
>> address that. Even G1 won’t protect you from highly unpredictable garbage
>> generation rates.
>>
>> In my case, for example, I found that a very small subset of my queries
>> were using the CollapseQParserPlugin, which requires quite a lot of memory
>> allocations, especially on a large index. Although generally this was fine,
>> if I got several of these rare queries in a very short window, it would
>> always spike enough garbage to cause CMS concurrent mode failures. The
>> single-threaded concurrent-mode failure would then take long enough that
>> the ZK heartbeat would fail, and things would just go downhill from there.
>>
>>
>>
>> On 6/15/16, 3:57 PM, "Cas Rusnov"  wrote:
>>
>> >Hey Shawn! Thanks for replying.
>> >
>> >Yes I meant HugePages not HugeTable, brain fart. I will give the
>> >transparent off option a go.
>> >
>> >I have attempted to use your CMS configs as is and also the default
>> >settings and the cluster dies under our load (basically a node will get a
>> >35-60s GC STW and then the others in the shard will take the load, and
>> they
>> >will in turn get long STWs until the shard dies), which is why basically
>> in
>> >a fit of desperation I tried out ParallelGC and found it to be half-way
>> >acceptable. I will run a test using your configs (and the defaults) again
>> >just to be sure (since I'm certain the machine config has changed since we
>> >used your unaltered settings).
>> >
>> >Thanks!
>> >Cas
>> >
>> >
>> >On Wed, Jun 15, 2016 at 3:41 PM, Shawn Heisey 
>> wrote:
>> >
>> >> On 6/15/2016 3:05 PM, Cas Rusnov wrote:
>> >> > After trying many of the off the shelf configurations (including CMS
>> >> > configurations but excluding G1GC, which we're still taking the
>> >> > warnings about seriously), numerous tweaks, rumors, various instance
>> >> > sizes, and all the rest, most of which regardless of heap size and
>> >> > newspace size resulted in frequent 30+ second STW GCs, we settled on
>> >> > the following configuration which leads to 

Morphlines.cell and attachments in complex docs?

2016-06-17 Thread Allison, Timothy B.
I was just looking at SolrCellBuilder, and it looks like there's an assumption 
that documents will not have attachments/embedded objects.  Unless I 
misunderstand the code, users will not be able to search documents inside zips, 
or attachments in msg/ doc/pdf/etc (cf. SOLR-7189).

Are embedded documents extracted in a step before hitting SolrCellBuilder?

Bug or feature?

Thank you!

 Cheers,

Tim



Re: ConcurrentMergeScheduler options not exposed

2016-06-17 Thread Michael McCandless
Really we need the infoStream output, to see what IW is doing, to take so
long merging.

Likely only one merge thread is running (CMS tries to detect if your IO
system "spins" and if so, uses 1 merge thread) ... maybe try configuring
this to something higher since your RAID array can probably handle it?

It's good that disabling auto IO throttling didn't change things ... that's
what I expected (since forced merges are not throttled by default).

Maybe capture all thread stacks and post back here?

Mike McCandless

http://blog.mikemccandless.com

On Thu, Jun 16, 2016 at 4:04 PM, Shawn Heisey  wrote:

> On 6/16/2016 2:35 AM, Michael McCandless wrote:
> >
> > Hmm, merging can't read at 800 MB/sec and only write at 20 MB/sec for
> > very long ... unless there is a huge percentage of deletes. Also, by
> > default CMS doesn't throttle forced merges (see
> > CMS.get/setForceMergeMBPerSec). Maybe capture
> > IndexWriter.setInfoStream output?
>
> I can see the problem myself.  I have a RAID10 array with six SATA
> disks.  When I click the Optimize button for a core that's several
> gigabytes, iotop shows me reads happening at about 100MB/s for several
> seconds, then writes clocking no more than 25 MB/s, and usually a lot
> less.  The last several gigabytes that were written were happening at
> less than 5 MB/s.  This is VERY slow, and does affect my nightly
> indexing processes.
>
> Asking the shell to copy a 5GB file revealed sustained write rates of
> over 500MB/s, so the hardware can definitely go faster.
>
> I patched in an option for solrconfig.xml where I could force it to call
> disableAutoIOThrottle().  I included logging in my patch to make
> absolutely sure that the new code was used.  This option made no
> difference in the write speed.  I also enabled infoStream, but either I
> configured it wrong or I do not know where to look for the messages.  I
> was modifying and compiling branch_5_5.
>
> This is the patch that I applied:
>
> http://apaste.info/wKG
>
> I did see the expected log entries in solr.log when I restarted with the
> patch and the new option in solrconfig.xml.
>
> What else can I look at?
>
> Thanks,
> Shawn
>
>


Accessing response docs in process method

2016-06-17 Thread Mark Robinson
Hi,

I would like to check the response for the *authors *data that comes in my
multiValued *authors* field and do some activity related to it before the
output is send back.

I know to access the facets and investigate it.

Could some one pls advise (the apis/ methods etc) on how I can get started
on this (accessing results in the *process* method).

Thanks!
Mark.


re: tlogs not deleting as usual in Solr 5.5.1?

2016-06-17 Thread Chris Morley
After some more searching, I found a thread online where Erick Erickson is 
telling someone about how there are old tlogs left around in case there is 
a need for a peer to sync even if SolrCloud is not enabled.  That makes 
sense, but we'll probably want to enable autoCommit and then trigger 
replication on the slaves when we know everything is committed after a full 
import.  (We disable polling.)
  
  
  


 From: "Chris Morley" 
Sent: Thursday, June 16, 2016 3:20 PM
To: "Solr Newsgroup" 
Subject: tlogs not deleting as usual in Solr 5.5.1?   
The repetition below is on purpose to show the contrast between solr
versions.

In Solr 4.10.3, we have autocommits disabled. We do a dataimport of a few
hundred thousand records and have a tlog that grows to ~1.2G.

In Solr 5.5.1, we have autocommits disabled. We do a dataimport of a few
hundred thousand records and have a tlog that grows to ~1.6G. (same exact
data, slightly larger tlog but who knows, that's fine)

In Solr 4.10.3 tlogs ARE deleted after issuing update?commit=true.
(And deleted immediately.)

In Solr 5.5.1 tlogs ARE NOT deleted after issuing update?commit=true.

We want the tlog to delete like it did in Solr 4.10.3. Perhaps there is a
configuration setting or feature of Solr 5.5.1 that causes this?

Would appreciate any tips on configuration or code we could change to
ensure the tlog will delete after a hard commit.

 



[ANNOUNCE] Apache Solr 6.1.0 released

2016-06-17 Thread Adrien Grand
17 June 2016, Apache Solr 6.1.0 available

Solr is the popular, blazing fast, open source NoSQL search platform from
the Apache Lucene project. Its major features include powerful full-text
search, hit highlighting, faceted search and analytics, rich document
parsing, geospatial search, extensive REST APIs as well as parallel SQL.
Solr is enterprise grade, secure and highly scalable, providing fault
tolerant distributed search and indexing, and powers the search and
navigation features of many of the world's largest internet sites.

Solr 6.1.0 is available for immediate download at:

 * http://lucene.apache.org/solr/mirrors-solr-latest-redir.html

Please read CHANGES.txt for a full list of new features and changes:

 * https://lucene.apache.org/solr/6_1_0/changes/Changes.html

Solr 6.1 Release Highlights:

 * Added graph traversal support, and new "sort" and "random" streaming
expressions. It's also now possible to create streaming expressions with
the Solr Admin UI.

 * Fixed the ENUM faceting method to not be unnecessarily rewritten to FCS,
which was causing slowdowns.

 * Reduced garbage creation when creating cache entries.

 * New [subquery] document transformer to obtatin related documents per
result doc.

 * EmbeddedSolrServer allocates heap much wisely even with plain document
list without callbacks.

 * New GeoJSON response writer for encoding geographic data in query
responses.

Further details of changes are available in the change log available at:
http://lucene.apache.org/solr/6_1_0/changes/Changes.html

Please report any feedback to the mailing lists (
http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring network
for distributing releases. It is possible that the mirror you are using may
not have replicated the release yet. If that is the case, please try
another mirror. This also applies to Maven access.

-- 
Adrien


Re: Long STW GCs with Solr Cloud

2016-06-17 Thread Ere Maijala

17.6.2016, 11.05, Bernd Fehling kirjoitti:



Am 17.06.2016 um 09:06 schrieb Ere Maijala:

16.6.2016, 1.41, Shawn Heisey kirjoitti:

If you want to continue avoiding G1, you should definitely be using
CMS.  My recommendation right now would be to try the G1 settings on my
wiki page under the heading "Current experiments" or the CMS settings
just below that.


For what it's worth, we're currently running Shawn's G1 settings slightly 
modified for our workload on Java 1.8.0_91 25.91-b14:

GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=16m \
-XX:MaxGCPauseMillis=200 \
-XX:+UnlockExperimentalVMOptions \
-XX:G1NewSizePercent=3 \
-XX:ParallelGCThreads=12 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"


-XX:G1NewSizePercent

... Sets the percentage of the heap to use as the minimum for the young 
generation size.
The default value is 5 percent of your Java heap. ...

So you are reducing the young heap generation size to get a smoother running 
system.
This is strange, like reducing the bottle below the bottleneck.


True, but it works. Perhaps that's due to the default being too much 
with our heap size (> 10 GB). In any case, these settings allow us to 
run with average pause of <150ms and max pause of <2s whiel we 
previously struggled with pauses exceeding 20s at worst. All this was 
inspired by 
https://software.intel.com/en-us/blogs/2014/06/18/part-1-tuning-java-garbage-collection-for-hbase.


Regards,
Ere


Re: Long STW GCs with Solr Cloud

2016-06-17 Thread Bernd Fehling


Am 17.06.2016 um 09:06 schrieb Ere Maijala:
> 16.6.2016, 1.41, Shawn Heisey kirjoitti:
>> If you want to continue avoiding G1, you should definitely be using
>> CMS.  My recommendation right now would be to try the G1 settings on my
>> wiki page under the heading "Current experiments" or the CMS settings
>> just below that.
> 
> For what it's worth, we're currently running Shawn's G1 settings slightly 
> modified for our workload on Java 1.8.0_91 25.91-b14:
> 
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=16m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UnlockExperimentalVMOptions \
> -XX:G1NewSizePercent=3 \
> -XX:ParallelGCThreads=12 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> "

-XX:G1NewSizePercent

... Sets the percentage of the heap to use as the minimum for the young 
generation size.
The default value is 5 percent of your Java heap. ...

So you are reducing the young heap generation size to get a smoother running 
system.
This is strange, like reducing the bottle below the bottleneck.

Just my 2 cents.

Regards
Bernd

> 
> It seems that our highly varying loads during day vs. night caused some 
> issues leading to long pauses until I added the G1NewSizePercent (which
> needs +UnlockExperimentalVMOptions). Things are running smoothly and there 
> are reports that the warnings regarding G1 with Lucene tests don't
> happen anymore with the newer Java versions, but it's of course up to you if 
> you're willing to take the chance.
> 
> Regards,
> Ere


Error when searching with special characters

2016-06-17 Thread Zheng Lin Edwin Yeo
Hi,

I encountered this error when I tried to search with special characters,
like "&" and "#".

{
  "responseHeader":{
"status":400,
"QTime":0},
  "error":{
"msg":"org.apache.solr.search.SyntaxError: Cannot parse
'\"Research ': Lexical error at line 1, column 11.  Encountered: 
after : \"\\\"Research \"",
"code":400}}


I have done the search by putting inverted commands, like: q="Research &
Development"

What could be the issue here?

I'm facing this problem in both Solr 5.4.0 and Solr 6.0.1.


Regards,
Edwin


Re: Long STW GCs with Solr Cloud

2016-06-17 Thread Ere Maijala

16.6.2016, 1.41, Shawn Heisey kirjoitti:

If you want to continue avoiding G1, you should definitely be using
CMS.  My recommendation right now would be to try the G1 settings on my
wiki page under the heading "Current experiments" or the CMS settings
just below that.


For what it's worth, we're currently running Shawn's G1 settings 
slightly modified for our workload on Java 1.8.0_91 25.91-b14:


GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=16m \
-XX:MaxGCPauseMillis=200 \
-XX:+UnlockExperimentalVMOptions \
-XX:G1NewSizePercent=3 \
-XX:ParallelGCThreads=12 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \
"

It seems that our highly varying loads during day vs. night caused some 
issues leading to long pauses until I added the G1NewSizePercent (which 
needs +UnlockExperimentalVMOptions). Things are running smoothly and 
there are reports that the warnings regarding G1 with Lucene tests don't 
happen anymore with the newer Java versions, but it's of course up to 
you if you're willing to take the chance.


Regards,
Ere