Re: slow queries

2015-10-14 Thread Erick Erickson
A couple of things don't particularly make sense here:

You specify edismax, q=*:* yet you specify qf=
You're searching across whatever you defined as the default
field in the request handler. What do you see if you attach
=true to the query?

I think this clause is wrong:
(cents_ri: [* 3000])

I think you mean
(cents_ri: [* TO 3000])

I'm not sure either of those is the problem, but are places I'd start.

As far as the size of your filter cache goes, a hit ratio of .87 actually
isn't bad. Upping the size would add some marginal benefit, but it's
unlikely to be a magic bullet.

But are these slow queries constant or intermittent? In other words,
are all queries of this general form slow or just the first few? In particular
is the first query that mentions sorting on this field slow but subsequent
ones faster? In that case consider adding a query to the newSearcher
event in solrconfig.xml that mentions this sort, that would pre-warm
the sort values. Also, defining all fields that you sort on as docValues="true"
is recommended at this point.

What I'd try is removing clauses to see which one is the problem. On
the surface this is surprisingly slow. And how heavily loaded is the server?
Your autocommit settings look fine, my question is more how much indexing
and querying is going on when you take these measurements.

Best,
Erick

On Wed, Oct 14, 2015 at 3:03 AM, Lorenzo Fundaró
 wrote:
> Hello,
>
> I have following conf for filters and commits :
>
> Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57,
> acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8,
> regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd)
>
>  
>
>${solr.autoCommit.maxTime:15000}
>false
>  
>
>  
>
>${solr.autoSoftCommit.maxTime:60}
>  
>
> and the following stats for filters:
>
> lookups = 3602
> hits  =  3148
> hit ratio = 0.87
> inserts = 455
> evictions = 400
> size = 63
> warmupTime = 770
>
> *Problem: *a lot of slow queries, for example:
>
> {q=*:*=1.0=edismax=standard=map==pk_i,score=0=view_counter_i
> desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50
> cache=true}in_languages_t:de={!cost=99
> cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
> (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378
>
> I could increase the size of the filter so I would decrease the amount of
> evictions, but it seems to me this would not be solving the root problem.
>
> Some ideas on where/how to start for optimisation ? Is it actually normal
> that this query takes this time ?
>
> We have an index of ~14 million docs. 4 replicas with two cores and 1 shard
> each.
>
> thank you.
>
>
> --
>
> --
> Lorenzo Fundaro
> Backend Engineer
> E-Mail: lorenzo.fund...@dawandamail.com
>
> Fax   + 49 - (0)30 - 25 76 08 52
> Tel+ 49 - (0)179 - 51 10 982
>
> DaWanda GmbH
> Windscheidstraße 18
> 10627 Berlin
>
> Geschäftsführer: Claudia Helming, Michael Pütz
> Amtsgericht Charlottenburg HRB 104695 B


Re: Replication and soft commits for NRT searches

2015-10-14 Thread Erick Erickson
bq: If a timeout between shard leader and replica can
lead to a smaller rf value (because replication has
timed out), is it possible to increase this timeout in the configuration?

Why do you care? If it timed out, then the follower will
no longer be active and will not serve queries. The Cloud view
should show it in "down", "recovery" or the like. Before it
goes back to the "active" state, it will synchronize from
the leader automatically without you having to do anything and
any docs that were indexed to the leader will be faithfully
reflected on the follower  _before_ the recovering
follower serves any new queries. So practically it makes no
difference whether there was an update timeout or not.

This is feeling a lot like an "XY" problem. You're asking detailed
questions about "X" (in this case timeouts, what rf means and the like)
without telling us what the problem you're concerned about is ("Y").

So please back up and tell us what your higher level concern is.
Do you have any evidence of Bad Things Happening?

And do, please, change your commit intervals to not happen after
doc. That's a Really Bad Practice in Solr.

Best,
Erick

On Tue, Oct 13, 2015 at 11:58 PM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> thank you for the detailed answer.
>
> If a timeout between shard leader and replica can lead to a smaller rf value 
> (because replication has timed out), is it possible to increase this timeout 
> in the configuration?
>
> Best Regards,
> Martin Mois
>
> Comments inline:
>
> On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
>  wrote:
>> Hello,
>>
>> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
>> created with
> replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am 
> using autoCommit/maxDocs=1
> and autoSoftCommits/maxDocs=1 in order to achieve near realtime search 
> behavior.
>>
>> As far as I understand from section "Write Side Fault Tolerance" in the 
>> documentation
> (https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
>  I
> cannot enforce that an update gets replicated to all replicas, but I can only 
> get the achieved
> replication factor by requesting the return value rf.
>>
>> My question is now, what exactly does rf=2 mean? Does it only mean that the 
>> replica has
> written the update to its transaction log? Or has the replica also performed 
> the soft commit
> as configured with autoSoftCommits/maxDocs=1? The answer is important for me, 
> as if the update
> would only get written to the transaction log, I could not search for it 
> reliable, as the
> replica may not have added it to the searchable index.
>
> rf=2 means that the update was successfully replicated to and
> acknowledged by two replicas (including the leader). The rf only deals
> with the durability of the update and has no relation to visibility of
> the update to searchers. The auto(soft)commit settings are applied
> asynchronously and do not block an update request.
>
>>
>> My second question is, does rf=1 mean that the update was definitely not 
>> successful on
> the replica or could it also represent a timeout of the replication request 
> from the shard
> leader? If it could also represent a timeout, then there would be a small 
> chance that the
> replication was successfully despite of the timeout.
>
> Well, rf=1 implies that the update was only applied on the leader's
> index + tlog and either replicas weren't available or returned an
> error or the request timed out. So yes, you are right that it can
> represent a timeout and as such there is a chance that the replication
> was indeed successful despite of the timeout.
>
>>
>> Is there a way to retrieve the replication factor for a specific document 
>> after the update
> in order to check if replication was successful in the meantime?
>>
>
> No, there is no way to do that.
>
>> Thanks in advance.
>>
>> Best Regards,
>> Martin Mois
>> #
>> " This e-mail and any attached documents may contain confidential or 
>> proprietary information.
> If you are not the intended recipient, you are notified that any 
> dissemination, copying of
> this e-mail and any attachments thereto or use of their contents by any means 
> whatsoever is
> strictly prohibited. If you have received this e-mail in error, please advise 
> the sender immediately
> and delete this e-mail and all attached documents from your computer system."
>> #
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information. If you are not the intended recipient, you are 
> notified that any dissemination, copying of this e-mail and any attachments 
> thereto or use of their contents by any means whatsoever is strictly 
> prohibited. If you have received this e-mail in error, please advise the 
> sender immediately and delete this e-mail and all attached documents from 
> 

Bioinformatics search event in Cambridge UK Feb 3rd & 4th 2016

2015-10-14 Thread Charlie Hull

Hi all,

We're helping to run an event in Cambridge UK next year which will be an 
open workshop on search for bioinformatics:

http://www.ebi.ac.uk/pdbe/about/events/open-source-search-bioinformatics
Do please spread the word to anyone working with biological data and 
open source search! It's linked to our project BioSolr which is 
developing Solr features for bioinformaticians such as ontology 
indexers, JOINs with external data and faceting improvements (although 
we're hoping they're also of general use).


Cheers

Charlie
--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Run Solr 5.3.0 as a Service on Windows using NSSM

2015-10-14 Thread Anders Thulin
Did you add the f param for running it in foreground?
I noticed that the Solr service was restarted indefinetly when running it
as a background service.
its also needed to stop the windows service.

This test worked well here (on Windows 2012):

REM Test for running solr 5.3.1 as a windows service
C:\nssm\nssm64.exe install "Solr 5.3.1" C:\search\solr-5.3.1\bin\solr.cmd
"start -f -p 8983"

On 8 October 2015 at 04:34, Zheng Lin Edwin Yeo 
wrote:

> Hi Adrian and Upayavira,
>
> It works fine when I start Solr outside NSSM.
> As for the NSSM, so far I haven't tried the automatic startup yet. I start
> the services for ZooKeeper and Solr in NSSM manually from the Windows
> Component Services, so the ZooKeeper will have been started before I start
> Solr.
>
> I'll also try to write the script for Solr that can check it can access
> Zookeeper before attempting to start Solr.
>
> Regards,
> Edwin
>
>
> On 7 October 2015 at 19:16, Upayavira  wrote:
>
> > Wrap your script that starts Solr with one that checks it can access
> > Zookeeper before attempting to start Solr, that way, once ZK starts,
> > Solr will come up. Then, hand *that* script to NSSM.
> >
> > And finally, when one of you has got a setup that works with NSSM
> > starting Solr via the default bin\solr.cmd script, create a patch and
> > upload it to JIRA. It would be a valuable thing for Solr to have a
> > *standard* way to start Solr on Windows as a service. I recall checking
> > the NSSM license and it wouldn't be an issue to include it within Solr -
> > or to have a script that assumes it is installed.
> >
> > Upayavira
> >
> > On Wed, Oct 7, 2015, at 11:49 AM, Adrian Liew wrote:
> > > Hi Edwin,
> > >
> > > You may want to try explore some of the configuration properties to
> > > configure in zookeeper.
> > >
> > >
> >
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#sc_zkMulitServerSetup
> > >
> > > My recommendation is to try run your batch files outside of NSSM so it
> is
> > > easier to debug and observe what you see from the command window. I
> don't
> > > think ZK and Solr can be automated on startup well using NSSM due to
> the
> > > fact that ZK services need to be running before you start up Solr
> > > services. I just had conversation with Shawn on this topic. NSSM cannot
> > > do the magic startup in a cluster setup. In that, you may need to write
> > > custom scripting to get it right.
> > >
> > > Back to your original issue, I guess it is worth exploring timeout
> > > values. Then again, I will leave the real Solr experts to chip in their
> > > thoughts.
> > >
> > > Best regards,
> > >
> > > Adrian Liew
> > >
> > >
> > > -Original Message-
> > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
> > > Sent: Wednesday, October 7, 2015 1:40 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Run Solr 5.3.0 as a Service on Windows using NSSM
> > >
> > > Hi Adrian,
> > >
> > > I've waited for more than 5 minutes and most of the time when I refresh
> > > it says that the page cannot be found. Got one or twice the main Admin
> > > page is loaded, but none of the cores are loaded.
> > >
> > > I have 20 cores which I'm loading. The core are of various sizes, but
> the
> > > maximum one is 38GB. Others ranges from 10GB to 15GB, and there're some
> > > which are less than 1GB.
> > >
> > > My overall core size is about 200GB.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 7 October 2015 at 12:11, Adrian Liew 
> wrote:
> > >
> > > > Hi Edwin,
> > > >
> > > > I have setup NSSM on Solr 5.3.0 in an Azure VM and can start up Solr
> > > > with a base standalone installation.
> > > >
> > > > You may have to give Solr some time to bootstrap things and wait for
> > > > the page to reload. Are you still seeing the page after 1 minute or
> so?
> > > >
> > > > What are your core sizes? And how many cores are you trying to load?
> > > >
> > > > Best regards,
> > > > Adrian
> > > >
> > > > -Original Message-
> > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
> > > > Sent: Wednesday, October 7, 2015 11:46 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Run Solr 5.3.0 as a Service on Windows using NSSM
> > > >
> > > > Hi,
> > > >
> > > > I tried to follow this to start my Solr as a service using NSSM.
> > > > http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/
> > > >
> > > > Everything is fine when I start the services under Component
> Services.
> > > > However, when I tried to point to the Solr Admin page, it says that
> > > > the page cannot be found.
> > > >
> > > > I have tried the same thing in Solr 5.1, and it was able to work. Not
> > > > sure why it couldn't work for Solr 5.2 and Solr 5.3.
> > > >
> > > > Is there any changes required to what is listed on the website?
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> >
>



-- 
Kind Regards / Med vänlig hälsning



*Anders Thulin*

Founder, CTO


Re: slow queries

2015-10-14 Thread Pushkar Raste
Consider
1. Turning on docValues for fields you are sorting, faceting on. This will
require to reindex your data
2. Try using TrieInt type field you are trying to do range search on (you
may have to fiddle with precisoinStep) to balance index size vs performance.
3. If slowness is intermittent - turn on GC logging and see if there are
any long and tune GC strategy accordingly.

-- Pushkar Raste

On Wed, Oct 14, 2015 at 5:03 AM, Lorenzo Fundaró <
lorenzo.fund...@dawandamail.com> wrote:

> Hello,
>
> I have following conf for filters and commits :
>
> Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57,
> acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8,
> regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd)
>
>  
>
>${solr.autoCommit.maxTime:15000}
>false
>  
>
>  
>
>${solr.autoSoftCommit.maxTime:60}
>  
>
> and the following stats for filters:
>
> lookups = 3602
> hits  =  3148
> hit ratio = 0.87
> inserts = 455
> evictions = 400
> size = 63
> warmupTime = 770
>
> *Problem: *a lot of slow queries, for example:
>
> {q=*:*=1.0=edismax=standard
> =map==pk_i,​score=0=view_counter_i
> desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50
> cache=true}in_languages_t:de={!cost=99
> cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
> (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378
>
> I could increase the size of the filter so I would decrease the amount of
> evictions, but it seems to me this would not be solving the root problem.
>
> Some ideas on where/how to start for optimisation ? Is it actually normal
> that this query takes this time ?
>
> We have an index of ~14 million docs. 4 replicas with two cores and 1 shard
> each.
>
> thank you.
>
>
> --
>
> --
> Lorenzo Fundaro
> Backend Engineer
> E-Mail: lorenzo.fund...@dawandamail.com
>
> Fax   + 49 - (0)30 - 25 76 08 52
> Tel+ 49 - (0)179 - 51 10 982
>
> DaWanda GmbH
> Windscheidstraße 18
> 10627 Berlin
>
> Geschäftsführer: Claudia Helming, Michael Pütz
> Amtsgericht Charlottenburg HRB 104695 B
>


Can I use tokenizer twice ?

2015-10-14 Thread vit
I have Solr 4.2
I need to do the following:

1. white space tokenize
2. create shingles
3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a
string

So can I do this?

* *


* *




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: slow queries

2015-10-14 Thread Susheel Kumar
Hi Lorenzo,

Can you provide which solr version you are using, index size on disks &
hardware config (memory/processor on each machine.

Thanks,
Susheel

On Wed, Oct 14, 2015 at 6:03 AM, Lorenzo Fundaró <
lorenzo.fund...@dawandamail.com> wrote:

> Hello,
>
> I have following conf for filters and commits :
>
> Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57,
> acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8,
> regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd)
>
>  
>
>${solr.autoCommit.maxTime:15000}
>false
>  
>
>  
>
>${solr.autoSoftCommit.maxTime:60}
>  
>
> and the following stats for filters:
>
> lookups = 3602
> hits  =  3148
> hit ratio = 0.87
> inserts = 455
> evictions = 400
> size = 63
> warmupTime = 770
>
> *Problem: *a lot of slow queries, for example:
>
> {q=*:*=1.0=edismax=standard
> =map==pk_i,​score=0=view_counter_i
> desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50
> cache=true}in_languages_t:de={!cost=99
> cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
> (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378
>
> I could increase the size of the filter so I would decrease the amount of
> evictions, but it seems to me this would not be solving the root problem.
>
> Some ideas on where/how to start for optimisation ? Is it actually normal
> that this query takes this time ?
>
> We have an index of ~14 million docs. 4 replicas with two cores and 1 shard
> each.
>
> thank you.
>
>
> --
>
> --
> Lorenzo Fundaro
> Backend Engineer
> E-Mail: lorenzo.fund...@dawandamail.com
>
> Fax   + 49 - (0)30 - 25 76 08 52
> Tel+ 49 - (0)179 - 51 10 982
>
> DaWanda GmbH
> Windscheidstraße 18
> 10627 Berlin
>
> Geschäftsführer: Claudia Helming, Michael Pütz
> Amtsgericht Charlottenburg HRB 104695 B
>


RE: How to formulate query

2015-10-14 Thread Prasanna S. Dhakephalkar
Hi Susheel, Mikhail, Erick,

Thanks for replies.

I need to learn more.

Regards,

Prasanna.

-Original Message-
From: Susheel Kumar [mailto:susheel2...@gmail.com] 
Sent: Tuesday, October 13, 2015 12:54 AM
To: solr-user@lucene.apache.org
Subject: Re: How to formulate query

Hi Prassana, This is a highly custom relevancy/ordering requirement and one 
possible way you can try is by creating multiple fields and coming up with 
query for each of the searches and boost them accordingly.

Thnx

On Mon, Oct 12, 2015 at 12:50 PM, Erick Erickson 
wrote:

> Nothing exists currently that would do this. I would urge you to 
> revisit the requirements, this kind of super-specific ordering is 
> often not worth the effort to try to enforce, how does the _user_ 
> benefit here?
>
> Best,
> Erick
>
> On Mon, Oct 12, 2015 at 12:47 AM, Prasanna S. Dhakephalkar 
>  wrote:
> > Hi,
> >
> >
> >
> > I am trying to make a solr search query to get result as under I am
> unable
> > to get do
> >
> >
> >
> > I have a search term say "pit"
> >
> > The result should have (in that order)
> >
> >
> >
> > All docs that have "pit" as first WORD in search field  (pit\ *)+
> >
> > All docs that have first WORD that starts with "pit"  (pit*\  *)+
> >
> > All docs that have "pit" as WORD anywhere in search field  (except 
> > first) (*\ pit\ *)+
> >
> > All docs that have  a WORD starting with "pit" anywhere in search 
> > field (except first) (*\ pit*\ *)+
> >
> > All docs that have "pit" as string anywhere in the search field 
> > except
> cases
> > covered above (*pit*)
> >
> >
> >
> > Example :
> >
> >
> >
> > Pit the pat
> >
> > Pit digger
> >
> > Pitch ball
> >
> > Pitcher man
> >
> > Dig a pit with shovel
> >
> > Why do you want to dig a pit with shovel
> >
> > Cricket pitch is 22 yards
> >
> > What is pithy, I don't know
> >
> > Per capita income
> >
> > Epitome of blah blah
> >
> >
> >
> >
> >
> > How can I achieve this ?
> >
> >
> >
> > Regards,
> >
> >
> >
> > Prasanna.
> >
> >
> >
>



Re: are there any SolrCloud supervisors?

2015-10-14 Thread Jeff Wartes

I’m aware of two public administration tools:
This was announced to the list just recently:
https://github.com/bloomreach/solrcloud-haft
And I’ve been working in this:
https://github.com/whitepages/solrcloud_manager

Both of these hook the Solrcloud client’s ZK access to inspect the cluster
state and execute more complex cluster-aware operations. I was also a bit
amused, because it looks like we both independently arrived at the same
replication-handler-based copy-collection operation. (Which suggests to me
that the functionality should be pushed into the collections API.)

Neither of these is a supervisor though, they merely provide a way to
execute cluster aware commands. Another monitor-oriented mechanism would
be needed to detect when to perform those commands, and I’ve not seen
anything existing along those lines.



On 10/13/15, 5:35 AM, "Susheel Kumar"  wrote:

>Sounds interesting...
>
>On Tue, Oct 13, 2015 at 12:58 AM, Trey Grainger 
>wrote:
>
>> I'd be very interested in taking a look if you post the code.
>>
>> Trey Grainger
>> Co-Author, Solr in Action
>> Director of Engineering, Search & Recommendations @ CareerBuilder
>>
>> On Fri, Oct 2, 2015 at 3:09 PM, r b  wrote:
>>
>> > I've been working on something that just monitors ZooKeeper to add and
>> > remove nodes from collections. the use case being I put SolrCloud in
>> > an autoscaling group on EC2 and as instances go up and down, I need
>> > them added to the collection. It's something I've built for work and
>> > could clean up to share on GitHub if there is much interest.
>> >
>> > I asked in the IRC about a SolrCloud supervisor utility but wanted to
>> > extend that question to this list. are there any more "full featured"
>> > supervisors out there?
>> >
>> >
>> > -renning
>> >
>>



partial search EdgeNGramFilterFactory

2015-10-14 Thread Brian Narsi
I have the following fieldtype in my schema:

   









  

and the following field:


With the following data:
 SellerName:CARDINAL HEALTH

When I do the following search

q:SellerName:cardinal

I get back the results with SellerName: CARDINAL HEALTH (correct)

or I do the search

q:SellerName:cardinal he

I get back the results with SellerName: CARDINAL HEALTH (correct)

But when I do the search

q:SellerName:cardinal hea

I am getting the results back with SellerName:INTEGRA RADIONICS

Why is that?

I need it to continue to return the correct results with CARDINAL HEALTH.
How do I make that happen?

Thanks in advance,


Re: slow queries

2015-10-14 Thread Pushkar Raste
You may want to start solr with following settings to enable logging GC
details. Here are some flags you might want to enable.

-Xloggc:/gc.log
-XX:+PrintGCDetails
-XX:+PrintGCDateStamps
-XX:+PrintGCTimeStamps
-XX:+PrintTenuringDistribution
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintHeapAtGC

Once you have GC logs, look for string "Total time for which application
threads were stopped" to check if you have long pauses (you may get long
pauses even with young generation GC).

-- Pushkar Raste

On Wed, Oct 14, 2015 at 11:47 AM, Lorenzo Fundaró <
lorenzo.fund...@dawandamail.com> wrote:

> < =true to the query?>>
>
> "debug": { "rawquerystring": "*:*", "querystring": "*:*", "parsedquery":
> "(+MatchAllDocsQuery(*:*))/no_coord", "parsedquery_toString": "+*:*", "
> explain": { "Product:47047358": "\n1.0 = (MATCH) MatchAllDocsQuery, product
> of:\n 1.0 = queryNorm\n", "Product:3223": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:30852121":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:35018929": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:31682082": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:31077677": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:22298365":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:41094514": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:13106166": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:19142249": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38243373":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:20434065": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:25194801": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:885482": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:45356790":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:67719831": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:12843394": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:38126213": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38798130":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:30292169": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:11535854": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:8443674": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:51012182":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:75780871": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:20227881": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:38093629": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:3142218":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:15295602": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:3375982": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:38276777": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:10726118":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:50827742": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n", "Product:5771722": "\n1.0 = (MATCH) MatchAllDocsQuery,
> product of:\n 1.0 = queryNorm\n", "Product:3245678": "\n1.0 = (MATCH)
> MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:13702130":
> "\n1.0
> = (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
> Product:25679953": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
> queryNorm\n" }, "QParser": "ExtendedDismaxQParser", "altquerystring": null,
> "boost_queries": null, "parsed_boost_queries": [], "boostfuncs": null, "
> filter_queries": [ "{!cost=1 cache=true}type_s:Product AND
> is_valid_b:true",
> "{!cost=50 cache=true}in_languages_t:de", "{!cost=99
> cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
> (cents_ri: [* TO 3000])" ], "parsed_filter_queries": [ "+type_s:Product
> +is_valid_b:true", "in_languages_t:de", "{!cache=false
> cost=99}+(shipping_country_codes_mt:de shipping_country_codes_mt:euro
> shipping_country_codes_mt:eur shipping_country_codes_mt:all) +cents_ri:[*
> TO 3000]" ], "timing": { "time": 18, "prepare": { "time": 0, "query": { "
> time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "
> time": 0 }, "stats": { "time": 0 }, "expand": { "time": 0 }, "spellcheck":
> {
> 

Re: slow queries

2015-10-14 Thread Lorenzo Fundaró
On 14 October 2015 at 18:18, Pushkar Raste  wrote:

> Consider
> 1. Turning on docValues for fields you are sorting, faceting on. This will
> require to reindex your data
>

Yes. I am considering doing this.


> 2. Try using TrieInt type field you are trying to do range search on (you
> may have to fiddle with precisoinStep) to balance index size vs
> performance.
>

Ok.


> 3. If slowness is intermittent - turn on GC logging and see if there are
> any long and tune GC strategy accordingly.
>

The Gc strategy is the default that comes when starting solr with bin/solr
start script. And I was looking at the GC logs, and saw no Full GC at all.

Thank you !



>
> -- Pushkar Raste
>
> On Wed, Oct 14, 2015 at 5:03 AM, Lorenzo Fundaró <
> lorenzo.fund...@dawandamail.com> wrote:
>
> > Hello,
> >
> > I have following conf for filters and commits :
> >
> > Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57,
> > acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8,
> > regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd)
> >
> >  
> >
> >${solr.autoCommit.maxTime:15000}
> >false
> >  
> >
> >  
> >
> >${solr.autoSoftCommit.maxTime:60}
> >  
> >
> > and the following stats for filters:
> >
> > lookups = 3602
> > hits  =  3148
> > hit ratio = 0.87
> > inserts = 455
> > evictions = 400
> > size = 63
> > warmupTime = 770
> >
> > *Problem: *a lot of slow queries, for example:
> >
> > {q=*:*=1.0=edismax=standard
> > =map==pk_i,​score=0=view_counter_i
> > desc={!cost=1 cache=true}type_s:Product AND
> is_valid_b:true={!cost=50
> > cache=true}in_languages_t:de={!cost=99
> > cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
> > (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378
> >
> > I could increase the size of the filter so I would decrease the amount of
> > evictions, but it seems to me this would not be solving the root problem.
> >
> > Some ideas on where/how to start for optimisation ? Is it actually normal
> > that this query takes this time ?
> >
> > We have an index of ~14 million docs. 4 replicas with two cores and 1
> shard
> > each.
> >
> > thank you.
> >
> >
> > --
> >
> > --
> > Lorenzo Fundaro
> > Backend Engineer
> > E-Mail: lorenzo.fund...@dawandamail.com
> >
> > Fax   + 49 - (0)30 - 25 76 08 52
> > Tel+ 49 - (0)179 - 51 10 982
> >
> > DaWanda GmbH
> > Windscheidstraße 18
> > 10627 Berlin
> >
> > Geschäftsführer: Claudia Helming, Michael Pütz
> > Amtsgericht Charlottenburg HRB 104695 B
> >
>



-- 

-- 
Lorenzo Fundaro
Backend Engineer
E-Mail: lorenzo.fund...@dawandamail.com

Fax   + 49 - (0)30 - 25 76 08 52
Tel+ 49 - (0)179 - 51 10 982

DaWanda GmbH
Windscheidstraße 18
10627 Berlin

Geschäftsführer: Claudia Helming, Michael Pütz
Amtsgericht Charlottenburg HRB 104695 B


Re: slow queries

2015-10-14 Thread Lorenzo Fundaró
<>

"debug": { "rawquerystring": "*:*", "querystring": "*:*", "parsedquery":
"(+MatchAllDocsQuery(*:*))/no_coord", "parsedquery_toString": "+*:*", "
explain": { "Product:47047358": "\n1.0 = (MATCH) MatchAllDocsQuery, product
of:\n 1.0 = queryNorm\n", "Product:3223": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:30852121": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:35018929": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n", "Product:31682082": "\n1.0 = (MATCH) MatchAllDocsQuery,
product of:\n 1.0 = queryNorm\n", "Product:31077677": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:22298365": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:41094514": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n", "Product:13106166": "\n1.0 = (MATCH) MatchAllDocsQuery,
product of:\n 1.0 = queryNorm\n", "Product:19142249": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38243373": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:20434065": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n", "Product:25194801": "\n1.0 = (MATCH) MatchAllDocsQuery,
product of:\n 1.0 = queryNorm\n", "Product:885482": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:45356790": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:67719831": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n", "Product:12843394": "\n1.0 = (MATCH) MatchAllDocsQuery,
product of:\n 1.0 = queryNorm\n", "Product:38126213": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:38798130": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:30292169": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n", "Product:11535854": "\n1.0 = (MATCH) MatchAllDocsQuery,
product of:\n 1.0 = queryNorm\n", "Product:8443674": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:51012182": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:75780871": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n", "Product:20227881": "\n1.0 = (MATCH) MatchAllDocsQuery,
product of:\n 1.0 = queryNorm\n", "Product:38093629": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:3142218": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:15295602": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n", "Product:3375982": "\n1.0 = (MATCH) MatchAllDocsQuery,
product of:\n 1.0 = queryNorm\n", "Product:38276777": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:10726118": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:50827742": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n", "Product:5771722": "\n1.0 = (MATCH) MatchAllDocsQuery,
product of:\n 1.0 = queryNorm\n", "Product:3245678": "\n1.0 = (MATCH)
MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "Product:13702130": "\n1.0
= (MATCH) MatchAllDocsQuery, product of:\n 1.0 = queryNorm\n", "
Product:25679953": "\n1.0 = (MATCH) MatchAllDocsQuery, product of:\n 1.0 =
queryNorm\n" }, "QParser": "ExtendedDismaxQParser", "altquerystring": null,
"boost_queries": null, "parsed_boost_queries": [], "boostfuncs": null, "
filter_queries": [ "{!cost=1 cache=true}type_s:Product AND is_valid_b:true",
"{!cost=50 cache=true}in_languages_t:de", "{!cost=99
cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
(cents_ri: [* TO 3000])" ], "parsed_filter_queries": [ "+type_s:Product
+is_valid_b:true", "in_languages_t:de", "{!cache=false
cost=99}+(shipping_country_codes_mt:de shipping_country_codes_mt:euro
shipping_country_codes_mt:eur shipping_country_codes_mt:all) +cents_ri:[*
TO 3000]" ], "timing": { "time": 18, "prepare": { "time": 0, "query": { "
time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "
time": 0 }, "stats": { "time": 0 }, "expand": { "time": 0 }, "spellcheck": {
"time": 0 }, "debug": { "time": 0 } }, "process": { "time": 18, "query": { "
time": 0 }, "facet": { "time": 0 }, "mlt": { "time": 0 }, "highlight": { "
time": 0 }, "stats": { "time": 0 }, "expand": { "time": 0 }, "spellcheck": {
"time": 0 }, "debug": { "time": 18 } }

I think this clause is wrong:
(cents_ri: [* 3000])

I think you mean
(cents_ri: [* TO 3000])

I think I made no difference. I tried both and they both worked.

But are these slow queries constant or intermittent?

They are definetly cached. The second time runs in no time.

I gonna try adding them in the pre warmcache too. And see the results.

The field that I used for sorting is indexed but not stored and it's not a
DocValue. I tried the query without the sort and the performance didnt

Re: slow queries

2015-10-14 Thread Erick Erickson
bq: They are definetly cached. The second time runs in no time.

That's not what I was referring to. Submitting the same query
over will certainly hit the queryResultCache and return in
almost no time.

What I meant was do things like vary the fq clause you have
where you've set cache=false. Or vary the parameters in the fq clauses.
The point is to only take measurements after enough queries have gone
through so you're sure the low-level caches are initialized. But the
queries all have to be different or you hit the queryResultCache.

 Best,
Erick

On Wed, Oct 14, 2015 at 9:50 AM, Lorenzo Fundaró
 wrote:
> On 14 October 2015 at 18:18, Pushkar Raste  wrote:
>
>> Consider
>> 1. Turning on docValues for fields you are sorting, faceting on. This will
>> require to reindex your data
>>
>
> Yes. I am considering doing this.
>
>
>> 2. Try using TrieInt type field you are trying to do range search on (you
>> may have to fiddle with precisoinStep) to balance index size vs
>> performance.
>>
>
> Ok.
>
>
>> 3. If slowness is intermittent - turn on GC logging and see if there are
>> any long and tune GC strategy accordingly.
>>
>
> The Gc strategy is the default that comes when starting solr with bin/solr
> start script. And I was looking at the GC logs, and saw no Full GC at all.
>
> Thank you !
>
>
>
>>
>> -- Pushkar Raste
>>
>> On Wed, Oct 14, 2015 at 5:03 AM, Lorenzo Fundaró <
>> lorenzo.fund...@dawandamail.com> wrote:
>>
>> > Hello,
>> >
>> > I have following conf for filters and commits :
>> >
>> > Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57,
>> > acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8,
>> > regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd)
>> >
>> >  
>> >
>> >${solr.autoCommit.maxTime:15000}
>> >false
>> >  
>> >
>> >  
>> >
>> >${solr.autoSoftCommit.maxTime:60}
>> >  
>> >
>> > and the following stats for filters:
>> >
>> > lookups = 3602
>> > hits  =  3148
>> > hit ratio = 0.87
>> > inserts = 455
>> > evictions = 400
>> > size = 63
>> > warmupTime = 770
>> >
>> > *Problem: *a lot of slow queries, for example:
>> >
>> > {q=*:*=1.0=edismax=standard
>> > =map==pk_i,score=0=view_counter_i
>> > desc={!cost=1 cache=true}type_s:Product AND
>> is_valid_b:true={!cost=50
>> > cache=true}in_languages_t:de={!cost=99
>> > cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
>> > (cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378
>> >
>> > I could increase the size of the filter so I would decrease the amount of
>> > evictions, but it seems to me this would not be solving the root problem.
>> >
>> > Some ideas on where/how to start for optimisation ? Is it actually normal
>> > that this query takes this time ?
>> >
>> > We have an index of ~14 million docs. 4 replicas with two cores and 1
>> shard
>> > each.
>> >
>> > thank you.
>> >
>> >
>> > --
>> >
>> > --
>> > Lorenzo Fundaro
>> > Backend Engineer
>> > E-Mail: lorenzo.fund...@dawandamail.com
>> >
>> > Fax   + 49 - (0)30 - 25 76 08 52
>> > Tel+ 49 - (0)179 - 51 10 982
>> >
>> > DaWanda GmbH
>> > Windscheidstraße 18
>> > 10627 Berlin
>> >
>> > Geschäftsführer: Claudia Helming, Michael Pütz
>> > Amtsgericht Charlottenburg HRB 104695 B
>> >
>>
>
>
>
> --
>
> --
> Lorenzo Fundaro
> Backend Engineer
> E-Mail: lorenzo.fund...@dawandamail.com
>
> Fax   + 49 - (0)30 - 25 76 08 52
> Tel+ 49 - (0)179 - 51 10 982
>
> DaWanda GmbH
> Windscheidstraße 18
> 10627 Berlin
>
> Geschäftsführer: Claudia Helming, Michael Pütz
> Amtsgericht Charlottenburg HRB 104695 B


Re: partial search EdgeNGramFilterFactory

2015-10-14 Thread Erick Erickson
try adding =true to your query. The query
q=SellerName:cardinal he
actually parses as
q=SellerName:cardinal defaultSearchField:he

so I suspect you're getting on the default search field.

I'm not sure EdgeNGram is what you want here though.
That only grams individual tokens, so CARDINAL is grammed
totally separately from HEALTH. You might consider
a different tokenizer, say KeywordTokenizer and LowerCaseFilter
followed by edgeNGram to treat the whole thing as a unit. You'd have
to take some care to make sure you escaped spaces to get
the whole thing through the query parser though.

Best,
Erick

On Wed, Oct 14, 2015 at 11:03 AM, Brian Narsi  wrote:
> I have the following fieldtype in my schema:
>
> positionIncrementGap="100">
> 
> 
> 
>  maxGramSize="25"/>
> 
> 
> 
> 
> 
>   
>
> and the following field:
>  required="true" multiValued="false" />
>
> With the following data:
>  SellerName:CARDINAL HEALTH
>
> When I do the following search
>
> q:SellerName:cardinal
>
> I get back the results with SellerName: CARDINAL HEALTH (correct)
>
> or I do the search
>
> q:SellerName:cardinal he
>
> I get back the results with SellerName: CARDINAL HEALTH (correct)
>
> But when I do the search
>
> q:SellerName:cardinal hea
>
> I am getting the results back with SellerName:INTEGRA RADIONICS
>
> Why is that?
>
> I need it to continue to return the correct results with CARDINAL HEALTH.
> How do I make that happen?
>
> Thanks in advance,


Re: Replication and soft commits for NRT searches

2015-10-14 Thread MOIS Martin (MORPHO)
Hello,

thank you for the detailed answer.

If a timeout between shard leader and replica can lead to a smaller rf value 
(because replication has timed out), is it possible to increase this timeout in 
the configuration?

Best Regards,
Martin Mois

Comments inline:

On Mon, Oct 12, 2015 at 1:31 PM, MOIS Martin (MORPHO)
 wrote:
> Hello,
>
> I am running Solr 5.2.1 in a cluster with 6 nodes. My collections have been 
> created with
replicationFactor=2, i.e. I have one replica for each shard. Beyond that I am 
using autoCommit/maxDocs=1
and autoSoftCommits/maxDocs=1 in order to achieve near realtime search behavior.
>
> As far as I understand from section "Write Side Fault Tolerance" in the 
> documentation
(https://cwiki.apache.org/confluence/display/solr/Read+and+Write+Side+Fault+Tolerance),
 I
cannot enforce that an update gets replicated to all replicas, but I can only 
get the achieved
replication factor by requesting the return value rf.
>
> My question is now, what exactly does rf=2 mean? Does it only mean that the 
> replica has
written the update to its transaction log? Or has the replica also performed 
the soft commit
as configured with autoSoftCommits/maxDocs=1? The answer is important for me, 
as if the update
would only get written to the transaction log, I could not search for it 
reliable, as the
replica may not have added it to the searchable index.

rf=2 means that the update was successfully replicated to and
acknowledged by two replicas (including the leader). The rf only deals
with the durability of the update and has no relation to visibility of
the update to searchers. The auto(soft)commit settings are applied
asynchronously and do not block an update request.

>
> My second question is, does rf=1 mean that the update was definitely not 
> successful on
the replica or could it also represent a timeout of the replication request 
from the shard
leader? If it could also represent a timeout, then there would be a small 
chance that the
replication was successfully despite of the timeout.

Well, rf=1 implies that the update was only applied on the leader's
index + tlog and either replicas weren't available or returned an
error or the request timed out. So yes, you are right that it can
represent a timeout and as such there is a chance that the replication
was indeed successful despite of the timeout.

>
> Is there a way to retrieve the replication factor for a specific document 
> after the update
in order to check if replication was successful in the meantime?
>

No, there is no way to do that.

> Thanks in advance.
>
> Best Regards,
> Martin Mois
> #
> " This e-mail and any attached documents may contain confidential or 
> proprietary information.
If you are not the intended recipient, you are notified that any dissemination, 
copying of
this e-mail and any attachments thereto or use of their contents by any means 
whatsoever is
strictly prohibited. If you have received this e-mail in error, please advise 
the sender immediately
and delete this e-mail and all attached documents from your computer system."
> #



--
Regards,
Shalin Shekhar Mangar.

#
" This e-mail and any attached documents may contain confidential or 
proprietary information. If you are not the intended recipient, you are 
notified that any dissemination, copying of this e-mail and any attachments 
thereto or use of their contents by any means whatsoever is strictly 
prohibited. If you have received this e-mail in error, please advise the sender 
immediately and delete this e-mail and all attached documents from your 
computer system."
#


Re: Request for Wiki edit right

2015-10-14 Thread Arcadius Ahouansou
Thank you very much Erick.

Arcadius.

On 13 October 2015 at 22:04, Erick Erickson  wrote:

> Just added you to the Solr Wiki contributors group, if you need to
> access the Lucene Wiki let us know.
>
> Best,
> Erick
>
> On Tue, Oct 13, 2015 at 1:57 PM, Arcadius Ahouansou
>  wrote:
> > Hello Erick.
> > Thank you for the detailed info.
> > My username is arcadius.
> >
> > Thanks.
> >
> >
> > On 13 October 2015 at 16:58, Erick Erickson 
> wrote:
> >
> >> Create a user on the Wiki (anyone can), then tell us the user name
> >> you've created and we'll add you to the auth lists. There are separate
> >> lists for Solr and Lucene. We had to lock these down because we were
> >> getting a lot of spam pages created.
> >>
> >> The reference guide (CWiki) is restricted to committers though.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Oct 13, 2015 at 6:30 AM, Arcadius Ahouansou
> >>  wrote:
> >> > Hello.
> >> >
> >> > Please, can I have the right to edit the Wiki?
> >> >
> >> > Thanks.
> >> >
> >> > Arcadius.
> >>
> >
> >
> >
> > --
> > Arcadius Ahouansou
> > Menelic Ltd | Information is Power
> > M: 07908761999
> > W: www.menelic.com
> > ---
>



-- 
Arcadius Ahouansou
Menelic Ltd | Information is Power
M: 07908761999
W: www.menelic.com
---


slow queries

2015-10-14 Thread Lorenzo Fundaró
Hello,

I have following conf for filters and commits :

Concurrent LFU Cache(maxSize=64, initialSize=64, minSize=57,
acceptableSize=60, cleanupThread=false, timeDecay=true, autowarmCount=8,
regenerator=org.apache.solr.search.SolrIndexSearcher$2@169ee0fd)

 
   
   ${solr.autoCommit.maxTime:15000}
   false
 

 
   
   ${solr.autoSoftCommit.maxTime:60}
 

and the following stats for filters:

lookups = 3602
hits  =  3148
hit ratio = 0.87
inserts = 455
evictions = 400
size = 63
warmupTime = 770

*Problem: *a lot of slow queries, for example:

{q=*:*=1.0=edismax=standard=map==pk_i,​score=0=view_counter_i
desc={!cost=1 cache=true}type_s:Product AND is_valid_b:true={!cost=50
cache=true}in_languages_t:de={!cost=99
cache=false}(shipping_country_codes_mt: (DE OR EURO OR EUR OR ALL)) AND
(cents_ri: [* 3000])=36=json} hits=3768003 status=0 QTime=1378

I could increase the size of the filter so I would decrease the amount of
evictions, but it seems to me this would not be solving the root problem.

Some ideas on where/how to start for optimisation ? Is it actually normal
that this query takes this time ?

We have an index of ~14 million docs. 4 replicas with two cores and 1 shard
each.

thank you.


-- 

-- 
Lorenzo Fundaro
Backend Engineer
E-Mail: lorenzo.fund...@dawandamail.com

Fax   + 49 - (0)30 - 25 76 08 52
Tel+ 49 - (0)179 - 51 10 982

DaWanda GmbH
Windscheidstraße 18
10627 Berlin

Geschäftsführer: Claudia Helming, Michael Pütz
Amtsgericht Charlottenburg HRB 104695 B


Re: Using SimpleNaiveBayesClassifier in solr

2015-10-14 Thread Yewint Ko
Thank Ales and Tommaso for your replies

So, is it like the classifier query the whole index db and load onto memory
first before running tokenizer against InputDocument? It sounds like if I
don't close the classifier and my index is big,  i might need bigger
machine. Anyway to reverse the order? Do I sound dump?

On 12 October 2015 at 16:11, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> Hi Yewint,
> >
> > The sample test code inside seems like that classifier read the whole
> index
> > db to train the model everytime when classification happened for
> > inputDocument. or am I misunderstanding something here?
>
>
> I would suggest you to take a look to a couple of articles I wrote last
> summer about the Classification in Lucene and Solr :
>
>
> http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html
>
>
> http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html
>
> Basically your misunderstood is that this module work as standard
> classifier, which is not our case.
> Lucene Classification doesn't train a model over time, the Index is your
> model.
> It uses the Index data structures to perform the classification processes
> (Knn and Simple Bayes are the algorithms I explored at that time) .
> Basically the algorithms access to Term Frequencies and Document
> Frequencies stored in the Inverted index.
>
> Having a big Index will affect as of course we are querying the index, but
> not because we are building a model.
>
> +1 on all Tommaso's observations!
>
> Cheers
>
>
>
> On 10 October 2015 at 20:36, Yewint Ko  wrote:
>
> > Hi
> >
> > I am trying to use SimpleNaiveBayesClassifier in my solr project.
> Currently
> > looking at its test base ClassificationTestBase.java.
> >
> > The sample test code inside seems like that classifier read the whole
> index
> > db to train the model everytime when classification happened for
> > inputDocument. or am I misunderstanding something here? If i had a large
> > index db, will it impact performance?
> >
> > protected void checkCorrectClassification(Classifier classifier,
> String
> > inputDoc, T expectedResult, Analyzer analyzer, String textFieldName,
> String
> > classFieldName, Query query) throws Exception {
> >
> > AtomicReader atomicReader = null;
> >
> > try {
> >
> >   populateSampleIndex(analyzer);
> >
> >   atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter
> > .getReader());
> >
> >   classifier.train(atomicReader, textFieldName, classFieldName,
> > analyzer,
> > query);
> >
> >   ClassificationResult classificationResult =
> > classifier.assignClass(
> > inputDoc);
> >
> >   assertNotNull(classificationResult.getAssignedClass());
> >
> >   assertEquals("got an assigned class of " +
> > classificationResult.getAssignedClass(),
> > expectedResult, classificationResult.getAssignedClass());
> >
> >   assertTrue("got a not positive score " +
> > classificationResult.getScore(),
> > classificationResult.getScore() > 0);
> >
> > } finally {
> >
> >   if (atomicReader != null)
> >
> > atomicReader.close();
> >
> > }
> >
> >   }
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Grouping facets: Possible to get facet results for each Group?

2015-10-14 Thread Alessandro Benedetti
mmm let's say that nested facets are a subset of Pivot Facets.
if pivot faceting works with the classic flat document structure, the sub
facet are working with any nested structure.
So be careful about pivot faceting in a flat document with multi valued
fields, because you lose the relation across the different fields value.

Cheers

On 13 October 2015 at 18:06, Peter Sturge  wrote:

> Hi,
> Thanks for your response.
> I did have a look at pivots, and they could work in a way. We're still on
> Solr 4.3, so I'll have to wait for sub-facets - but they sure look pretty
> cool!
> Peter
>
>
> On Tue, Oct 13, 2015 at 12:30 PM, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Can you model your business domain with Solr nested Docs ? In the case
> you
> > can use Yonik article about nested facets.
> >
> > Cheers
> >
> > On 13 October 2015 at 05:05, Alexandre Rafalovitch 
> > wrote:
> >
> > > Could you use the new nested facets syntax?
> > > http://yonik.com/solr-subfacets/
> > >
> > > Regards,
> > >Alex.
> > > 
> > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > http://www.solr-start.com/
> > >
> > > On 11 October 2015 at 09:51, Peter Sturge 
> > wrote:
> > > > Been trying to coerce Group faceting to give some faceting back for
> > each
> > > > group, but maybe this use case isn't catered for in Grouping? :
> > > >
> > > > So the Use Case is this:
> > > > Let's say I do a grouped search that returns say, 9 distinct groups,
> > and
> > > in
> > > > these groups are various numbers of unique field values that need
> > > faceting
> > > > - but the faceting needs to be within each group:
> > >
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card - http://about.me/alessandro_benedetti
> > Blog - http://alexbenedetti.blogspot.co.uk
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: AutoComplete Feature in Solr

2015-10-14 Thread Alessandro Benedetti
using the suggester feature you can in some case rank the suggestions based
on an additional numeric field.
It's not your use case, you actually want to use a search handler with a
well defined schema that will allow you for example to query on an edge
ngram token filtered field, applying a geo distance boost function.

This is what i would use and would work fine with your applied filter
queries as well ( reducing the space of Suggestions)

Cheers

On 14 October 2015 at 05:09, William Bell  wrote:

> We want to use suggester but also want to show those results closest to my
> lat,long... Kinda combine suggester and bq=geodist()
>
> On Mon, Oct 12, 2015 at 2:24 PM, Salman Ansari 
> wrote:
>
> > Hi,
> >
> > I have been trying to get the autocomplete feature in Solr working with
> no
> > luck up to now. First I read that "suggest component" is the recommended
> > way as in the below article (and this is the exact functionality I am
> > looking for, which is to autocomplete multiple words)
> >
> >
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
> >
> > Then I tried implementing suggest as described in the following articles
> in
> > this order
> > 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration
> > 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/  (I
> > implemented suggesting phrases)
> > 3)
> >
> >
> http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms
> >
> > With no luck, after implementing each article when I run my query as
> > http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack
> >
> >
> >
> > I get
> > 
> > 
> > 0
> > 0
> > 
> > 
> >
> >  Although I have an entry for Barack Obama in my index. I am posting my
> > Solr configuration as well
> >
> > 
> >  
> >   suggest
> >   org.apache.solr.spelling.suggest.Suggester
> >> name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup
> >   entity_autocomplete
> > true
> >  
> > 
> >
> >   > class="org.apache.solr.handler.component.SearchHandler">
> >  
> >   true
> >   suggest
> >   10
> > true
> > false
> >  
> >  
> >   suggest
> >  
> > 
> >
> > It looks like a very simple job, but even after following so many
> articles,
> > I could not get it right. Any comment will be appreciated!
> >
> > Regards,
> > Salman
> >
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: catchall fields or multiple fields

2015-10-14 Thread elisabeth benoit
Thanks for your suggestion Jack. In fact we're doing geographic search
(fields are country, state, county, town, hamlet, district)

So it's difficult to split.

Best regards,
Elisabeth

2015-10-13 16:01 GMT+02:00 Jack Krupansky :

> Performing a sequence of queries can help too. For example, if users
> commonly search for a product name, you could do an initial query on just
> the product name field which should be much faster than searching the text
> of all product descriptions, and highlighting would be less problematic. If
> that initial query comes up empty, then you could move on to the next
> highest most likely field, maybe product title (short one line
> description), and query voluminous fields like detailed product
> descriptions, specifications, and user comments/reviews only as a last
> resort.
>
> -- Jack Krupansky
>
> On Tue, Oct 13, 2015 at 6:17 AM, elisabeth benoit <
> elisaelisael...@gmail.com
> > wrote:
>
> > Thanks to you all for those informed advices.
> >
> > Thanks Trey for your very detailed point of view. This is now very clear
> to
> > me how a search on multiple fields can grow slower than a search on a
> > catchall field.
> >
> > Our actual search model is problematic: we search on a catchall field,
> but
> > need to know which fields match, so we do highlighting on multi fields
> (not
> > indexed, but stored). To improve performance, we want to get rid of
> > highlighting and use the solr explain output. To get the explain output
> on
> > those fields, we need to do a search on those fields.
> >
> > So I guess we have to test if removing highlighting and adding multi
> fields
> > search will improve performances or not.
> >
> > Best regards,
> > Elisabeth
> >
> >
> >
> > 2015-10-12 17:55 GMT+02:00 Jack Krupansky :
> >
> > > I think it may all depend on the nature of your application and how
> much
> > > commonality there is between fields.
> > >
> > > One interesting area is auto-suggest, where you can certainly suggest
> > from
> > > the union of all fields, you may want to give priority to suggestions
> > from
> > > preferred fields. For example, for actual product names or important
> > > keywords rather than random words from the English language that happen
> > to
> > > occur in descriptions, all of which would occur in a catchall.
> > >
> > > -- Jack Krupansky
> > >
> > > On Mon, Oct 12, 2015 at 8:39 AM, elisabeth benoit <
> > > elisaelisael...@gmail.com
> > > > wrote:
> > >
> > > > Hello,
> > > >
> > > > We're using solr 4.10 and storing all data in a catchall field. It
> > seems
> > > to
> > > > me that one good reason for using a catchall field is when using
> > scoring
> > > > with idf (with idf, a word might not have same score in all fields).
> We
> > > got
> > > > rid of idf and are now considering using multiple fields. I remember
> > > > reading somewhere that using a catchall field might speed up
> searching
> > > > time. I was wondering if some of you have any opinion (or experience)
> > > > related to this subject.
> > > >
> > > > Best regards,
> > > > Elisabeth
> > > >
> > >
> >
>


Re: Solr Pagination

2015-10-14 Thread Jan Høydahl
I have not benchmarked various number of segments at different sizes
on different HW etc, so my hunch could very well be wrong for Salman’s case.
I don’t know how frequent updates there is to his data either.

Have you done #segments benchmarking for your huge datasets?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 12. okt. 2015 kl. 12.56 skrev Toke Eskildsen :
> 
> On Mon, 2015-10-12 at 10:05 +0200, Jan Høydahl wrote:
>> What you do when you call optimize is to force Lucene to merge all
>> those 35M docs into ONE SINGLE index segment. You get better HW
>> utilization if you let Lucene/Solr automatically handle merging,
>> meaning you’ll have around 10 smaller segments that are faster to
>> search across than one huge segment.
> 
> As individual Lucene/Solr shard searches are very much single threaded,
> the single segment version should be faster. Have you observed
> otherwise?
> 
> 
> Optimization is a fine feature if ones workflow is batch oriented with
> sufficiently long pauses between index updates. Nightly index updates
> with few active users at that time could be an example.
> 
> - Toke Eskildsen, State and University Library, Denmark
> 
> 



Re: Using SimpleNaiveBayesClassifier in solr

2015-10-14 Thread Alessandro Benedetti
ahahah absolutely not, you don't sound dumb.

You need only a basic knowledge of how Lucene manage IndexReaders and
IndexSearchers.

On 14 October 2015 at 09:08, Yewint Ko  wrote:

> Thank Ales and Tommaso for your replies
>
> So, is it like the classifier query the whole index db and load onto memory
> first before running tokenizer against InputDocument?


Your Index for durability is flushed on the disk on every hard commit, so
it will be physically present as a set of disk files ( each file is related
to a specific data structure in the Index) in your data directory.
At this point Lucene model the data directory depending of the
implementation of the Index Directory.
For example in Solr the default Lucene Index Directory implementation is :

NRTCachingDirectoryFactory .

This Directory implementation is based on the OS feature of Memory Mapping,
optimized to manage Near Real time caching of small files for NRT Search
systems.
This means that Lucene will leverage the OS memory map implementation (
using the memory available to the OS) .
Ideally if your RAM memory allows that, you are going to see the entire
index in memory and searches will be really fast.
If all the index does't fix , partition of itself are going to pass in the
memory, and some I/O will happen, with a degradation of performances.

Hope this clarifies your first doubt. It is not a requirement for the index
to be load in memory immediately, but files will be cached in memory over
time, during the life of your system.



> It sounds like if I
> don't close the classifier and my index is big,  i might need bigger
> machine. Anyway to reverse the order? Do I sound dump?
>

I would not be so worried about this, the memory mapping management , will
be quite efficient, just focus on the implementation of your functionality
and prototype to see the performances. If you don't match the expected, you
can try to improve the bottlenecks, maybe can be the disk I/O, in the case
you could switch to a SSD and improve the problem, etc etc

Cheers


>
> On 12 October 2015 at 16:11, Alessandro Benedetti <
> benedetti.ale...@gmail.com> wrote:
>
> > Hi Yewint,
> > >
> > > The sample test code inside seems like that classifier read the whole
> > index
> > > db to train the model everytime when classification happened for
> > > inputDocument. or am I misunderstanding something here?
> >
> >
> > I would suggest you to take a look to a couple of articles I wrote last
> > summer about the Classification in Lucene and Solr :
> >
> >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/lucene-document-classification.html
> >
> >
> >
> http://alexbenedetti.blogspot.co.uk/2015/07/solr-document-classification-part-1.html
> >
> > Basically your misunderstood is that this module work as standard
> > classifier, which is not our case.
> > Lucene Classification doesn't train a model over time, the Index is your
> > model.
> > It uses the Index data structures to perform the classification processes
> > (Knn and Simple Bayes are the algorithms I explored at that time) .
> > Basically the algorithms access to Term Frequencies and Document
> > Frequencies stored in the Inverted index.
> >
> > Having a big Index will affect as of course we are querying the index,
> but
> > not because we are building a model.
> >
> > +1 on all Tommaso's observations!
> >
> > Cheers
> >
> >
> >
> > On 10 October 2015 at 20:36, Yewint Ko  wrote:
> >
> > > Hi
> > >
> > > I am trying to use SimpleNaiveBayesClassifier in my solr project.
> > Currently
> > > looking at its test base ClassificationTestBase.java.
> > >
> > > The sample test code inside seems like that classifier read the whole
> > index
> > > db to train the model everytime when classification happened for
> > > inputDocument. or am I misunderstanding something here? If i had a
> large
> > > index db, will it impact performance?
> > >
> > > protected void checkCorrectClassification(Classifier classifier,
> > String
> > > inputDoc, T expectedResult, Analyzer analyzer, String textFieldName,
> > String
> > > classFieldName, Query query) throws Exception {
> > >
> > > AtomicReader atomicReader = null;
> > >
> > > try {
> > >
> > >   populateSampleIndex(analyzer);
> > >
> > >   atomicReader = SlowCompositeReaderWrapper.wrap(indexWriter
> > > .getReader());
> > >
> > >   classifier.train(atomicReader, textFieldName, classFieldName,
> > > analyzer,
> > > query);
> > >
> > >   ClassificationResult classificationResult =
> > > classifier.assignClass(
> > > inputDoc);
> > >
> > >   assertNotNull(classificationResult.getAssignedClass());
> > >
> > >   assertEquals("got an assigned class of " +
> > > classificationResult.getAssignedClass(),
> > > expectedResult, classificationResult.getAssignedClass());
> > >
> > >   assertTrue("got a not positive score " +
> > > classificationResult.getScore(),
> > > classificationResult.getScore() > 0);
> > >
> > > } 

Re: Can I use tokenizer twice ?

2015-10-14 Thread Steve Rowe
Hi,

Analyzers must have exactly one tokenizer, no more and no less.

You could achieve what you want by copying to another field and defining a 
separate analyzer for each.  One would create shingles, and the other edge 
ngrams.  

Steve

> On Oct 14, 2015, at 11:58 AM, vit  wrote:
> 
> I have Solr 4.2
> I need to do the following:
> 
> 1. white space tokenize
> 2. create shingles
> 3. use EdgeNGramFilter for each word in shingles, but not in a shingle as a
> string
> 
> So can I do this?
> 
> * *
> 
>  maxShingleSize="4" outputUnigrams="false" outputUnigramsIfNoShingles="true"
> />
> * *
>  maxGramSize="25"/>
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438.html
> Sent from the Solr - User mailing list archive at Nabble.com.



Re: AutoComplete Feature in Solr

2015-10-14 Thread Salman Ansari
Actually what you mentioned Alessandro is something interesting for me. I
am looking to boost the ranking of some suggestions based on some dynamic
criteria (let's say how frequent they are used). Do I need to update the
boost field each time I request the suggestion (to capture the frequency)?
If you can direct me to an article that explains this with some scenarios
of using boost that would be appreciated.

Regards,
Salman


On Wed, Oct 14, 2015 at 11:49 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> using the suggester feature you can in some case rank the suggestions based
> on an additional numeric field.
> It's not your use case, you actually want to use a search handler with a
> well defined schema that will allow you for example to query on an edge
> ngram token filtered field, applying a geo distance boost function.
>
> This is what i would use and would work fine with your applied filter
> queries as well ( reducing the space of Suggestions)
>
> Cheers
>
> On 14 October 2015 at 05:09, William Bell  wrote:
>
> > We want to use suggester but also want to show those results closest to
> my
> > lat,long... Kinda combine suggester and bq=geodist()
> >
> > On Mon, Oct 12, 2015 at 2:24 PM, Salman Ansari 
> > wrote:
> >
> > > Hi,
> > >
> > > I have been trying to get the autocomplete feature in Solr working with
> > no
> > > luck up to now. First I read that "suggest component" is the
> recommended
> > > way as in the below article (and this is the exact functionality I am
> > > looking for, which is to autocomplete multiple words)
> > >
> > >
> >
> http://blog.trifork.com/2012/02/15/different-ways-to-make-auto-suggestions-with-solr/
> > >
> > > Then I tried implementing suggest as described in the following
> articles
> > in
> > > this order
> > > 1) https://wiki.apache.org/solr/Suggester#SearchHandler_configuration
> > > 2) http://solr.pl/en/2010/11/15/solr-and-autocomplete-part-2/  (I
> > > implemented suggesting phrases)
> > > 3)
> > >
> > >
> >
> http://stackoverflow.com/questions/18132819/how-to-have-solr-autocomplete-on-whole-phrase-when-query-contains-multiple-terms
> > >
> > > With no luck, after implementing each article when I run my query as
> > > http://[MySolr]:8983/solr/entityStore114/suggest?spellcheck.q=Barack
> > >
> > >
> > >
> > > I get
> > > 
> > > 
> > > 0
> > > 0
> > > 
> > > 
> > >
> > >  Although I have an entry for Barack Obama in my index. I am posting my
> > > Solr configuration as well
> > >
> > > 
> > >  
> > >   suggest
> > >name="classname">org.apache.solr.spelling.suggest.Suggester
> > >> > name="lookupImpl">org.apache.solr.spelling.suggest.fst.FSTLookup
> > >   entity_autocomplete
> > > true
> > >  
> > > 
> > >
> > >   > > class="org.apache.solr.handler.component.SearchHandler">
> > >  
> > >   true
> > >   suggest
> > >   10
> > > true
> > > false
> > >  
> > >  
> > >   suggest
> > >  
> > > 
> > >
> > > It looks like a very simple job, but even after following so many
> > articles,
> > > I could not get it right. Any comment will be appreciated!
> > >
> > > Regards,
> > > Salman
> > >
> >
> >
> >
> > --
> > Bill Bell
> > billnb...@gmail.com
> > cell 720-256-8076
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Grouping facets: Possible to get facet results for each Group?

2015-10-14 Thread Peter Sturge
Yes, you are right about that - I've used pivots before and they do need to
be used judiciously.
Fortunately, we only ever use single-value fields, as it gives some good
advantages in a heavily sharded environment.
Our document structure is, by it's very nature always flat, so it could be
an impediment to nested facets, but I don't know enough about them to know
for sure.
Thanks,
Peter


On Wed, Oct 14, 2015 at 9:44 AM, Alessandro Benedetti <
benedetti.ale...@gmail.com> wrote:

> mmm let's say that nested facets are a subset of Pivot Facets.
> if pivot faceting works with the classic flat document structure, the sub
> facet are working with any nested structure.
> So be careful about pivot faceting in a flat document with multi valued
> fields, because you lose the relation across the different fields value.
>
> Cheers
>
> On 13 October 2015 at 18:06, Peter Sturge  wrote:
>
> > Hi,
> > Thanks for your response.
> > I did have a look at pivots, and they could work in a way. We're still on
> > Solr 4.3, so I'll have to wait for sub-facets - but they sure look pretty
> > cool!
> > Peter
> >
> >
> > On Tue, Oct 13, 2015 at 12:30 PM, Alessandro Benedetti <
> > benedetti.ale...@gmail.com> wrote:
> >
> > > Can you model your business domain with Solr nested Docs ? In the case
> > you
> > > can use Yonik article about nested facets.
> > >
> > > Cheers
> > >
> > > On 13 October 2015 at 05:05, Alexandre Rafalovitch  >
> > > wrote:
> > >
> > > > Could you use the new nested facets syntax?
> > > > http://yonik.com/solr-subfacets/
> > > >
> > > > Regards,
> > > >Alex.
> > > > 
> > > > Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> > > > http://www.solr-start.com/
> > > >
> > > > On 11 October 2015 at 09:51, Peter Sturge 
> > > wrote:
> > > > > Been trying to coerce Group faceting to give some faceting back for
> > > each
> > > > > group, but maybe this use case isn't catered for in Grouping? :
> > > > >
> > > > > So the Use Case is this:
> > > > > Let's say I do a grouped search that returns say, 9 distinct
> groups,
> > > and
> > > > in
> > > > > these groups are various numbers of unique field values that need
> > > > faceting
> > > > > - but the faceting needs to be within each group:
> > > >
> > >
> > >
> > >
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card - http://about.me/alessandro_benedetti
> > > Blog - http://alexbenedetti.blogspot.co.uk
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> > >
> >
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>


Re: Can I use tokenizer twice ?

2015-10-14 Thread vitaly bulgakov
Steve,
/You could achieve what you want by copying to another field and defining a
separate analyzer for each.  One would create shingles, and the other edge
ngrams. /  

Could you please elaborate this. I am not sure I understand how to do it by
using copyField.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-I-use-tokenizer-twice-tp4234438p4234503.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Run Solr 5.3.0 as a Service on Windows using NSSM

2015-10-14 Thread Adrian Liew
Hi,

I am trying to implement some scripting to detect if all Zookeepers have 
started in a cluster, then restart the solr servers. Has anyone achieved this 
yet through scripting?

I also saw there is the ZookeeperClient that is available in .NET via a nuget 
package. Not sure if this could be also implemented to check if a zookeeper is 
running.

Any thoughts?

Regards,
Adrian

-Original Message-
From: Anders Thulin [mailto:anders.thu...@comintelli.com] 
Sent: Wednesday, October 14, 2015 11:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Run Solr 5.3.0 as a Service on Windows using NSSM

Did you add the f param for running it in foreground?
I noticed that the Solr service was restarted indefinetly when running it as a 
background service.
its also needed to stop the windows service.

This test worked well here (on Windows 2012):

REM Test for running solr 5.3.1 as a windows service C:\nssm\nssm64.exe install 
"Solr 5.3.1" C:\search\solr-5.3.1\bin\solr.cmd "start -f -p 8983"

On 8 October 2015 at 04:34, Zheng Lin Edwin Yeo 
wrote:

> Hi Adrian and Upayavira,
>
> It works fine when I start Solr outside NSSM.
> As for the NSSM, so far I haven't tried the automatic startup yet. I 
> start the services for ZooKeeper and Solr in NSSM manually from the 
> Windows Component Services, so the ZooKeeper will have been started 
> before I start Solr.
>
> I'll also try to write the script for Solr that can check it can 
> access Zookeeper before attempting to start Solr.
>
> Regards,
> Edwin
>
>
> On 7 October 2015 at 19:16, Upayavira  wrote:
>
> > Wrap your script that starts Solr with one that checks it can access 
> > Zookeeper before attempting to start Solr, that way, once ZK starts, 
> > Solr will come up. Then, hand *that* script to NSSM.
> >
> > And finally, when one of you has got a setup that works with NSSM 
> > starting Solr via the default bin\solr.cmd script, create a patch 
> > and upload it to JIRA. It would be a valuable thing for Solr to have 
> > a
> > *standard* way to start Solr on Windows as a service. I recall 
> > checking the NSSM license and it wouldn't be an issue to include it 
> > within Solr - or to have a script that assumes it is installed.
> >
> > Upayavira
> >
> > On Wed, Oct 7, 2015, at 11:49 AM, Adrian Liew wrote:
> > > Hi Edwin,
> > >
> > > You may want to try explore some of the configuration properties 
> > > to configure in zookeeper.
> > >
> > >
> >
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#sc_zkMulitS
> erverSetup
> > >
> > > My recommendation is to try run your batch files outside of NSSM 
> > > so it
> is
> > > easier to debug and observe what you see from the command window. 
> > > I
> don't
> > > think ZK and Solr can be automated on startup well using NSSM due 
> > > to
> the
> > > fact that ZK services need to be running before you start up Solr 
> > > services. I just had conversation with Shawn on this topic. NSSM 
> > > cannot do the magic startup in a cluster setup. In that, you may 
> > > need to write custom scripting to get it right.
> > >
> > > Back to your original issue, I guess it is worth exploring timeout 
> > > values. Then again, I will leave the real Solr experts to chip in 
> > > their thoughts.
> > >
> > > Best regards,
> > >
> > > Adrian Liew
> > >
> > >
> > > -Original Message-
> > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
> > > Sent: Wednesday, October 7, 2015 1:40 PM
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: Run Solr 5.3.0 as a Service on Windows using NSSM
> > >
> > > Hi Adrian,
> > >
> > > I've waited for more than 5 minutes and most of the time when I 
> > > refresh it says that the page cannot be found. Got one or twice 
> > > the main Admin page is loaded, but none of the cores are loaded.
> > >
> > > I have 20 cores which I'm loading. The core are of various sizes, 
> > > but
> the
> > > maximum one is 38GB. Others ranges from 10GB to 15GB, and there're 
> > > some which are less than 1GB.
> > >
> > > My overall core size is about 200GB.
> > >
> > > Regards,
> > > Edwin
> > >
> > >
> > > On 7 October 2015 at 12:11, Adrian Liew 
> wrote:
> > >
> > > > Hi Edwin,
> > > >
> > > > I have setup NSSM on Solr 5.3.0 in an Azure VM and can start up 
> > > > Solr with a base standalone installation.
> > > >
> > > > You may have to give Solr some time to bootstrap things and wait 
> > > > for the page to reload. Are you still seeing the page after 1 
> > > > minute or
> so?
> > > >
> > > > What are your core sizes? And how many cores are you trying to load?
> > > >
> > > > Best regards,
> > > > Adrian
> > > >
> > > > -Original Message-
> > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
> > > > Sent: Wednesday, October 7, 2015 11:46 AM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Run Solr 5.3.0 as a Service on Windows using NSSM
> > > >
> > > > Hi,
> > > >
> > > > I tried to follow this to 

Re: Run Solr 5.3.0 as a Service on Windows using NSSM

2015-10-14 Thread Zheng Lin Edwin Yeo
Hi Anders,

Yes, I did put the -f param for running it in foreground.
I put start -f -p 8983 in the Arugments parameters in NSSM service
installer.

Is that the correct place to put for Solr 5.3.0? I did the same way for
Solr 5.1 and it was working then. I'm using Windows 8.1.

Regards,
Edwin


On 14 October 2015 at 23:44, Anders Thulin 
wrote:

> Did you add the f param for running it in foreground?
> I noticed that the Solr service was restarted indefinetly when running it
> as a background service.
> its also needed to stop the windows service.
>
> This test worked well here (on Windows 2012):
>
> REM Test for running solr 5.3.1 as a windows service
> C:\nssm\nssm64.exe install "Solr 5.3.1" C:\search\solr-5.3.1\bin\solr.cmd
> "start -f -p 8983"
>
> On 8 October 2015 at 04:34, Zheng Lin Edwin Yeo 
> wrote:
>
> > Hi Adrian and Upayavira,
> >
> > It works fine when I start Solr outside NSSM.
> > As for the NSSM, so far I haven't tried the automatic startup yet. I
> start
> > the services for ZooKeeper and Solr in NSSM manually from the Windows
> > Component Services, so the ZooKeeper will have been started before I
> start
> > Solr.
> >
> > I'll also try to write the script for Solr that can check it can access
> > Zookeeper before attempting to start Solr.
> >
> > Regards,
> > Edwin
> >
> >
> > On 7 October 2015 at 19:16, Upayavira  wrote:
> >
> > > Wrap your script that starts Solr with one that checks it can access
> > > Zookeeper before attempting to start Solr, that way, once ZK starts,
> > > Solr will come up. Then, hand *that* script to NSSM.
> > >
> > > And finally, when one of you has got a setup that works with NSSM
> > > starting Solr via the default bin\solr.cmd script, create a patch and
> > > upload it to JIRA. It would be a valuable thing for Solr to have a
> > > *standard* way to start Solr on Windows as a service. I recall checking
> > > the NSSM license and it wouldn't be an issue to include it within Solr
> -
> > > or to have a script that assumes it is installed.
> > >
> > > Upayavira
> > >
> > > On Wed, Oct 7, 2015, at 11:49 AM, Adrian Liew wrote:
> > > > Hi Edwin,
> > > >
> > > > You may want to try explore some of the configuration properties to
> > > > configure in zookeeper.
> > > >
> > > >
> > >
> >
> http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html#sc_zkMulitServerSetup
> > > >
> > > > My recommendation is to try run your batch files outside of NSSM so
> it
> > is
> > > > easier to debug and observe what you see from the command window. I
> > don't
> > > > think ZK and Solr can be automated on startup well using NSSM due to
> > the
> > > > fact that ZK services need to be running before you start up Solr
> > > > services. I just had conversation with Shawn on this topic. NSSM
> cannot
> > > > do the magic startup in a cluster setup. In that, you may need to
> write
> > > > custom scripting to get it right.
> > > >
> > > > Back to your original issue, I guess it is worth exploring timeout
> > > > values. Then again, I will leave the real Solr experts to chip in
> their
> > > > thoughts.
> > > >
> > > > Best regards,
> > > >
> > > > Adrian Liew
> > > >
> > > >
> > > > -Original Message-
> > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
> > > > Sent: Wednesday, October 7, 2015 1:40 PM
> > > > To: solr-user@lucene.apache.org
> > > > Subject: Re: Run Solr 5.3.0 as a Service on Windows using NSSM
> > > >
> > > > Hi Adrian,
> > > >
> > > > I've waited for more than 5 minutes and most of the time when I
> refresh
> > > > it says that the page cannot be found. Got one or twice the main
> Admin
> > > > page is loaded, but none of the cores are loaded.
> > > >
> > > > I have 20 cores which I'm loading. The core are of various sizes, but
> > the
> > > > maximum one is 38GB. Others ranges from 10GB to 15GB, and there're
> some
> > > > which are less than 1GB.
> > > >
> > > > My overall core size is about 200GB.
> > > >
> > > > Regards,
> > > > Edwin
> > > >
> > > >
> > > > On 7 October 2015 at 12:11, Adrian Liew 
> > wrote:
> > > >
> > > > > Hi Edwin,
> > > > >
> > > > > I have setup NSSM on Solr 5.3.0 in an Azure VM and can start up
> Solr
> > > > > with a base standalone installation.
> > > > >
> > > > > You may have to give Solr some time to bootstrap things and wait
> for
> > > > > the page to reload. Are you still seeing the page after 1 minute or
> > so?
> > > > >
> > > > > What are your core sizes? And how many cores are you trying to
> load?
> > > > >
> > > > > Best regards,
> > > > > Adrian
> > > > >
> > > > > -Original Message-
> > > > > From: Zheng Lin Edwin Yeo [mailto:edwinye...@gmail.com]
> > > > > Sent: Wednesday, October 7, 2015 11:46 AM
> > > > > To: solr-user@lucene.apache.org
> > > > > Subject: Run Solr 5.3.0 as a Service on Windows using NSSM
> > > > >
> > > > > Hi,
> > > > >
> > > > > I tried to follow this to start my Solr as a service 

Re: partial search EdgeNGramFilterFactory

2015-10-14 Thread Brian Narsi
Thank you Erick. Yes it was the default search field.

So for the following SellerName:

1) cardinal healthcare products
2) cardinal healthcare
3) postoperative cardinal healthcare
4) surgical cardinal products

My requirement is:
q=SellerName:cardinal - all 4 records returned
q=SellerName:healthcare - 1,2,3 returned
q=SellerName:surgical cardinal - 4 returned
q=SellerName:cardinal healthcare - 1,2,3 returned
q=SellerName:products - 1,4 returned
q=SellerName:car - nothing returned
q=SellerName:card - all 4 returned

How should I setup my fieldtype?

Thanks


On Wed, Oct 14, 2015 at 1:14 PM, Erick Erickson 
wrote:

> try adding =true to your query. The query
> q=SellerName:cardinal he
> actually parses as
> q=SellerName:cardinal defaultSearchField:he
>
> so I suspect you're getting on the default search field.
>
> I'm not sure EdgeNGram is what you want here though.
> That only grams individual tokens, so CARDINAL is grammed
> totally separately from HEALTH. You might consider
> a different tokenizer, say KeywordTokenizer and LowerCaseFilter
> followed by edgeNGram to treat the whole thing as a unit. You'd have
> to take some care to make sure you escaped spaces to get
> the whole thing through the query parser though.
>
> Best,
> Erick
>
> On Wed, Oct 14, 2015 at 11:03 AM, Brian Narsi  wrote:
> > I have the following fieldtype in my schema:
> >
> > > positionIncrementGap="100">
> > 
> > 
> > 
> >  > maxGramSize="25"/>
> > 
> > 
> > 
> > 
> > 
> >   
> >
> > and the following field:
> >  > required="true" multiValued="false" />
> >
> > With the following data:
> >  SellerName:CARDINAL HEALTH
> >
> > When I do the following search
> >
> > q:SellerName:cardinal
> >
> > I get back the results with SellerName: CARDINAL HEALTH (correct)
> >
> > or I do the search
> >
> > q:SellerName:cardinal he
> >
> > I get back the results with SellerName: CARDINAL HEALTH (correct)
> >
> > But when I do the search
> >
> > q:SellerName:cardinal hea
> >
> > I am getting the results back with SellerName:INTEGRA RADIONICS
> >
> > Why is that?
> >
> > I need it to continue to return the correct results with CARDINAL HEALTH.
> > How do I make that happen?
> >
> > Thanks in advance,
>