date:20140211

Facets for fields in subdocuments with block join, is it possible?

2014-02-11 Thread Henning Ivan Solberg


Hello,

I'm testing block join in solr 4.6.1 and wondering, is it possible to 
get facets for fields in subdocuments with number of hits based on ROOT 
documents?


See example below:

doc
documentPartROOT/documentPart
texttesting 123/text
titletitle/test
groupGRP/group
subdocument
field3khat/field3
field47000/field4
field5purchase/field5
/subdocoment
subdocument
field3cannabis/field3
field4500/field4
field5sale/field5
/subdocoment
/doc

My query looks like this:

solrQuery.setQuery(text:testing);
solrQuery.setFilterQueries({!parent 
which=\dokumentPart:ROOT\}field3:khat);

solrQuery.setFacet(true);
solrQuery.addFacetField(group,field5);

This does not give me any facets for the subdocument fields, so i'm 
thinking, could a solution be to execute a second query to get the 
facets for the subdocument by join from parent to child whith a {!child 
of=} query like this:


solrQuery.setQuery({!child of=\dokumentPart:ROOT\}text:testing);
solrQuery.setFilterQueries(field3:khat);
solrQuery.setFacet(true);
solrQuery.addFacetField(field5,field4, field3);

The problem with this method is that the facet count will be based on 
sub documents and not ROOT/parent documents...


Is there a silver bullet for this kind of requirement?

Yours faithfully

Henning Solberg

Re: Group.Facet issue in Sharded Solr Setup

2014-02-11 Thread rks_lucene

Quick follow up on my question below and if anyone is using Group.facets in a
sharded solr setup ?

Based on further testing, the group.facets counts dont seem reliable at all
for lesser popular items in the facet list.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-Facet-issue-in-Sharded-Solr-Setup-tp4116077p4116635.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Facets for fields in subdocuments with block join, is it possible?

2014-02-11 Thread Mikhail Khludnev

Hello Henning,

There is no open source facet component for child level of block-join.
There is no even open jira for this.

Don.t think it helps.
11.02.2014 12:22 пользователь Henning Ivan Solberg h...@lovdata.no
написал:

 Hello,

 I'm testing block join in solr 4.6.1 and wondering, is it possible to get
 facets for fields in subdocuments with number of hits based on ROOT
 documents?

 See example below:

 doc
 documentPartROOT/documentPart
 texttesting 123/text
 titletitle/test
 groupGRP/group
 subdocument
 field3khat/field3
 field47000/field4
 field5purchase/field5
 /subdocoment
 subdocument
 field3cannabis/field3
 field4500/field4
 field5sale/field5
 /subdocoment
 /doc

 My query looks like this:

 solrQuery.setQuery(text:testing);
 solrQuery.setFilterQueries({!parent which=\dokumentPart:ROOT\}
 field3:khat);
 solrQuery.setFacet(true);
 solrQuery.addFacetField(group,field5);

 This does not give me any facets for the subdocument fields, so i'm
 thinking, could a solution be to execute a second query to get the facets
 for the subdocument by join from parent to child whith a {!child of=} query
 like this:

 solrQuery.setQuery({!child of=\dokumentPart:ROOT\}text:testing);
 solrQuery.setFilterQueries(field3:khat);
 solrQuery.setFacet(true);
 solrQuery.addFacetField(field5,field4, field3);

 The problem with this method is that the facet count will be based on sub
 documents and not ROOT/parent documents...

 Is there a silver bullet for this kind of requirement?

 Yours faithfully

 Henning Solberg

Set up embedded Solr container and cores programmatically to read their configs from the classpath

2014-02-11 Thread Robert Krüger

Hi,

I have an application with an embedded Solr instance (and I want to
keep it embedded) and so far I have been setting up my Solr
installation programmatically using folder paths to specify where the
specific container or core configs are.

I have used the CoreContainer methods createAndLoad and create using
File arguments and this works fine. However, now I want to change this
so that all configuration files are loaded from certain locations
using the classloader but I have not been able to get this to work.

E.g. I want to have my solr config located in the classpath at

my/base/package/solr/conf

and the core configs at

my/base/package/solr/cores/core1/conf,
my/base/package/solr/cores/core2/conf

etc..

Is this possible at all? Looking through the source code it seems that
specifying classpath resources in such a qualified way is not
supported but I may be wrong.

I could get this to work for the container by supplying my own
implementation of SolrResourceLoader that allows a base path to be
specified for the resources to be loaded (I first thought that would
happen already when specifying instanceDir accordingly but looking at
the code it does not. for resources loaded through the classloader,
instanceDir is not prepended). However then I am stuck with the
loading of the cores' resources as the respective code (see
org.apache.solr.core.CoreContainer#createFromLocal) instantiates a
SolResourceLoader internally.

Thanks for any help with this (be it a clarification that it is not possible).

Robert

How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Furkan KAMACI

Hi;

I've written a code that I can update a file to Zookeeper for SlorCloud.
Currently I have many configurations at Zookeeper for SolrCloud. I want to
update synonyms.txt file so I should know the currently linked
configuration (I will update the synonyms.txt file under appropriate
configuration folder) How can I learn it?

Thanks;
Furkan KAMACI

Re: How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Alan Woodward

For a particular collection or core?  There should be a collection.configName 
property specified for the core or collection which tells you which ZK config 
directory is being used.

Alan Woodward
www.flax.co.uk


On 11 Feb 2014, at 11:49, Furkan KAMACI wrote:

 Hi;
 
 I've written a code that I can update a file to Zookeeper for SlorCloud.
 Currently I have many configurations at Zookeeper for SolrCloud. I want to
 update synonyms.txt file so I should know the currently linked
 configuration (I will update the synonyms.txt file under appropriate
 configuration folder) How can I learn it?
 
 Thanks;
 Furkan KAMACI

Re: How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Furkan KAMACI

I am looking it for a particular collection.


2014-02-11 13:55 GMT+02:00 Alan Woodward a...@flax.co.uk:

 For a particular collection or core?  There should be a
 collection.configName property specified for the core or collection which
 tells you which ZK config directory is being used.

 Alan Woodward
 www.flax.co.uk


 On 11 Feb 2014, at 11:49, Furkan KAMACI wrote:

  Hi;
 
  I've written a code that I can update a file to Zookeeper for SlorCloud.
  Currently I have many configurations at Zookeeper for SolrCloud. I want
 to
  update synonyms.txt file so I should know the currently linked
  configuration (I will update the synonyms.txt file under appropriate
  configuration folder) How can I learn it?
 
  Thanks;
  Furkan KAMACI

Re: How to Learn Linked Configuration for SolrCloud at Zookeeper

2014-02-11 Thread Furkan KAMACI

Hi;

OK, I've checked the source code and implemented that:

   public String readConfigName(SolrZkClient zkClient, String collection)
throws KeeperException, InterruptedException {

  String configName = null;

  String path = ZkStateReader.COLLECTIONS_ZKNODE + / + collection;

  LOGGER.info(Load collection config from: + path);
  byte[] data = zkClient.getData(path, null, null, true);

  if (data != null) {
 ZkNodeProps props = ZkNodeProps.load(data);
 configName = props.getStr(CONFIGNAME_PROP);
  }

  if (configName != null  !zkClient.exists(CONFIGS_ZKNODE + / +
configName, true)) {
 LOGGER.error(Specified config does not exist in ZooKeeper: +
configName);
 throw new ZooKeeperException(SolrException.ErrorCode.SERVER_ERROR,
Specified config does not exist in ZooKeeper: + configName);
  }
  return configName;
   }

So, I can get the linked configuration name.

Thanks;
Furkan KAMACI


2014-02-11 13:57 GMT+02:00 Furkan KAMACI furkankam...@gmail.com:

 I am looking it for a particular collection.


 2014-02-11 13:55 GMT+02:00 Alan Woodward a...@flax.co.uk:

 For a particular collection or core?  There should be a
 collection.configName property specified for the core or collection which
 tells you which ZK config directory is being used.

 Alan Woodward
 www.flax.co.uk


 On 11 Feb 2014, at 11:49, Furkan KAMACI wrote:

  Hi;
 
  I've written a code that I can update a file to Zookeeper for SlorCloud.
  Currently I have many configurations at Zookeeper for SolrCloud. I want
 to
  update synonyms.txt file so I should know the currently linked
  configuration (I will update the synonyms.txt file under appropriate
  configuration folder) How can I learn it?
 
  Thanks;
  Furkan KAMACI

Re: Lowering query time

2014-02-11 Thread Joel Cohen

I'd like to thank you for lending a hand on my query time problem with
SolrCloud. By switching to a single shard with replicas setup, I've reduced
my query time to 18 msec. My full ingestion of 300k+ documents went down
from 2 hours 50 minutes to 1 hour 40 minutes. There are some code changes
that are going in that should help a bit as well. Big thanks to everyone
that had suggestions.


On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch arafa...@gmail.comwrote:

 I suspect faceting is the issue here. The actual query you have shown
 seem to bring back a single document (or a single set of document for
 a product):
 fq=id:(320403401)

 On the other hand, you are asking for 4 field facets:
 facet.field=q_virtualCategory_ss
 facet.field=q_brand_s
 facet.field=q_color_s
 facet.field=q_category_ss
 AND 2 range facets, both clustered/grouped:
 facet.range=daysSinceStart_i
 facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)

 And for all facets you have asked to bring back ALL of the results:
 facet.limit=-1

 Plus, you are doing a complex sort:
 sort=popularity_i desc,popularity_i desc

 So, you are probably spending quite a bit of time counting (especially
 in a shared setup) and then quite a bit more sending the response
 back.

 I would check the size of the result document (HTTP result) and see
 how large it is. Maybe you don't need all of the stuff that's coming
 back. I assume you are not actually querying Solr from the client's
 machine (that is I hope it is inside your data centre close to your
 web server), otherwise I would say to look at automatic content
 compression as well to minimize on-wire document size.

 Finally, if your documents have many stored fields (store=true in
 schema.xml) but you only return small subsets of them during search,
 you could look into using enableLazyFieldLoading flag in the
 solrconfig.

 Regards,
Alex.
 P.s. As others said, you don't seem to have too many documents.
 Perhaps you want replication instead of sharding for improved
 performance.
 Personal website: http://www.outerthoughts.com/
 LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
 - Time is the quality of nature that keeps events from happening all
 at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
 book)


 On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
 alexey_kozhemia...@epam.com wrote:
  Btw timing for distributed requests are broken at this moment, it
 doesn't combine values from requests to shards.  I'm working on a patch.
 
  https://issues.apache.org/jira/browse/SOLR-3644
 
  -Original Message-
  From: Jack Krupansky [mailto:j...@basetechnology.com]
  Sent: Tuesday, February 04, 2014 22:00
  To: solr-user@lucene.apache.org
  Subject: Re: Lowering query time
 
  Add the debug=true parameter to some test queries and look at the
 timing
  section to see which search components are taking the time.
 Traditionally, highlighting for large documents was a top culprit.
 
  Are you returning a lot of data or field values? Sometimes reducing the
 amount of data processed can help. Any multivalued fields with lots of
 values?
 
  -- Jack Krupansky
 
  -Original Message-
  From: Joel Cohen
  Sent: Tuesday, February 4, 2014 1:43 PM
  To: solr-user@lucene.apache.org
  Subject: Re: Lowering query time
 
  1. We are faceting. I'm not a developer so I'm not quite sure how we're
 doing it. How can I measure?
  2. I'm not sure how we'd force this kind of document partitioning. I can
 see how my shards are partitioned by looking at the clusterstate.json from
 Zookeeper, but I don't have a clue on how to get documents into specific
 shards.
 
  Would I be better off with fewer shards given the small size of my
 indexes?
 
 
  On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley yo...@heliosearch.com
 wrote:
 
  On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen joel.co...@bluefly.com
  wrote:
   I'm trying to get the query time down to ~15 msec. Anyone have any
   tuning recommendations?
 
  I guess it depends on what the slowest part of the query currently is.
   If you are faceting, it's often that.
  Also, it's often a big win if you can somehow partition documents such
  that requests can normally be serviced from a single shard.
 
  -Yonik
  http://heliosearch.org - native off-heap filters and fieldcache for
  solr
 
 
 
 
  --
 
  joel cohen, senior system engineer
 
  e joel.co...@bluefly.com p 212.944.8000 x276 bluefly, inc. 42 w. 39th
 st. new york, ny 10018 www.bluefly.com 
 http://www.bluefly.com/?referer=autosig | *fly since
  2013...*
 




-- 

joel cohen, senior system engineer

e joel.co...@bluefly.com p 212.944.8000 x276
bluefly, inc. 42 w. 39th st. new york, ny 10018
www.bluefly.com http://www.bluefly.com/?referer=autosig | *fly since
2013...*

Urgent Help. Best Way to have multiple OR Conditions for same field in SOLR

2014-02-11 Thread rajeev.nadgauda

HI,

I am new to SOLR , we have CRM data for Contacts and Companies which are in
millions, we have switched to SOLR for fast search results.

PROBLEM: We have large inclusion and exclusion lists with names of companies
or contacts.
Ex: Include or Exclude : company A  Company B  Company C  
Company n  where assume  n = 1;

What would be the best way to do this kind of a query using SOLR.

WHAT I HAVE TRIED: 
Setting q == field_name: (companyA OR companyB . OR Company n);
This works only for a list of 400 odd.

Looking forward for assistance on this.

Thank You,
Rajeev.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Urgent-Help-Best-Way-to-have-multiple-OR-Conditions-for-same-field-in-SOLR-tp4116681.html
Sent from the Solr - User mailing list archive at Nabble.com.

solr-query with NOT and OR operator

2014-02-11 Thread Johannes Siegert


Hi,

my solr-request contains the following filter-query:

fq=((-(field1:value1)))+OR+(field2:value2).

I expect solr deliver documents matching to ((-(field1:value1))) and 
documents matching to (field2:value2).


But solr deliver only documents, that are the result of (field2:value2). 
I receive several documents, if I request only for ((-(field1:value1))).


Thanks!

Johannes

Re: solr-query with NOT and OR operator

2014-02-11 Thread Mikhail Khludnev

http://wiki.apache.org/solr/CommonQueryParameters#debugQuery
and
http://wiki.apache.org/solr/CommonQueryParameters#explainOther
usually help so much


On Tue, Feb 11, 2014 at 7:57 PM, Johannes Siegert 
johannes.sieg...@marktjagd.de wrote:

 Hi,

 my solr-request contains the following filter-query:

 fq=((-(field1:value1)))+OR+(field2:value2).

 I expect solr deliver documents matching to ((-(field1:value1))) and
 documents matching to (field2:value2).

 But solr deliver only documents, that are the result of (field2:value2). I
 receive several documents, if I request only for ((-(field1:value1))).

 Thanks!

 Johannes




-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Tf-Idf for a specific query

2014-02-11 Thread David Miller

Hi Erick,

Slower queries for getting facets can be tolerated, as long as they don't
affect those without facets. The requirement is for a separate query which
can get me both term vector and facet counts.

One issue I am facing is that, for a search query I only want the term
vectors and facet counts, but not the results/docs. If I set the rows=0,
then term vectors are not returned. Could you suggest some way to achieve
the above.

Also it will be helpful to get a way to get aggregate TF of a term (across
all docs in the query).

Regards,
David






On Sat, Feb 8, 2014 at 10:49 AM, Erick Erickson erickerick...@gmail.comwrote:

 David:

 If you're, say, faceting on fields with lots of unique values, this
 will be quite expensive.
 No idea whether you can tolerate slower queries or not, just sayin'

 Erick

 On Fri, Feb 7, 2014 at 5:35 PM, David Miller davthehac...@gmail.com
 wrote:
  Thanks Mikhai,
 
  It seems that, this was what I was looking for. Being new to this, I
 wasn't
  aware of such a use of facets.
 
  Now I can probably combine the term vectors and facets to fit my
 scenario.
 
  Regards,
  Dave
 
 
  On Fri, Feb 7, 2014 at 2:43 PM, Mikhail Khludnev 
 mkhlud...@griddynamics.com
  wrote:
 
  David,
 
  I can imagine that DF for resultset is facets!
 
 
  On Fri, Feb 7, 2014 at 11:26 PM, David Miller davthehac...@gmail.com
  wrote:
 
   Hi Mikhail,
  
   The DF seems to be based on the entire document set. What I require is
   based on a the results of a single query.
  
   Suppose my Solr query returns a set of 50K documents from a superset
 of
   10Million documents, I require to calculate the DF just based on the
 50K
   documents. But currently it seems to be calculated on the entire doc
 set.
  
   So, is there any way to get the DF or IDF just on basis of the docs
   returned by the query?
  
   Regards,
   Dave
  
  
  
  
  
  
  
   On Fri, Feb 7, 2014 at 5:15 AM, Mikhail Khludnev 
   mkhlud...@griddynamics.com
wrote:
  
Hello Dave
you can get DF from http://wiki.apache.org/solr/TermsComponent(invert
   it
yourself)
then, for certain term you can get number of occurrences per
 document
  by
http://wiki.apache.org/solr/FunctionQuery#tf
   
   
   
On Fri, Feb 7, 2014 at 3:58 AM, David Miller 
 davthehac...@gmail.com
wrote:
   
 Hi Guys..

 I require to obtain Tf-idf score from Solr for a certain set of
documents.
 But the catch is that, I needs the IDF (or DF) to be calculated on
  the
 documents returned by the specific query and not the entire
 corpus.

 Please provide me some hint on whether Solr has this feature or
 if I
   can
 use the Lucene Api directly to achieve this.


 Thanks in advance,
 Dave

   
   
   
--
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics
   
http://www.griddynamics.com
 mkhlud...@griddynamics.com
   
  
 
 
 
  --
  Sincerely yours
  Mikhail Khludnev
  Principal Engineer,
  Grid Dynamics
 
  http://www.griddynamics.com
   mkhlud...@griddynamics.com

Re: solr-query with NOT and OR operator

2014-02-11 Thread Jack Krupansky

With so many parentheses in there, I wonder what you are really trying to 
do Try expressing your query in simple English first so that we can 
understand your goal.


But generally, a purely negative nested query must have a *:* term to apply 
the exclusion against:


fq=((*:* -(field1:value1)))+OR+(field2:value2).

-- Jack Krupansky

-Original Message- 
From: Johannes Siegert

Sent: Tuesday, February 11, 2014 10:57 AM
To: solr-user@lucene.apache.org
Subject: solr-query with NOT and OR operator

Hi,

my solr-request contains the following filter-query:

fq=((-(field1:value1)))+OR+(field2:value2).

I expect solr deliver documents matching to ((-(field1:value1))) and
documents matching to (field2:value2).

But solr deliver only documents, that are the result of (field2:value2).
I receive several documents, if I request only for ((-(field1:value1))).

Thanks!

Johannes

Re: Lowering query time

2014-02-11 Thread Erick Erickson

Hmmm, I'm still a little puzzled BTW. 300K documents, unless they're
huge, shouldn't be taking 100 minutes. I can index 11M documents on
my laptop (Wikipedia dump) in 45 minutes for instance Of course
that's a single core, not cloud and not replicas...

So possibly it' on the data acquisition side? Is your Solr CPU pegged?

YMMV of course.

Erick


On Tue, Feb 11, 2014 at 6:40 AM, Joel Cohen joel.co...@bluefly.com wrote:

 I'd like to thank you for lending a hand on my query time problem with
 SolrCloud. By switching to a single shard with replicas setup, I've reduced
 my query time to 18 msec. My full ingestion of 300k+ documents went down
 from 2 hours 50 minutes to 1 hour 40 minutes. There are some code changes
 that are going in that should help a bit as well. Big thanks to everyone
 that had suggestions.


 On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch arafa...@gmail.com
 wrote:

  I suspect faceting is the issue here. The actual query you have shown
  seem to bring back a single document (or a single set of document for
  a product):
  fq=id:(320403401)
 
  On the other hand, you are asking for 4 field facets:
  facet.field=q_virtualCategory_ss
  facet.field=q_brand_s
  facet.field=q_color_s
  facet.field=q_category_ss
  AND 2 range facets, both clustered/grouped:
  facet.range=daysSinceStart_i
  facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)
 
  And for all facets you have asked to bring back ALL of the results:
  facet.limit=-1
 
  Plus, you are doing a complex sort:
  sort=popularity_i desc,popularity_i desc
 
  So, you are probably spending quite a bit of time counting (especially
  in a shared setup) and then quite a bit more sending the response
  back.
 
  I would check the size of the result document (HTTP result) and see
  how large it is. Maybe you don't need all of the stuff that's coming
  back. I assume you are not actually querying Solr from the client's
  machine (that is I hope it is inside your data centre close to your
  web server), otherwise I would say to look at automatic content
  compression as well to minimize on-wire document size.
 
  Finally, if your documents have many stored fields (store=true in
  schema.xml) but you only return small subsets of them during search,
  you could look into using enableLazyFieldLoading flag in the
  solrconfig.
 
  Regards,
 Alex.
  P.s. As others said, you don't seem to have too many documents.
  Perhaps you want replication instead of sharding for improved
  performance.
  Personal website: http://www.outerthoughts.com/
  LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
  - Time is the quality of nature that keeps events from happening all
  at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
  book)
 
 
  On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
  alexey_kozhemia...@epam.com wrote:
   Btw timing for distributed requests are broken at this moment, it
  doesn't combine values from requests to shards.  I'm working on a patch.
  
   https://issues.apache.org/jira/browse/SOLR-3644
  
   -Original Message-
   From: Jack Krupansky [mailto:j...@basetechnology.com]
   Sent: Tuesday, February 04, 2014 22:00
   To: solr-user@lucene.apache.org
   Subject: Re: Lowering query time
  
   Add the debug=true parameter to some test queries and look at the
  timing
   section to see which search components are taking the time.
  Traditionally, highlighting for large documents was a top culprit.
  
   Are you returning a lot of data or field values? Sometimes reducing the
  amount of data processed can help. Any multivalued fields with lots of
  values?
  
   -- Jack Krupansky
  
   -Original Message-
   From: Joel Cohen
   Sent: Tuesday, February 4, 2014 1:43 PM
   To: solr-user@lucene.apache.org
   Subject: Re: Lowering query time
  
   1. We are faceting. I'm not a developer so I'm not quite sure how we're
  doing it. How can I measure?
   2. I'm not sure how we'd force this kind of document partitioning. I
 can
  see how my shards are partitioned by looking at the clusterstate.json
 from
  Zookeeper, but I don't have a clue on how to get documents into specific
  shards.
  
   Would I be better off with fewer shards given the small size of my
  indexes?
  
  
   On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley yo...@heliosearch.com
  wrote:
  
   On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen joel.co...@bluefly.com
   wrote:
I'm trying to get the query time down to ~15 msec. Anyone have any
tuning recommendations?
  
   I guess it depends on what the slowest part of the query currently is.
If you are faceting, it's often that.
   Also, it's often a big win if you can somehow partition documents such
   that requests can normally be serviced from a single shard.
  
   -Yonik
   http://heliosearch.org - native off-heap filters and fieldcache for
   solr
  
  
  
  
   --
  
   joel cohen, senior system engineer
  
   e joel.co...@bluefly.com p 212.944.8000

Re: Urgent Help. Best Way to have multiple OR Conditions for same field in SOLR

2014-02-11 Thread Erick Erickson

right, 10K Boolean clauses are not very efficient. You actually can
up the limit here, but still...

Consider a post filter, here's a place to start:
http://lucene.apache.org/solr/4_3_1/solr-core/org/apache/solr/search/PostFilter.html

Best,
Erick


On Tue, Feb 11, 2014 at 6:47 AM, rajeev.nadgauda 
rajeev.nadga...@leadenrich.com wrote:

 HI,

 I am new to SOLR , we have CRM data for Contacts and Companies which are in
 millions, we have switched to SOLR for fast search results.

 PROBLEM: We have large inclusion and exclusion lists with names of
 companies
 or contacts.
 Ex: Include or Exclude : company A  Company B  Company C  
 Company n  where assume  n = 1;

 What would be the best way to do this kind of a query using SOLR.

 WHAT I HAVE TRIED:
 Setting q == field_name: (companyA OR companyB . OR Company
 n);
 This works only for a list of 400 odd.

 Looking forward for assistance on this.

 Thank You,
 Rajeev.





 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Urgent-Help-Best-Way-to-have-multiple-OR-Conditions-for-same-field-in-SOLR-tp4116681.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr-query with NOT and OR operator

2014-02-11 Thread Erick Erickson

Solr/Lucene is not strictly Boolean logic, this trips up a lot
of people.

Excellent blog on the subject here:
http://searchhub.org/dev/2011/12/28/why-not-and-or-and-not/

Best,
Erick


On Tue, Feb 11, 2014 at 8:22 AM, Jack Krupansky j...@basetechnology.comwrote:

 With so many parentheses in there, I wonder what you are really trying to
 do Try expressing your query in simple English first so that we can
 understand your goal.

 But generally, a purely negative nested query must have a *:* term to
 apply the exclusion against:

 fq=((*:* -(field1:value1)))+OR+(field2:value2).

 -- Jack Krupansky

 -Original Message- From: Johannes Siegert
 Sent: Tuesday, February 11, 2014 10:57 AM
 To: solr-user@lucene.apache.org
 Subject: solr-query with NOT and OR operator


 Hi,

 my solr-request contains the following filter-query:

 fq=((-(field1:value1)))+OR+(field2:value2).

 I expect solr deliver documents matching to ((-(field1:value1))) and
 documents matching to (field2:value2).

 But solr deliver only documents, that are the result of (field2:value2).
 I receive several documents, if I request only for ((-(field1:value1))).

 Thanks!

 Johannes

Re: Is \'optimize\' necessary for a 45-segment Solr 4.6 index?

2014-02-11 Thread Shawn Heisey

On 2/11/2014 3:27 AM, Jäkel, Guido wrote:
 Dear Shawn,

 On 2/9/2014 11:41 PM, Arun Rangarajan wrote:
 I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an
 'out of memory' error. Is optimize really necessary, since I read that
 lucene is able to handle multiple segments well now?
 It seems I currently run into the same problem while migrating from Solr 1.4 
 to Solr 4.6.1.

 I run into OOM-Problems -- after running a full, fresh re-index of your 
 catalogue data -- while optimizing an ~80GB core on a 16GB JVM. After about 
 one hour the heap explodes within a minute while  create compound file 
 _5b2.cfs. How to deal with this? Wit it happens because there are too much 
 small (about 30 @ 1..4GB) segments before optimize? It seem that they are 
 limited to this size by the defaults of the TieredMergePolicy. And, of 
 course: Is optimizedepreciated?

 Because it takes about 1h to reach the point of prolems any hints or 
 explanations will be helpful for me to save a lot of time!

Replying to a privately sent email on this thread:

I can't be sure that there are no memory leaks in Solr's program code,
but it is a rare thing, and I'm running 4.6.1 on a large system with a
smaller heap than yours without problems, so a memory leak is unlikely.
My setup DOES do index optimizes.

I have two guesses.  It could be either or both.  They are similar but
not identical.  There might be something else entirely, but these are
the most likely:

One guess is that you don't have enough RAM, leading to a performance
issue that compounds itself.  Adding the optimize pushes the system over
a threshold, everything slows down enough that the system tries to do
too much simultaneously, and it uses all the heap.

Assuming there's nothing else running on the machine, with an 80GB index
and a 16GB heap, a perfectly ideal server for this index would have 96GB
of RAM.  You might be able to get really good performance with 48GB, but
more would be better.  If it were me, I don't think I'd try it with less
than 64GB.

http://wiki.apache.org/solr/SolrPerformanceProblems#RAM

The other guess is that your Solr config and your request/index
characteristicsare resulting in a lot of heap usage, so when you add an
optimize on top of it, 16GB is not enough.

http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn

Re: solr-query with NOT and OR operator

2014-02-11 Thread Johannes Siegert


Hi Jack,

thanks!

fq=((*:* -(field1:value1)))+OR+(field2:value2).

This is the solution.

Johannes

Am 11.02.2014 17:22, schrieb Jack Krupansky:
With so many parentheses in there, I wonder what you are really trying 
to do Try expressing your query in simple English first so that we 
can understand your goal.


But generally, a purely negative nested query must have a *:* term to 
apply the exclusion against:


fq=((*:* -(field1:value1)))+OR+(field2:value2).

-- Jack Krupansky

-Original Message- From: Johannes Siegert
Sent: Tuesday, February 11, 2014 10:57 AM
To: solr-user@lucene.apache.org
Subject: solr-query with NOT and OR operator

Hi,

my solr-request contains the following filter-query:

fq=((-(field1:value1)))+OR+(field2:value2).

I expect solr deliver documents matching to ((-(field1:value1))) and
documents matching to (field2:value2).

But solr deliver only documents, that are the result of (field2:value2).
I receive several documents, if I request only for ((-(field1:value1))).

Thanks!

Johannes


--
Johannes Siegert
Softwareentwickler

Telefon:  0351 - 418 894 -73
Fax:  0351 - 418 894 -99
E-Mail:   johannes.sieg...@marktjagd.de
Xing: https://www.xing.com/profile/Johannes_Siegert2

Webseite: http://www.marktjagd.de
Blog: http://blog.marktjagd.de
Facebook: http://www.facebook.com/marktjagd
Twitter:  http://twitter.com/Marktjagd
__

Marktjagd GmbH | Schützenplatz 14 | D - 01067 Dresden

Geschäftsführung: Jan Großmann
Sitz Dresden | Amtsgericht Dresden | HRB 28678

Re: Is 'optimize' necessary for a 45-segment Solr 4.6 index?

2014-02-11 Thread Arun Rangarajan

Dear Shawn,
Thanks for your reply. For now, I did merges in steps with maxSegments
param (using HOST:PORT/CORE/update?optimize=truemaxSegments=10). First I
merged the 45 segments to 10, and then from 10 to 5. (Merging from 5 to 2
again caused out-of-memory exception.) Now I have a 5-segment index with
all segments roughly of equal sizes. Will try using that and see if that is
good enough for us.


On Sun, Feb 9, 2014 at 11:22 PM, Shawn Heisey s...@elyograg.org wrote:

 On 2/9/2014 11:41 PM, Arun Rangarajan wrote:
  I have a 28 GB Solr 4.6 index with 45 segments. Optimize failed with an
  'out of memory' error. Is optimize really necessary, since I read that
  lucene is able to handle multiple segments well now?

 I have had indexes with more than 45 segments, because of the merge
 settings that I use.  My large index shards are about 16GB at the
 moment.  Out of memory errors are very rare because I use a fairly large
 heap, at 6GB for a machine that hosts three of these large shards.  When
 I was still experimenting with my memory settings, I did see occasional
 out of memory errors during normal segment merging.

 Increasing your heap size is pretty much required at this point.  I've
 condensed some very basic information about heap sizing here:

 http://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

 As for whether optimizing on 4.x is necessary: I do not have any hard
 numbers for you, but I can tell you that an optimized index does seem
 noticeably faster than one that is freshly built and has has a large
 number of relatively large segments.

 I optimize my index shards on an schedule, but it is relatively
 infrequent -- one large shard per night.  Most of the time what I have
 is one really large segment and a bunch of super-small segments, and
 that does not seem to suffer from performance issues compared to a fully
 optimized index.  The situation is different right after a fresh
 rebuild, which produces a handful of very large segments and a bunch of
 smaller segments of varying sizes.

 Interesting but probably irrelevant details:

 Although I don't use mergeFactor any more, the TieredMergePolicy
 settings that I use are equivalent to a mergeFactor of 35.  I chose this
 number back in the 1.4.1 days because it resulted in synchronicity
 between merges and lucene segment names when LogByteSizeMergePolicy was
 still in use.  Segments _0 through _z would be merged into segment _10,
 and so on.

 Thanks,
 Shawn

handleSelect=true with SolrCloud

2014-02-11 Thread Jeff Wartes


I’m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0 at 
present.) The old query style relied on using /solr/select?qt=foo to select the 
proper requestHandler. I know handleSelect=true is deprecated now, but it’d be 
pretty handy for testing to be able to be backwards compatible, at least until 
some time after the initial release.

So in my SolrCloud configuration, I set requestDispatcher handleSelect=true” 
and deleted the /select requestHandler as suggested here: 
http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue_Resolution_.28qt_param.29

However, my /solr/collection1/select?qt=foo query throws an “unknown handler: 
null” error with this configuration. Has anyone successfully tried 
handleSelect=true with the collections api?

Thanks.

boost group doclist members

2014-02-11 Thread David Santamauro



Without falling into the x/y problem area, I'll explain what I want to 
do: I would like to group my result set by a field, f1 and within each 
group, I'd like to boost the score of the most appropriate member of 
the group so it appears first in the doc list.


The most appropriate member is defined by the content of other fields 
(e.g., f2, f3). So basically, I'd like to boost based on the values in 
fields f2 and f3.


If there is a better way to achieve this, I'm all ears. But I was 
thinking this could be achieved by using a function query as the 
sortspec to group.sort.


Example content:

doc
  field name=f14181770/field !-- integer --
  field name=f2x_val/field   !-- text --
  field name=f3100/field !-- integer --
/doc
doc
  field name=f14181770/field
  field name=f2y_val/field
  field name=f3100/field
/doc
doc
  field name=f14181770/field
  field name=f2z_val/field
  field name=f3100/field
/doc

All 3 of the above documents will be grouped into a doclist with 
groupValue=4181770. My questions is then, How do I make the document 
with f2=y_val appear first in the doclist. I've been playing with


group.field=f1
group.sort=query({!dismax qf=f2 bq=f2:y_val^100}) asc

... but I'm getting:
org.apache.solr.common.SolrException: Can't determine a Sort Order (asc 
or desc) in sort spec 'query({!dismax qf=f2 bq=f2:y_val^100.0}) asc', 
pos=14.


Can anyone point to a some examples of this?

thanks

David

Re: handleSelect=true with SolrCloud

2014-02-11 Thread Shawn Heisey


On 2/11/2014 10:21 AM, Jeff Wartes wrote:

I’m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0 at 
present.) The old query style relied on using /solr/select?qt=foo to select the 
proper requestHandler. I know handleSelect=true is deprecated now, but it’d be 
pretty handy for testing to be able to be backwards compatible, at least until 
some time after the initial release.

So in my SolrCloud configuration, I set requestDispatcher handleSelect=true” 
and deleted the /select requestHandler as suggested here: 
http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue_Resolution_.28qt_param.29

However, my /solr/collection1/select?qt=foo query throws an “unknown handler: 
null” error with this configuration. Has anyone successfully tried 
handleSelect=true with the collections api?


I'm pretty sure that if you won't have a handler named /select, then you 
need to have default=true as an attribute on one of your other handler 
definitions.


See line 715 of the example solrconfig.xml for Solr 3.5:

http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/example/solr/conf/solrconfig.xml?view=annotate

Thanks,
Shawn

Re: handleSelect=true with SolrCloud

2014-02-11 Thread Jeff Wartes


Got it in one. Thanks!


On 2/11/14, 9:50 AM, Shawn Heisey s...@elyograg.org wrote:

On 2/11/2014 10:21 AM, Jeff Wartes wrote:
 I¹m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0
at present.) The old query style relied on using /solr/select?qt=foo to
select the proper requestHandler. I know handleSelect=true is deprecated
now, but it¹d be pretty handy for testing to be able to be backwards
compatible, at least until some time after the initial release.

 So in my SolrCloud configuration, I set requestDispatcher
handleSelect=true² and deleted the /select requestHandler as suggested
here: 
http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue_Re
solution_.28qt_param.29

 However, my /solr/collection1/select?qt=foo query throws an ³unknown
handler: null² error with this configuration. Has anyone successfully
tried handleSelect=true with the collections api?

I'm pretty sure that if you won't have a handler named /select, then you
need to have default=true as an attribute on one of your other handler
definitions.

See line 715 of the example solrconfig.xml for Solr 3.5:

http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/exam
ple/solr/conf/solrconfig.xml?view=annotate

Thanks,
Shawn

Re: USER NAME Baruch Labunski

2014-02-11 Thread Baruch

Hello Wiki admin,

 I would like to some value links. Can you please add me, my user name is 
Baruch Labunski


Thank You,
Baruch!



On Thursday, January 16, 2014 2:12:32 PM, Baruch bar...@rogers.com wrote:
 
Hello Wiki admin,

 I would like to some value links. Can you please add me, my user name is 
Baruch Labunski


Thank You,

Baruch!

Re: Lowering query time

2014-02-11 Thread Joel Cohen

It's a custom ingestion process. It does a big DB query and then inserts
stuff in batches. The batch size is tuneable.


On Tue, Feb 11, 2014 at 11:23 AM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, I'm still a little puzzled BTW. 300K documents, unless they're
 huge, shouldn't be taking 100 minutes. I can index 11M documents on
 my laptop (Wikipedia dump) in 45 minutes for instance Of course
 that's a single core, not cloud and not replicas...

 So possibly it' on the data acquisition side? Is your Solr CPU pegged?

 YMMV of course.

 Erick


 On Tue, Feb 11, 2014 at 6:40 AM, Joel Cohen joel.co...@bluefly.com
 wrote:

  I'd like to thank you for lending a hand on my query time problem with
  SolrCloud. By switching to a single shard with replicas setup, I've
 reduced
  my query time to 18 msec. My full ingestion of 300k+ documents went down
  from 2 hours 50 minutes to 1 hour 40 minutes. There are some code changes
  that are going in that should help a bit as well. Big thanks to everyone
  that had suggestions.
 
 
  On Tue, Feb 4, 2014 at 8:11 PM, Alexandre Rafalovitch 
 arafa...@gmail.com
  wrote:
 
   I suspect faceting is the issue here. The actual query you have shown
   seem to bring back a single document (or a single set of document for
   a product):
   fq=id:(320403401)
  
   On the other hand, you are asking for 4 field facets:
   facet.field=q_virtualCategory_ss
   facet.field=q_brand_s
   facet.field=q_color_s
   facet.field=q_category_ss
   AND 2 range facets, both clustered/grouped:
   facet.range=daysSinceStart_i
   facet.range=activePrice_l (e.g. f.activePrice_l.facet.range.gap=5000)
  
   And for all facets you have asked to bring back ALL of the results:
   facet.limit=-1
  
   Plus, you are doing a complex sort:
   sort=popularity_i desc,popularity_i desc
  
   So, you are probably spending quite a bit of time counting (especially
   in a shared setup) and then quite a bit more sending the response
   back.
  
   I would check the size of the result document (HTTP result) and see
   how large it is. Maybe you don't need all of the stuff that's coming
   back. I assume you are not actually querying Solr from the client's
   machine (that is I hope it is inside your data centre close to your
   web server), otherwise I would say to look at automatic content
   compression as well to minimize on-wire document size.
  
   Finally, if your documents have many stored fields (store=true in
   schema.xml) but you only return small subsets of them during search,
   you could look into using enableLazyFieldLoading flag in the
   solrconfig.
  
   Regards,
  Alex.
   P.s. As others said, you don't seem to have too many documents.
   Perhaps you want replication instead of sharding for improved
   performance.
   Personal website: http://www.outerthoughts.com/
   LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
   - Time is the quality of nature that keeps events from happening all
   at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
   book)
  
  
   On Wed, Feb 5, 2014 at 6:31 AM, Alexey Kozhemiakin
   alexey_kozhemia...@epam.com wrote:
Btw timing for distributed requests are broken at this moment, it
   doesn't combine values from requests to shards.  I'm working on a
 patch.
   
https://issues.apache.org/jira/browse/SOLR-3644
   
-Original Message-
From: Jack Krupansky [mailto:j...@basetechnology.com]
Sent: Tuesday, February 04, 2014 22:00
To: solr-user@lucene.apache.org
Subject: Re: Lowering query time
   
Add the debug=true parameter to some test queries and look at the
   timing
section to see which search components are taking the time.
   Traditionally, highlighting for large documents was a top culprit.
   
Are you returning a lot of data or field values? Sometimes reducing
 the
   amount of data processed can help. Any multivalued fields with lots of
   values?
   
-- Jack Krupansky
   
-Original Message-
From: Joel Cohen
Sent: Tuesday, February 4, 2014 1:43 PM
To: solr-user@lucene.apache.org
Subject: Re: Lowering query time
   
1. We are faceting. I'm not a developer so I'm not quite sure how
 we're
   doing it. How can I measure?
2. I'm not sure how we'd force this kind of document partitioning. I
  can
   see how my shards are partitioned by looking at the clusterstate.json
  from
   Zookeeper, but I don't have a clue on how to get documents into
 specific
   shards.
   
Would I be better off with fewer shards given the small size of my
   indexes?
   
   
On Tue, Feb 4, 2014 at 12:32 PM, Yonik Seeley yo...@heliosearch.com
 
   wrote:
   
On Tue, Feb 4, 2014 at 12:12 PM, Joel Cohen joel.co...@bluefly.com
 
wrote:
 I'm trying to get the query time down to ~15 msec. Anyone have any
 tuning recommendations?
   
I guess it depends on what the slowest part of the query currently
 is.
 If you are faceting, it's often that.

Re: handleSelect=true with SolrCloud

2014-02-11 Thread Joel Bernstein

Jeff,

I believe the shards.qt parameter is what you're looking for. For example
when using the /elevate handler with SolrCloud I use the following url to
tell Solr to use the /elevate handler on the shards:

http://localhost:8983/solr/collection1/elevate?q=ipodwt=jsonindent=trueshards.qt=/elevate







Joel Bernstein
Search Engineer at Heliosearch


On Tue, Feb 11, 2014 at 1:01 PM, Jeff Wartes jwar...@whitepages.com wrote:


 Got it in one. Thanks!


 On 2/11/14, 9:50 AM, Shawn Heisey s...@elyograg.org wrote:

 On 2/11/2014 10:21 AM, Jeff Wartes wrote:
  I¹m working on a port of a Solr service to SolrCloud. (Targeting v4.6.0
 at present.) The old query style relied on using /solr/select?qt=foo to
 select the proper requestHandler. I know handleSelect=true is deprecated
 now, but it¹d be pretty handy for testing to be able to be backwards
 compatible, at least until some time after the initial release.
 
  So in my SolrCloud configuration, I set requestDispatcher
 handleSelect=true² and deleted the /select requestHandler as suggested
 here:
 
 http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue_Re
 solution_.28qt_param.29
 
  However, my /solr/collection1/select?qt=foo query throws an ³unknown
 handler: null² error with this configuration. Has anyone successfully
 tried handleSelect=true with the collections api?
 
 I'm pretty sure that if you won't have a handler named /select, then you
 need to have default=true as an attribute on one of your other handler
 definitions.
 
 See line 715 of the example solrconfig.xml for Solr 3.5:
 
 
 http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/exam
 ple/solr/conf/solrconfig.xml?view=annotate
 
 Thanks,
 Shawn

RE: handleSelect=true with SolrCloud

2014-02-11 Thread EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions)

Hi Jeff, it is not with elevated, I am talking in the link of Relevancy / 
Boost/ Score.

Select productid from products where SKU = 101
Select Productid from products where ManufactureSKU = 101
Select Productid from product where SKU Like 101%
Select Productid from Product where ManufactureSKU like 101%
Select Productid from product where Name Like 101%
Select Productid from Product where Description like '%101%

Is there any way in Solr can search the exact match,starts with and anywhere.. 
in single solr query.

-Original Message-
From: Joel Bernstein [mailto:joels...@gmail.com] 
Sent: Tuesday, February 11, 2014 3:11 PM
To: solr-user@lucene.apache.org
Subject: Re: handleSelect=true with SolrCloud

Jeff,

I believe the shards.qt parameter is what you're looking for. For example when 
using the /elevate handler with SolrCloud I use the following url to tell 
Solr to use the /elevate handler on the shards:

http://localhost:8983/solr/collection1/elevate?q=ipodwt=jsonindent=trueshards.qt=/elevate







Joel Bernstein
Search Engineer at Heliosearch


On Tue, Feb 11, 2014 at 1:01 PM, Jeff Wartes jwar...@whitepages.com wrote:


 Got it in one. Thanks!


 On 2/11/14, 9:50 AM, Shawn Heisey s...@elyograg.org wrote:

 On 2/11/2014 10:21 AM, Jeff Wartes wrote:
  I¹m working on a port of a Solr service to SolrCloud. (Targeting 
 v4.6.0 at present.) The old query style relied on using 
 /solr/select?qt=foo to select the proper requestHandler. I know 
 handleSelect=true is deprecated now, but it¹d be pretty handy for 
 testing to be able to be backwards compatible, at least until some time 
 after the initial release.
 
  So in my SolrCloud configuration, I set requestDispatcher 
 handleSelect=true² and deleted the /select requestHandler as 
 suggested
 here:
 
 http://wiki.apache.org/solr/SolrRequestHandler#Old_handleSelect.3Dtrue
 _Re
 solution_.28qt_param.29
 
  However, my /solr/collection1/select?qt=foo query throws an 
 ³unknown
 handler: null² error with this configuration. Has anyone 
 successfully tried handleSelect=true with the collections api?
 
 I'm pretty sure that if you won't have a handler named /select, then 
 you need to have default=true as an attribute on one of your other 
 handler definitions.
 
 See line 715 of the example solrconfig.xml for Solr 3.5:
 
 
 http://svn.apache.org/viewvc/lucene/dev/branches/lucene_solr_3_5/solr/
 exam
 ple/solr/conf/solrconfig.xml?view=annotate
 
 Thanks,
 Shawn

Solr Autosuggest - Strange issue with leading numbers in query

2014-02-11 Thread Developer

I have a strange issue with Autosuggest.

Whenever I query for a keyword along with numbers (leading) it returns the
suggestion corresponding to the alphabets (ignoring the numbers). I was
under assumption that it will return an empty result back. I am not sure
what I am doing wrong. Can someone help?

*Query:*
/autocomplete?qt=/lucidreq_type=auto_completespellcheck.maxCollations=10q=12342343243242gaspellcheck.count=10

*Result:*

response
lst name=responseHeader
int name=status0/int
int name=QTime1/int
/lst
lst name=spellcheck
lst name=suggestions
lst name=ga
int name=numFound1/int
int name=startOffset15/int
int name=endOffset17/int
arr name=suggestion
strgalaxy/str
/arr
/lst
str name=collation12342343243242galaxy/str
/lst
/lst
/response


*My field configuration is as below:*
fieldType class=solr.TextField name=textSpell_word
positionIncrementGap=100
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory 
enablePositionIncrements=true
ignoreCase=true words=stopwords_autosuggest.txt/
/analyzer
/fieldType

*SolrConfig.xml*

searchComponent class=solr.SpellCheckComponent name=autocomplete
lst name=spellchecker
str name=nameautocomplete/str
str 
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str
name=lookupImplorg.apache.solr.spelling.suggest.tst.TSTLookup/str
str name=fieldautocomplete_word/str
str name=storeDirautocomplete/str
str name=buildOnCommittrue/str
float name=threshold.005/float

/lst
/searchComponent
requestHandler class=org.apache.solr.handler.component.SearchHandler
name=/autocomplete
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionaryautocomplete/str
str name=spellcheck.collatetrue/str
str name=spellcheck.count10/str
str name=spellcheck.onlyMorePopularfalse/str
/lst
arr name=components
strautocomplete/str
/arr
/requestHandler



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Autosuggest-Strange-issue-with-leading-numbers-in-query-tp4116751.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Indexing question on individual field update

2014-02-11 Thread shamik

Eric,

  Thanks for your reply. I should have given a better context. I'm currently
running an incremental crawl daily on this particular source and indexing
the documents. Incremental crawl looks for any change since last crawl date
based on the document publish date. But, there's no way for me to know if a
document has been deleted. To ensure that, I ran a full crawl on a weekend,
which basically re-index the entire content. After the full index is over, I
call a purge script, which deletes any content which is more than 24 hour
old, based on the indextimestamp field. 

The issue with atomic update is that it doesn't alter the indextimstamp
field. So even if I run a full crawl with atomic updates, the timestamp will
stick to its old value. Unfortunately, I can't rely on another date field
coming from the source as they are not consistent. That translates to the
fact that I can't remove stale content.

Let me know if I'm missing something here.

- Thanks,
Shamik





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116757.html
Sent from the Solr - User mailing list archive at Nabble.com.

RE: Solr server requirements for 100+ million documents

2014-02-11 Thread Susheel Kumar

Hi Otis,

Just to confirm, the 3 servers you mean here are 2 for shards/nodes and 1 for 
Zookeeper. Is that correct?

Thanks,
Susheel

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Friday, January 24, 2014 5:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents

Hi Susheel,

Like Erick said, it's impossible to give precise recommendations, but making a 
few assumptions and combining them with experience (+ a licked finger in the 
air):
* 3 servers
* 32 GB
* 2+ CPU cores
* Linux

Assuming docs are not bigger than a few KB, that they are not being reindexed 
over and over, that you don't have a search rate higher than a few dozen QPS, 
assuming your queries are not a page long, etc. assuming best practices are 
followed, the above should be sufficient.

I hope this helps.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr  Elasticsearch 
Support * http://sematext.com/


On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar  
susheel.ku...@thedigitalgroup.net wrote:

 Hi,

 Currently we are indexing 10 million document from database (10 db 
 data
 entities)  index size is around 8 GB on windows virtual box. Indexing 
 in one shot taking 12+ hours while indexing parallel in separate cores 
  merging them together taking 4+ hours.

 We are looking to scale to 100+ million documents and looking for 
 recommendation on servers requirements on below parameters for a 
 Production environment. There can be 200+ users performing search same time.

 No of physical servers (considering solr cloud) Memory requirement 
 Processor requirement (# cores) Linux as OS oppose to windows

 Thanks in advance.
 Susheel

Re: Solr server requirements for 100+ million documents

2014-02-11 Thread Otis Gospodnetic

Hi Susheel,

No, we wouldn't want to go with just 1 ZK. :)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr  Elasticsearch Support * http://sematext.com/


On Tue, Feb 11, 2014 at 5:18 PM, Susheel Kumar 
susheel.ku...@thedigitalgroup.net wrote:

 Hi Otis,

 Just to confirm, the 3 servers you mean here are 2 for shards/nodes and 1
 for Zookeeper. Is that correct?

 Thanks,
 Susheel

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Friday, January 24, 2014 5:21 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 Hi Susheel,

 Like Erick said, it's impossible to give precise recommendations, but
 making a few assumptions and combining them with experience (+ a licked
 finger in the air):
 * 3 servers
 * 32 GB
 * 2+ CPU cores
 * Linux

 Assuming docs are not bigger than a few KB, that they are not being
 reindexed over and over, that you don't have a search rate higher than a
 few dozen QPS, assuming your queries are not a page long, etc. assuming
 best practices are followed, the above should be sufficient.

 I hope this helps.

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics Solr 
 Elasticsearch Support * http://sematext.com/


 On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar 
 susheel.ku...@thedigitalgroup.net wrote:

  Hi,
 
  Currently we are indexing 10 million document from database (10 db
  data
  entities)  index size is around 8 GB on windows virtual box. Indexing
  in one shot taking 12+ hours while indexing parallel in separate cores
   merging them together taking 4+ hours.
 
  We are looking to scale to 100+ million documents and looking for
  recommendation on servers requirements on below parameters for a
  Production environment. There can be 200+ users performing search same
 time.
 
  No of physical servers (considering solr cloud) Memory requirement
  Processor requirement (# cores) Linux as OS oppose to windows
 
  Thanks in advance.
  Susheel

RE: Solr server requirements for 100+ million documents

2014-02-11 Thread Susheel Kumar

Thanks, Otis for quick reply. So for ZK do you recommend separate servers and 
if so how many for initial Solr cloud cluster setup. 

-Original Message-
From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Sent: Tuesday, February 11, 2014 4:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr server requirements for 100+ million documents

Hi Susheel,

No, we wouldn't want to go with just 1 ZK. :)

Otis
--
Performance Monitoring * Log Analytics * Search Analytics Solr  Elasticsearch 
Support * http://sematext.com/

On Tue, Feb 11, 2014 at 5:18 PM, Susheel Kumar  
susheel.ku...@thedigitalgroup.net wrote:

 Hi Otis,

 Just to confirm, the 3 servers you mean here are 2 for shards/nodes 
 and 1 for Zookeeper. Is that correct?

 Thanks,
 Susheel

 -Original Message-
 From: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com]
 Sent: Friday, January 24, 2014 5:21 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Solr server requirements for 100+ million documents

 Hi Susheel,

 Like Erick said, it's impossible to give precise recommendations, but 
 making a few assumptions and combining them with experience (+ a 
 licked finger in the air):
 * 3 servers
 * 32 GB
 * 2+ CPU cores
 * Linux

 Assuming docs are not bigger than a few KB, that they are not being 
 reindexed over and over, that you don't have a search rate higher than 
 a few dozen QPS, assuming your queries are not a page long, etc. 
 assuming best practices are followed, the above should be sufficient.

 I hope this helps.

 Otis
 --
 Performance Monitoring * Log Analytics * Search Analytics Solr  
 Elasticsearch Support * http://sematext.com/

 On Fri, Jan 24, 2014 at 1:10 PM, Susheel Kumar  
 susheel.ku...@thedigitalgroup.net wrote:

  Hi,

  Currently we are indexing 10 million document from database (10 db 
  data
  entities)  index size is around 8 GB on windows virtual box. 
  Indexing in one shot taking 12+ hours while indexing parallel in 
  separate cores  merging them together taking 4+ hours.

  We are looking to scale to 100+ million documents and looking for 
  recommendation on servers requirements on below parameters for a 
  Production environment. There can be 200+ users performing search 
  same
 time.

  No of physical servers (considering solr cloud) Memory requirement 
  Processor requirement (# cores) Linux as OS oppose to windows

  Thanks in advance.
  Susheel

Re: Indexing question on individual field update

2014-02-11 Thread shamik

Ok, I was wrong here. I can always set the indextimestamp field with current
time (NOW) for every atomic update. On a similar note, is there any
performance constraint with updates compared to add ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Indexing-question-on-individual-field-update-tp4116605p4116772.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr server requirements for 100+ million documents

2014-02-11 Thread svante karlsson

ZK needs a quorum to keep functional so 3 servers handles one failure. 5
handles 2 node failures. If you Solr with 1 replica per shard then stick to
3 ZK. If you use 2 replicas use 5 ZK

Replica node down but zookeeper clusterstate not updated

2014-02-11 Thread Gopal Patwa

Solr = 4.6.1, attached solrcloud admin console view
Zookeeper 3.4.5  = 3 node ensemble

In my test setup, I have 3 Node SolrCloud setup with 2 shard. Today we had
power failure and all node went down.

I started 3 node zookeeper ensemble first then followed with 3 node
solrcloud, and one of replica ip address was change due to dynamic ip
allocation but zookeeper
clusterstate is not updated with new ip address and it was still holding
old ip address for that bad node.

Do I need to manually update clusterstate in zookeeper? what are my options
if this could happen in production.

Bad node:
old IP:10.249.132.35 (still exist in zookeeper)
new IP: 10.249.133.10

Log from Node1:

11:26:25,242 INFO  [STDOUT] 49170786 [Thread-2-EventThread] INFO
 org.apache.solr.common.cloud.ZkStateReader  â A cluster state change:
WatchedEvent state:SyncConnected type:NodeDataChanged
path:/clusterstate.json, has occurred - updating... (live nodes size: 3)
11:26:41,072 INFO  [STDOUT] 49186615 [RecoveryThread] INFO
 org.apache.solr.cloud.ZkController  â publishing
core=genre_shard1_replica1 state=recovering
11:26:41,079 INFO  [STDOUT] 49186622 [RecoveryThread] ERROR
org.apache.solr.cloud.RecoveryStrategy  â Error while trying to recover.
core=genre_shard1_replica1:org.apache.solr.client.solrj.SolrServerException:
Server refused connection at: http://10.249.132.35:8080/solr
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:496)
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:197)
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:221)
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:367)
11:26:41,079 INFO  [STDOUT] at
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:244)
11:26:41,079 INFO  [STDOUT] Caused by:
org.apache.http.conn.HttpHostConnectException: Connection to
http://10.249.132.35:8080 refused


11:27:14,036 INFO  [STDOUT] 49219580 [RecoveryThread] ERROR
org.apache.solr.cloud.RecoveryStrategy  â Recovery failed - trying again...
(9) core=geo_shard1_replica1
11:27:14,037 INFO  [STDOUT] 49219581 [RecoveryThread] INFO
 org.apache.solr.cloud.RecoveryStrategy  â Wait 600.0 seconds before trying
to recover again (10)
11:27:14,958 INFO  [STDOUT] 49220498 [Thread-40] INFO
 org.apache.solr.common.cloud.ZkStateReader  â Updating cloud state from
ZooKeeper...



Log from bad node with new ip address:

11:06:29,551 INFO  [STDOUT] 6234 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.ShardLeaderElectionContext  â Enough replicas found
to continue.
11:06:29,552 INFO  [STDOUT] 6236 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.ShardLeaderElectionContext  â I may be the new
leader - try and sync
11:06:29,554 INFO  [STDOUT] 6237 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.SyncStrategy  â Sync replicas to
http://10.249.132.35:8080/solr/venue_shard2_replica2/
11:06:29,555 INFO  [STDOUT] 6239 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.update.PeerSync  â PeerSync: core=venue_shard2_replica2
url=http://10.249.132.35:8080/solr START replicas=[
http://10.249.132.56:8080/solr/venue_shard2_replica1/] nUpdates=100
11:06:29,556 INFO  [STDOUT] 6240 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.update.PeerSync  â PeerSync: core=venue_shard2_replica2
url=http://10.249.132.35:8080/solr DONE.  We have no versions.  sync failed.
11:06:29,556 INFO  [STDOUT] 6241 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.SyncStrategy  â Leader's attempt to sync with shard
failed, moving to the next candidate
11:06:29,558 INFO  [STDOUT] 6241 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.ShardLeaderElectionContext  â We failed sync, but we
have no versions - we can't sync in that case - we were active before, so
become leader anyway
11:06:29,559 INFO  [STDOUT] 6243 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.cloud.ShardLeaderElectionContext  â I am the new leader:
http://10.249.132.35:8080/solr/venue_shard2_replica2/ shard2
11:06:29,561 INFO  [STDOUT] 6245 [coreLoadExecutor-4-thread-10] INFO
 org.apache.solr.common.cloud.SolrZkClient  â makePath:
/collections/venue/leaders/shard2
11:06:29,577 INFO  [STDOUT] 6261 [Thread-2-EventThread] INFO
 org.apache.solr.update.PeerSync  â PeerSync: core=event_shard2_replica2
url=http://10.249.132.35:8080/solr  Received 18 versions from
10.249.132.56:8080/solr/event_shard2_replica1/
11:06:29,578 INFO  [STDOUT] 6263 [Thread-2-EventThread] INFO
 org.apache.solr.update.PeerSync  â PeerSync: core=event_shard2_replica2
url=http://10.249.132.35:8080/solr Requesting updates from
10.249.132.56:8080/solr/event_shard2_replica1/n=10versions=[1457764666067386368,
1456709993140060160, 1456709989863260160,
1456709986075803648, 1456709971758546944, 1456709179685208064,
1456709137524064256,

Re: Indexing question on individual field update

2014-02-11 Thread Shawn Heisey


On 2/11/2014 2:37 PM, shamik wrote:

Eric,

   Thanks for your reply. I should have given a better context. I'm currently
running an incremental crawl daily on this particular source and indexing
the documents. Incremental crawl looks for any change since last crawl date
based on the document publish date. But, there's no way for me to know if a
document has been deleted. To ensure that, I ran a full crawl on a weekend,
which basically re-index the entire content. After the full index is over, I
call a purge script, which deletes any content which is more than 24 hour
old, based on the indextimestamp field.

The issue with atomic update is that it doesn't alter the indextimstamp
field. So even if I run a full crawl with atomic updates, the timestamp will
stick to its old value. Unfortunately, I can't rely on another date field
coming from the source as they are not consistent. That translates to the
fact that I can't remove stale content.


One possibility is this: When you send the atomic update to Solr, 
include a new value for the indextimestamp field.


Another option: You can write a custom update processor plugin for 
Solr.  When the custom code is used, it will be executed on each 
incoming document.  Depending on what it finds in the update request, it 
can make appropriate changes, like updating indextimestamp.  You can do 
pretty much anything.


http://wiki.apache.org/solr/UpdateRequestProcessor

Writing an update processor in Java typically gives the best results in 
terms of flexibility and performance, but there is also a way to use 
other programming languages:


http://wiki.apache.org/solr/ScriptUpdateProcessor

Thanks,
Shawn

Re: Solr server requirements for 100+ million documents

2014-02-11 Thread Jason Hellman

Whether you use the same machines as Solr or separate machines is a matter 
suited to taste.

If you are the CTO, then you should make this decision.  If not, inform 
management that risk conditions are greater when you share function and control 
on a single piece of hardware.  A single failure of a replica + zookeeper node 
will be more impactful than a single failure of a replica *or* a zookeeper 
node.  Let them earn the big bucks to make the risk decision.

The good news is, zookeeper hardware can be extremely lightweight for Solr 
Cloud.  Commodity hardware should work just fine…and thus scaling to 5 nodes 
for zookeeper is not that hard at all.

Jason


On Feb 11, 2014, at 3:00 PM, svante karlsson s...@csi.se wrote:

 ZK needs a quorum to keep functional so 3 servers handles one failure. 5
 handles 2 node failures. If you Solr with 1 replica per shard then stick to
 3 ZK. If you use 2 replicas use 5 ZK

Re: Solr server requirements for 100+ million documents

2014-02-11 Thread Shawn Heisey


On 2/11/2014 3:28 PM, Susheel Kumar wrote:

Thanks, Otis for quick reply. So for ZK do you recommend separate servers and 
if so how many for initial Solr cloud cluster setup.


In a minimal 3-server setup, all servers would run zookeeper and two of 
them would also run Solr.With this setup, you can survive the failure of 
any of those three machines, even if it dies completely.


If the third machine is only running zookeeper, two fast CPU cores and 
2GB of RAM would be plenty.  For 100 million documents, I would 
personally recommend at least 8 CPU cores on the machines running Solr, 
ideally provided by at least two separate physical CPUs.  Otis 
recommended 32GB of RAM as a starting point.  You would very likely want 
more.


One copy of my 90 million document index uses two servers to run all the 
shards.  Because I have two copies of the index, I have four servers.  
Each server has 64GB of RAM.  This is **NOT** running SolrCloud, but if 
it were, I would have zookeeper running on three of those servers.


Thanks,
Shawn

Re: FuzzyLookupFactory with exactMatchFirst not giving the exact match.

2014-02-11 Thread Hamish Campbell

I've tried the new SuggestComponent, however it doesn't work quite as
expected. It returns the full field value rather than a list of corrections
for the specific term. I can see how SuggestComponent would be excellent
for phrase suggestions and document lookups, but it doesn't seem to be
suitable for a per-word spelling suggestion. Correct me if I'm wrong.

I'm taking another look at solr.SpellCheckComponenet. I've switched on
`spellcheck.extendedResults` but the response `correctlySpelled` is always
false, regardless of other settings. It seems it's an example SOLR-4278. In
that ticket James Dyer says:

You can tell if the user's keywords exist in the index on a term-by-term
basis by specifying spellcheck.extendedResults=true. Then look under each
lst name=ORIG_KEYWORD for int name=origFreq0/int.

This would be suit me perfectly - but `origFreq` does not appear in the
response at all. I'm looking that code but tracing down how the token
frequency is added is leading me down a deep and dark rabbit hole :). Am I
missing something basic here?

On Tue, Feb 11, 2014 at 3:59 PM, Areek Zillur areek...@gmail.com wrote:

Dont worry about the analysis chain, I realized you are using the
spellcheck component for suggestions. The suggestion gets returned from the
Lucene layer, but unfortunately the Spellcheck component strips the
suggestion out as it is mainly built for spell checking (when the query
token == suggestion; spelling is correct, so why suggest it!). You can try
out the SuggestComponent (SOLR-5378), it does the right thing in this
situation.

On Mon, Feb 10, 2014 at 9:30 PM, Areek Zillur areek...@gmail.com wrote:

That should not be the case, Maybe the analysis-chain of 'text_spell' is
doing something before the key hits the suggester (you want to use
something like KeywordTokenizerFactory)? Also maybe specify the
queryAnalyzerFieldType
in the suggest component config? you might want to do something similar
to
solr-config: (

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test-files/solr/collection1/conf/solrconfig-phrasesuggest.xml
)
[look at suggest_analyzing component] and schema: (

https://github.com/apache/lucene-solr/blob/trunk/solr/core/src/test-files/solr/collection1/conf/schema-phrasesuggest.xml
)
[look at phrase_suggest field type].

On Mon, Feb 10, 2014 at 8:44 PM, Hamish Campbell
hamish.campb...@koordinates.com wrote:

Same issue with AnalyzingLookupFactory - I'll get autocomplete
suggestions
but not the original query.

On Tue, Feb 11, 2014 at 1:57 PM, Areek Zillur areek...@gmail.com
wrote:

The FuzzyLookupFactory should accept all the options as that of as
AnalyzingLookupFactory (

http://lucene.apache.org/solr/4_2_1/solr-core/org/apache/solr/spelling/suggest/fst/AnalyzingLookupFactory.html
).
[FuzzySuggester is a direct subclass of the AnalyzingSuggester in
lucene].
Have you tried the exactMatchFirst with the AnalyzingLookupFactory?
Does
AnalyzingLookup have the same problem with the exactMatchFirst option?

On Mon, Feb 10, 2014 at 6:00 PM, Hamish Campbell
hamish.campb...@koordinates.com wrote:

Looking at:

http://lucene.apache.org/solr/4_2_1/solr-core/org/apache/solr/spelling/suggest/fst/FuzzyLookupFactory.html

It seems that exactMatchFirst is not a valid option for
FuzzyLookupFactory.
Potential workarounds?

On Mon, Feb 10, 2014 at 5:04 PM, Hamish Campbell
hamish.campb...@koordinates.com wrote:

Hi all,

I've got a FuzzyLookupFactory spellchecker with exactMatchFirst
enabled.
A
query like tes will return test and testing, but a query for
test
will *not* return test even though it is clearly in the
dictionary.
Why
would this be?

Relevant config follows

searchComponent class=solr.SpellCheckComponent name=suggest
lst name=spellchecker
str name=namesuggest/str

!-- Implementation --
str
name=classnameorg.apache.solr.spelling.suggest.Suggester/str
str

name=lookupImplorg.apache.solr.spelling.suggest.fst.FuzzyLookupFactory/str

!-- Properties --
bool name=preserveSepfalse/bool
bool name=exactMatchFirsttrue/bool
str name=suggestAnalyzerFieldTypetext_spell/str
float name=threshold0.005/float

!--
Do not build on each commit, bad for performance. See
cron.
str name=buildOnCommitfalse/str
--

!-- Source --
str name=fieldsuggest/str
/lst
/searchComponent

requestHandler name=/suggest class=solr.SearchHandler
lst name=defaults
str name=spellchecktrue/str
str name=spellcheck.dictionarysuggest/str
str name=spellcheck.onlyMorePopulartrue/str
str

Re: FuzzyLookupFactory with exactMatchFirst not giving the exact match.

2014-02-11 Thread Hamish Campbell

Ah, I think the term frequency is only available for the Spellcheckers
rather than the Suggesters - so I tried a DirectSolrSpellChecker. This gave
me good spelling suggestions for misspelt terms, but if the term is spelled
correctly I, again, get no term information and correctlySpelled is false.
Back to square 1.

On Wed, Feb 12, 2014 at 12:37 PM, Hamish Campbell
hamish.campb...@koordinates.com wrote:

You can tell if the user's keywords exist in the index on a term-by-term
basis by specifying spellcheck.extendedResults=true. Then look under each
lst name=ORIG_KEYWORD for int name=origFreq0/int.

On Tue, Feb 11, 2014 at 3:59 PM, Areek Zillur areek...@gmail.com wrote:

Dont worry about the analysis chain, I realized you are using the
spellcheck component for suggestions. The suggestion gets returned from
the
Lucene layer, but unfortunately the Spellcheck component strips the
suggestion out as it is mainly built for spell checking (when the query
token == suggestion; spelling is correct, so why suggest it!). You can try
out the SuggestComponent (SOLR-5378), it does the right thing in this
situation.