date:20160930

[ANNOUNCEMENT] Luke 6.2.1 released

2016-09-30 Thread Tomoko Uchida

Download the release zip here:

https://github.com/DmitryKey/luke/releases/tag/luke-6.2.1

Upgrade to Lucene 6.2.1.
#67 
Enjoy!

Re: JSON Facet "allBuckets" behavior

2016-09-30 Thread Yonik Seeley

On Tue, Sep 27, 2016 at 12:20 PM, Karthik Ramachandran
 wrote:
> While performing json faceting with "allBuckets" and "mincount", I not sure 
> if I am expecting a wrong result or there is bug?
>
> By "allBucket" definition the response, representing the union of all of the 
> buckets.
[...]
> I was wonder, why the result is not this, since I have "mincount:2"

allBuckets means all of the buckets before limiting or filtering (i.e.
mincount filtering).

-Yonik

Re: Archiving documents

2016-09-30 Thread hairymcclarey

 You can also look at sharding options for SolrCloud, e.g. with implicit 
sharding you can choose a sharding field and SolrCloud will index your docs 
into shards based on this field. You could have two shards (and also replicate 
your main shard if you want for distributed searches and fault tolerance) or 
even split your main and archive into several shards depending on size/general 
requirements. You can then very easily search your main shard(s) by adding 
shard=my_main_shard or the entire collection by excluding it. I'm looking at 
this for time-series data where I'll have maybe a shard per year so my shard 
field would be the year, it may make sense to do some "manual" work to merge 
older shards but not sure on this yet.
Alternatively you can use a composite key to be more explicit about whether you 
place your docs in archive or not by using the prefix of the key to denote 
main/archive, and you'd have the same options for searching as above. With this 
you'd need to do some re-indexing as you move stuff in and out of archive - 
sounds like you'd need something like this because you want to be more in 
control of whether a doc is in archive or not.

 

-Original Message-
From: Vasu Y [mailto:vya...@gmail.com] 
Sent: 29 September 2016 14:55
To: solr-user@lucene.apache.org
Subject: Archiving documents

Hi,
 We would like to archive documents based on some criteria (like those that 
were not modified for more than an year OR are least used) in order to reduce 
storage requirements.
I would like hear some of the best practices followed.

How about having main collection and optionally an archive collection (or one 
or more archive collections?) to where we move documents (at regular
intervals) from the main collection based on some criteria (least used or 
modified date etc.) and provide a flag during search whether to include 
archived documents in search or not?

Thanks,
Vasu

RE: Highlighting brings in irrelevant words

2016-09-30 Thread Bade, Vidya (Sagar)

Forgot to include: All the three fields used for highlighting are configured as 
follows:





Thank You,
:Sagar

-Original Message-
From: Bade, Vidya (Sagar) [mailto:vb...@webmd.net] 
Sent: Friday, September 30, 2016 2:12 PM
To: solr-user@lucene.apache.org
Subject: Highlighting brings in irrelevant words

Hi,

I am using Solr 4.10.2 following is my request:

defType(Parser)=edismax
fl=id,title,description,link_title
qf=title description keywords
pf=title description
hl=true
hl.fl=title,description,link_title
hl.q=lupus
q=lupus

I have two records in the index. Both about lupus. When I query using the 
above, the highlights are returned but not just the query term but also other 
irrelevant terms as shown below. The only synonym I have for lupus is 
"lupus,SLE,systemic lupus erythematosus" and I don't have the words "Symptoms" 
or "Sleep" in the synonyms file.

Can anyone tell me what I am doing wrong or how to fix this issue?

"highlighting":{
"07":{
  "title":["What Are the Symptoms of Lupus?"],
  "description":["Medical guide to the symptoms of lupus."],
  "link_title":["Lupus: The Symptoms and Signs"]},
"09":{
  "title":["Lupus and Sleep"],
  "description":["Join this team to get the tips you need to sleep better 
while living with lupus."],
  "link_title":["Lupus and Sleep"]}}

Thank You,
Sagar

Highlighting brings in irrelevant words

2016-09-30 Thread Bade, Vidya (Sagar)

Hi,

I am using Solr 4.10.2 following is my request:

defType(Parser)=edismax
fl=id,title,description,link_title
qf=title description keywords
pf=title description
hl=true
hl.fl=title,description,link_title
hl.q=lupus
q=lupus

I have two records in the index. Both about lupus. When I query using the 
above, the highlights are returned but not just the query term but also other 
irrelevant terms as shown below. The only synonym I have for lupus is 
"lupus,SLE,systemic lupus erythematosus" and I don't have the words "Symptoms" 
or "Sleep" in the synonyms file.

Can anyone tell me what I am doing wrong or how to fix this issue?

"highlighting":{
"07":{
  "title":["What Are the Symptoms of Lupus?"],
  "description":["Medical guide to the symptoms of lupus."],
  "link_title":["Lupus: The Symptoms and Signs"]},
"09":{
  "title":["Lupus and Sleep"],
  "description":["Join this team to get the tips you need to sleep better 
while living with lupus."],
  "link_title":["Lupus and Sleep"]}}

Thank You,
Sagar

Yes. Apparently, there is work to do with phrase queries. As I continue
to debug, noticed that a multi word phrase query is CPU bound as it
certainly works "hard". Are there any optimizations to consider?

On 9/29/16 8:14 AM, Erick Erickson wrote:

bq: The QTimes increase as the number of words in a phrase increase

Well, there's more work to do as the # of words increases, and if you
have large slops there's more work yet.

Best,
Erick

On Wed, Sep 28, 2016 at 5:54 PM, Rallavagu wrote:

Thanks Erick.

I have added queries for "firstSearcher" and "newSearcher". After startup,
pmap shows well populated mmap entries and have better QTimes than before.

However, phrase queries (edismax with pf2) are still sluggish. The QTimes
increase as the number of words in a phrase increase. None of the mmap
"warming" seem to have any impact on this. Am I missing anything? Thanks.

On 9/24/16 5:20 PM, Erick Erickson wrote:

Hmm..

About <1>: Yep, GC is one of the "more art than science" bits of
Java/Solr. Siiih.

About <2>: that's what autowarming is about. Particularly the
filterCache and queryResultCache. My guess is that you have the
autowarm count on those two caches set to zero. Try setting it to some
modest number like 16 or 32. The whole _point_ of those parameters is
to smooth out these kinds of spikes. Additionally, the newSearcher
event (also in solrconfig.xml) is explicitly intended to allow you to
hard-code queries that fill the internal caches as well as the mmap OS
memory from disk, people include facets, sorts and the like in that
event. It's fired every time a new searcher is opened (i.e. whenever
you commit and open a new searcher)...

FirstSearcher is for restarts. The difference is that newSearcher
presumes Solr has been running for a while and the autowarm counts
have something to work from. OTOH, when you start Solr there's no
history to autowarm so firstSeracher can be quite a bit more complex
than newSearcher. Practically, most people just copy newSearcher into
firstSearcher on the assumption that restarting Solr is pretty
rare.

about <3> MMap stuff will be controlled by the OS I think. I actually
worked with a much more primitive system at one point that would be
dog-slow during off-hours. Someone wrote an equivalent of a cron job
to tickle the app upon occasion to prevent periodic slowness.

for a nauseating set of details about hard and soft commits, see:

https://lucidworks.com/blog/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Sat, Sep 24, 2016 at 11:35 AM, Rallavagu wrote:

On 9/22/16 5:59 AM, Shawn Heisey wrote:

On 9/22/2016 5:46 AM, Muhammad Zahid Iqbal wrote:

Did you find any solution to slow searches? As far as I know jetty
container default configuration is bit slow for large production
environment.

This might be true for the default configuration that comes with a
completely stock jetty downloaded from eclipse.org, but the jetty
configuration that *Solr* ships with is adequate for just about any Solr
installation. The Solr configuration may require adjustment as the
query load increases, but the jetty configuration usually doesn't.

Thanks,
Shawn

It turned out to be a "sequence of performance testing sessions" in order
to
locate slowness. Though I am not completely done with it, here are my
finding so far. We are using NRT configuration (warm up count to 0 for
caches and NRTCachingDirectoryFactory for index directory)

1. Essentially, solr searches (particularly with edismax and relevance)
generate lot of "garbage" that makes GC activity to kick in more often.
This
becomes even more when facets are included. This has huge impact on
QTimes
(I have 12g heap and configured 6g to NewSize).

2. After a fresh restart (or core reload) when searches are performed,
Solr
would initially "populate" mmap entries and this is adding to total
QTimes
(I have made sure that index files are cached at filesystem layer using
vmtouch - https://hoytech.com/vmtouch). When run the same test again with
mmap entries populated from previous tests, it shows improved QTimes
relative to previous test.

3. Seems the populated mmap entries are flushed away after certain idle
time
(not sure if it is controlled by Solr or underlying OS). This will make
subsequent searches to fetch from "disk" (even though the disk items are
cached by OS).

So, what I am gonna try next is to tune the field(s) for facets to reduce
the index size if possible. Though I am not sure, if it will have impact
but
would attempt to change the "caches" even though they will be invalidated
after a softCommit (every 10 minutes in my case).

Any other tips/clues/suggestions are welcome. Thanks.

Re: Configuring a custom Analyzer for the SynonymFilter

2016-09-30 Thread Raf

Just to bring up to a conclusion, I have finally solved my issue by
creating a custom Analyzer for use with the SynonymFilter.
It is not as "declarative" as I would have hoped, but at least it works :)

Greetings,
*Raf*

On Wed, Sep 28, 2016 at 9:26 AM, Raf  wrote:

> On Wed, Sep 28, 2016 at 3:21 AM, Alexandre Rafalovitch  > wrote:
>
>> Before you go down this rabbit hole, are you actually sure this does
>> what you think it does?
>>
>> As far as I can tell, that parameter is for analyzing/parsing the
>> synonym entries in the synonym file. Not the incoming search queries
>> or text actually being indexed.
>
>
>
> Yes, this is exactly what I am looking for.
>
> I have already customized my indexing and query analyzer for that field,
> by using a custom filter that performs lemmatization for the Italian
> language.
> Hence, the token I have in my index (or in the parsed query) are something
> like evento_n (event -> noun) or mangiare_v (eat -> verb).
>
> Now I would like to define synonyms without having to know the "lemma"
> form.
>
> For example, I would like to have in my synonyms file:
> evento,festa,spettacolo
> and make the *SynonymFilter* analyzer transform them in
> *evento_n,festa_n,spettacolo_n*
>
> This way, a query like *myField:spettacoli* (the plural form of
> *spettacolo*) would be analyzed as *myField:(spettacolo_n evento_n
> festa_n)*.
>
>
>
>> Did you get it to work with the simpler configuration?
>>
>
> Yes, I carried out an experiment using the standard Lucene ItalianAnalyzer
> class (both at indexing and query time and for the SynonymFilter) and it
> works the way I was expecting. Unfortunately I cannot use this analyzer
> because I have to apply my custom lemmatization filter.
>
> Therefore, I am confident I can achieve my desired result by defining a
> custom Analyzer class, but I would have preferred to be able to alter the
> filter chain just modifying the *schema.xml* file.
>
> Is there an alternative way to achieve the same result I am not seeing?
>
> Thank you very much for your help.
>
>
> Bye,
> *Raf*
>
>
>
>>
>> Just double checking.
>>
>> Regards,
>>Alex.
>> 
>> Newsletter and resources for Solr beginners and intermediates:
>> http://www.solr-start.com/
>>
>>
>> On 27 September 2016 at 22:45, Raf  wrote:
>> > On Tue, Sep 27, 2016 at 4:22 PM, Alexandre Rafalovitch <
>> arafa...@gmail.com>
>> > wrote:
>> >
>> >> Looking at the code (on GitHub is easiest), it can take either
>> >> analyzer or tokenizer but definitely not any chain definitions. This
>> >> seems to be the same all the way to 6.2.1.
>> >>
>> >
>> > Thanks for your answer Alex.
>> >
>> > Does anyone know if it exists a viable alternative to make it
>> configurable
>> > inside the schema.xml instead of defining a custom Java class?
>> >
>> > I was thinking about something like:
>> >
>> > * defining the *analyzer* outside of the *field* element, giving it a
>> name:
>> > 
>> >
>> >
>> >
>> > 
>> >
>> > * referring to it inside the *SynonymFilter* definition by its name:
>> > > > ignoreCase="true" expand="true" analyzer="myAnalyzer"/>
>> >
>> > Unfortunately I have not found anything like this inside the Solr
>> > documentation.
>> > Is it possible to achieve something like that or the only solution is
>> > writing a custom Java class for each combination filter I need to use
>> for
>> > synonyms analysis?
>> >
>> > Thanks.
>> >
>> > Bye,
>> > *Raffaella*
>> >
>> >
>> > 
>> >> Newsletter and resources for Solr beginners and intermediates:
>> >> http://www.solr-start.com/
>> >>
>> >>
>> >> On 27 September 2016 at 21:10, Raf  wrote:
>> >> > Hi,
>> >> > is it possible to configure a custom analysis for synonyms the same
>> way
>> >> we
>> >> > do for index/query field analysis?
>> >> >
>> >> > Reading the *SynonymFilter* documentation[0], I have found I can
>> specify
>> >> a
>> >> > custom analyzer by writing its class name.
>> >> >
>> >> > Example:
>> >> > 
>> >> >   
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > > >> synonyms="synonyms.txt"
>> >> > ignoreCase="true" expand="true"
>> >> > analyzer="org.apache.lucene.analysis.it.ItalianAnalyzer"/>
>> >> >   
>> >> > 
>> >> >
>> >> >
>> >> > What I would like to achieve, instead, it is something like this:
>> >> > 
>> >> >   
>> >> > 
>> >> > 
>> >> > 
>> >> > 
>> >> > > >> synonyms="synonyms.txt"
>> >> > ignoreCase="true" expand="true">
>> >> >   
>> >> >   
>> >> >   
>> >> >   
>> >> > 
>> >> > 
>> >> >   
>> >> > 
>> >> >
>> >> >
>> >> > I have tried to configure it this way, but it does not work.
>> >> > I do not get any configuration error, but the custom analyzer is not
>> >> > applied to synonyms.
>> >> >
>> >> > Is it possible to achieve this result by configuration or am I
>> forced to
>> >> > write a custom Analyzer class?
>> >> >
>> >> > I am currently using Solr 5.2.1.
>> >> > At the mome

Re: Archiving documents

2016-09-30 Thread Shawn Heisey

On 9/29/2016 6:55 AM, Vasu Y wrote:
>  We would like to archive documents based on some criteria (like those that
> were not modified for more than an year OR are least used) in order to
> reduce storage requirements.
> I would like hear some of the best practices followed.
>
> How about having main collection and optionally an archive collection (or
> one or more archive collections?) to where we move documents (at regular
> intervals) from the main collection based on some criteria (least used or
> modified date etc.) and provide a flag during search whether to include
> archived documents in search or not?

As long as the collections are using compatible schemas and configs, the
general idea here should work.

If this is SolrCloud, you can create a collection alias that can search
multiple collections.

If it's not SolrCloud, you can still do a distributed search using the
"shards" parameter, but it will be slightly more complicated to set up.

If both schemas have a boolean field for the archive flag, with
documents in the main collection having "false" in that field and
documents in the archive collection having "true" in that field, then
you can include a filter for that flag in your search to limit the
search to one collection or the other.  I think that's probably the best
approach.

Thanks,
Shawn

Re: solr deployment help

2016-09-30 Thread Shawn Heisey

On 9/29/2016 1:24 AM, Ken Fan wrote:
> introduce me, my name is kenfan and i work in IT company, indonesian.
> i want to deploy solr 6.2.0 to openshift online. in openshift online i
> used jbossews-2.0. after that i upload a solr-webapps to webapps
> openshift online. 

We cannot be sure that deployment into JBoss will work.  The only
officially supported deployment option is the included Jetty. 
Deployment into *all* third-party containers has never really worked,
usually for reasons that we cannot control.

Solr 5.0 had the first service installation script, and was the second
version with a startup script.  It was released February 2015, over a
year ago.  For over a year, developers have been working under the
assumption that they only need to support one container.  It's very
possible that deployment in third-party containers has gotten less
reliable than it was in earlier versions.

https://wiki.apache.org/solr/WhyNoWar

You can continue to try and deploy into JBoss, but you are on your own
to make it work.  I recommend deploying the included Jetty with our
installation script onto UNIX-like operating systems.  For
internet-facing services, we strongly recommend that Solr is *not*
directly reachable by end users, but only by the applications and
administrators that require access.

I urge you to try a small deployment with the script into a dev
environment, and see if your problem remains.  If it does, then please
describe the problem so we can help you with it.

Thanks,
Shawn

Re: Connect to SolrCloud using proxy in SolrJ

2016-09-30 Thread Mikhail Khludnev

I even tried to run zkCli with socksProxyHost/Port - no luck. But then
lurking internets reveal a chilling truth it won't ever work because
zookeeper Java client uses NIO which doesn't support proxies at all, at
contrast to good old java.net API. No way.

On Thu, Sep 29, 2016 at 12:06 PM, Mikhail Khludnev  wrote:

> Zookeeper clients connect on tcp not http. Perhaps SOCKS proxy might help,
> but I don't know exactly.
>
> On Thu, Sep 29, 2016 at 11:55 AM, Preeti Bhat 
> wrote:
>
>> Hi Vincenzo,
>>
>> Yes, I have tried using the https protocol.  We are not able to connect
>> to Zookeeper's.
>>
>> I am getting the below error message.
>>
>> Could not connect to ZooKeeper zkHost within 1 ms
>>
>> Thanks and Regards,
>> Preeti Bhat
>>
>> -Original Message-
>> From: Vincenzo D'Amore [mailto:v.dam...@gmail.com]
>> Sent: Thursday, September 29, 2016 1:57 PM
>> To: solr-user@lucene.apache.org
>> Subject: Re: Connect to SolrCloud using proxy in SolrJ
>>
>> Hi,
>>
>> not sure, have you tried to add proxy configuration for https ?
>>
>> System.setProperty("https.proxyHost", ProxyHost);
>> System.setProperty("https.proxyPort", ProxyPort);
>>
>>
>> Bests,
>> Vincenzo
>>
>> On Thu, Sep 29, 2016 at 10:12 AM, Preeti Bhat 
>> wrote:
>>
>> > HI All,
>> >
>> > Pinging this again. Could someone please advise.
>> >
>> >
>> > Thanks and Regards,
>> > Preeti Bhat
>> >
>> > From: Preeti Bhat
>> > Sent: Wednesday, September 28, 2016 7:14 PM
>> > To: solr-user@lucene.apache.org
>> > Subject: Connect to SolrCloud using proxy in SolrJ
>> >
>> > Hi All,
>> >
>> > I am trying to connect to the Solrcloud using the zookeeper host
>> > string in my java application(CloudSolrClient). I am able to connect
>> > to the solrcloud when there are no proxy settings needed, but when
>> > trying to connect to the code using proxy settings, I am getting the
>> below error.
>> >
>> >
>> > 1)  Without Proxy
>> >
>> > System.setProperty("javax.net.ssl.keyStore", keyStore);
>> >
>> > System.setProperty("javax.net.ssl.keyStorePassword", keyStorePsswd);
>> >
>> > System.setProperty("javax.net.ssl.trustStore", trustStore);
>> >
>> > System.setProperty("javax.net.ssl.trustStorePassword",
>> > trustStorePsswd);
>> >
>> >
>> > HttpClientBuilder builder = HttpClientBuilder.create();
>> >builder.useSystemProperties();
>> >CloseableHttpClient httpclient = builder.build(); cloud_client
>> > = new CloudSolrClient(zkHost, httpclient);
>> > @SuppressWarnings("rawtypes")
>> > SolrRequest req = new QueryRequest();
>> > req.setBasicAuthCredentials(UserName, Password);
>> >cloud_client.request(req, collectionName);
>> >
>> > cloud_client.setDefaultCollection(collectionName);
>> >
>> > 2)  With Proxy code
>> >
>> > System.setProperty("javax.net.ssl.keyStore", keyStore);
>> >
>> > System.setProperty("javax.net.ssl.keyStorePassword", keyStorePsswd);
>> >
>> > System.setProperty("javax.net.ssl.trustStore", trustStore);
>> >
>> > System.setProperty("javax.net.ssl.trustStorePassword",
>> > trustStorePsswd); System.setProperty("http.proxyHost", ProxyHost);
>> > System.setProperty("http.proxyPort", ProxyPort); HttpHost proxy = new
>> > HttpHost(ProxyHost, Integer.parseInt(ProxyPort)); RequestConfig
>> > defaultRequestConfig = RequestConfig.custom()
>> >  .setProxy(new
>> > HttpHost(ProxyHost, Integer.parseInt(ProxyPort))).build();
>> >
>> > HttpGet httpget = new HttpGet();
>> > httpget.setConfig(defaultRequestConfig);
>> >
>> > HttpClientBuilder builder = HttpClientBuilder.create();
>> > builder.useSystemProperties(); CloseableHttpClient httpclient =
>> > builder.build(); httpclient.execute(proxy, httpget); cloud_client =
>> > new CloudSolrClient(zkHost, httpclient);
>> > @SuppressWarnings("rawtypes")
>> > SolrRequest req = new QueryRequest();
>> > req.setBasicAuthCredentials(UserName, Password);
>> > cloud_client.request(req, collectionName);
>> > cloud_client.setDefaultCollection(collectionName);
>> >
>> >
>> > Please note that the Zookeepers are already whitelisted in the proxy
>> > under TCP protocol & SOLR servers are under https protocol.
>> > Could someone please advise on this??
>> >
>> > Thanks and Regards,
>> > Preeti Bhat
>> >
>> >
>> >
>> > NOTICE TO RECIPIENTS: This communication may contain confidential
>> > and/or privileged information. If you are not the intended recipient
>> > (or have received this communication in error) please notify the
>> > sender and it-supp...@shoregrp.com immediately, and destroy this
>> > communication. Any unauthorized copying, disclosure or distribution of
>> > the material in this communication is strictly forbidden. Any views or
>> > opinions presented in this email are solely those of the author and do
>> > not necessarily represent those of the company. Finally, the recipient
>> > should check this email and any attachments for the presence of
>> > viruses. The company accepts no liability for any damage caused by any
>> virus transmitted by this

Re: How to know if SOLR indexing is completed prorammatically

2016-09-30 Thread subinalex

Thanks a lot christian..
let me explore that..


:)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-know-if-SOLR-indexing-is-completed-prorammatically-tp4298799p4298807.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: How to know if SOLR indexing is completed prorammatically

2016-09-30 Thread Christian Ortner

Hi,

the admin console is backed by a JSON API. You can run the same requests it
uses programatically. Find them easily by checking your browser debug
tools' networking tab.

Regards,
Chris

On Fri, Sep 30, 2016 at 10:29 AM, subinalex  wrote:

> Hi Guys,
>
> We are running back to back solr indexing batch jobs.We need to ensure if
> the triggered batch indexing is completed before starting the next.
>
> I know we can check the status by viewing the 'Logging' and 'CoreAdmin'
> page
> of solr admin console.
>
> But,we need to find this out programmatically and based on this trigger the
> next solr indexing batch job.
>
>
> Please help with this.
>
>
> :)
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/How-to-know-if-SOLR-indexing-is-completed-
> prorammatically-tp4298799.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Possible change of tie breaker behavior between Solr 4.4 and Solr 6

2016-09-30 Thread Christian Ortner

Hello everyone,

We're in the process of upgrading a service from Solr 4.4 to Solr 6. While
comparing result quality between the two versions, I found that a result's
phrase query score now contains the highest scoring field. In Solr 4.4, the
sum of all matching fields' scores was added to the total score.

This behavior looks similar to how a tie breaker set to >0 impacts regular
queries, although this does not appear to be documented. Is it possible
that such a change was introduced between 4.4 and 6? I checked the change
logs and documentation to no avail.

Thanks!
Chris

How to know if SOLR indexing is completed prorammatically

2016-09-30 Thread subinalex

Hi Guys,

We are running back to back solr indexing batch jobs.We need to ensure if
the triggered batch indexing is completed before starting the next.

I know we can check the status by viewing the 'Logging' and 'CoreAdmin' page
of solr admin console.

But,we need to find this out programmatically and based on this trigger the
next solr indexing batch job.


Please help with this.


:)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-know-if-SOLR-indexing-is-completed-prorammatically-tp4298799.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: how to sampling search result

2016-09-30 Thread Renaud Delbru

Some people in the Elasticsearch community are using random scoring [1] 
to sample a document subset from the search results. Maybe something 
similar could be implemented for Solr ?


There are probably more efficient sampling solution than this one, but 
this solution is likely more straightforward to implement.


[1] 
https://www.elastic.co/guide/en/elasticsearch/guide/current/random-scoring.html


--
Renaud Delbru

On 27/09/16 15:57, googoo wrote:

Hi,

Is it possible I can sampling based on  "search result"?
Like run query first, and search result return 1 million documents.
With random sampling, 50% (500K) documents return for facet, and stats.

The sampling need based on "search result".

Thanks,
Yongtao



--
View this message in context: 
http://lucene.472066.n3.nabble.com/how-to-sampling-search-result-tp4298269.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: solr hierarchical search hyponyms hypernyms

2016-09-30 Thread Andrea Gazzarini


Hi Francesco,

On 29/09/16 10:47, marcyborg wrote:

Hi Andrea,
Thanks very much for your complete reply.
You're right, I'm new about Solr, so I'm sorry if'm asking trivial
questions, or I'not exaustive in my questions!

About the scenario, I try to explain it:
I have to load the thesaurus in Solr core, and the user would be able to
query that thesaurus, when searching a keyword.
Getting into the details: I search a keyword T, if this T has BT and/or NT,
I'd like to retrive that terms, and show that.
If you mean "user enters term T; in the thesaurus it is associated to a 
BT and NT1, NT2; I want to expand the search using all those terms" then 
I think the most trivial thing you can do is a simple Java standalone 
program (or whatever language you prefer) that loads the thesaurus and 
convert it in the plain synonyms format. Then, you can see in the Solr 
reference guide [1] how to configure that (the default configuration 
should already have set up, if I remember well, just a matter of 
replacing the default synynoms.txt file included in the example).


I suggest you to start reading the reference guide and then go deeper 
into the synonyms topic, which can be very tricky (the "pain in the ass" 
Hoss mentioned in his answer).


Best,
Andrea

[1] 
https://cwiki.apache.org/confluence/display/solr/Filter+Descriptions#FilterDescriptions-SynonymFilter

I hope this clarifies the scenario!

Ciao,
Francesco



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-hierarchical-search-hyponyms-hypernyms-tp4298385p4298569.html
Sent from the Solr - User mailing list archive at Nabble.com.

[ANNOUNCEMENT] Luke 6.2.1 released

Re: JSON Facet "allBuckets" behavior

Re: Archiving documents

RE: Highlighting brings in irrelevant words

Highlighting brings in irrelevant words

Re: slow updates/searches

Re: Configuring a custom Analyzer for the SynonymFilter

Re: Archiving documents

Re: solr deployment help

Re: Connect to SolrCloud using proxy in SolrJ

Re: How to know if SOLR indexing is completed prorammatically

Re: How to know if SOLR indexing is completed prorammatically

Possible change of tie breaker behavior between Solr 4.4 and Solr 6

How to know if SOLR indexing is completed prorammatically

Re: how to sampling search result

Re: solr hierarchical search hyponyms hypernyms

16 matches

Site Navigation

Mail list logo

Footer information