The time that init.d script waits before shutdown should be configurable

2015-11-09 Thread Yago Riveiro
The time that init.d script waits before shutdown should be configurable

The 5 seconds is not enough to all my shards notify the shutdown and the
process ends with a kill command

I think that in solr.in.sh should exists a entry to configure the time to
wait before use a kill command



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/The-time-that-init-d-script-waits-before-shutdown-should-be-configurable-tp4239143.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: SolrSpatial conversion error

2015-11-09 Thread Gangl, Michael E (398H)
Can anyone help with this error? It’s not an issue with the WKT itself as I can 
easily convert the spatial to java using the JTS api without error.

From: Michael Gangl 
>
Date: Thursday, November 5, 2015 at 3:40 PM
To: "solr-user@lucene.apache.org" 
>
Subject: SolrSpatial conversion error

I’m processing some satellite coverage data and storing it in solr to search by 
geographical regions. I can create the correct WKT and pass ‘invalid’ tests 
when its created, but when I output to WKT and then ingested in solr, it looks 
like some string to digit conversion errors are happening:

2015-11-05 23:24:03.272 ERROR (qtp1125757038-18) [   x:l2ssCore] 
o.a.s.c.SolrCore org.apache.solr.common.SolrException: Couldn't parse shape 
'POLYGON ((39.42654 86.82489, -22.74477 87.94481, -51.87799 87.34623, -70.80492 
86.02579, -80.82939 84.22955, -87.55906 81.48592, -91.99886 77.37768, -94.95214 
71.18504, -109.15262 71.1237, -122.03073 70.07185, -132.71886 68.30231, 
-143.40538 65.33532, -159.34148 70.66631, -180 73.53569, -180 90, 180 90, 180 
73.53569, 157.67432 73.89309, 154.67627 78.65489, 149.71222 82.05602, 142.35925 
84.34942, 131.24057 85.93911, 89.5779 87.4869, 39.42654 86.82489))' because: 
com.vividsolutions.jts.geom.TopologyException: side location conflict [ 
(39.426539, 86.82489, NaN) ]

The conflict point ( (39.426539, 86.82489, NaN)  isn’t in the original 
WKT, so it looks like that’s being created or synthesized somewhere within 
solr. Has anyone run into this issue before? Are there configuration options 
that can help prevent this situation?


Full stack trace:

l2ss-solr_1| 2015-11-05 23:24:03.270 INFO  (qtp1125757038-18) [   
x:l2ssCore] o.a.s.u.p.LogUpdateProcessor [l2ssCore] webapp=/solr path=/update 
params={wt=javabin=2} {} 0 157
l2ss-solr_1| 2015-11-05 23:24:03.272 ERROR (qtp1125757038-18) [   
x:l2ssCore] o.a.s.c.SolrCore org.apache.solr.common.SolrException: Couldn't 
parse shape 'POLYGON ((39.42654 86.82489, -22.74477 87.94481, -51.87799 
87.34623, -70.80492 86.02579, -80.82939 84.22955, -87.55906 81.48592, -91.99886 
77.37768, -94.95214 71.18504, -109.15262 71.1237, -122.03073 70.07185, 
-132.71886 68.30231, -143.40538 65.33532, -159.34148 70.66631, -180 73.53569, 
-180 90, 180 90, 180 73.53569, 157.67432 73.89309, 154.67627 78.65489, 
149.71222 82.05602, 142.35925 84.34942, 131.24057 85.93911, 89.5779 87.4869, 
39.42654 86.82489))' because: com.vividsolutions.jts.geom.TopologyException: 
side location conflict [ (39.426539, 86.82489, NaN) ]
l2ss-solr_1| at 
org.apache.solr.schema.AbstractSpatialFieldType.parseShape(AbstractSpatialFieldType.java:236)
l2ss-solr_1| at 
org.apache.solr.schema.AbstractSpatialFieldType.createFields(AbstractSpatialFieldType.java:201)
l2ss-solr_1| at 
org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:48)
l2ss-solr_1| at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:123)
l2ss-solr_1| at 
org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:83)
l2ss-solr_1| at 
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:237)
l2ss-solr_1| at 
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:163)
l2ss-solr_1| at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
l2ss-solr_1| at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
l2ss-solr_1| at 
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:79)
l2ss-solr_1| at 

Re: Boost query at search time according set of roles with least performance impact

2015-11-09 Thread Alessandro Benedetti
ehehe your request is kinda delicate :
1)  I can't store the
payload at index time
2) Passing all the weights at query time is not an option

So you seem to exclude all the possible solutions ...
Anyway, just thinking loud, have you tried the edismax query parser and the
boost query feature?

1) the first strategy is the one you would prefer to avoid :
you define the AuthorRole, then you use the Boost Query parameter to boost
differently your roles :
AuthorRole:"ADMIN"^100 AuthorRole:"ARCHITECT"^50 ect ...
If you have 20 roles , the query could be not readable.

2) you index the "weight" for the role in the original document.
The you use a Boost Function according to your requirement ( using there
"weight" field)

Hope this helps,

Cheers

e.g. from the Solr wiki
The bq (Boost Query) Parameter

The bq parameter specifies an additional, optional, query clause that will
be added to the user's main query to influence the score. For example, if
you wanted to add a relevancy boost for recent documents:
q=cheese
bq=date:[NOW/DAY-1YEAR TO NOW/DAY]

You can specify multiple bq parameters. If you want your query to be parsed
as separate clauses with separate boosts, use multiple bq parameters.
The bf (Boost Functions) Parameter

The bf parameter specifies functions (with optional boosts) that will be
used to construct FunctionQueries which will be added to the user's main
query as optional clauses that will influence the score. Any function
supported natively by Solr can be used, along with a boost value. For
example:
recip(rord(myfield),1,2,3)^1.5

Specifying functions with the bf parameter is essentially just shorthand
for using the bq param combined with the {!func} parser.

For example, if you want to show the most recent documents first, you could
use either of the following:
bf=recip(rord(creationDate),1,1000,1000)
  ...or...
bq={!func}recip(rord(creationDate),1,1000,1000)

On 6 November 2015 at 16:44, Andrea Roggerone <
andrearoggerone.o...@gmail.com> wrote:

> Hi all,
> I am working on a mechanism that applies additional boosts to documents
> according to the role covered by the author. For instance we have
>
> CEO|5 Architect|3 Developer|1 TeamLeader|2
>
> keeping in mind that an author could cover multiple roles (e.g. for a
> design document, a Team Leader could be also a Developer).
>
> I am aware that is possible to implement a function that leverages
> payloads, however the weights need to be configurable so I can't store the
> payload at index time.
> Passing all the weights at query time is not an option as we have more than
> 20 roles and query readability and performance would be heavily affected.
>
> Do we have any "out of the box mechanism" in Solr to implement the
> described behavior? If not, what other options do we have?
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: DELETEREPLICA command shouldn't delete de last replica of a shard

2015-11-09 Thread Yago Riveiro
I raised a JIRA with this, SOLR-8257



-
Best regards
--
View this message in context: 
http://lucene.472066.n3.nabble.com/DELETEREPLICA-command-shouldn-t-delete-de-last-replica-of-a-shard-tp4239054p4239139.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Search: Access Control / Role based security

2015-11-09 Thread Scott Stults
Susheel,

This is perfectly fine for simple use-cases and has the benefit that the
filterCache will help things stay nice and speedy. Apache ManifoldCF goes a
bit further and ties back to your authentication and authorization
mechanism:

http://manifoldcf.apache.org/release/trunk/en_US/concepts.html#ManifoldCF+security+model


k/r,
Scott

On Thu, Nov 5, 2015 at 2:26 PM, Susheel Kumar  wrote:

> Hi,
>
> I have seen couple of use cases / need where we want to restrict result of
> search based on role of a user.  For e.g.
>
> - if user role is admin, any document from the search result will be
> returned
> - if user role is manager, only documents intended for managers will be
> returned
> - if user role is worker, only documents intended for workers will be
> returned
>
> Typical practise is to tag the documents with the roles (using a
> multi-valued field) during indexing and then during search append filter
> query to restrict result based on roles.
>
> Wondering if there is any other better way out there and if this common
> requirement should be added as a Solr feature/plugin.
>
> The current security plugins are more towards making Solr apis/resources
> secure not towards securing/controlling data during search.
>
> https://cwiki.apache.org/confluence/display/solr/Authentication+and+Authorization+Plugins
>
>
> Please share your thoughts.
>
> Thanks,
> Susheel
>



-- 
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com


Security Problems

2015-11-09 Thread 马柏樟
Hi,

After I configure Authentication with Basic Authentication Plugin and 
Authorization with Rule-Based Authorization Plugin, How can I prevent the 
strangers from visiting my solr by browser? For example, if the stranger visit 
the http://(my host):8983, the browser will pop up a window and says "the 
server http://(my host):8983 requires a username and password"

Convert output response xml into input xml format using xslt

2015-11-09 Thread davidphilip cherian
Has anyone written a sample xslt (and would like to share) that converts
output response xml of solr into its  input format, to
repost/reindex it back?

Thanks


Costs/benefits of DocValues

2015-11-09 Thread Demian Katz
Hello,

I have a legacy Solr schema that I would like to update to take advantage of 
DocValues. I understand that by adding "docValues=true" to some of my fields, I 
can improve sorting/faceting performance. However, I have a couple of questions:


1.)Will Solr always take proper advantage of docValues when it is turned 
on, or will I gain greater performance by turning of stored/indexed in 
situations where only docValues are necessary (e.g. a sort-only field)?

2.)Will adding docValues to a field introduce significant performance 
penalties for non-docValues uses of that field, beyond the obvious fact that 
the additional data will consume more disk and memory?

I'm asking this question because the existing schema has some multi-purpose 
fields, and I'm trying to determine whether I should just add "docValues=true" 
wherever it might help, or if I need to take a more thoughtful approach and 
potentially split some fields with copyFields, etc. This is particularly 
significant because my schema makes use of some dynamic field suffixes, and I'm 
not sure if I need to add new suffixes to differentiate docValues/non-docValues 
fields, or if it's okay to turn on docValues across the board "just in case."

Apologies if these questions have already been answered - I couldn't find a 
totally clear answer in the places I searched.

Thanks!

- Demian


Re: Costs/benefits of DocValues

2015-11-09 Thread Erick Erickson
bq: But if we are keeping the indexed=true, then docValues=true will STILL
use at least as much memory however efficient docValues are
themselves, right?

AFAIK, kinda. The big difference is that with docValues="false", you're
building these structures in the JVM whereas with docValues="true",
the structures are at least partially in the OS memory thus relieving
the pressure on Java's heap, GC and the rest.

On Mon, Nov 9, 2015 at 9:06 AM, Alexandre Rafalovitch
 wrote:
> Thank you Yonik.
>
> So I would probably advise then to "keep your indexed=true" and think
> about _adding_ docValues when there is a memory pressure or when there
> is clear performance issue for the ...specific... uses.
>
> But if we are keeping the indexed=true, then docValues=true will STILL
> use at least as much memory however efficient docValues are
> themselves, right? Or will something that is normally loaded and use
> memory will stay unloaded in this combination scenario?
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 9 November 2015 at 11:57, Yonik Seeley  wrote:
>> On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch
>>  wrote:
>>> I thought docValues were per segment, so the price of un-inversion was
>>> effectively paid on each commit for all the segments, as opposed to
>>> just the updated one.
>>
>> Both the field cache (i.e. uninverting indexed values) and docValues
>> are mostly per-segment (I say mostly because some uses still require
>> building a global ord map).
>>
>> But even when things are mostly per-segment, you hit major segment
>> merges and the cost of un-inversion (when you aren't using docValues)
>> is non-trivial.
>>
>>> I admit I also find the story around docValues to be very confusing at
>>> the moment. Especially on the interplay with "indexed=false".
>>
>> You still need "indexed=true" for efficient filters on the field.
>> Hence if you're faceting on a field and want to use docValues, you
>> probably want to keep the "indexed=true" on the field as well.
>>
>> -Yonik
>>
>>
>>> It would
>>> make a VERY good article to have this clarified somehow by people in
>>> the know.
>>>
>>> Regards,
>>>Alex.
>>> 
>>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>>> http://www.solr-start.com/
>>>
>>>
>>> On 9 November 2015 at 11:04, Yonik Seeley  wrote:
 On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz  
 wrote:
> I understand that by adding "docValues=true" to some of my fields, I can 
> improve sorting/faceting performance.

 I don't think this is true in the general sense.
 docValues are built at index-time, so what you will save is initial
 un-inversion time (i.e. the first time a field is used after a new
 searcher is opened).
 After that point, docValues may be slightly slower.

 The other advantage of docValues is memory use... much/most of it is
 essentially "off-heap", being memory-mapped from disk.  This cuts down
 on memory issues and helps reduce longer GC pauses.

 docValues are good in general, and I think we should default to them
 more for Solr 6, but they are not better in all ways.

> However, I have a couple of questions:
>
>
> 1.)Will Solr always take proper advantage of docValues when it is 
> turned on

 Yes.

> , or will I gain greater performance by turning of stored/indexed in 
> situations where only docValues are necessary (e.g. a sort-only field)?
>
> 2.)Will adding docValues to a field introduce significant performance 
> penalties for non-docValues uses of that field, beyond the obvious fact 
> that the additional data will consume more disk and memory?

 No, it's a separate part of the index.

 -Yonik


> I'm asking this question because the existing schema has some 
> multi-purpose fields, and I'm trying to determine whether I should just 
> add "docValues=true" wherever it might help, or if I need to take a more 
> thoughtful approach and potentially split some fields with copyFields, 
> etc. This is particularly significant because my schema makes use of some 
> dynamic field suffixes, and I'm not sure if I need to add new suffixes to 
> differentiate docValues/non-docValues fields, or if it's okay to turn on 
> docValues across the board "just in case."
>
> Apologies if these questions have already been answered - I couldn't find 
> a totally clear answer in the places I searched.
>
> Thanks!
>
> - Demian


Re: Costs/benefits of DocValues

2015-11-09 Thread Yonik Seeley
On Mon, Nov 9, 2015 at 12:06 PM, Alexandre Rafalovitch
 wrote:
> Thank you Yonik.
>
> So I would probably advise then to "keep your indexed=true" and think
> about _adding_ docValues when there is a memory pressure or when there
> is clear performance issue for the ...specific... uses.
>
> But if we are keeping the indexed=true, then docValues=true will STILL
> use at least as much memory however efficient docValues are
> themselves, right? Or will something that is normally loaded and use
> memory will stay unloaded in this combination scenario?

Think about it this way: for something like sorting, we need a column
for fast docid->value lookup.
Enabling docValues means building this column at index time.  At
search time, it gets memory mapped, just like most other parts of the
index.  The required memory is off-heap... the OS needs to keep the
file in it's buffer cache for good performance.
If docValues aren't enabled, this means that we need to build the
column on-the-fly on-heap (i.e. FieldCache entry is built from
un-inverting the indexed values).

An indexed field by itself only takes up disk space, just like
docValues.  Of course for searches to be fast, off-heap RAM (in the
form of OS buffer cache / disk cache) is still needed.

-Yonik


Re: OpenNLP plugin or similar NER software for Solr ??? !!!

2015-11-09 Thread simon
https://github.com/OpenSextant/SolrTextTagger/

We're using it for country tagging successfully.

On Wed, Nov 4, 2015 at 3:10 PM, Doug Turnbull <
dturnb...@opensourceconnections.com> wrote:

> David Smiley had a place name and general tagging engine that for the life
> of me I can't find.
>
> It didn't do NER for you (I'm not sure you want to do this in the search
> engine) but it helps you tag entities in a search engine based on a
> predefined list. At least that's what I remember.
>
> On Wed, Nov 4, 2015 at 3:05 PM,  wrote:
>
> > Hi everyone,
> >
> > I need to install a plugin to extract Location (Country/State/City) from
> > free text documents - any professional advice?!? Does OpenNLP really does
> > the job? Is it English only? US only? Or does it cover worldwide places
> > names?
> > Could someone help me with this job - installation, configuration,
> > model-training etc?
> >
> > Please help,Kind regards,Christian
> >  Christian Fotache Tel: 0728.297.207 Fax: 0351.411.570
> >
> >
> >  From: Upayavira 
> >  To: solr-user@lucene.apache.org
> >  Sent: Tuesday, November 3, 2015 12:13 PM
> >  Subject: Re: language plugin
> >
> > Looking at the code, this is not going to work without modifications to
> > Solr (or at least a custom component).
> >
> > The atomic update code is closely embedded into the Solr
> > DistributedUpdateProcessor, which expands the atomic update into a full
> > document and then posts it to the shards.
> >
> > You need to do the update expansion before your lang detect processor,
> > but there is no gap between them.
> >
> > From my reading of the code, you could create an AtomicUpdateProcessor
> > that simply expands updates, and insert that before the
> > LangDetectUpdateProcessor.
> >
> > Upayavira
> >
> > On Tue, Nov 3, 2015, at 06:38 AM, Chaushu, Shani wrote:
> > > Hi
> > > When I make atomic update - set field - also on content field and also
> > > another field, the language field became generic. Meaning, it doesn’t
> > > work in the set field, only in the first inserting. Even if in the
> first
> > > time the language was detected, it just became generic after the
> update.
> > > Any idea?
> > >
> > > The chain is
> > >
> > > 
> > >  > >
> >
> class="org.apache.solr.update.processor.LangDetectLanguageIdentifierUpdateProcessorFactory">
> > > title,content,text
> > >language_t
> > >language_all_t
> > >generic
> > >false
> > >0.8
> > > 
> > > 
> > >  
> > > 
> > >
> > >
> > > Thanks,
> > > Shani
> > >
> > >
> > >
> > >
> > > -Original Message-
> > > From: Jack Krupansky [mailto:jack.krupan...@gmail.com]
> > > Sent: Thursday, October 29, 2015 17:04
> > > To: solr-user@lucene.apache.org
> > > Subject: Re: language plugin
> > >
> > > Are you trying to do an atomic update without the content field? If so,
> > > it sounds like Solr needs an enhancement (bug fix?) so that language
> > > detection would be skipped if the input field is not present. Or maybe
> > > that could be an option.
> > >
> > >
> > > -- Jack Krupansky
> > >
> > > On Thu, Oct 29, 2015 at 3:25 AM, Chaushu, Shani <
> shani.chau...@intel.com
> > >
> > > wrote:
> > >
> > > > Hi,
> > > >  I'm using solr language detection plugin on field name "content"
> > > > (solr 4.10, plugin
> LangDetectLanguageIdentifierUpdateProcessorFactory)
> > > > When I'm indexing  on the first time it works fine, but if I want to
> > > > set one field again (regardless if it's the content or not) if goes
> to
> > > > its default language. If I'm setting other field I would like the
> > > > language to stay the way it was before, and o don't want to insert
> all
> > > > the content again. There is an option to set the plugin that it won't
> > > > calculate again the language? (put langid.overwrite to false didn't
> > > > work)
> > > >
> > > > Thanks,
> > > > Shani
> > > >
> > > >
> > > > -
> > > > Intel Electronics Ltd.
> > > >
> > > > This e-mail and any attachments may contain confidential material for
> > > > the sole use of the intended recipient(s). Any review or distribution
> > > > by others is strictly prohibited. If you are not the intended
> > > > recipient, please contact the sender and delete all copies.
> > > >
> > > -
> > > Intel Electronics Ltd.
> > >
> > > This e-mail and any attachments may contain confidential material for
> > > the sole use of the intended recipient(s). Any review or distribution
> > > by others is strictly prohibited. If you are not the intended
> > > recipient, please contact the sender and delete all copies.
> >
> >
> >
> >
>
>
>
>
> --
> *Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections
> , LLC | 240.476.9983
> Author: Relevant Search 
> This e-mail and all contents, including attachments, is 

SqlEntityProcessor is too unstable

2015-11-09 Thread Yangrui Guo
Hello

I've been trying to index IMDB data from MySQL with no success yet. The
problem was with the data import handler. When I specify using of
"SqlEntityProcessor", DIH either totally skipped the row, or didn't start
importing at all, or the results are not searchable. I also tried setting
batchSize to -1 but the result count was less than the row counts in MySQL.
I checked used memory but it was far less than the entire heap. Has anyone
been in my situation before?

Yangrui


Re: Costs/benefits of DocValues

2015-11-09 Thread Yonik Seeley
On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz  wrote:
> I understand that by adding "docValues=true" to some of my fields, I can 
> improve sorting/faceting performance.

I don't think this is true in the general sense.
docValues are built at index-time, so what you will save is initial
un-inversion time (i.e. the first time a field is used after a new
searcher is opened).
After that point, docValues may be slightly slower.

The other advantage of docValues is memory use... much/most of it is
essentially "off-heap", being memory-mapped from disk.  This cuts down
on memory issues and helps reduce longer GC pauses.

docValues are good in general, and I think we should default to them
more for Solr 6, but they are not better in all ways.

> However, I have a couple of questions:
>
>
> 1.)Will Solr always take proper advantage of docValues when it is turned 
> on

Yes.

> , or will I gain greater performance by turning of stored/indexed in 
> situations where only docValues are necessary (e.g. a sort-only field)?
>
> 2.)Will adding docValues to a field introduce significant performance 
> penalties for non-docValues uses of that field, beyond the obvious fact that 
> the additional data will consume more disk and memory?

No, it's a separate part of the index.

-Yonik


> I'm asking this question because the existing schema has some multi-purpose 
> fields, and I'm trying to determine whether I should just add 
> "docValues=true" wherever it might help, or if I need to take a more 
> thoughtful approach and potentially split some fields with copyFields, etc. 
> This is particularly significant because my schema makes use of some dynamic 
> field suffixes, and I'm not sure if I need to add new suffixes to 
> differentiate docValues/non-docValues fields, or if it's okay to turn on 
> docValues across the board "just in case."
>
> Apologies if these questions have already been answered - I couldn't find a 
> totally clear answer in the places I searched.
>
> Thanks!
>
> - Demian


Re: Arabic analyser

2015-11-09 Thread Jack Krupansky
Use an index-time (but not query time) synonym filter with a rule like:

Abd Allah,Abdallah

This will index the combined word in addition to the separate words.

-- Jack Krupansky

On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem 
wrote:

> Hello,
>
> We are indexing Arabic content and facing a problem for tokenizing multi
> terms phrases like 'عبد الله' 'Abd Allah', so users will search for
> 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
> الله' with space. We are using StandardTokenizer.
>
>
> Is there any configurations to handle this case?
>
> Thank you,
> Mahmoud
>


Re: Costs/benefits of DocValues

2015-11-09 Thread Alexandre Rafalovitch
Thank you Yonik.

So I would probably advise then to "keep your indexed=true" and think
about _adding_ docValues when there is a memory pressure or when there
is clear performance issue for the ...specific... uses.

But if we are keeping the indexed=true, then docValues=true will STILL
use at least as much memory however efficient docValues are
themselves, right? Or will something that is normally loaded and use
memory will stay unloaded in this combination scenario?

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 9 November 2015 at 11:57, Yonik Seeley  wrote:
> On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch
>  wrote:
>> I thought docValues were per segment, so the price of un-inversion was
>> effectively paid on each commit for all the segments, as opposed to
>> just the updated one.
>
> Both the field cache (i.e. uninverting indexed values) and docValues
> are mostly per-segment (I say mostly because some uses still require
> building a global ord map).
>
> But even when things are mostly per-segment, you hit major segment
> merges and the cost of un-inversion (when you aren't using docValues)
> is non-trivial.
>
>> I admit I also find the story around docValues to be very confusing at
>> the moment. Especially on the interplay with "indexed=false".
>
> You still need "indexed=true" for efficient filters on the field.
> Hence if you're faceting on a field and want to use docValues, you
> probably want to keep the "indexed=true" on the field as well.
>
> -Yonik
>
>
>> It would
>> make a VERY good article to have this clarified somehow by people in
>> the know.
>>
>> Regards,
>>Alex.
>> 
>> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
>> http://www.solr-start.com/
>>
>>
>> On 9 November 2015 at 11:04, Yonik Seeley  wrote:
>>> On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz  
>>> wrote:
 I understand that by adding "docValues=true" to some of my fields, I can 
 improve sorting/faceting performance.
>>>
>>> I don't think this is true in the general sense.
>>> docValues are built at index-time, so what you will save is initial
>>> un-inversion time (i.e. the first time a field is used after a new
>>> searcher is opened).
>>> After that point, docValues may be slightly slower.
>>>
>>> The other advantage of docValues is memory use... much/most of it is
>>> essentially "off-heap", being memory-mapped from disk.  This cuts down
>>> on memory issues and helps reduce longer GC pauses.
>>>
>>> docValues are good in general, and I think we should default to them
>>> more for Solr 6, but they are not better in all ways.
>>>
 However, I have a couple of questions:


 1.)Will Solr always take proper advantage of docValues when it is 
 turned on
>>>
>>> Yes.
>>>
 , or will I gain greater performance by turning of stored/indexed in 
 situations where only docValues are necessary (e.g. a sort-only field)?

 2.)Will adding docValues to a field introduce significant performance 
 penalties for non-docValues uses of that field, beyond the obvious fact 
 that the additional data will consume more disk and memory?
>>>
>>> No, it's a separate part of the index.
>>>
>>> -Yonik
>>>
>>>
 I'm asking this question because the existing schema has some 
 multi-purpose fields, and I'm trying to determine whether I should just 
 add "docValues=true" wherever it might help, or if I need to take a more 
 thoughtful approach and potentially split some fields with copyFields, 
 etc. This is particularly significant because my schema makes use of some 
 dynamic field suffixes, and I'm not sure if I need to add new suffixes to 
 differentiate docValues/non-docValues fields, or if it's okay to turn on 
 docValues across the board "just in case."

 Apologies if these questions have already been answered - I couldn't find 
 a totally clear answer in the places I searched.

 Thanks!

 - Demian


Re: Is it impossible to update an index that is undergoing an optimize?

2015-11-09 Thread Shawn Heisey
On 11/6/2015 8:39 PM, Yonik Seeley wrote:
> On Fri, Nov 6, 2015 at 10:20 PM, Shawn Heisey  wrote:
>>  Is there a decent API for getting uniqueKey?
> Not off the top of my head.
> I deeply regret making it configurable and not just using "id" ;-)

By poking around in the admin UI with Firebug, I found something that
will work for me to get the uniqueKey field name:

SolrQuery uniqueKeyQuery = new SolrQuery();
uniqueKeyQuery.setRequestHandler("/admin/luke");
uniqueKeyQuery.set("show", "schema");
QueryResponse rsp = client.query(coreName, uniqueKeyQuery);
String uniqueKey = (String)
rsp.getResponse().findRecursive("schema", "uniqueKeyField");

Thanks,
Shawn



Re: Costs/benefits of DocValues

2015-11-09 Thread Alexandre Rafalovitch
I thought docValues were per segment, so the price of un-inversion was
effectively paid on each commit for all the segments, as opposed to
just the updated one.

I admit I also find the story around docValues to be very confusing at
the moment. Especially on the interplay with "indexed=false". It would
make a VERY good article to have this clarified somehow by people in
the know.

Regards,
   Alex.

Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
http://www.solr-start.com/


On 9 November 2015 at 11:04, Yonik Seeley  wrote:
> On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz  
> wrote:
>> I understand that by adding "docValues=true" to some of my fields, I can 
>> improve sorting/faceting performance.
>
> I don't think this is true in the general sense.
> docValues are built at index-time, so what you will save is initial
> un-inversion time (i.e. the first time a field is used after a new
> searcher is opened).
> After that point, docValues may be slightly slower.
>
> The other advantage of docValues is memory use... much/most of it is
> essentially "off-heap", being memory-mapped from disk.  This cuts down
> on memory issues and helps reduce longer GC pauses.
>
> docValues are good in general, and I think we should default to them
> more for Solr 6, but they are not better in all ways.
>
>> However, I have a couple of questions:
>>
>>
>> 1.)Will Solr always take proper advantage of docValues when it is turned 
>> on
>
> Yes.
>
>> , or will I gain greater performance by turning of stored/indexed in 
>> situations where only docValues are necessary (e.g. a sort-only field)?
>>
>> 2.)Will adding docValues to a field introduce significant performance 
>> penalties for non-docValues uses of that field, beyond the obvious fact that 
>> the additional data will consume more disk and memory?
>
> No, it's a separate part of the index.
>
> -Yonik
>
>
>> I'm asking this question because the existing schema has some multi-purpose 
>> fields, and I'm trying to determine whether I should just add 
>> "docValues=true" wherever it might help, or if I need to take a more 
>> thoughtful approach and potentially split some fields with copyFields, etc. 
>> This is particularly significant because my schema makes use of some dynamic 
>> field suffixes, and I'm not sure if I need to add new suffixes to 
>> differentiate docValues/non-docValues fields, or if it's okay to turn on 
>> docValues across the board "just in case."
>>
>> Apologies if these questions have already been answered - I couldn't find a 
>> totally clear answer in the places I searched.
>>
>> Thanks!
>>
>> - Demian


Re: Costs/benefits of DocValues

2015-11-09 Thread Yonik Seeley
On Mon, Nov 9, 2015 at 11:19 AM, Alexandre Rafalovitch
 wrote:
> I thought docValues were per segment, so the price of un-inversion was
> effectively paid on each commit for all the segments, as opposed to
> just the updated one.

Both the field cache (i.e. uninverting indexed values) and docValues
are mostly per-segment (I say mostly because some uses still require
building a global ord map).

But even when things are mostly per-segment, you hit major segment
merges and the cost of un-inversion (when you aren't using docValues)
is non-trivial.

> I admit I also find the story around docValues to be very confusing at
> the moment. Especially on the interplay with "indexed=false".

You still need "indexed=true" for efficient filters on the field.
Hence if you're faceting on a field and want to use docValues, you
probably want to keep the "indexed=true" on the field as well.

-Yonik


> It would
> make a VERY good article to have this clarified somehow by people in
> the know.
>
> Regards,
>Alex.
> 
> Solr Analyzers, Tokenizers, Filters, URPs and even a newsletter:
> http://www.solr-start.com/
>
>
> On 9 November 2015 at 11:04, Yonik Seeley  wrote:
>> On Mon, Nov 9, 2015 at 10:55 AM, Demian Katz  
>> wrote:
>>> I understand that by adding "docValues=true" to some of my fields, I can 
>>> improve sorting/faceting performance.
>>
>> I don't think this is true in the general sense.
>> docValues are built at index-time, so what you will save is initial
>> un-inversion time (i.e. the first time a field is used after a new
>> searcher is opened).
>> After that point, docValues may be slightly slower.
>>
>> The other advantage of docValues is memory use... much/most of it is
>> essentially "off-heap", being memory-mapped from disk.  This cuts down
>> on memory issues and helps reduce longer GC pauses.
>>
>> docValues are good in general, and I think we should default to them
>> more for Solr 6, but they are not better in all ways.
>>
>>> However, I have a couple of questions:
>>>
>>>
>>> 1.)Will Solr always take proper advantage of docValues when it is 
>>> turned on
>>
>> Yes.
>>
>>> , or will I gain greater performance by turning of stored/indexed in 
>>> situations where only docValues are necessary (e.g. a sort-only field)?
>>>
>>> 2.)Will adding docValues to a field introduce significant performance 
>>> penalties for non-docValues uses of that field, beyond the obvious fact 
>>> that the additional data will consume more disk and memory?
>>
>> No, it's a separate part of the index.
>>
>> -Yonik
>>
>>
>>> I'm asking this question because the existing schema has some multi-purpose 
>>> fields, and I'm trying to determine whether I should just add 
>>> "docValues=true" wherever it might help, or if I need to take a more 
>>> thoughtful approach and potentially split some fields with copyFields, etc. 
>>> This is particularly significant because my schema makes use of some 
>>> dynamic field suffixes, and I'm not sure if I need to add new suffixes to 
>>> differentiate docValues/non-docValues fields, or if it's okay to turn on 
>>> docValues across the board "just in case."
>>>
>>> Apologies if these questions have already been answered - I couldn't find a 
>>> totally clear answer in the places I searched.
>>>
>>> Thanks!
>>>
>>> - Demian


Re: The time that init.d script waits before shutdown should be configurable

2015-11-09 Thread Upayavira
Yago,

I think a JIRA has been raised for this. I'd encourage you to hunt it
down and make a patch.

Upayavira

On Mon, Nov 9, 2015, at 03:09 PM, Yago Riveiro wrote:
> The time that init.d script waits before shutdown should be configurable
> 
> The 5 seconds is not enough to all my shards notify the shutdown and the
> process ends with a kill command
> 
> I think that in solr.in.sh should exists a entry to configure the time to
> wait before use a kill command
> 
> 
> 
> -
> Best regards
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/The-time-that-init-d-script-waits-before-shutdown-should-be-configurable-tp4239143.html
> Sent from the Solr - User mailing list archive at Nabble.com.


Re: Arabic analyser

2015-11-09 Thread Mahmoud Almokadem
Thanks Jack, 

This is a good solution, but we have more combinations that I think can’t be 
handled as synonyms like every word starts with ‘عبد’ ‘Abd’ and ‘أبو’ ‘Abo’. 
When using Standard tokenizer on ‘أبو بكر’ ‘Abo Bakr’, It’ll be tokenised to 
‘أبو’ and ‘بكر’ and the filters will be applied for each separate term.

Is there available tokeniser to tokenise ‘أبو *’ or ‘عبد *' as a single term?

Thanks,
Mahmoud 


> On Nov 9, 2015, at 5:47 PM, Jack Krupansky  wrote:
> 
> Use an index-time (but not query time) synonym filter with a rule like:
> 
> Abd Allah,Abdallah
> 
> This will index the combined word in addition to the separate words.
> 
> -- Jack Krupansky
> 
> On Mon, Nov 9, 2015 at 4:48 AM, Mahmoud Almokadem 
> wrote:
> 
>> Hello,
>> 
>> We are indexing Arabic content and facing a problem for tokenizing multi
>> terms phrases like 'عبد الله' 'Abd Allah', so users will search for
>> 'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
>> الله' with space. We are using StandardTokenizer.
>> 
>> 
>> Is there any configurations to handle this case?
>> 
>> Thank you,
>> Mahmoud
>> 



Solr Suggester with Geo?

2015-11-09 Thread William Bell
http://lucidworks.com/blog/solr-suggester/


Wondering if anyone has uses these new techniques with a boost on
geodist() inverted? So the rows that get returned that are closest
need to come back first.


We are still using Edge Grams since we have not figured out how to
boost the results on geo spatial.


Anyone have thoughts?




-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Costs/benefits of DocValues

2015-11-09 Thread Mikhail Khludnev
On Mon, Nov 9, 2015 at 6:55 PM, Demian Katz 
wrote:

> I have a legacy Solr schema that I would like to update to take advantage
> of DocValues. I understand that by adding "docValues=true" to some of my
> fields, I can improve sorting/faceting performance.


Demian,
If an index has many segments  (let's say more than 5, or 10) docValues
faceting performance is prohibitive for old facet.field=.. .
You either need to wait for Solr 5.4 (see
https://issues.apache.org/jira/browse/SOLR-7730) or switch to JSON Facets.


-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





Re: Solr Suggester with Geo?

2015-11-09 Thread Sameer Maggon
Have you looked at the Spatial extensions for Solr? If you are indexing
Lat/Lon along with your documents, you can compute the distance from the
origin & use that distance as one of the boost factors to affect the score.
Typically, use cases around that combine the geo score with other factors
as a pure sort by geo score might not give you the relevant results.

e.g. typing to search for "sushi restaurants" near Santa Monica, CA - you
might not want "thai restaurants" that are closest to you. (Local Search
use case)

https://cwiki.apache.org/confluence/display/solr/Spatial+Search

Thanks,
-- 
*Sameer Maggon*
www.measuredsearch.com 
Fully Managed Solr-as-a-Service | Solr Consulting | Solr Support



On Mon, Nov 9, 2015 at 11:18 AM, William Bell  wrote:

> http://lucidworks.com/blog/solr-suggester/
>
>
> Wondering if anyone has uses these new techniques with a boost on
> geodist() inverted? So the rows that get returned that are closest
> need to come back first.
>
>
> We are still using Edge Grams since we have not figured out how to
> boost the results on geo spatial.
>
>
> Anyone have thoughts?
>
>
>
>
> --
> Bill Bell
> billnb...@gmail.com
> cell 720-256-8076
>


Re: Solr Suggester with Geo?

2015-11-09 Thread William Bell
Yeah we have that working today. But the issue is we want to use
http://lucidworks.com/blog/solr-suggester/

And you cannot do a boost with that right?



On Mon, Nov 9, 2015 at 12:41 PM, Sameer Maggon 
wrote:

> Have you looked at the Spatial extensions for Solr? If you are indexing
> Lat/Lon along with your documents, you can compute the distance from the
> origin & use that distance as one of the boost factors to affect the score.
> Typically, use cases around that combine the geo score with other factors
> as a pure sort by geo score might not give you the relevant results.
>
> e.g. typing to search for "sushi restaurants" near Santa Monica, CA - you
> might not want "thai restaurants" that are closest to you. (Local Search
> use case)
>
> https://cwiki.apache.org/confluence/display/solr/Spatial+Search
>
> Thanks,
> --
> *Sameer Maggon*
> www.measuredsearch.com 
> Fully Managed Solr-as-a-Service | Solr Consulting | Solr Support
>
>
>
> On Mon, Nov 9, 2015 at 11:18 AM, William Bell  wrote:
>
> > http://lucidworks.com/blog/solr-suggester/
> >
> >
> > Wondering if anyone has uses these new techniques with a boost on
> > geodist() inverted? So the rows that get returned that are closest
> > need to come back first.
> >
> >
> > We are still using Edge Grams since we have not figured out how to
> > boost the results on geo spatial.
> >
> >
> > Anyone have thoughts?
> >
> >
> >
> >
> > --
> > Bill Bell
> > billnb...@gmail.com
> > cell 720-256-8076
> >
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076


Re: Solr Suggester with Geo?

2015-11-09 Thread Sameer Maggon
Looking through the code and some example Suggesters, it seems that
theoretically, one can write a GeoSuggester and provide that as the Lookup
implementation (lookupimpl) that would factor in the geo score or extend
the SolrSuggestor to support spatial extensions in the same spirit as
"Filters" are supported today.

Sameer.

On Mon, Nov 9, 2015 at 11:47 AM, William Bell  wrote:

> Yeah we have that working today. But the issue is we want to use
> http://lucidworks.com/blog/solr-suggester/
>
> And you cannot do a boost with that right?
>
>
>
> On Mon, Nov 9, 2015 at 12:41 PM, Sameer Maggon 
> wrote:
>
> > Have you looked at the Spatial extensions for Solr? If you are indexing
> > Lat/Lon along with your documents, you can compute the distance from the
> > origin & use that distance as one of the boost factors to affect the
> score.
> > Typically, use cases around that combine the geo score with other factors
> > as a pure sort by geo score might not give you the relevant results.
> >
> > e.g. typing to search for "sushi restaurants" near Santa Monica, CA - you
> > might not want "thai restaurants" that are closest to you. (Local Search
> > use case)
> >
> > https://cwiki.apache.org/confluence/display/solr/Spatial+Search
> >
> > Thanks,
> > --
> > *Sameer Maggon*
> > www.measuredsearch.com 
> > Fully Managed Solr-as-a-Service | Solr Consulting | Solr Support
> >
> >
> >
> > On Mon, Nov 9, 2015 at 11:18 AM, William Bell 
> wrote:
> >
> > > http://lucidworks.com/blog/solr-suggester/
> > >
> > >
> > > Wondering if anyone has uses these new techniques with a boost on
> > > geodist() inverted? So the rows that get returned that are closest
> > > need to come back first.
> > >
> > >
> > > We are still using Edge Grams since we have not figured out how to
> > > boost the results on geo spatial.
> > >
> > >
> > > Anyone have thoughts?
> > >
> > >
> > >
> > >
> > > --
> > > Bill Bell
> > > billnb...@gmail.com
> > > cell 720-256-8076
> > >
> >
>


Re: Parent/Child (Nested Document) Faceting

2015-11-09 Thread Mikhail Khludnev
Yonik,

I wonder is there a plan or a vision for something like
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-reverse-nested-aggregation.html
under JSON facets?

Thanks

On Sun, Jun 14, 2015 at 4:02 AM, Yonik Seeley  wrote:

> Hey Folks, I'd love some feedback on the interface for nested document
> faceting (or rather switching facet domains to/from parent/child).
>
> See the bottom of this blog:
> http://yonik.com/solr-nested-objects/
>
> Issue #1: How to specify that one should change domains before faceting?
>
> I originally started out with a new facet type (like query facet, but
> switches domains).
> So if you started out querying a child of type book, you would first
> do a "blockParent" facet to map the domain to parents, and then put
> the actual facet you wanted as a sub-facet.
>
> q=book_review:xx  /* query some child-doc of book */
> json.facet=
>   {  // NOTE: this was my first pass... not the current interface
> books : {
>   type: blockParent,
>   parentFilter : "type:book"
>   facet : {
> authors : {
>   type : terms,
>   field : author
> }
>  }
>   }
>
> Although having a separate facet type to map domains is logically very
> clean, it does introduce an additional level of indentation which may
> not be desired.
>
> So then I thought about including domain switching operations under a
> "domain" directive in the facet itself:
>
> json.facet=
> {  // current form a domain switching facet
>   authors : {
> type: terms,
> field: author,
> domain : {blockParent:"type:book"}
>   }
> }
>
> I envision some future other options for "domain" including the
> ability to reset the domain with another query (ignoring your parent
> domain), or adding additional filters to the domain before faceting,
> or normal (non-block) joins.
>
> Issue #2: Naming
>
> I avoided toParent and toChild because people cloud be confused that
> it would work on any sort of parent/child relationship (i.e. other
> than nested documents).
>
> I used "blockParent" and "blockChildren" because I was thinking about
> block join.
> One alternative that might be better could be "nested" (i.e. nestedParent).
>
> Pluralization:
> I picked the singular for blockParent and plural for blockChildren
> since a single block as one parent and multiple children.  But you
> could think about it in other ways since we're mapping a set of
> documents at a time (i.e. both could be pluralized).
>
> Options:
> nestedParent, nestedChildren   // current option
> nestedParents, nestedChildren // both plural
> nestedChild, nestedParent// both singular
>
> Feedback appreciated!
>
> -Yonik
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics





[DIH] deltaQuery has no column to resolve to declared primary key pk='id'

2015-11-09 Thread Hangu Choi
Hi,
I stuck in DIH...

full import is fine.
and delta import was also fine before I add deltaQuery and parentDeltaQuery
in 'auth' entity.

thank you for any help..








   
   
   
   
   




   








Regards,
Hangu


solr search relevancy

2015-11-09 Thread Dhanesh Radhakrishnan
Hi,
Can anybody help me to resolve an issues with solr search relevancy.
Problem is that when somebody search "Bank", it displays some other
business related to this phrase.
For Eg it shows  "Blood bank" and "Power bank" as the first results.
To resolve this, we implemented the proximity search at the end of the
phrase for getting the relevancy and boost the field.This eliminate the
issue in search, and irrelevant results goes down to the last sections
For Eg : When a user search "Bank" he gets banks on the top results, not
the "Blood banks".This resolved with proximity in phrases and boosting.

http://stackoverflow.com/questions/12070016/solr-how-to-boost-score-for-early-matches

http://localhost:8983/solr/localbusiness/select?q=((name
:"bank")^300+OR+((categoryPrefixed:"_CATEGORY_PREFIXED_+bank"~1)^300+AND+(categoryPrefixed:"_CATEGORY_PREFIXED_+bank"~2)^200+AND+(categoryPrefixed:"_CATEGORY_PREFIXED_+bank"~3)^100)+OR+
(tag:"bank")^30+OR+(address:"bank")^5)
=0=10=json=true


But the actual problem occurred when we sort the search result.

There is a specific requirement from client that the "Premium" listings
should display as top results.For that we have field packageWeight in solr
schema with values  like 10, 20, 30 for free, basic and premium
consecutively.




And now when we perform this sorting, we get some irrelevant results to
top, but its Premium listing.
How this happened is that
In solr schema, there is a field  "tag". A tag is a small summary or words
that used related to the business and can be implemented to provide a quick
overview of the business.
When a search perform based on  "Tag" which is comparatively very low boost.




In solr doc, there is a premium business named "Mobile Store" and which is
tagged with a keyword "Power Bank".
When we search "Bank" without sorting we are getting relevant results first.
But when we  sort result with field packageWeight, this doc comes first.

Is there any way to resolve this issue??
Or at least is there any way to remove certain fields from the sort, but
not from search.


Regards
dhanesh s r

-- 

--
IMPORTANT: This is an e-mail from HiFX IT Media Services Pvt. Ltd. Its 
content are confidential to the intended recipient. If you are not the 
intended recipient, be advised that you have received this e-mail in error 
and that any use, dissemination, forwarding, printing or copying of this 
e-mail is strictly prohibited. It may not be disclosed to or used by anyone 
other than its intended recipient, nor may it be copied in any way. If 
received in error, please email a reply to the sender, then delete it from 
your system. 

Although this e-mail has been scanned for viruses, HiFX cannot ultimately 
accept any responsibility for viruses and it is your responsibility to scan 
attachments (if any).

​
Before you print this email or attachments, please consider the negative 
environmental impacts associated with printing.


Re: solr search relevancy

2015-11-09 Thread Emir Arnautovic

Hi Dhanesh,
Several things you could try:
* when you are searching for "bank" you are actually searching for 
tag/category and in your query you are boosting name 300 while tag is 3.
* you must not sort on premium content weight - you can either use boost 
query clauses to prefer premium content
* use elevator component in case you want to explicitly list some 
results for some queries
* take a look at edismax query parser instead of building your own query 
- it gives you nice features you could use here: boost fields, minimum 
terms match, boost queries...


Regards,
Emir

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



On 09.11.2015 11:50, Dhanesh Radhakrishnan wrote:

Hi,
Can anybody help me to resolve an issues with solr search relevancy.
Problem is that when somebody search "Bank", it displays some other
business related to this phrase.
For Eg it shows  "Blood bank" and "Power bank" as the first results.
To resolve this, we implemented the proximity search at the end of the
phrase for getting the relevancy and boost the field.This eliminate the
issue in search, and irrelevant results goes down to the last sections
For Eg : When a user search "Bank" he gets banks on the top results, not
the "Blood banks".This resolved with proximity in phrases and boosting.

http://stackoverflow.com/questions/12070016/solr-how-to-boost-score-for-early-matches

http://localhost:8983/solr/localbusiness/select?q=((name
:"bank")^300+OR+((categoryPrefixed:"_CATEGORY_PREFIXED_+bank"~1)^300+AND+(categoryPrefixed:"_CATEGORY_PREFIXED_+bank"~2)^200+AND+(categoryPrefixed:"_CATEGORY_PREFIXED_+bank"~3)^100)+OR+
(tag:"bank")^30+OR+(address:"bank")^5)
=0=10=json=true


But the actual problem occurred when we sort the search result.

There is a specific requirement from client that the "Premium" listings
should display as top results.For that we have field packageWeight in solr
schema with values  like 10, 20, 30 for free, basic and premium
consecutively.




And now when we perform this sorting, we get some irrelevant results to
top, but its Premium listing.
How this happened is that
In solr schema, there is a field  "tag". A tag is a small summary or words
that used related to the business and can be implemented to provide a quick
overview of the business.
When a search perform based on  "Tag" which is comparatively very low boost.




In solr doc, there is a premium business named "Mobile Store" and which is
tagged with a keyword "Power Bank".
When we search "Bank" without sorting we are getting relevant results first.
But when we  sort result with field packageWeight, this doc comes first.

Is there any way to resolve this issue??
Or at least is there any way to remove certain fields from the sort, but
not from search.


Regards
dhanesh s r



child document faceting returning empty buckets

2015-11-09 Thread Yangrui Guo
Hello

I followed Yonik's blog regarding faceting on child document and my curl
command is posted below:

curl http://localhost:8983/solr/movie_shard1_replica1/query -d '
q={!parent which="content_type:parent"}+movie&
json.facet={
movies:{
type:terms,
field:actor,
domain:{blockChildren:"content_type:children"}
}
}'

But I got an empty list of buckets from the response. The count number was
equivalent to number of parent docs. Is there anything wrong with my query?

 "facets":{
"count":2412762,
"movies":{
  "buckets":[]}}}

Yangrui Guo


Re: solr-8983-console.log is huge

2015-11-09 Thread CrazyDiamond
i use solr cloud. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-8983-console-log-is-huge-tp4238613p4239100.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: child document faceting returning empty buckets

2015-11-09 Thread Yonik Seeley
On Mon, Nov 9, 2015 at 7:30 PM, Yangrui Guo  wrote:
> Just solved the problem by changing blockChildren:"content_type:children"
> to blockParent:"content_type:children".

Unless you're dealing with multiple levels, you may be using the wrong
content_type value.
That query should always define the full set of parents for both
blockChildren and blockParents.

-Yonik


Re: child document faceting returning empty buckets

2015-11-09 Thread Yangrui Guo
Just solved the problem by changing blockChildren:"content_type:children"
to blockParent:"content_type:children". Does Solrj support json faceting as
well?

Yangrui

On Mon, Nov 9, 2015 at 2:39 PM, Yangrui Guo  wrote:

> Hello
>
> I followed Yonik's blog regarding faceting on child document and my curl
> command is posted below:
>
> curl http://localhost:8983/solr/movie_shard1_replica1/query -d '
> q={!parent which="content_type:parent"}+movie&
> json.facet={
> movies:{
> type:terms,
> field:actor,
> domain:{blockChildren:"content_type:children"}
> }
> }'
>
> But I got an empty list of buckets from the response. The count number was
> equivalent to number of parent docs. Is there anything wrong with my query?
>
>  "facets":{
> "count":2412762,
> "movies":{
>   "buckets":[]}}}
>
> Yangrui Guo
>


Re: No live SolrServers available to handle this request

2015-11-09 Thread wilanjar .
Hi Erick,

Thanks for your response.
You right my node running properly and the graph is green.
we solve with remove the data index in collection and reindex again.

Thanks

On Fri, Nov 6, 2015 at 11:02 PM, Erick Erickson 
wrote:

> The host may be running well, but my bet is that
> you have an error in the schema.xml file so it's
> no longer valid XML and the core did not load.
>
> So while the solr instance is up and running, no
> core using that schema is running, thus no
> live servers.
>
> Look at the admin UI, cloud>>graph view and
> if the collection you're trying to operate on is
> not green, then that's probably the issue.
>
> Otherwise look through the Solr log file and
> you should see some exceptions that may
> point the way.
>
> Best,
> Erick
>
> On Thu, Nov 5, 2015 at 11:58 PM, wilanjar .  wrote:
> > Hi All,
> >
> > I'm very new handle the solrcloud.
> > I've changed the scema.xml with adding field to index but after reload
> the
> > collection we got error from logging " No live SolrServers available to
> > handle this request".
> >
> > i have check solrcloud from localhost each node and running  well.
> > i'm using solr version 4.10.4 lucene version 4.10.4
> > tomcat 8.0.27
> > zookeeper 3.4.6.
> >
> > I already googling but not get solution yet.
> >
> > Thank you.
>


Arabic analyser

2015-11-09 Thread Mahmoud Almokadem
Hello,

We are indexing Arabic content and facing a problem for tokenizing multi
terms phrases like 'عبد الله' 'Abd Allah', so users will search for
'عبدالله' 'Abdallah' without space and need to get the results of 'عبد
الله' with space. We are using StandardTokenizer.


Is there any configurations to handle this case?

Thank you,
Mahmoud


Re: Solr results relevancy / scoring

2015-11-09 Thread Emir Arnautovic
To get answer for why 15, you can use field analysis for index/query and 
see that "15%" is probably tokenized and as both 15 and 15%.


Emir

On 06.11.2015 20:22, Erick Erickson wrote:

I'm not sure what the question your asking is. You say
that you have debugged the query and the score for 15 is
higher than the ones below it. What's surprising about that?

Are you saying you don't understand how the score is
calculated? Or the output when adding =true
is inconsistent or what?

Best,
Erick

On Fri, Nov 6, 2015 at 11:04 AM, Brian Narsi  wrote:

I have a situation where.

User search query

q=15%

Solr results contain several documents that are

15%
15%
15%
15%
15 (why?)
15%
15%

I have debugged the query and can see that the score for 15 is higher than
the ones below it.

Why is that? Where can I read in detail about how the scoring is being done?

Thanks


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Solr results relevancy / scoring

2015-11-09 Thread Alessandro Benedetti
 I quote Emir and I would like to ask if the Norms are ignored or not.
If they are not ignored and 15 is one of the search tokens, I can expect an
high score for a doc containing "15" because the Norm value will be quite
high ( as the field contains basically exactly the query term).

Cheers

On 9 November 2015 at 10:01, Emir Arnautovic 
wrote:

> To get answer for why 15, you can use field analysis for index/query and
> see that "15%" is probably tokenized and as both 15 and 15%.
>
> Emir
>
>
> On 06.11.2015 20:22, Erick Erickson wrote:
>
>> I'm not sure what the question your asking is. You say
>> that you have debugged the query and the score for 15 is
>> higher than the ones below it. What's surprising about that?
>>
>> Are you saying you don't understand how the score is
>> calculated? Or the output when adding =true
>> is inconsistent or what?
>>
>> Best,
>> Erick
>>
>> On Fri, Nov 6, 2015 at 11:04 AM, Brian Narsi  wrote:
>>
>>> I have a situation where.
>>>
>>> User search query
>>>
>>> q=15%
>>>
>>> Solr results contain several documents that are
>>>
>>> 15%
>>> 15%
>>> 15%
>>> 15%
>>> 15 (why?)
>>> 15%
>>> 15%
>>>
>>> I have debugged the query and can see that the score for 15 is higher
>>> than
>>> the ones below it.
>>>
>>> Why is that? Where can I read in detail about how the scoring is being
>>> done?
>>>
>>> Thanks
>>>
>>
> --
> Monitoring * Alerting * Anomaly Detection * Centralized Log Management
> Solr & Elasticsearch Support * http://sematext.com/
>
>


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England