Pagination issue when grouping

2017-05-29 Thread Nguyen Manh Tien
Hello,

I group search result by a field (with high cardinality)
I paginate search page using num of groups using param group.ngroups=true.
But that cause high CPU issue. So i turn off it.

Without ngroups=true, i can't get the num of groups so pagination is not
correct because i must use numFound,

it alway miss some last pages, the reason is some results was already
collapsed into groups in previous pages.

For example, a search return 11 results, but there are 2 results belong to
1 groups, so it has 10 groups (but i don't know it in advance because i set
ngroups=false), with 11 results, pagination display 2 pages, but page 2
have 0 results.

Anyone faced similar issue and had a work around?

Thanks,
Tien


Re: Unable to enrich UIMA annotated results to Solr fields

2017-05-29 Thread aruninfo100
I was able to resolve the issue.I was passing the extracted text content of
each document to Solr for indexing after converting to lowercase(did this
for a different usage).When the original content(without  converting
lowercase) was indexed annotated entities were enriched to respective fields
to Solr.
Noted this when I was analyzing the texts using CVD .

Thanks and Regards,
Arun




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Unable-to-enrich-UIMA-annotated-results-to-Solr-fields-tp4337349p4337942.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: TLog for non-Solrcloud scenario

2017-05-29 Thread Nawab Zada Asad Iqbal
Thanks Erick, that summary is very helpful.


Nawab


On Mon, May 29, 2017 at 1:39 PM, Erick Erickson 
wrote:

> Yeah, it's a bit confusing. I made Yonik and Mark take me through the
> process in detail in order to write that blog, misunderstandings my
> fault of course ;)
>
> bq: This makes me think that at the time of soft-commit,
> the documents in preceding update requests are already flushed (might not
> be on the disk yet, but JVM has handed over the responsibility to Operating
> system)
>
> True. Soft commits aren't about the tlog at all, just making docs that
> are already indexed visible to  searchers. Soft commits don't have any
> effect on the segment files either.
>
> Back to your original question:
>
> bq: Does it mean that flush protects against JVM crash but not power
> failure?
> While fsync will protect against both scenarios.
>
> In a word, "yes". In practice, the only time people will do an fsync
> (which you can specify when you commit) is in situations where they
> need to guard against the remote possibility that the bits would be
> lost if the power went out during that very short interval. And you
> have a one-replica system (assuming SolrCloud). And you don't have a
> tlog (see below).
>
> bq:  If the JVM crashes or there is a loss of power, changes that
> occurred after the last *hard
> commit* will be lost."
>
> OK, there's a distinction between whether the tlog enabled or not.
> There's nothing at all that _requires_ the tlog. So you have two
> scenarios:
>
> 1> tlog not enabled. In this scenario the above is completely true.
> Unless and until the hard commit is performed, documents sent to the
> index are lost if there's a power outage or you kill Solr harshly. A
> hard commit will close all open segments so the state of the index is
> consistent. When Solr starts up it only "knows" about segments that
> were closed by a hard commit.
>
> 2> tlog enabled. In this scenario, any docs written to the tlog (and
> the flush/fsync discussion pertains here) then, upon restart, the Solr
> node will replay docs between the last hard commit from the tlog and
> no data successfully written to the tlog will be lost. Note that Solr
> doesn't "know" about the unclosed segments in this case either. But
> you don't care since any docs in those segments are re-indexed from
> the tlog.
>
> One implication here is that if you do _not_ hard commit, your tlogs
> will grow without limit. Which is one of the reasons you can specify
> openSearcher=false for hard commits, so you can commit frequently,
> preserving your index without having to replay and without worrying
> about the expense of opening new searchers.
>
> Best,
> Erick
>
> On Mon, May 29, 2017 at 12:47 PM, Nawab Zada Asad Iqbal
>  wrote:
> > Thanks Erick,
> >
> > I have read different documents in this area and I am getting confused
> due
> > to overloaded/"reused" terms.
> >
> > E.g., in that lucidworks page, the flow for an indexing request is
> > explained as follows. This makes me think that at the time of
> soft-commit,
> > the documents in preceding update requests are already flushed (might not
> > be on the disk yet, but JVM has handed over the responsibility to
> Operating
> > system). (even if we don't do it as part of soft-commit)
> >
> > "After all the leaders have responded, the originating node replies to
> the
> > client. At this point,
> >
> > *all documents have been flushed to the tlog for all the nodes in the
> > cluster!"*
> >
> > On Mon, May 29, 2017 at 7:57 AM, Erick Erickson  >
> > wrote:
> >
> >> There's a long post here on this that might help:
> >>
> >> https://lucidworks.com/2013/08/23/understanding-
> >> transaction-logs-softcommit-and-commit-in-sorlcloud/
> >>
> >> Short form: soft commit doesn't flush tlogs, does not start a new
> >> tlog, does not close segments, does not open new segments.
> >>
> >> Hard commit does all of these things.
> >>
> >> Best,
> >> Erick
> >>
> >> On Sun, May 28, 2017 at 3:59 PM, Nawab Zada Asad Iqbal <
> khi...@gmail.com>
> >> wrote:
> >> > Hi,
> >> >
> >> > SolrCloud document 
> >> > mentions:
> >> >
> >> > "The sync can be tunable e.g. flush vs fsync by default can protect
> >> against
> >> > JVM crashes but not against power failure and can be much faster "
> >> >
> >> > Does it mean that flush protects against JVM crash but not power
> failure?
> >> > While fsync will protect against both scenarios.
> >> >
> >> >
> >> > Also, this NRT help
> >> >  >> Real+Time+Searching>
> >> > explains soft commit as:
> >> > "A *soft commit* is much faster since it only makes index changes
> visible
> >> > and does not fsync index files or write a new index descriptor. If the
> >> JVM
> >> > crashes or there is a loss of power, changes that occurred after the
> >> last *hard
> >> > commit* will be lost."
> >> >
> >> > This is little confusing, as a soft-commit will only happen after a
> tl

Re: TLog for non-Solrcloud scenario

2017-05-29 Thread Erick Erickson
Yeah, it's a bit confusing. I made Yonik and Mark take me through the
process in detail in order to write that blog, misunderstandings my
fault of course ;)

bq: This makes me think that at the time of soft-commit,
the documents in preceding update requests are already flushed (might not
be on the disk yet, but JVM has handed over the responsibility to Operating
system)

True. Soft commits aren't about the tlog at all, just making docs that
are already indexed visible to  searchers. Soft commits don't have any
effect on the segment files either.

Back to your original question:

bq: Does it mean that flush protects against JVM crash but not power failure?
While fsync will protect against both scenarios.

In a word, "yes". In practice, the only time people will do an fsync
(which you can specify when you commit) is in situations where they
need to guard against the remote possibility that the bits would be
lost if the power went out during that very short interval. And you
have a one-replica system (assuming SolrCloud). And you don't have a
tlog (see below).

bq:  If the JVM crashes or there is a loss of power, changes that
occurred after the last *hard
commit* will be lost."

OK, there's a distinction between whether the tlog enabled or not.
There's nothing at all that _requires_ the tlog. So you have two
scenarios:

1> tlog not enabled. In this scenario the above is completely true.
Unless and until the hard commit is performed, documents sent to the
index are lost if there's a power outage or you kill Solr harshly. A
hard commit will close all open segments so the state of the index is
consistent. When Solr starts up it only "knows" about segments that
were closed by a hard commit.

2> tlog enabled. In this scenario, any docs written to the tlog (and
the flush/fsync discussion pertains here) then, upon restart, the Solr
node will replay docs between the last hard commit from the tlog and
no data successfully written to the tlog will be lost. Note that Solr
doesn't "know" about the unclosed segments in this case either. But
you don't care since any docs in those segments are re-indexed from
the tlog.

One implication here is that if you do _not_ hard commit, your tlogs
will grow without limit. Which is one of the reasons you can specify
openSearcher=false for hard commits, so you can commit frequently,
preserving your index without having to replay and without worrying
about the expense of opening new searchers.

Best,
Erick

On Mon, May 29, 2017 at 12:47 PM, Nawab Zada Asad Iqbal
 wrote:
> Thanks Erick,
>
> I have read different documents in this area and I am getting confused due
> to overloaded/"reused" terms.
>
> E.g., in that lucidworks page, the flow for an indexing request is
> explained as follows. This makes me think that at the time of soft-commit,
> the documents in preceding update requests are already flushed (might not
> be on the disk yet, but JVM has handed over the responsibility to Operating
> system). (even if we don't do it as part of soft-commit)
>
> "After all the leaders have responded, the originating node replies to the
> client. At this point,
>
> *all documents have been flushed to the tlog for all the nodes in the
> cluster!"*
>
> On Mon, May 29, 2017 at 7:57 AM, Erick Erickson 
> wrote:
>
>> There's a long post here on this that might help:
>>
>> https://lucidworks.com/2013/08/23/understanding-
>> transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Short form: soft commit doesn't flush tlogs, does not start a new
>> tlog, does not close segments, does not open new segments.
>>
>> Hard commit does all of these things.
>>
>> Best,
>> Erick
>>
>> On Sun, May 28, 2017 at 3:59 PM, Nawab Zada Asad Iqbal 
>> wrote:
>> > Hi,
>> >
>> > SolrCloud document 
>> > mentions:
>> >
>> > "The sync can be tunable e.g. flush vs fsync by default can protect
>> against
>> > JVM crashes but not against power failure and can be much faster "
>> >
>> > Does it mean that flush protects against JVM crash but not power failure?
>> > While fsync will protect against both scenarios.
>> >
>> >
>> > Also, this NRT help
>> > > Real+Time+Searching>
>> > explains soft commit as:
>> > "A *soft commit* is much faster since it only makes index changes visible
>> > and does not fsync index files or write a new index descriptor. If the
>> JVM
>> > crashes or there is a loss of power, changes that occurred after the
>> last *hard
>> > commit* will be lost."
>> >
>> > This is little confusing, as a soft-commit will only happen after a tlog
>> > entry is flushed. Isn't it? Or doesn't tlog work differently for
>> solrcloud
>> > and non-solrCloud configurations.
>> >
>> >
>> > Thanks
>> > Nawab
>>


Re: Spread SolrCloud across two locations

2017-05-29 Thread Jan Høydahl
> I believe that my solution isolates manual change to two ZK nodes in DC2, 
> while your requires config change to 1 in DC2 and manual start/stop of 1 in 
> DC1.

Answering my own statement here. Turns out that in order to flip the observer 
bit for one ZK node, you need to touch the config of all the others, quote from 
https://zookeeper.apache.org/doc/trunk/zookeeperObservers.html 
: "Secondly, in 
every server config file, you must add :observer to the server definition line 
of each Observer”. 

So basically you would do some sed magic or keep two versions of configs around 
and switch between them to toggle. Wonder what happens once you have modified 
the config of one ZK but not the rest, will they be confused and start printing 
warnings to the logs? Is it safe to to the observer flip as a rolling update or 
should you take down the whole ZK ensemble for it?

And if you lose DC1, then modify the remaining ZKs in DC2 to form its own 
quorum, then when the DC1 ZKs come up again their config is not aware that some 
nodes in DC2 have changed from observer to follower - is there a risk that DC1 
starts operating alone once it comes up again, or will it try repeatedly to 
contact the Zk nodes in DC2 and then discover that they disagree about the 
voting authority of some nodes or what?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

Re: TLog for non-Solrcloud scenario

2017-05-29 Thread Nawab Zada Asad Iqbal
Thanks Erick,

I have read different documents in this area and I am getting confused due
to overloaded/"reused" terms.

E.g., in that lucidworks page, the flow for an indexing request is
explained as follows. This makes me think that at the time of soft-commit,
the documents in preceding update requests are already flushed (might not
be on the disk yet, but JVM has handed over the responsibility to Operating
system). (even if we don't do it as part of soft-commit)

"After all the leaders have responded, the originating node replies to the
client. At this point,

*all documents have been flushed to the tlog for all the nodes in the
cluster!"*

On Mon, May 29, 2017 at 7:57 AM, Erick Erickson 
wrote:

> There's a long post here on this that might help:
>
> https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Short form: soft commit doesn't flush tlogs, does not start a new
> tlog, does not close segments, does not open new segments.
>
> Hard commit does all of these things.
>
> Best,
> Erick
>
> On Sun, May 28, 2017 at 3:59 PM, Nawab Zada Asad Iqbal 
> wrote:
> > Hi,
> >
> > SolrCloud document 
> > mentions:
> >
> > "The sync can be tunable e.g. flush vs fsync by default can protect
> against
> > JVM crashes but not against power failure and can be much faster "
> >
> > Does it mean that flush protects against JVM crash but not power failure?
> > While fsync will protect against both scenarios.
> >
> >
> > Also, this NRT help
> >  Real+Time+Searching>
> > explains soft commit as:
> > "A *soft commit* is much faster since it only makes index changes visible
> > and does not fsync index files or write a new index descriptor. If the
> JVM
> > crashes or there is a loss of power, changes that occurred after the
> last *hard
> > commit* will be lost."
> >
> > This is little confusing, as a soft-commit will only happen after a tlog
> > entry is flushed. Isn't it? Or doesn't tlog work differently for
> solrcloud
> > and non-solrCloud configurations.
> >
> >
> > Thanks
> > Nawab
>


Re: StandardDirectoryReader.java:: applyAllDeletes, writeAllDeletes

2017-05-29 Thread Michael McCandless
If you are not using NRT readers then the applyAllDeletes/writeAllDeletes
boolean values are completely unused (and should have no impact on your
performance).

Mike McCandless

http://blog.mikemccandless.com

On Sun, May 28, 2017 at 8:34 PM, Nawab Zada Asad Iqbal 
wrote:

> After reading some more code it seems if we are sure that there are no
> deletes in this segment/index, then setting  applyAllDeletes and
> writeAllDeletes both to false will achieve similar to what I was getting in
> 4.5.0
>
> However, after I read the comment from IndexWriter::DirectoryReader
> getReader(boolean applyAllDeletes, boolean writeAllDeletes) , it seems that
> this method is particular to NRT.  Since we are not using soft commits, can
> this change actually improve our performance during full reindex?
>
>
> Thanks
> Nawab
>
>
>
>
>
>
>
>
>
> On Sun, May 28, 2017 at 2:16 PM, Nawab Zada Asad Iqbal 
> wrote:
>
>> Thanks Michael and Shawn for the detailed response. I was later able to
>> pull the full history using gitk; and found the commits behind this patch.
>>
>> Mike:
>>
>> So, in solr 4.5.0 ; some earlier developer has added code and config to
>> set applyAllDeletes to false when we reindex all the data.  At the moment,
>> I am not sure about the performance gain by this.
>>
>> 
>>
>>
>> I am investigating the question, if this change is still needed in 6.5.1
>> or can this be achieved by any other configuration?
>>
>> For now, we are not planning to use NRT and solrCloud.
>>
>>
>> Thanks
>> Nawab
>>
>> On Sun, May 28, 2017 at 9:26 AM, Michael McCandless <
>> luc...@mikemccandless.com> wrote:
>>
>>> Sorry, yes, that commit was one of many on a feature branch I used to
>>> work on LUCENE-5438, which added near-real-time index replication to
>>> Lucene.  Before this change, Lucene's replication module required a commit
>>> in order to replicate, which is a heavy operation.
>>>
>>> The writeAllDeletes boolean option asks Lucene to move all recent
>>> deletes (tombstone bitsets) to disk while opening the NRT (near-real-time)
>>> reader.
>>>
>>> Normally Lucene won't always do that, and will instead carry the bitsets
>>> in memory from writer to reader, for reduced refresh latency.
>>>
>>> What sort of custom changes do you have in this part of Lucene?
>>>
>>> Mike McCandless
>>>
>>> http://blog.mikemccandless.com
>>>
>>> On Sat, May 27, 2017 at 10:35 PM, Nawab Zada Asad Iqbal <
>>> khi...@gmail.com> wrote:
>>>
 Hi all

 I am looking at following change in lucene-solr which doen't mention any
 JIRA. How can I know more about it?

 "1ae7291 Mike McCandless on 1/24/16 at 3:17 PM current patch"

 Specifically, I am interested in what 'writeAllDeletes'  does in the
 following method. Let me know if it is very stupid question and I should
 have done something else before emailing here.

 static DirectoryReader open(IndexWriter writer, SegmentInfos infos,
 boolean applyAllDeletes, boolean writeAllDeletes) throws IOException {

 Background: We are running solr4.5 and upgrading to 6.5.1. We have
 some custom code in this area, which we need to merge.


 Thanks

 Nawab

>>>
>>>
>>
>


Re: Spread SolrCloud across two locations

2017-05-29 Thread Jan Høydahl
> In my setup once DC1 comes back up make sure you start only two nodes.

And if you start all three in DC1, you have 3+3 voting, what would then happen? 
Any chance of state corruption?
I believe that my solution isolates manual change to two ZK nodes in DC2, while 
your requires config change to 1 in DC2 and manual start/stop of 1 in DC1.

> Add another server in either DC1 or DC2, in a separate rack, with independent 
> power etc. As Shawn says below, install the third ZK there. You would satisfy 
> most of your requirements that way.


Well, that’s not up to me to decide, it’s the customer environment that sets 
the constraints, they currently have 2 independent geo locations. And Solr is 
just a dependency of some other app they need to install, so doubt that they 
are very happy to start adding racks or independent power/network for this 
alone. Of course, if they already have such redundancy within one of the DCs, 
placing a 3rd ZK there is an ideal solution with probably good enough HA. If 
not, I’m looking for the 2nd best low-friction approach with software-only.

Thanks for the input all!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 26. mai 2017 kl. 21.27 skrev Pushkar Raste :
> 
> In my setup once DC1 comes back up make sure you start only two nodes.
> 
> Now bring down the original observer and make it observer again.
> 
> Bring back the third node
> 
> 
> 
> It seems like lot of work compared to Jan's setup but you get 5 voting
> members instead of 3 in normal situation.
> 
> On May 26, 2017 10:56 AM, "Susheel Kumar"  wrote:
> 
>> Thanks, Pushkar, Make sense.  Trying to understand the difference between
>> your setup vs Jan's proposed setup.
>> 
>> - Seems like when DC1 goes down, in your setup we have to bounce *one* from
>> observer to non-observer while in Jan's setup *two* observers to
>> non-observers.  Anything else I am missing
>> 
>> - When DC1 comes back -  with your setup we need to bounce the one
>> non-observer to observer to have 5 nodes quorum otherwise there are 3 + 3
>> observers while with Jan's setup if we don't take any action when DC1 comes
>> back, we are still operational with 5 nodes quorum.  Isn't it?  Or I am
>> missing something.
>> 
>> 
>> 
>> On Fri, May 26, 2017 at 10:07 AM, Pushkar Raste 
>> wrote:
>> 
>>> Damn,
>>> Math is hard
>>> 
>>> DC1 : 3 non observers
>>> DC2 : 2 non observers
>>> 
>>> 3 + 2 = 5 non observers
>>> 
>>> Observers don't participate in voting = non observers participate in
>> voting
>>> 
>>> 5 non observers = 5 votes
>>> 
>>> In addition to the 2 non observer, DC2 also has an observer, which as you
>>> pointed out does not participate in the voting.
>>> 
>>> We still have 5 voting nodes.
>>> 
>>> 
>>> Think of the observer as a standby name node in Hadoop 1.x, where some
>>> intervention needed if the primary name node went down.
>>> 
>>> 
>>> I hope my math makes sense
>>> 
>>> On May 26, 2017 9:04 AM, "Susheel Kumar"  wrote:
>>> 
>>> From ZK documentation, observers do not participate in vote,  so Pushkar,
>>> when you said 5 nodes participate in voting, what exactly you mean?
>>> 
>>> -- Observers are non-voting members of an ensemble which only hear the
>>> results of votes, not the agreement protocol that leads up to them.
>>> 
>>> Per ZK documentation, 3.4 includes observers,  does that mean Jan thought
>>> experiment is practically possible, correct?
>>> 
>>> 
>>> On Fri, May 26, 2017 at 3:53 AM, Rick Leir  wrote:
>>> 
 Jan, Shawn, Susheel
 
 First steps first. First, let's do a fault-tolerant cluster, then
>> maybe a
 _geographically_ fault-tolerant cluster.
 
 Add another server in either DC1 or DC2, in a separate rack, with
 independent power etc. As Shawn says below, install the third ZK there.
>>> You
 would satisfy most of your requirements that way.
 
 cheers -- Rick
 
 
 On 2017-05-23 12:56 PM, Shawn Heisey wrote:
 
> On 5/23/2017 10:12 AM, Susheel Kumar wrote:
> 
>> Hi Jan, FYI - Since last year, I have been running a Solr 6.0 cluster
>>> in
>> one of lower env with 6 shards/replica in dc1 & 6 shard/replica in
>> dc2
>> (each shard replicated cross data center) with 3 ZK in dc1 and 2 ZK
>> in
>>> dc2.
>> (I didn't have the availability of 3rd data center for ZK so went
>> with
>>> only
>> 2 data center with above configuration) and so far no issues. Its
>> been
>> running fine, indexing, replicating data, serving queries etc. So in
>> my
>> test, setting up single cluster across two zones/data center works
>>> without
>> any issue when there is no or very minimal latency (in my case around
>>> 30ms
>> one way
>> 
> 
> With that setup, if dc2 goes down, you're all good, but if dc1 goes
>>> down,
> you're not.
> 
> There aren't enough ZK servers in dc2 to maintain quorum when dc1 is
> unreachable, and SolrCloud is going to go read-only.  Queries would
>> most
> like

Re: TLog for non-Solrcloud scenario

2017-05-29 Thread Erick Erickson
There's a long post here on this that might help:

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Short form: soft commit doesn't flush tlogs, does not start a new
tlog, does not close segments, does not open new segments.

Hard commit does all of these things.

Best,
Erick

On Sun, May 28, 2017 at 3:59 PM, Nawab Zada Asad Iqbal  wrote:
> Hi,
>
> SolrCloud document 
> mentions:
>
> "The sync can be tunable e.g. flush vs fsync by default can protect against
> JVM crashes but not against power failure and can be much faster "
>
> Does it mean that flush protects against JVM crash but not power failure?
> While fsync will protect against both scenarios.
>
>
> Also, this NRT help
> 
> explains soft commit as:
> "A *soft commit* is much faster since it only makes index changes visible
> and does not fsync index files or write a new index descriptor. If the JVM
> crashes or there is a loss of power, changes that occurred after the last 
> *hard
> commit* will be lost."
>
> This is little confusing, as a soft-commit will only happen after a tlog
> entry is flushed. Isn't it? Or doesn't tlog work differently for solrcloud
> and non-solrCloud configurations.
>
>
> Thanks
> Nawab


Re: java.lang.NullPointerException in json facet hll function

2017-05-29 Thread jpereira
Hi,

Any updates on this issue? I am using Solr 6.3 and I have hit this same
bug...

Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/java-lang-NullPointerException-in-json-facet-hll-function-tp4265378p4337877.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: eDisMax with nested documents

2017-05-29 Thread Rick Leir

Michael,

I assume you were trying his out in the SolrAdmin Query tab? With the 
debug=true flag set? What do you see in the debug output about scoring?


You might end up using a Boost Function to get the desired ranking. 
Google "solr multiple boost functions".


cheers -- Rick


On 2017-05-29 03:43 AM, Moritz Michael wrote:

Hello,


I'm new to this list and having a question regarding nested documents with
scoring through eDisMax.

We do have an index of e-books that contains the metadata of the e-book and
each page as a single document:

- Book 1
   - Page 1
   - Page 2
   - Page 3
   - ...
   - Page n

The parent document contain a field is_parent:true, an id and doc_id. For
parent documents, the id is the same as doc_id, for the child documents the
id is {doc_id}_{page_number}. The doc_id is always the same for the parent
and it's children (that's how we know which child belongs to which parent).

So now the challenge is to offer a fulltext search that uses eDisMax to
find the best result for a searchterm from the parents metadata and the
childrens content and combines them to a score value.

The current query looks like this:

 'fl' => 
'id,doc_id,main_type,title,subtitle,publishing_date,page_number,author,score',
 'defType' => 'edismax',
 'stopwords' => 'true',
 'mm' => '1',
 'qf' => 'id^10.0 title^50.0 subtitle^40.0 author^20.0',
 'q' => 'TEST _query_:"{!parent which=is_parent:true
score=Max}{!dismax qf='content' v='TEST'}"^20',
 'sort' => 'score DESC'


But it seems the score of the _query_ is not added to the score of the main
query. But, if I do this without the "TEST" term in the beginning: 'q'
=> '_query_:"{!parent
which=is_parent:true score=Max}{!dismax qf='content' v='TEST'}"^20', I get
a score value - so somehow it seems the score is there but I'm not sure how
to use it properly.

Is there another option to do a nested search with combined scores?

Before the questions comes up why we did not use the nested documents
feature of SOLR: we created this index years ago using SOLR 4.6 and this
feature didn't exist at this time. We now used the IndexUpdater Tools of
SOLR 5 and SOLR 6 to update our index to SOLR 6.

We weren't able to create our fulltext search in the past, because SOLR
BlockJoin feature did not support score calculation at all, but that seemed
to be changed in SOLR 5 so we want to give it another try. Every hint in
the right direction would be helpfull.

Thank you!

Moritz





eDisMax with nested documents

2017-05-29 Thread Moritz Michael
Hello,


I'm new to this list and having a question regarding nested documents with
scoring through eDisMax.

We do have an index of e-books that contains the metadata of the e-book and
each page as a single document:

   - Book 1
  - Page 1
  - Page 2
  - Page 3
  - ...
  - Page n

The parent document contain a field is_parent:true, an id and doc_id. For
parent documents, the id is the same as doc_id, for the child documents the
id is {doc_id}_{page_number}. The doc_id is always the same for the parent
and it's children (that's how we know which child belongs to which parent).

So now the challenge is to offer a fulltext search that uses eDisMax to
find the best result for a searchterm from the parents metadata and the
childrens content and combines them to a score value.

The current query looks like this:

'fl' => 
'id,doc_id,main_type,title,subtitle,publishing_date,page_number,author,score',
'defType' => 'edismax',
'stopwords' => 'true',
'mm' => '1',
'qf' => 'id^10.0 title^50.0 subtitle^40.0 author^20.0',
'q' => 'TEST _query_:"{!parent which=is_parent:true
score=Max}{!dismax qf='content' v='TEST'}"^20',
'sort' => 'score DESC'


But it seems the score of the _query_ is not added to the score of the main
query. But, if I do this without the "TEST" term in the beginning: 'q'
=> '_query_:"{!parent
which=is_parent:true score=Max}{!dismax qf='content' v='TEST'}"^20', I get
a score value - so somehow it seems the score is there but I'm not sure how
to use it properly.

Is there another option to do a nested search with combined scores?

Before the questions comes up why we did not use the nested documents
feature of SOLR: we created this index years ago using SOLR 4.6 and this
feature didn't exist at this time. We now used the IndexUpdater Tools of
SOLR 5 and SOLR 6 to update our index to SOLR 6.

We weren't able to create our fulltext search in the past, because SOLR
BlockJoin feature did not support score calculation at all, but that seemed
to be changed in SOLR 5 so we want to give it another try. Every hint in
the right direction would be helpfull.

Thank you!

Moritz