AW: solr suggester.rebuild takes forever and eventually runs out of memory on production

2020-07-24 Thread Sebastian Riemer
Oh, I am sorry, I totally forgot to mention our solr version, it's 7.7.3.

-Ursprüngliche Nachricht-
Von: Sebastian Riemer [mailto:s.rie...@littera.eu] 
Gesendet: Freitag, 24. Juli 2020 09:53
An: solr-user@lucene.apache.org
Betreff: solr suggester.rebuild takes forever and eventually runs out of memory 
on production

Dear mailing list community,

we have troubles when starting the Suggester-Build on one of our production 
servers.


1.   We execute the required query with the suggest.build parameter

2.   It seems solr is taking up the task to recreate the suggester index 
(we see that the CPU rises significantly)

3.   It takes forever to build (and seems to never finish!)

4.   Sometimes  the linux OOM killer strikes and usually picks the solr 
process and kills it
5.   During the rebuild calling the suggester results in "suggester not 
built" exception
6.   Restarting the solr-Service has no effect. It just continues the 
rebuild

How long should it take for that task, given that our index currently holds 
approximately 7,2 Mio  documents in a parent/child structure?
Is it possible, to query the progress of the suggester.build task after it was 
started?
How can we be sure, whether the suggester.build task is still running or 
whether it is finished?

Which factors have the most significant impact on the duration of the rebuild 
process, given that we use the config below? (Let me now, if you need 
additional information) Can we speed up the process somehow?

Best regards,
Sebastian
Solrconfig.xml

   
   infixSuggester
   AnalyzingInfixLookupFactory
   infix_suggestions
   
   DocumentDictionaryFactory
   SUGGEST
   textSuggest
   false
   false
   


   
   true
   infixSuggester
   true
   500
   true
   
   
   suggest
   



Mit freundlichen Grüßen
Sebastian Riemer, BSc


solr suggester.rebuild takes forever and eventually runs out of memory on production

2020-07-24 Thread Sebastian Riemer
Dear mailing list community,

we have troubles when starting the Suggester-Build on one of our production 
servers.


1.   We execute the required query with the suggest.build parameter

2.   It seems solr is taking up the task to recreate the suggester index 
(we see that the CPU rises significantly)

3.   It takes forever to build (and seems to never finish!)

4.   Sometimes  the linux OOM killer strikes and usually picks the solr 
process and kills it
5.   During the rebuild calling the suggester results in "suggester not 
built" exception
6.   Restarting the solr-Service has no effect. It just continues the 
rebuild

How long should it take for that task, given that our index currently holds 
approximately 7,2 Mio  documents in a parent/child structure?
Is it possible, to query the progress of the suggester.build task after it was 
started?
How can we be sure, whether the suggester.build task is still running or 
whether it is finished?

Which factors have the most significant impact on the duration of the rebuild 
process, given that we use the config below? (Let me now, if you need 
additional information)
Can we speed up the process somehow?

Best regards,
Sebastian
Solrconfig.xml

   
   infixSuggester
   AnalyzingInfixLookupFactory
   infix_suggestions
   
   DocumentDictionaryFactory
   SUGGEST
   textSuggest
   false
   false
   


   
   true
   infixSuggester
   true
   500
   true
   
   
   suggest
   



Mit freundlichen Grüßen
Sebastian Riemer, BSc


AW: How to negate numeric range query - or - how to get records NOT matching a certain numeric range

2020-01-27 Thread Sebastian Riemer
Dear community!

It works as suggested, either using

"-u_lastLendingDate_combined_ls_ns:[8610134693 TO 8611935823]"

or

"NOT u_lastLendingDate_combined_ls_ns:[8610134693 TO 8611935823]"

It seems that additional bracketing (as in the next line) does not harm my 
query but I will eliminate it as it's unnecessary and possibly wrong.

"!u_lastLendingDate_combined_ls_ns:([8610134693 TO 8611935823])"]

About the only negated query-parts, thanks I already thought of that! Glad I 
already have a positive part with "u_id_s:[* TO *]"

I am not sure about my initial mistake, either my positioning of the 
negate-keyword was wrong, or I did not spell the "NOT"-keyword in uppercase.

Thank you all for your time and effort,  have a nice day!


Sebastian

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Freitag, 24. Jänner 2020 21:51
An: solr-user@lucene.apache.org
Betreff: Re: How to negate numeric range query - or - how to get records NOT 
matching a certain numeric range

On 1/24/2020 9:04 AM, David Hastings wrote:
> just tried  "fq":"NOT year:[1900 TO 2000]"}}, on my data et and also 
> worked as expected, mind if i ask why:
> (u_lastLendingDate_combined_ls_ns:([8610134693 TO 8611935823]))
> 
> there are ()'s around your range query?

I think David is correct here about the parentheses causing a problem. 
If that query is working without the negation, that's a little odd.  I do know 
the parentheses should not be there.

Purely negative queries in Lucene do not actually work.  The problem with them 
is that if you start with nothing and then subtract something, you end up with 
nothing.

When the query being negated is very simple, Solr is able to detect the problem 
and internally fix it before running the query.  If there is ANY complexity to 
it at all, Solr cannot do this, and it won't work.  It is likely that adding 
parentheses around the range as you have makes the query complex enough that 
this detection doesn't work.

The fully correct way to write a negated version of the query above is:

*:* -u_lastLendingDate_combined_ls_ns:[8610134693 TO 8611935823]

This is a starting point of all documents, subtracting documents where the 
field falls within the specified range.  You could replace the minus sign with 
"AND NOT " for the same effect.

Thanks,
Shawn


How to negate numeric range query - or - how to get records NOT matching a certain numeric range

2020-01-24 Thread Sebastian Riemer
Hi all!

Consider a query containing fq-params like this:

"fq":["tenant_id:1",
"u_markedAsDeleted_b:false",
"u_id_s:[* TO *]",
"(u_lastLendingDate_combined_ls_ns:([8610134693 TO 8611935823]))"]

This gives me a list of users, having a last lending date (somewhat encoded as 
long) in that given numeric range.

Now, I'd like to get a list of users, *NOT* having a last lending in that given 
numeric range.

I've tried adding NOT and ! to the respective fq-query-part without success.


Additional info: the field is of type long (TrieLongField) and it is 
multiValued="true"

An example of the full query-string would be:

start=0=50=tenant_id:1=u_markedAsDeleted_b:false=u_id_s:[* TO 
*]=*:*=true=true=count=1=u_userName_cp_s
 desc=u_userName_cp_s^20 u_displayName_cp_s^20  text^2 text_en text_de 
text_it=u_userName_cp_s^100 u_displayName_cp_s^20  text^10=100%

Thank you for your input and a nice weekend to all of you!

Please let me know if I did not share vital details!

Mit freundlichen Grüßen
Sebastian Riemer, BSc


[logo_Littera_SC]<http://www.littera.eu/>
LITTERA Software & Consulting GmbH
A-6060 Hall i.T., Haller Au 19a
Telefon: +43(0) 50 765 000, Fax: +43(0) 50 765 118
Sitz: Hall i.T., eingetragen beim Handelsgericht Innsbruck,
Firmenbuch-Nr. FN 295807k, geschäftsführender Gesellschafter: Albert 
Unterkircher

D-80637 München, Landshuter Allee 8-10
Telefon: +49(0) 89 919 29 122, Fax: +49(0) 89 919 29 123
Sitz: München, eingetragen beim Amtsgericht München
unter HRB 103698, Geschäftsführer: Albert Unterkircher
E-Mail: off...@littera.eumailto:off...@littera.eu>
Homepage: www.littera.eu<http://www.littera.eu/>

Diese Nachricht kann vertrauliche, nicht für die Veröffentlichung bestimmte 
und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der 
beabsichtigte Empfänger sind, beachten Sie bitte, dass jegliche 
Veröffentlichung, Verbreitung oder Vervielfältigung dieser Nachricht 
strengstens untersagt ist. Sollten Sie diese Nachricht irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender durch Anruf oder Rücksendung 
der Nachricht und vernichten Sie diese.

This communication may contain information that is legally privileged, 
confidential or exempt from disclosure.  If you are not the intended recipient, 
please note that any dissemination, distribution, or copying of this 
communication is strictly prohibited.  Anyone who receives this message in 
error should notify the sender immediately by telephone or by return e-mail and 
delete this communication entirely from his or her computer.



Rename field in all documents from `i_itemNumber_l` to `i_itemNumber_cp_l`

2019-09-16 Thread Sebastian Riemer
Dear mailing list,

I would like to know:

Is there some simple way to rename a field in all documents in my solr index?

I am using a dynamic schema definition, and I've introduced some new 
copyField-instructions. Those make it necessary to reindex all documents. It 
would help me a great deal to be able to rename a specific field from:

`i_itemNumber_l` to `i_itemNumber_cp_l`

I don't really mind to reindex all documents too, but that takes some time and 
having my (old) documents return NULL as value for the field 
`i_itemNumber_cp_l` is breaking a lot of stuff.

So if there _IS_ a way to rename that field, that would help tremendously. Btw. 
I am using Solr 6.5.1 and I use SolrJ in my ApplicationLayer.

Best regards and as always,

Thank you so much for any input!


Yours,
Sebastian

Mit freundlichen Grüßen
Sebastian Riemer, BSc


[logo_Littera_SC]<http://www.littera.eu/>
LITTERA Software & Consulting GmbH
A-6060 Hall i.T., Haller Au 19a
Telefon: +43(0) 50 765 000, Fax: +43(0) 50 765 118
Sitz: Hall i.T., eingetragen beim Handelsgericht Innsbruck,
Firmenbuch-Nr. FN 295807k, geschäftsführender Gesellschafter: Albert 
Unterkircher

D-80637 München, Landshuter Allee 8-10
Telefon: +49(0) 89 919 29 122, Fax: +49(0) 89 919 29 123
Sitz: München, eingetragen beim Amtsgericht München
unter HRB 103698, Geschäftsführer: Albert Unterkircher
E-Mail: off...@littera.eumailto:off...@littera.eu>
Homepage: www.littera.eu<http://www.littera.eu/>

Diese Nachricht kann vertrauliche, nicht für die Veröffentlichung bestimmte 
und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der 
beabsichtigte Empfänger sind, beachten Sie bitte, dass jegliche 
Veröffentlichung, Verbreitung oder Vervielfältigung dieser Nachricht 
strengstens untersagt ist. Sollten Sie diese Nachricht irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender durch Anruf oder Rücksendung 
der Nachricht und vernichten Sie diese.

This communication may contain information that is legally privileged, 
confidential or exempt from disclosure.  If you are not the intended recipient, 
please note that any dissemination, distribution, or copying of this 
communication is strictly prohibited.  Anyone who receives this message in 
error should notify the sender immediately by telephone or by return e-mail and 
delete this communication entirely from his or her computer.



AW: Upgrade 6.2.1 to 7.5.0 - "Connection evictor" Threads not closed

2018-11-30 Thread Sebastian Riemer
Dear Jason,

Thank you for your response! I'm happy to tell you that we did resolve our 
issue (by currently downgrading to 6.5 on both solrj-client and server side).

We were formerly executing queries by creating our HttpSolrClient this way (for 
each Query):

SolrClient client = new HttpSolrClient(urlString);

I'm not sure but I guess this way of creating a client is deprecated now? 
Anyhow a colleague changed this to:

SolrClient client = new HttpSolrClient.Builder(urlString).build();

Again, this is done _for each query_ we execute. 

As we discovered by now (like here 
http://lucene.472066.n3.nabble.com/6-6-gt-7-5-SolrJ-seeing-many-quot-Connection-evictor-quot-Threads-td4410488.html)
 this is not the correct way to do it. Instead we should only create one client 
per core and reuse it.

Interestingly enough, with 6.5 the HttpSolrClient.Builder-way of creating a new 
client for every query seems to not create new threads for each query (or it 
closes It automatically or reuses existing I don't know).

Maybe it was a bad idea in the first place, to create a new client for every 
query.

Anyways, thanks a lot for your answer - we'll definitely have to revisit the 
way we create the HttpSolrClient and check for the recommended way of doing so.

All the best,

Sebastian


-Ursprüngliche Nachricht-
Von: Jason Gerlowski [mailto:gerlowsk...@gmail.com] 
Gesendet: Montag, 26. November 2018 17:55
An: solr-user@lucene.apache.org
Betreff: Re: Upgrade 6.2.1 to 7.5.0 - "Connection evictor" Threads not closed

Hey Sebastian,

As for how Solr/SolrJ compatibility is handled, the story for SolrJ looks a lot 
like the story for Solr itself - major version changes can introduce breaking 
changes, so it is best to avoid using SolrJ 6.x with Solr 7.x.  In practice I 
think changes that break Solr/SolrJ compatibility are relatively rare though, 
so it might be possible if your hand is forced.

As for the behavior you described...I think I understand what you're 
describing, but to make sure:  Are the "connection-evictor" threads 
accumulating in your client application, on the Solr server itself, or both?

I suspect you're seeing this in your client code.  If so, it'd really help us 
to help you if you could provide some more details on how you're using SolrJ.  
Can you share a small snippet (JUnit test?) that reproduces the problem?  How 
are you creating the SolrClient you're using to send requests?  Which 
SolrClient implementation(s) are you using?  Are you providing your own 
HttpClient, or letting SolrClient create its own?  It'll be much easier for 
others to help with a little more detail there.

Best,

Jason

On Fri, Nov 23, 2018 at 10:38 AM Sebastian Riemer  wrote:
>
> Hi,
>
> we've recently changed our Solr-Version from 6.2.1 to 7.5.0, and since then, 
> whenever we execute a query on solr, a new thread is being created and never 
> closed.
>
> These threads are all labelled "Connection evictor" and the gather until a 
> critical mass is reached and either the OS cannot create anymore OS threads, 
> or an out of memory error is being produced.
>
> First I thought, that this might have as cause we were using a higher 
> SolrJ-Version than our Solr-Server (by mistakenly forgetting to uprade the 
> server version too):
>
> So we had for SolrJ: 7.4.0
>
> 
> org.apache.solr
> solr-solrj
> 7.4.0
> 
>
> And for Solr-Server:  6.2.1
>
> But now I just installed the newest Solr-Server-Version 7.5.0 and still I see 
> with each Solr-Search performed an additional Thread being created and never 
> released.
>
> When downgrading SolrJ to 6.2.1 I can verify, that no new threads are created 
> when doing a solr search.
>
> What do you think about this? Are there any known pitfalls? Maybe I missed 
> some crucial changes necessary when upgrading to 7.5.0?
>
> What about differing versions in SolrJ and Solr-Server? As far as I recall 
> the docs, one major-version-difference up/down in both ways should be o.k.
>
> Thanks for all your feedback,
>
> Yours sincerely
>
> Sebastian Riemer


Upgrade 6.2.1 to 7.5.0 - "Connection evictor" Threads not closed

2018-11-23 Thread Sebastian Riemer
Hi,

we've recently changed our Solr-Version from 6.2.1 to 7.5.0, and since then, 
whenever we execute a query on solr, a new thread is being created and never 
closed.

These threads are all labelled "Connection evictor" and the gather until a 
critical mass is reached and either the OS cannot create anymore OS threads, or 
an out of memory error is being produced.

First I thought, that this might have as cause we were using a higher 
SolrJ-Version than our Solr-Server (by mistakenly forgetting to uprade the 
server version too):

So we had for SolrJ: 7.4.0


org.apache.solr
solr-solrj
7.4.0


And for Solr-Server:  6.2.1

But now I just installed the newest Solr-Server-Version 7.5.0 and still I see 
with each Solr-Search performed an additional Thread being created and never 
released.

When downgrading SolrJ to 6.2.1 I can verify, that no new threads are created 
when doing a solr search.

What do you think about this? Are there any known pitfalls? Maybe I missed some 
crucial changes necessary when upgrading to 7.5.0?

What about differing versions in SolrJ and Solr-Server? As far as I recall the 
docs, one major-version-difference up/down in both ways should be o.k.

Thanks for all your feedback,

Yours sincerely

Sebastian Riemer


Storing multiple dates for a doc, filter on a date range and get count of matches within date range

2018-04-26 Thread Sebastian Riemer
Consider this situation,

I've got documents, for which I'll have to store multiple dates, those could be 
access dates for example, or maybe "downloaded at"-dates or something similar.

So, a document might look like this:

{id:"1", name:"apache-solr-ref-guide-7.3.pdf", 
downloaded_at:{"2018-01-01T00:00:00Z", "2018-01-02T00:00:00Z", 
"2018-03-16T00:00:00Z", "2018-03-17T00:00:00Z"}

My question is, how would I write a query which will return all documents 
downloaded between i.e. 2018-02-01 and 2018-04-12 and provides additionally the 
count of downloads within the given date range of this document?

So considering an index consisting only of the example doc provided above, I'd 
like to get a result like this:

result: 1 document, with name:"apache-solr-ref-guide-7.3.pdf", 
count_of_downloads:2 (2, since within the "downloaded_at"-field, two out of the 
four dates lie within the filtered date range)

I've thought of two possible approaches on how I'd need to store information in 
order to be able to execute such a query, but I am not sure any of these would 
actually make such a query possible.

A) store the dates in a multivalued field and see if I can both, filter the 
multivalued field on a date range and somehow, maybe using some function query, 
can also obtain the count of matches within the multivalued field
or B) introduce these dates as nested child documents, filtering on that date 
range and somehow get the number of matching child documents into the result

I guess filtering for that given date range will be easy, but how about getting 
the count of matches within the multivalued field, respectively the count of 
matching child documents into the result?

Best regards, and as always thanks for reading!

Sebastian


AW: Navigation/Paging

2018-03-14 Thread Sebastian Riemer
Dear Shawn,

thank you so much for taking the time for this detailed answer! It helps me 
very much and I'm very grateful.

1) As you've suggested, we already load the data for detail pages from our 
relational db, just using the documentId from Solr to look it up. 
2) Our index size won't ever reach millions of records as it is common in other 
users' scenarios. Having 6 Documents as search result is currently the 
maximum as single client can ever get when not specifying _any_ filter 
criterias. 

-> I'll have to think about whether to prevent the user from deep paging into 
big search results, or just take a possible performance hit (as you've pointed 
out, usually a typical user won't page further than a couple of pages).  The 
same goes for jumping to the very end of a search result. Currently I kind of 
like this feature so I'll try to keep it in.

For retrieving the previous/next documentId if I'm on the start/end of the 
current page, I'll use the approach you (and Rick) suggested -thanks!
 
Best wishes,

Sebastian

-Ursprüngliche Nachricht-
Von: Shawn Heisey [mailto:apa...@elyograg.org] 
Gesendet: Mittwoch, 14. März 2018 00:19
An: solr-user@lucene.apache.org
Betreff: Re: Navigation/Paging

On 3/13/2018 10:26 AM, Sebastian Riemer wrote:
> However, now we want to introduce a similar navigation in our detail views, 
> where only ever one document is displayed. Again, the navigation bar looks 
> like this:
>
> << First   < Prev   1 - 15 of 62181   Next 
> ><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>   
> Last 
> >><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=en>
>
> But now, Prev / Next shall open up the previous / next _document_ instead of 
> the next page. The same goes for First and Last, it shall open the first / 
> last _document_ not the page.
>
> Our first approach to this was to simply add the param "fl=id" so we only get 
> the IDs of documents and set page size to ALL (i.e. no restriction on param 
> "rows"). That way, it was easy to extract the current document id from the 
> result list, and check which id was preceding and succeeding the current id, 
> as well as getting the very first id and the very last id, in order to render 
> the navigation bar.
>
> This lead to solr being heavily under load since it must load 62181 documents 
> (in this example) in order to return the ids. I somehow thought this would be 
> easy for solr to do, but it isn't.

This will indeed be very slow.  And you only have 62181 documents in your 
result set, which is pretty easy for Solr to handle.  For a search that has 100 
million results, this approach is *impossible*.  I do have searches like this 
on my index, and my index is not all that big compared to some of the indexes 
that the community has built.

> Our second approach was, to simply keep the same value for params "start" and 
> "rows" since the user is always selecting a document from the list - thus the 
> selected document already is within the page. However, the edge cases are, 
> the selected document is the very first on the page or the very last one, 
> thus the previous or next document id is not within the page result from solr 
> -> I guess this we could handle by simply checking and sending a second query 
> where the param "start" would be adjusted accordingly.

Detail pages often include information that you do not want to store in Solr.  
A well-tuned Solr install will have responses that contain everything that the 
application needs to build a search result grid, but for really detailed 
information, the application should probably be using the id information 
received from Solr to go to the main data repository and retrieve full details.

Additionally, you should not allow the user to navigate to the last page or to 
navigate to the last document, or even a page/document anywhere near the end of 
the resultset.  The reason for this is that really high start values are a 
serious performance killer.  61K is definitely a start value high enough to see 
performance drops.  If the user tries to page too deeply into results, your 
application should simply refuse to go any further.  For comparison purposes -- 
the last time I checked how deeply Google would let me go into a search result, 
I could get to page 39, but no further.  The number of results for my search 
was MILLIONS, but Google wouldn't let me view them all.  The performance issues 
for deep paging are universal for search engines, especially when it is 
possible to jump to an arbitrary page number.

I recommend limiting how many results a user can page through to about
5000 or 1.  If there are 50 results per page, this allows them to get to at 
least page 99.  In general, most users of search engines will never go deeper 
than 

AW: Navigation/Paging

2018-03-14 Thread Sebastian Riemer
Hi Rick,

thanks for pointing this out - that's the solution I was thinking about too

"... -> I guess this we could handle by 
>simply checking and sending a second query where the param "start"
>would be adjusted accordingly ..."

Just checking if there are other options,

Thanks again!

Sebastian

Sebastien
Can you not just handle this in your Javascript? Your request will always get 
15 rows, start=0 then start=15 and so on. In the details view you only show one 
of the documents of course, and when the user is viewing the last of 15 and  
clicks next, you will request the next 15.
When viewing the first of the 15, click previous, you will request the previous 
15. 
Am I missing something here?
Rick

On March 13, 2018 12:26:18 PM EDT, Sebastian Riemer <s.rie...@littera.eu> wrote:
>Hi,
>
>In our web app, when displaying result lists from solr,  we've 
>successfully introduced paging via the params 'start' and 'rows' and 
>it's working quite well.
>
>Our navigation in list screens look like this:
>
>
><< First   < Prev   1 - 15 of 62181   Next
>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=e
>>n>
>Last
>>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=
>>>en>
>
>One can navigate to the first page, previous page, next page and last 
>page. All is done via adapting the param "start" accordingly by simply 
>adding the page size.
>
>However, now we want to introduce a similar navigation in our detail 
>views, where only ever one document is displayed. Again, the navigation 
>bar looks like this:
>
><< First   < Prev   1 - 15 of 62181   Next
>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=e
>>n>
>Last
>>><http://test.litterare.local:3100/littera/libraries/2/cat/man?locale=
>>>en>
>
>But now, Prev / Next shall open up the previous / next _document_ 
>instead of the next page. The same goes for First and Last, it shall 
>open the first / last _document_ not the page.
>
>Our first approach to this was to simply add the param "fl=id" so we 
>only get the IDs of documents and set page size to ALL (i.e. no 
>restriction on param "rows"). That way, it was easy to extract the 
>current document id from the result list, and check which id was 
>preceding and succeeding the current id, as well as getting the very 
>first id and the very last id, in order to render the navigation bar.
>
>This lead to solr being heavily under load since it must load 62181 
>documents (in this example) in order to return the ids. I somehow 
>thought this would be easy for solr to do, but it isn't.
>
>Our second approach was, to simply keep the same value for params 
>"start" and "rows" since the user is always selecting a document from 
>the list - thus the selected document already is within the page.
>However, the edge cases are, the selected document is the very first on 
>the page or the very last one, thus the previous or next document id is 
>not within the page result from solr -> I guess this we could handle by 
>simply checking and sending a second query where the param "start"
>would be adjusted accordingly.
>
>However I would not know how to retrieve the id of the very first 
>document and the very last document (except for executing separate 
>queries with I guess start=0, rows=1 and start=62181 and rows=1)
>
>TL,DR:
>For any query and a documentId (of which it is known it is within the 
>query result), what is a simple and efficient enough way, to get the 
>following navigational information:
>
>-  Previous document Id
>
>-  Next document id
>
>-  First document id
>
>-  Last document id
>
>Can this sort of requirement be handled within one solr query? Should I 
>user cursorMark in this scenario?
>
>Best regards,
>
>Sebastian

--
Sorry for being brief. Alternate email is rickleir at yahoo dot com 


Navigation/Paging

2018-03-13 Thread Sebastian Riemer
Hi,

In our web app, when displaying result lists from solr,  we've successfully 
introduced paging via the params 'start' and 'rows' and it's working quite well.

Our navigation in list screens look like this:


<< First   < Prev   1 - 15 of 62181   Next 
>   
Last >>

One can navigate to the first page, previous page, next page and last page. All 
is done via adapting the param "start" accordingly by simply adding the page 
size.

However, now we want to introduce a similar navigation in our detail views, 
where only ever one document is displayed. Again, the navigation bar looks like 
this:

<< First   < Prev   1 - 15 of 62181   Next 
>   
Last >>

But now, Prev / Next shall open up the previous / next _document_ instead of 
the next page. The same goes for First and Last, it shall open the first / last 
_document_ not the page.

Our first approach to this was to simply add the param "fl=id" so we only get 
the IDs of documents and set page size to ALL (i.e. no restriction on param 
"rows"). That way, it was easy to extract the current document id from the 
result list, and check which id was preceding and succeeding the current id, as 
well as getting the very first id and the very last id, in order to render the 
navigation bar.

This lead to solr being heavily under load since it must load 62181 documents 
(in this example) in order to return the ids. I somehow thought this would be 
easy for solr to do, but it isn't.

Our second approach was, to simply keep the same value for params "start" and 
"rows" since the user is always selecting a document from the list - thus the 
selected document already is within the page. However, the edge cases are, the 
selected document is the very first on the page or the very last one, thus the 
previous or next document id is not within the page result from solr -> I guess 
this we could handle by simply checking and sending a second query where the 
param "start" would be adjusted accordingly.

However I would not know how to retrieve the id of the very first document and 
the very last document (except for executing separate queries with I guess 
start=0, rows=1 and start=62181 and rows=1)

TL,DR:
For any query and a documentId (of which it is known it is within the query 
result), what is a simple and efficient enough way, to get the following 
navigational information:

-  Previous document Id

-  Next document id

-  First document id

-  Last document id

Can this sort of requirement be handled within one solr query? Should I user 
cursorMark in this scenario?

Best regards,

Sebastian



SolrClient.queryAndStreamResponse - QueryResponse should be used with care

2017-02-22 Thread Sebastian Riemer
Dear solr users,

I am considering to switch from SolrClient.execute to 
SolrClient.queryAndStreamResponse, because I want to display the process of 
query execution.
I've found http://stackoverflow.com/a/15810200/2747410 which seems to be a good 
starting point for me.

However, the docs for SolrClient.queryAndStreamResponse state that:

"*Although this function returns a 'QueryResponse' it should be used with care
   * since it excludes anything that was passed to callback.  Also note that
   * future version may pass even more info to the callback and may not return
  * the results in the QueryResponse."

Since I heavily depend on the QueryResponse-Object in the calling code which 
executes the query, I would really like to keep that as result to work with. 
Does anyone know what exactly is being passed to the callback (and thus is not 
included in the QueryResponse)?

I currently use the following from the QueryResponse-Object:

* getGroupResponse()

* getResults()

* getFacetField()

* getFacetFields()

Best regards,

Sebastian



Atomic updates to increase single field bulk updates?

2017-02-15 Thread Sebastian Riemer
Dear solr users,

when updating documents in bulk (i.e. 40.000 documents at once), and only 
changing the value of a single Boolean-Flag, I currently re-index all whole 
40.000 objects. However, the process of obtaining all relevant information for 
each object from the database is one of relatively high cost.

I now wonder, if in this situation it would be a good idea to implement a 
single-field update routine using atomic updates? In that case, I could skip 
any necessary lookups in the relational database, since the only information 
would be the new value for that Boolean-Flag, and the list of those 40.000 
document ids.

I am aware of the requirements to use atomic updates, but as I understood, 
those would not have a big impact on performance and only a slight increase in 
index size?

What is your opinion on that?

Thanks for your input, have a nice evening!

Sebastian



AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Thanks @Toke,  for pointing out these options. I'll have a read about 
expungeDeletes. 

Sounds even more so, that having solr filter out 0-counts is a good idea and I 
should handle my use-case outside of solr.

Thanks again,
Sebastian

On Fri, 2017-01-13 at 14:19 +, Sebastian Riemer wrote:
> the second search should have been this: http://localhost:8983/solr/w 
> emi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0
> =json
> (or in other words, give me all documents having value "1" for field
> "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the 
> facet.fields result-count list?

Qualified guess (I don't know the JSON faceting code in details):
The list of possible facet values is extracted from the DocValues structure in 
the segment files, without respect to documents marked as deleted. At some 
point you had one or more documents with m_mediaType_s:1, which were later 
deleted.

If your index is not too large, you can verify this by optimizing down to 1 
segment, which will remove all traces of deleted documents (unless the index is 
already 1 segment).

If you cannot live with the false terms, committing with expungeDeletes=true 
should do the trick, although it is likely to make your indexing process a lot 
heavier.

The reason for this inaccuracy is that it is quite heavy to verify whether a 
docvalue is referenced by a document: Each time one or more documents in a 
segment are deleted, all references from all documents in that segment would 
have to be checked to create a correct mapping.
As this only affects mincount=0 combined with your use case where _all_ 
documents with a certain docvalue are deleted, my guess it that it is seen as 
too much of an edge case to handle.
--
Toke Eskildsen, Royal Danish Library



AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Nice, thank you very much for your explanation!

>> Solr returns all fields as facet result where there was some value at 
some time as long as the the documents are somewhere in the index, even when 
they're marked as indexed. So there must have been a document with 
m_mediaType_s=1. Even if all these documents are deleted already, its values 
still appear in the facet result.

I did not know about that! That makes perfect sense. I am quite sure there has 
been a time where that field contained the value "1". Even more, as now where I 
rebuild my index, the value "1" is not present as facet.field result anymore.

I'll think about how to deal with my situation then, maybe it would be better 
to keep solr filtering out 0-count facet-fields and insert the filterquery 
leading to 0 results into the select-dropdown "manually".

-Ursprüngliche Nachricht-
Von: Michael Kuhlmann [mailto:k...@solr.info] 
Gesendet: Freitag, 13. Januar 2017 15:43
An: solr-user@lucene.apache.org
Betreff: Re: FacetField-Result on String-Field contains value with count 0?

Then I don't understand your problem. Solr already does exactly what you want.

Maybe the problem is different: I assume that there never was a value of "1" in 
the index, leading to your confusion.

Solr returns all fields as facet result where there was some value at some time 
as long as the the documents are somewhere in the index, even when they're 
marked as indexed. So there must have been a document with m_mediaType_s=1. 
Even if all these documents are deleted already, its values still appear in the 
facet result.

This holds true until segments get merged so that all deleted documents are 
pruned. So if you send a forceMerge request, chances are good that "1" won't 
come up any more.

-Michael

Am 13.01.2017 um 15:36 schrieb Sebastian Riemer:
> Hi Bill,
>
> Thanks, that's actually where I come from. But I don't want to exclude values 
> leading to a count of zero.
>
> Background to this: A user searched for mediaType "book" which gave him 10 
> results. Now some other task/routine whatever changes all those 10 books to 
> be say 10 ebooks, because the type has been incorrect. The user makes a 
> refresh, still looking for "book" gets 0 results (which is expected) and 
> because we rule out facet.fields having count 0, I don't get back the 
> selected mediaType "book" and thus I cannot select this value in the 
> select-dropdown-filter for the mediaType. This leads to confusion for the 
> user, since he has no results, but doesn't see that it's because of he still 
> has that mediaType-filter set to a value "books" which now actually leads to 
> 0 results.
>
> -Ursprüngliche Nachricht-
> Von: billnb...@gmail.com [mailto:billnb...@gmail.com]
> Gesendet: Freitag, 13. Januar 2017 15:23
> An: solr-user@lucene.apache.org
> Betreff: Re: AW: FacetField-Result on String-Field contains value with count 
> 0?
>
> Set mincount to 1
>
> Bill Bell
> Sent from mobile
>
>
>> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <s.rie...@littera.eu> wrote:
>>
>> Pardon me,
>> the second search should have been this: 
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22
>> t =on=*:*=0=0=json (or in other words, give me all 
>> documents having value "1" for field "m_mediaType_s")
>>
>> Since this search gives zero results, why is it included in the facet.fields 
>> result-count list?
>>
>> 
>>
>> Hi,
>>
>> Please help me understand: 
>> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
>>  returns:
>>
>> "facet_counts":{
>>"facet_queries":{},
>>"facet_fields":{
>>  "m_mediaType_s":[
>>"2",25561,
>>"3",19027,
>>"10",1966,
>>"11",1705,
>>"12",1067,
>>"4",1056,
>>"5",291,
>>"8",68,
>>"13",2,
>>"6",2,
>>"7",1,
>>"9",1,
>>"1",0]},
>>"facet_ranges":{},
>>"facet_intervals":{},
>>"facet_heatmaps":{}}}
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22
>> t
>> =on=*:*=0=0=json
>>
>>
>> ?  "response":{"numFound":25561,"start":0,"docs":[]
>>
>> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22
>> t
>> =on=*:*=0=0=json
>>
>>
>> ?  "response":{"numFound":0,"start":0,"docs":[]
>>
>> So why does the search for facet.field even contain the value "1", if it 
>> does not exist?
>>
>> And why does it e.g. not contain
>> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsInclude
>> I tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
>>
>> Best regards,
>> Sebastian
>>
>> Additional info, field m_mediaType_s is a string;
>> > stored="true" />
>> > />
>>



AW: AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Hi Bill,

Thanks, that's actually where I come from. But I don't want to exclude values 
leading to a count of zero.

Background to this: A user searched for mediaType "book" which gave him 10 
results. Now some other task/routine whatever changes all those 10 books to be 
say 10 ebooks, because the type has been incorrect. The user makes a refresh, 
still looking for "book" gets 0 results (which is expected) and because we rule 
out facet.fields having count 0, I don't get back the selected mediaType "book" 
and thus I cannot select this value in the select-dropdown-filter for the 
mediaType. This leads to confusion for the user, since he has no results, but 
doesn't see that it's because of he still has that mediaType-filter set to a 
value "books" which now actually leads to 0 results.

-Ursprüngliche Nachricht-
Von: billnb...@gmail.com [mailto:billnb...@gmail.com] 
Gesendet: Freitag, 13. Januar 2017 15:23
An: solr-user@lucene.apache.org
Betreff: Re: AW: FacetField-Result on String-Field contains value with count 0?

Set mincount to 1

Bill Bell
Sent from mobile


> On Jan 13, 2017, at 7:19 AM, Sebastian Riemer <s.rie...@littera.eu> wrote:
> 
> Pardon me,
> the second search should have been this: 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22
> =on=*:*=0=0=json (or in other words, give me all 
> documents having value "1" for field "m_mediaType_s")
> 
> Since this search gives zero results, why is it included in the facet.fields 
> result-count list?
> 
> 
> 
> Hi,
> 
> Please help me understand: 
> http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
>  returns:
> 
> "facet_counts":{
>"facet_queries":{},
>"facet_fields":{
>  "m_mediaType_s":[
>"2",25561,
>"3",19027,
>"10",1966,
>"11",1705,
>"12",1067,
>"4",1056,
>"5",291,
>"8",68,
>"13",2,
>"6",2,
>"7",1,
>"9",1,
>"1",0]},
>"facet_ranges":{},
>"facet_intervals":{},
>"facet_heatmaps":{}}}
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22
> =on=*:*=0=0=json
> 
> 
> ?  "response":{"numFound":25561,"start":0,"docs":[]
> 
> http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22
> =on=*:*=0=0=json
> 
> 
> ?  "response":{"numFound":0,"start":0,"docs":[]
> 
> So why does the search for facet.field even contain the value "1", if it does 
> not exist?
> 
> And why does it e.g. not contain 
> "SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeI
> tInTheFacetFieldsResultListAnywaysWithCountZero" : 0
> 
> Best regards,
> Sebastian
> 
> Additional info, field m_mediaType_s is a string;
>  stored="true" />
>  />
> 


AW: FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Pardon me, 
the second search should have been this: 
http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%221%22=on=*:*=0=0=json
 
(or in other words, give me all documents having value "1" for field 
"m_mediaType_s")

Since this search gives zero results, why is it included in the facet.fields 
result-count list?



Hi,

Please help me understand: 
http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
 returns:

"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "m_mediaType_s":[
"2",25561,
"3",19027,
"10",1966,
"11",1705,
"12",1067,
"4",1056,
"5",291,
"8",68,
"13",2,
"6",2,
"7",1,
"9",1,
"1",0]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json


?  "response":{"numFound":25561,"start":0,"docs":[]

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json


?  "response":{"numFound":0,"start":0,"docs":[]

So why does the search for facet.field even contain the value "1", if it does 
not exist?

And why does it e.g. not contain 
"SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
 : 0

Best regards,
Sebastian

Additional info, field m_mediaType_s is a string;





FacetField-Result on String-Field contains value with count 0?

2017-01-13 Thread Sebastian Riemer
Hi,

Please help me understand: 
http://localhost:8983/solr/wemi/select?facet.field=m_mediaType_s=on=on=*:*=json
 returns:

"facet_counts":{
"facet_queries":{},
"facet_fields":{
  "m_mediaType_s":[
"2",25561,
"3",19027,
"10",1966,
"11",1705,
"12",1067,
"4",1056,
"5",291,
"8",68,
"13",2,
"6",2,
"7",1,
"9",1,
"1",0]},
"facet_ranges":{},
"facet_intervals":{},
"facet_heatmaps":{}}}

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%222%22=on=*:*=0=0=json


?  "response":{"numFound":25561,"start":0,"docs":[]

http://localhost:8983/solr/wemi/select?fq=m_mediaType_s:%220%22=on=*:*=0=0=json


?  "response":{"numFound":0,"start":0,"docs":[]

So why does the search for facet.field even contain the value "1", if it does 
not exist?

And why does it e.g. not contain 
"SomeReallyCrazyOtherValueWhichLikeValue"1"DoesNotExistButLetsIncludeItInTheFacetFieldsResultListAnywaysWithCountZero"
 : 0

Best regards,
Sebastian

Additional info, field m_mediaType_s is a string;





AW: Search for ISBN-like identifiers

2017-01-05 Thread Sebastian Riemer
Thank you very much for taking the time to help me!

I'll definitely have a look at the link you've posted.

@ShawnHeisey Thanks too for shedding light on the wildcard behaviour!

Allow me one further question:
- Assuming that I define a separate field for storing the ISBNs, using the 
awesome analyzer provider by Mr. Bill Dueber. How do I get that field copied 
into my general text field, which is used by my QuickSearch-Input? Won't that 
field be processed again by the analyser defined on the text field?
- Should I alternatively add more fields to the q-Parameter? As for now, I 
always have set q=text: but I guess one could 
try something like 
q=text:+isbnspeciallookupfield:

I don't really know about that last idea though, since the searches are 
propably OR-combined which is not what I like to have.

Third option would be, to pre-process the distinction to where to look at in 
the solr in my application of course. I.e. everything being a regex containing 
only numbers and hyphens with length 13 -> don't query on field text, instead 
use field isbnspeciallookupfield


Many thanks again, and have a nice day!
Sebastian


-Ursprüngliche Nachricht-
Von: Erik Hatcher [mailto:erik.hatc...@gmail.com] 
Gesendet: Donnerstag, 5. Januar 2017 19:10
An: solr-user@lucene.apache.org
Betreff: Re: Search for ISBN-like identifiers

Sebastian -

There’s some precedent out there for ISBN’s.  Bill Dueber and the 
UMICH/code4lib folks have done amazing work, check it out here -

https://github.com/mlibrary/umich_solr_library_filters 
<https://github.com/mlibrary/umich_solr_library_filters>

  - Erik


> On Jan 5, 2017, at 5:08 AM, Sebastian Riemer <s.rie...@littera.eu> wrote:
> 
> Hi folks,
> 
> 
> TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general text 
> field, respectively configure the analyser on that field, so that a search 
> for the hyphenated ISBN returns exactly the matching document?
> 
> Long version:
> I've defined a field "text" of type "text_general", where I copy all 
> my other fields to, to be able to do a "quick search" where I set 
> q=text
> 
> The definition of the type text_general is like this:
> 
> 
> 
>  positionIncrementGap="100">
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
>
> 
>  
> 
>  
> 
>
> 
> words="stopwords.txt" />
> 
> synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
> 
>
> 
>  
> 
>
> 
> 
> I now face the problem, that searching for a book with 
> text:978-3-8052-5094-8* does not return the single result I expect. 
> However searching for text:9783805250948* instead returns a result. 
> Note, that I am adding a wildcard at the end automatically, to further 
> broaden the resultset. Note also, that it does not seem to matter 
> whether I put backslashes in front of the hyphen or not (to be exact, 
> when sending via SolrJ from my application, I put in the backslashes, 
> but I don't see a difference when using SolrAdmin as I guess SolrAdmin 
> automatically inserts backslashes if needed?)
> 
> When storing ISBNs, I do store them twice, once with hyphens 
> (978-3-8052-5094-8) and once without (9783805250948). A pure phrase search on 
> both those values return also the single document.
> 
> I learned that the StandardTokenizer splits up values from fields at index 
> time, and I've also learned that I can use the solrAdmin analysis and the 
> debugQuery to help understand what is going on. From the analysis screen I 
> see, that given the value 9783805250948 at index-time and 9783805250948* 
> query-time both leads to an unchanged value 9783805250948 at the end.
> When given the value 978-3-8052-5094-8 for "Field Value (Index)" and 
> 978-3-8052-5094-8* for "Field Value (Query)"  I can see how the ISBN is 
> tokenized into 5 parts. Again, the values match on both sides (Index and 
> Query).
> 
> How does the left side correlate with the right side? My guess: The left side 
> means, "Values stored in field text will be tokenized while indexing as show 
> here on the left". The right side means, "When querying on the field text, 
> I'll tokenize the entered value like this, and see if I find something on the 
> index" Is this correct?
> 
> Another question: when querying and investigating the single document in 
> solrAdmin, the contents I see In the column text represents the _stored_ 
> value of the field text, right?
> And am I correct that this actually has nothing to do, with what is actually 
> stored in  the index for searching?
> 
> When storing the value 978-3-805

Search for ISBN-like identifiers

2017-01-05 Thread Sebastian Riemer
Hi folks,


TL;DR: Is there an easy way, to copy ISBNs with hyphens to the general text 
field, respectively configure the analyser on that field, so that a search for 
the hyphenated ISBN returns exactly the matching document?

Long version:
I've defined a field "text" of type "text_general", where I copy all my other 
fields to, to be able to do a "quick search" where I set q=text

The definition of the type text_general is like this:





  







  

  









  




I now face the problem, that searching for a book with text:978-3-8052-5094-8* 
does not return the single result I expect. However searching for 
text:9783805250948* instead returns a result. Note, that I am adding a wildcard 
at the end automatically, to further broaden the resultset. Note also, that it 
does not seem to matter whether I put backslashes in front of the hyphen or not 
(to be exact, when sending via SolrJ from my application, I put in the 
backslashes, but I don't see a difference when using SolrAdmin as I guess 
SolrAdmin automatically inserts backslashes if needed?)

When storing ISBNs, I do store them twice, once with hyphens 
(978-3-8052-5094-8) and once without (9783805250948). A pure phrase search on 
both those values return also the single document.

I learned that the StandardTokenizer splits up values from fields at index 
time, and I've also learned that I can use the solrAdmin analysis and the 
debugQuery to help understand what is going on. From the analysis screen I see, 
that given the value 9783805250948 at index-time and 9783805250948* query-time 
both leads to an unchanged value 9783805250948 at the end.
When given the value 978-3-8052-5094-8 for "Field Value (Index)" and 
978-3-8052-5094-8* for "Field Value (Query)"  I can see how the ISBN is 
tokenized into 5 parts. Again, the values match on both sides (Index and Query).

How does the left side correlate with the right side? My guess: The left side 
means, "Values stored in field text will be tokenized while indexing as show 
here on the left". The right side means, "When querying on the field text, I'll 
tokenize the entered value like this, and see if I find something on the index" 
Is this correct?

Another question: when querying and investigating the single document in 
solrAdmin, the contents I see In the column text represents the _stored_ value 
of the field text, right?
And am I correct that this actually has nothing to do, with what is actually 
stored in  the index for searching?

When storing the value 978-3-8052-5094-8, are only the tokenized values stored 
for search, or is the "whole word" also stored? Is there a way to actually see 
all the values which are stored for search?
When searching text:" 978-3-8052-5094-8" I get the single result, so I guess 
the value as a whole must also be stored in the index for searching?

One more thing which confuses me:
Searching for text: 978-3-8052-5094-8 gives me 72 results, because it leads to 
searching for "parsedquery_toString":"text:978 text:3 text:8052 text:5094 
text:8",
but searching for text: 978-3-8052-5094-8* gives me 0 results, this leads to 
"parsedquery_toString":"text:978-3-8052-5094-8*",

Why is the appended wildcard changing the behaviour so radically? I'd rather 
expect to get something like "parsedquery_toString":"text:978 text:3 text:8052 
text:5094 text:8*",  and thus even more results.

Btw. I've found and read an interesting blog about storing ISBNs and alikes 
here: 
http://robotlibrarian.billdueber.com/2012/03/solr-field-type-for-numericish-ids/
 However, I already store my ISBN also in a separate field, of type string, 
which works fine when I use this field for searching.

Best regards, sorry for the enormously long question and thank you for 
listening.

Sebastian


Easy way to preserve Solr Admin form input

2016-12-27 Thread Sebastian Riemer
Hi,

is there an easy way to preserve the query data I input in SolrAdmin?

E.g. when debugging a query, I often have the desire to reopen the current 
query in solrAdmin in a new browser tab to make slight adaptions to the query 
without losing the original query.  What happens instead is the form is opened 
blank in the new tab and I have to manually copy/paste the entered form values.

This is not such a big problem, when I only use the "Raw Query Parameters" 
field, but editing something in that tiny input is a real pain ...

I wonder how others come around this?

Sebastian



SolrJ bulk indexing documents - HttpSolrClient vs. ConcurrentUpdateSolrClient

2016-11-18 Thread Sebastian Riemer
Hi all,

I am looking to improve indexing speed when loading many documents as part of 
an import. I am using the SolrJ-Client and currently I add the documents 
one-by-one using HttpSolrClient and  its method add(SolrInputDocument doc, int 
commitWithinMs).

My first step would be to change that to use add(Collection 
docs, int commitWithinMs) instead, which I expect would already improve 
performance.
Does it matter which method I use? Beside the method taking a 
Collection there is also one that takes an 
Iterator ... and what about ConcurrentUpdateSolrClient? 
Should I use it for bulk indexing instead of HttpSolrClient?

Currently we are on version 5.5.0 of solr, and we don't run SolrCloud, i.e. 
only one instance etc.
Indexing 39657 documents (which result in a core size of appr. 127MB) took 
about 10 minutes with the one-by-one approach.

Best regards and thanks for any suggestions,

Sebastian Riemer



AW: AW: group.facet=true and facet on field of type int -> org.apache.solr.common.SolrException: Exception during facet.field

2016-07-19 Thread Sebastian Riemer
Hi Tomás!

Many thanks for responding - I agree, I'd say 
https://issues.apache.org/jira/browse/SOLR-7495 is definitely  the same issue. 
I am working around that issue by using a STR-Field and copyField.

Thanks again,

Sebastian

-Ursprüngliche Nachricht-
Von: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com] 
Gesendet: Dienstag, 19. Juli 2016 12:59
An: solr-user@lucene.apache.org
Betreff: Re: AW: group.facet=true and facet on field of type int -> 
org.apache.solr.common.SolrException: Exception during facet.field

Hi Sebastian,
This looks like https://issues.apache.org/jira/browse/SOLR-7495

On Jul 19, 2016 3:46 AM, "Sebastian Riemer" <s.rie...@littera.eu> wrote:

> May I respectfully refer again to a question I posted last week?
>
> Thank you very much and a nice day to you all!
>
> Sebastian
> -
>
>
>
>
>
>
>
> Hi all,
>
> Tested on Solr 6.1.0 (as well as 5.4.0 and 5.5.0) using the "techproducts"
> example the following query throws the same exception as in my 
> original
> question:
>
> To reproduce:
> 1) set up the techproducts example:
> solr start -e techproducts -noprompt
> 2) go to Solr Admin:
> http://localhost:8983/solr/#/techproducts/query
> 3) in "Raw Query Parameters" enter:
>
> group=true=true=true=manu_id_s
> acet=true=popularity
> 4) Hit "Execute Query"
>
> [..]
> "error":{
> "metadata":[
>   "error-class","org.apache.solr.common.SolrException",
>   "root-error-class","java.lang.IllegalStateException"],
> "msg":"Exception during facet.field: popularity",
> "trace":"org.apache.solr.common.SolrException: Exception during
> facet.field: popularity\r\n\tat
> org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$50(Sim
> pleFacets.java:739)\r\n\tat 
> org.apache.solr.request.SimpleFacets$$Lambda$37/2022187546.call(Unknow
> n
> Source)\r\n\tat
> java.util.concurrent.FutureTask.run(FutureTask.java:266)\r\n\tat
> org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:672)\
> r\n\tat 
> org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.
> java:748)\r\n\tat 
> org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetC
> omponent.java:321)\r\n\tat 
> org.apache.solr.handler.component.FacetComponent.process(FacetComponen
> t.java:265)\r\n\tat 
> org.apache.solr.handler.component.SearchHandler.handleRequestBody(Sear
> chHandler.java:293)\r\n\tat 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandle
> rBase.java:156)\r\n\tat 
> org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)\r\n\tat
> org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)\r\
> n\tat 
> org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)\r\n\t
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:257)\r\n\tat 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter
> .java:208)\r\n\tat 
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletH
> andler.java:1668)\r\n\tat 
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:
> 581)\r\n\tat 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> va:143)\r\n\tat 
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java
> :548)\r\n\tat 
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandle
> r.java:226)\r\n\tat 
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandle
> r.java:1160)\r\n\tat 
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:5
> 11)\r\n\tat 
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler
> .java:185)\r\n\tat 
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler
> .java:1092)\r\n\tat 
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.ja
> va:141)\r\n\tat 
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(Conte
> xtHandlerCollection.java:213)\r\n\tat
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerColle
> ction.java:119)\r\n\tat 
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.
> java:134)\r\n\tat 
> org.eclipse.jetty.server.Server.handle(Server.java:518)\r\n\tat
> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)\r\n\
> tat 
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java
> :244)\r\n\tat 
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(Abstr

AW: group.facet=true and facet on field of type int -> org.apache.solr.common.SolrException: Exception during facet.field

2016-07-19 Thread Sebastian Riemer
earcher.search(IndexSearcher.java:660)\r\n\tat 
org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:473)\r\n\tat 
org.apache.solr.request.SimpleFacets.getGroupedCounts(SimpleFacets.java:638)\r\n\tat
 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:443)\r\n\tat
 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:380)\r\n\tat
 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$50(SimpleFacets.java:733)\r\n\t...
 37 more\r\n",
"code":500}}


Could anyone please explain if this is expected behaviour, point out what I am 
doing wrong, or confirm that this is not expected behaviour?

Many thanks and best regards,
Sebastian

-Ursprüngliche Nachricht-
Von: Sebastian Riemer [mailto:s.rie...@littera.eu] 
Gesendet: Freitag, 8. Juli 2016 14:55
An: solr-user@lucene.apache.org
Betreff: group.facet=true and facet on field of type int -> 
org.apache.solr.common.SolrException: Exception during facet.field

Hi all,

are there any limitations in regard to retrieval of facet information, when 
grouping?

When I send the following query where the field to facet ("m_pt_14_s_ns") on is 
of type "string", everything works fine.

"q": "*:*",
"facet.field": "m_pt_14_s_ns",
"indent": "true",
"group.facet": "true",
"fq": "m_id_l:[* TO *]",
"wt": "json",
"facet": "true",
"group.field": "m_id_l",
"group": "true",

 [..]
"facet_fields": {
  "m_pt_14_s_ns": [
"zahlr. Ill.",
7,
"Ill.",
4,
"Ill., graph. Darst.",
2,
"zahlr. Ill. (z.T. farb.))",
   2,
"überw. Ill.",
1,
"zahlr. Cartoons jetzt durchgehend zweifarbig illustriert",
1,
"zahlr. Ill., Kt.",
1,
"zahlr. Ill., graph. Darst.",
1,
"überw. Ill.",
1
  ]
},

When I try the same with a field of type “int” (m_pt_27_i_ns), I get the 
following exception:

"q": "*:*",
"facet.field": "m_pt_27_i_ns",
"indent": "true",
"group.facet": "true",
"fq": "m_id_l:[* TO *]",
"wt": "json",
"facet": "true",
"group.field": "m_id_l",
"group": "true",

 [..]
 "error": {
"metadata": [
  "error-class",
  "org.apache.solr.common.SolrException",
  "root-error-class",
  "java.lang.IllegalStateException"
],
"msg": "Exception during facet.field: m_pt_27_i_ns",
"trace": "org.apache.solr.common.SolrException: Exception during 
facet.field: m_pt_27_i_ns\r\n\tat 
org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:700)\r\n\tat 
org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:685)\r\n\tat 
java.util.concurrent.FutureTask.run(FutureTask.java:266)\r\n\tat 
org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:639)\r\n\tat 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:710)\r\n\tat
 
org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)\r\n\tat
 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)\r\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)\r\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\r\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)\r\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670)\r\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)\r\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)\r\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\r\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\r\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\r\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScop

AW: group.facet=true and facet on field of type int -> org.apache.solr.common.SolrException: Exception during facet.field

2016-07-12 Thread Sebastian Riemer
upedCounts(SimpleFacets.java:638)\r\n\tat
 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:443)\r\n\tat
 
org.apache.solr.request.SimpleFacets.getTermCounts(SimpleFacets.java:380)\r\n\tat
 
org.apache.solr.request.SimpleFacets.lambda$getFacetFieldCounts$50(SimpleFacets.java:733)\r\n\t...
 37 more\r\n",
"code":500}}


Could anyone please explain if this is expected behaviour, point out what I am 
doing wrong, or confirm that this is not expected behaviour?

Many thanks and best regards,
Sebastian

-Ursprüngliche Nachricht-
Von: Sebastian Riemer [mailto:s.rie...@littera.eu] 
Gesendet: Freitag, 8. Juli 2016 14:55
An: solr-user@lucene.apache.org
Betreff: group.facet=true and facet on field of type int -> 
org.apache.solr.common.SolrException: Exception during facet.field

Hi all,

are there any limitations in regard to retrieval of facet information, when 
grouping?

When I send the following query where the field to facet ("m_pt_14_s_ns") on is 
of type "string", everything works fine.

"q": "*:*",
"facet.field": "m_pt_14_s_ns",
"indent": "true",
"group.facet": "true",
"fq": "m_id_l:[* TO *]",
"wt": "json",
"facet": "true",
"group.field": "m_id_l",
"group": "true",

 [..]
"facet_fields": {
  "m_pt_14_s_ns": [
"zahlr. Ill.",
7,
"Ill.",
4,
"Ill., graph. Darst.",
2,
"zahlr. Ill. (z.T. farb.))",
   2,
"überw. Ill.",
1,
"zahlr. Cartoons jetzt durchgehend zweifarbig illustriert",
1,
"zahlr. Ill., Kt.",
1,
"zahlr. Ill., graph. Darst.",
1,
"überw. Ill.",
1
  ]
},

When I try the same with a field of type “int” (m_pt_27_i_ns), I get the 
following exception:

"q": "*:*",
"facet.field": "m_pt_27_i_ns",
"indent": "true",
"group.facet": "true",
"fq": "m_id_l:[* TO *]",
"wt": "json",
"facet": "true",
"group.field": "m_id_l",
"group": "true",

 [..]
 "error": {
"metadata": [
  "error-class",
  "org.apache.solr.common.SolrException",
  "root-error-class",
  "java.lang.IllegalStateException"
],
"msg": "Exception during facet.field: m_pt_27_i_ns",
"trace": "org.apache.solr.common.SolrException: Exception during 
facet.field: m_pt_27_i_ns\r\n\tat 
org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:700)\r\n\tat 
org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:685)\r\n\tat 
java.util.concurrent.FutureTask.run(FutureTask.java:266)\r\n\tat 
org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:639)\r\n\tat 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:710)\r\n\tat
 
org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)\r\n\tat
 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)\r\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)\r\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\r\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)\r\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670)\r\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)\r\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)\r\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\r\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\r\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\r\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\r\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandl

group.facet=true and facet on field of type int -> org.apache.solr.common.SolrException: Exception during facet.field

2016-07-08 Thread Sebastian Riemer
Hi all,

are there any limitations in regard to retrieval of facet information, when 
grouping?

When I send the following query where the field to facet ("m_pt_14_s_ns") on is 
of type "string", everything works fine.

"q": "*:*",
"facet.field": "m_pt_14_s_ns",
"indent": "true",
"group.facet": "true",
"fq": "m_id_l:[* TO *]",
"wt": "json",
"facet": "true",
"group.field": "m_id_l",
"group": "true",

 [..]
"facet_fields": {
  "m_pt_14_s_ns": [
"zahlr. Ill.",
7,
"Ill.",
4,
"Ill., graph. Darst.",
2,
"zahlr. Ill. (z.T. farb.))",
   2,
"überw. Ill.",
1,
"zahlr. Cartoons jetzt durchgehend zweifarbig illustriert",
1,
"zahlr. Ill., Kt.",
1,
"zahlr. Ill., graph. Darst.",
1,
"überw. Ill.",
1
  ]
},

When I try the same with a field of type “int” (m_pt_27_i_ns), I get the 
following exception:

"q": "*:*",
"facet.field": "m_pt_27_i_ns",
"indent": "true",
"group.facet": "true",
"fq": "m_id_l:[* TO *]",
"wt": "json",
"facet": "true",
"group.field": "m_id_l",
"group": "true",

 [..]
 "error": {
"metadata": [
  "error-class",
  "org.apache.solr.common.SolrException",
  "root-error-class",
  "java.lang.IllegalStateException"
],
"msg": "Exception during facet.field: m_pt_27_i_ns",
"trace": "org.apache.solr.common.SolrException: Exception during 
facet.field: m_pt_27_i_ns\r\n\tat 
org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:700)\r\n\tat 
org.apache.solr.request.SimpleFacets$3.call(SimpleFacets.java:685)\r\n\tat 
java.util.concurrent.FutureTask.run(FutureTask.java:266)\r\n\tat 
org.apache.solr.request.SimpleFacets$2.execute(SimpleFacets.java:639)\r\n\tat 
org.apache.solr.request.SimpleFacets.getFacetFieldCounts(SimpleFacets.java:710)\r\n\tat
 
org.apache.solr.handler.component.FacetComponent.getFacetCounts(FacetComponent.java:294)\r\n\tat
 
org.apache.solr.handler.component.FacetComponent.process(FacetComponent.java:256)\r\n\tat
 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:272)\r\n\tat
 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:155)\r\n\tat
 org.apache.solr.core.SolrCore.execute(SolrCore.java:2082)\r\n\tat 
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:670)\r\n\tat 
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:458)\r\n\tat 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:225)\r\n\tat
 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:183)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1652)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:585)\r\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)\r\n\tat
 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:577)\r\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:223)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1127)\r\n\tat
 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:515)\r\n\tat
 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1061)\r\n\tat
 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat
 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:215)\r\n\tat
 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:110)\r\n\tat
 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:97)\r\n\tat
 org.eclipse.jetty.server.Server.handle(Server.java:499)\r\n\tat 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:310)\r\n\tat 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:257)\r\n\tat
 
org.eclipse.jetty.io.AbstractConnection$2.run(AbstractConnection.java:540)\r\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:635)\r\n\tat
 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:555)\r\n\tat
 java.lang.Thread.run(Thread.java:745)\r\nCaused by: 
java.lang.IllegalStateException: unexpected docvalues type NUMERIC for field 
'm_pt_27_i_ns' (expected=SORTED). Use UninvertingReader or index with 
docvalues.\r\n\tat 
org.apache.lucene.index.DocValues.checkField(DocValues.java:208)\r\n\tat 
org.apache.lucene.index.DocValues.getSorted(DocValues.java:264)\r\n\tat 
org.apache.lucene.search.grouping.term.TermGroupFacetCollector$SV.doSetNextReader(TermGroupFacetCollector.java:129)\r\n\tat
 
org.apache.lucene.search.SimpleCollector.getLeafCollector(SimpleCollector.java:33)\r\n\tat
 
org.apache.solr.request.SimpleFacets$1.getLeafCollector(SimpleFacets.java:601)\r\n\tat

"Block join faceting is allowed with ToParentBlockJoinQuery only"

2016-07-06 Thread Sebastian Riemer
Hi,

Please consider the following three queries:


(1)this works:

{
"responseHeader": {
"status": 0,
"QTime": 5,
"params": {
  "q": "(type_s:wemi AND {!parent which='type_s:wemi'v='-type_s:wemi AND 
(((text:(Moby*'})",
  "facet.field": "m_mainAuthority_s",
  "indent": "true",
  "fq": "m_id_l:[* TO *]",
  "wt": "json",
  "facet": "true",
  "child.facet.field": [
"corporateBodyContainer_name_t_ns_fac",
"personContainer_name_t_ns_fac"
  ],
  "_": "1467801413472"
}
  },

(2)this also works:

"responseHeader": {

"status": 0,

"QTime": 0,

"params": {

  "q": "(((text:(Moby*(type_s:wemi AND {!parent 
which='type_s:wemi'v='-type_s:wemi AND (((text:(Moby*'})",

  "indent": "true",

  "fq": "m_id_l:[* TO *]",

  "wt": "json",

  "_": "1467801481986"

}

  },



(3)this does not:

{

  "responseHeader": {

"status": 400,

"QTime": 3,

"params": {

  "q": "(((text:(Moby*(type_s:wemi AND {!parent 
which='type_s:wemi'v='-type_s:wemi AND (((text:(Moby*'})",

  "facet.field": "m_mainAuthority_s",

  "indent": "true",

  "fq": "m_id_l:[* TO *]",

  "wt": "json",

  "facet": "true",

  "child.facet.field": [

"corporateBodyContainer_name_t_ns_fac",

"personContainer_name_t_ns_fac"

  ],

  "_": "1467801452826"

}

  },


(1)returns me parent documents where the child document contains the term 
"Moby*" including facets on a parent doc field AND facets on child doc fields 
(Nice!)

(2)returns me parent documents where either the parent document or the 
child document contains the term "Moby*" (Hell yea!)

(3)Fails with the error message "Block join faceting is allowed with 
ToParentBlockJoinQuery only" (Nay :()

So, I want both, the possibility to search for a term in all fields of the 
parent and the child docs AND to receive the facet counts for fields of the 
parent AND the child. Is what I long for possible, and if so could you please 
punch me in the right direction?

Many thanks,
Sebastian


How to best serialize/deserialize a SolrInputDocument?

2016-06-30 Thread Sebastian Riemer
Hi,

I am looking for a way to serialize a SolrInputDocument.

I want to store the serialized document in a MySQL table.

Later I want to deserialize that document and send it to the Solr server.

Currently I am looking at org.apache.solr.client.solrj.request.UpdateRequest 
and JavaBinUpdateRequestCodec. There are two methods, marshal and unmarshal 
which look like I could use for that purpose.

I'd simply create an UpdateRequest, add the document to it, call marshal, save 
the OutputStream somehow in the MySQL table. When retrieving I pass the value 
from the MySQL as InputStream to the unmarshal method, get my UpdateRequest 
object, iterate the contained SolrInputDocument and send it to the server.

Am I on the right track, or is there a better approach?

The background to this is, that we want backup the generated documents which 
are indexed with solr. So if a client restores a backup, that MySQL table with 
the serialized documents can be used to rebuild the index as quickly as 
possible.

Thanks,
Sebastian




AW: How many cores is too many cores?

2016-06-21 Thread Sebastian Riemer
Thanks for your respone Erick!

Currently we are trying to keep things simple so we don't use SolrCloud.

I'll give it a look, configuration seems easy, however testing with many 
clients in parallel seems not so much.

Thanks again,
Sebastian

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Dienstag, 21. Juni 2016 01:52
An: solr-user <solr-user@lucene.apache.org>
Betreff: Re: How many cores is too many cores?

Sebastian:

It Depends (tm). Solr can handle this, but there are caveats. Is this SolrCloud 
or not? Each core will consume some resources and there are some JIRAs out 
there about specifically that many cores in SolrCloud.
If your problem space works with the LotsOfCores, start here:
https://cwiki.apache.org/confluence/display/solr/Format+of+solr.xml
and
https://cwiki.apache.org/confluence/display/solr/Defining+core.properties
The idea is that if your access pattern is
> sign on
> ask some questions
> go away
you can configure that only N cores are loaded at any one time.
Theoretically you can have a huge number of cores (I've tested with
15,000) defined, but only say 100 active at a time.

There are also options you can specify that cause a core to not be loaded until 
requested, but not aged out.

The 1,500 core case will keep Solr from coming up until all of the cores have 
been opened, which can be lengthy. But you can define the number of threads 
that are running in parallel to open the cores
but the default is unlimited so you can run out of threads (really memory).

So the real answer is "it's not insane, but you really need to test it 
operationally and tweak a bunch of settings before making your decision"

Best,
Erick

On Mon, Jun 20, 2016 at 12:49 PM, Sebastian Riemer <s.rie...@littera.eu> wrote:
> Hi,
>
> Currently I have a single solr server handling 5 cores which differ in the 
> content they provide.
>
> However, each of them might hold data for many different clients/customers. 
> Let's say for example one day there might be 300 different clients each 
> storing their data in those 5 cores.
>
> Every client can make backups of his data and import that data back into our 
> system. That however, makes re-indexing all of his documents in the cores 
> necessary, which A) is very slow at the moment since fetching the data from 
> MySQL-DB is slow and B) would slow down searches for all other clients while 
> the reindexing is taking place, right?
>
> Now my idea would be:
>
> What if each client gets his own 5 cores? Then instead of re-indexing I could 
> simply copy back the solr-index files (which I copied while making the 
> backup) into his core-directories, right?
>
> That would lead to about 5 x 300 cores, equals 1500 cores.
>
> Am I insane by thinking that way?
>
> Best regards,
> Sebastian
>


How many cores is too many cores?

2016-06-20 Thread Sebastian Riemer
Hi,

Currently I have a single solr server handling 5 cores which differ in the 
content they provide.

However, each of them might hold data for many different clients/customers. 
Let's say for example one day there might be 300 different clients each storing 
their data in those 5 cores.

Every client can make backups of his data and import that data back into our 
system. That however, makes re-indexing all of his documents in the cores 
necessary, which A) is very slow at the moment since fetching the data from 
MySQL-DB is slow and B) would slow down searches for all other clients while 
the reindexing is taking place, right?

Now my idea would be:

What if each client gets his own 5 cores? Then instead of re-indexing I could 
simply copy back the solr-index files (which I copied while making the backup) 
into his core-directories, right?

That would lead to about 5 x 300 cores, equals 1500 cores.

Am I insane by thinking that way?

Best regards,
Sebastian



Get documents having a boolean field:false or not having the field at all

2016-05-19 Thread Sebastian Riemer
Hi,

I've introduced a new boolean field "is_deleted_b_ns" on my objects which I 
index with Solr. I am using dynamic field definitions ("b" indicating Boolean, 
"ns" for "not stored").

Since the field did not exist while the index was built, none of my documents 
currently has that field indexed.

My queries from now on must always include this new boolean field: either they 
ask the index is_deleted_b_ns:false or is_deleted_b_ns:true. However since the 
field is not yet indexed both queries return 0 results.

I see two ways I could go from here:

1)  Either rebuild the whole index so that all documents index this newly 
added field as well (time consuming) and then above queries will return the 
expected results.

2)  I think I could ease my query in the way of OR combining: 
is_deleted_b_ns:false with -is_deleted_b_ns:[* TO *]

That would mean "give me the documents where the flag is false or where it does 
not exist at all"

Doing 1) is ok for now since this is a big change and we're not in production 
yet. Doing 2) feels kind of bad since I don't know if it's a big performance 
hit. Also I don't like it since it seems like I react to the current state of 
the index in my program code - someday the index will be up to date again and 
then I'd have this broader query logic in my program which is not needed 
anymore.

However 1) will be a problem when we are in production someday. Sure, we won't 
have changes that big all time to the index schema but one never knows.

What's your opinion on this? May be there is another option as well?

Best regards,
Sebastian

Mit freundlichen Grüßen
Sebastian Riemer, BSc


[logo_Littera_SC]<http://www.littera.eu/>
LITTERA Software & Consulting GmbH
A-6060 Hall i.T., Haller Au 19a
Telefon: +43(0) 50 765 000, Fax: +43(0) 50 765 118
Sitz: Hall i.T., eingetragen beim Handelsgericht Innsbruck,
Firmenbuch-Nr. FN 295807k, geschäftsführender Gesellschafter: Albert 
Unterkircher

D-80637 München, Landshuter Allee 8-10
Telefon: +49(0) 89 919 29 122, Fax: +49(0) 89 919 29 123
Sitz: München, eingetragen beim Amtsgericht München
unter HRB 103698, Geschäftsführer: Albert Unterkircher
E-Mail: off...@littera.eu<blocked::mailto:off...@littera.eu>
Homepage: www.littera.eu<http://www.littera.eu/>

Diese Nachricht kann vertrauliche, nicht für die Veröffentlichung bestimmte 
und/oder rechtlich geschützte Informationen enthalten. Falls Sie nicht der 
beabsichtigte Empfänger sind, beachten Sie bitte, dass jegliche 
Veröffentlichung, Verbreitung oder Vervielfältigung dieser Nachricht 
strengstens untersagt ist. Sollten Sie diese Nachricht irrtümlich erhalten 
haben, informieren Sie bitte sofort den Absender durch Anruf oder Rücksendung 
der Nachricht und vernichten Sie diese.

This communication may contain information that is legally privileged, 
confidential or exempt from disclosure.  If you are not the intended recipient, 
please note that any dissemination, distribution, or copying of this 
communication is strictly prohibited.  Anyone who receives this message in 
error should notify the sender immediately by telephone or by return e-mail and 
delete this communication entirely from his or her computer.



AW: How to find out if index contains orphaned child documents

2016-05-10 Thread Sebastian Riemer
Sorry for the double post. Formatting got lost too :(

Whenever I mention the field "type" I actually mean "type_s".


-Ursprüngliche Nachricht-
Von: Sebastian Riemer [mailto:s.rie...@littera.eu] 
Gesendet: Dienstag, 10. Mai 2016 11:47
An: solr-user@lucene.apache.org
Betreff: How to find out if index contains orphaned child documents

Hi all,



I have the suspicion that my index might contain orphaned child documents 
because a query restricting to a field on a child document field returns two 
parent documents where I only expect one document to match the query. As I 
cannot figure out any obvious reason why the second document is returned, I 
suspect something is going wrong elsewhere. (See the query link and the result 
in very small font at the end of mail).



Therefore I would like to know whether there is a simple way to find out if my 
index contains orphaned child documents?



In my index I have parent documents which are marked through field 
"type_s:wemi" and I have child documents (amongst other) marked through field 
"type:cat_title". They share the same ID in a field called "wemiId”.



So I guess I would have to phrase a query like “are there any documents with a 
type_s other than wemi for which there are no documents with type wemi having 
the same wemiId?”



If you need further information I am happy to provide, thanks for your help!



Sebastian





Query in multiple formats:



http://localhost:8983/solr/wemi/select?q=*:*=client_id:1=cat_db_id:4294967297=m_id_l:[*
 TO *]=(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title 
AND titles_name_t_ns:("Neuland unter den 
Sandalen"'})=0=15=json=true



http://localhost:8983/solr/wemi/select?q=*%3A*=client_id%3A1=cat_db_id%3A4294967297=m_id_l%3A%5B*+TO+*%5D=(type_s%3Awemi+AND+%7B!parent+which%3D%27type_s%3Awemi%27v%3D%27(((type_s%3Acat_title+AND+titles_name_t_ns%3A(%22Neuland+unter+den+Sandalen%22%27%7D)=0=15=json=true



start=0

=15

=client_id:1

=cat_db_id:4294967297

=m_id_l:[* TO *]

=(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title AND 
titles_name_t_ns:("Neuland unter den Sandalen"'})

=*:*

=true

=true

=1

=true

=true

=true

=m_id_l

=m_id_l desc

={!ex=m_mt_0 key=m_mt_0}m_mediaType_lang_2_s



Result of the query:

(to verify that the result is strange, look for the text “Neuland unter den 
Sandalen”, which seems to only occur in one of the two documents)



{

  "responseHeader":{

"status":0,

"QTime":15,

"params":{

  "q":"*:*",

  "indent":"true",

  "start":"0",

  "fq":["client_id:1",

"cat_db_id:4294967297",

"m_id_l:[* TO *]",

"(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title 
AND titles_name_t_ns:(\"Neuland unter den Sandalen\"'})"],

  "rows":"15",

  "wt":"json"}},

  "response":{"numFound":2,"start":0,"docs":[

  {

"type_s":"wemi",

"text":["wemi",

  "4294985955",

  "Work",

  "Werk",

  "Opera",

  "",

  "",

  "Neuland unter den Sandalen ; Müller, Christoph",

  "Müller, Christoph",

  "Neuland unter den Sandalen",

  "4294984086",

  "Neuland unter den Sandalen",

  "Expression",

  "Expression",

  "Espressione",

  "",

  "",

  "Neuland unter den Sandalen",

  "German",

  "Deutsch",

  "Tedesco",

  "German",

  "German",

  "TEXT",

  "4294985990",

  "Neuland unter den Sandalen ; Müller, Christoph",

  "Neuland unter den Sandalen",

  "Book",

  "Buch",

  "Libro",

  "",

  "",

  "Müller, Christoph",

  "Verlagsangaben Angaben aus der Verlagsmeldung \n\n \n\n  Bete, 
arbeite und brich auf! : Ein Benediktiner auf dem Jakobsweg / von Christoph 
Müller \n\n \nWas ein Ordensmann auf dem Jakobsweg erlebt: \nZum \"Ora et 
Labora\" gesellt sich bei Benediktinerpater Christoph das Pilgern hinzu. 
Zunächst per Fahrrad, später auf Schusters Rappen, erlebt er Freud- und 
Leidvolles bis Santiago. Gute Beobachtungsgabe, Sinn für Situationskomik und 
die benediktinische Spiritualität, die immer wiede

How to find out if index contains orphaned child documents

2016-05-10 Thread Sebastian Riemer
Hi all,



I have the suspicion that my index might contain orphaned child documents 
because a query restricting to a field on a child document field returns two 
parent documents where I only expect one document to match the query. As I 
cannot figure out any obvious reason why the second document is returned, I 
suspect something is going wrong elsewhere. (See the query link and the result 
in very small font at the end of mail).



Therefore I would like to know whether there is a simple way to find out if my 
index contains orphaned child documents?



In my index I have parent documents which are marked through field 
"type_s:wemi" and I have child documents (amongst other) marked through field 
"type:cat_title". They share the same ID in a field called "wemiId”.



So I guess I would have to phrase a query like “are there any documents with a 
type_s other than wemi for which there are no documents with type wemi having 
the same wemiId?”



If you need further information I am happy to provide, thanks for your help!



Sebastian





Query in multiple formats:



http://localhost:8983/solr/wemi/select?q=*:*=client_id:1=cat_db_id:4294967297=m_id_l:[*
 TO *]=(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title 
AND titles_name_t_ns:("Neuland unter den 
Sandalen"'})=0=15=json=true



http://localhost:8983/solr/wemi/select?q=*%3A*=client_id%3A1=cat_db_id%3A4294967297=m_id_l%3A%5B*+TO+*%5D=(type_s%3Awemi+AND+%7B!parent+which%3D%27type_s%3Awemi%27v%3D%27(((type_s%3Acat_title+AND+titles_name_t_ns%3A(%22Neuland+unter+den+Sandalen%22%27%7D)=0=15=json=true



start=0

=15

=client_id:1

=cat_db_id:4294967297

=m_id_l:[* TO *]

=(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title AND 
titles_name_t_ns:("Neuland unter den Sandalen"'})

=*:*

=true

=true

=1

=true

=true

=true

=m_id_l

=m_id_l desc

={!ex=m_mt_0 key=m_mt_0}m_mediaType_lang_2_s



Result of the query:

(to verify that the result is strange, look for the text “Neuland unter den 
Sandalen”, which seems to only occur in one of the two documents)



{

  "responseHeader":{

"status":0,

"QTime":15,

"params":{

  "q":"*:*",

  "indent":"true",

  "start":"0",

  "fq":["client_id:1",

"cat_db_id:4294967297",

"m_id_l:[* TO *]",

"(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title 
AND titles_name_t_ns:(\"Neuland unter den Sandalen\"'})"],

  "rows":"15",

  "wt":"json"}},

  "response":{"numFound":2,"start":0,"docs":[

  {

"type_s":"wemi",

"text":["wemi",

  "4294985955",

  "Work",

  "Werk",

  "Opera",

  "",

  "",

  "Neuland unter den Sandalen ; Müller, Christoph",

  "Müller, Christoph",

  "Neuland unter den Sandalen",

  "4294984086",

  "Neuland unter den Sandalen",

  "Expression",

  "Expression",

  "Espressione",

  "",

  "",

  "Neuland unter den Sandalen",

  "German",

  "Deutsch",

  "Tedesco",

  "German",

  "German",

  "TEXT",

  "4294985990",

  "Neuland unter den Sandalen ; Müller, Christoph",

  "Neuland unter den Sandalen",

  "Book",

  "Buch",

  "Libro",

  "",

  "",

  "Müller, Christoph",

  "Verlagsangaben Angaben aus der Verlagsmeldung \n\n \n\n  Bete, 
arbeite und brich auf! : Ein Benediktiner auf dem Jakobsweg / von Christoph 
Müller \n\n \nWas ein Ordensmann auf dem Jakobsweg erlebt: \nZum \"Ora et 
Labora\" gesellt sich bei Benediktinerpater Christoph das Pilgern hinzu. 
Zunächst per Fahrrad, später auf Schusters Rappen, erlebt er Freud- und 
Leidvolles bis Santiago. Gute Beobachtungsgabe, Sinn für Situationskomik und 
die benediktinische Spiritualität, die immer wieder durchscheint, machen diesen 
Pilgerbericht zu einem niveauvollen Leseerlebnis.",

  "1",

  "UNSPECIFIED",

  "Christoph Müller",

  "UNMEDIATED",

  "Ill., Kt.",

  "German",

  "Deutsch",

  "Tedesco",

  "German",

  "German",

  "205 S.",

  "4294985812",

  "4294985990",

  "4294967297",

  "2016-05-10T00:00:00Z",

  "Mü",

  "18449",

  "false",

  "1",

  "Available",

  "Verfügbar",

  "Disponibile",

  "",

  "",

  "true",

  "http://;],

"wemiId":"4294985955429498408642949859904294985812",

"id":"4294985955429498408642949859904294985812",

"w_id_l":4294985955,

"w_mediaType_lang_1_s":"Work",

"w_mediaType_lang_2_s":"Werk",

"w_mediaType_lang_3_s":"Opera",

"w_mediaType_lang_4_s":"",

"w_mediaType_lang_5_s":"",

"w_displayTitle_s":"Neuland unter den 

How to find out if index contains orphaned child documents

2016-05-10 Thread Sebastian Riemer
Hi all,



I have the suspicion that my index might contain orphaned child documents 
because a query restricting to a field on a child document field returns two 
parent documents where I only expect one document to match the query. As I 
cannot figure out any obvious reason why the second document is returned, I 
suspect something is going wrong elsewhere. (See the query link and the result 
in very small font at the end of mail).



Therefore I would like to know whether there is a simple way to find out if my 
index contains orphaned child documents?



In my index I have parent documents which are marked through field 
"type_s:wemi" and I have child documents (amongst other) marked through field 
"type:cat_title". They share the same ID in a field called "wemiId”.



So I guess I would have to phrase a query like “are there any documents with a 
type_s other than wemi for which there are no documents with type wemi having 
the same wemiId?”



If you need further information I am happy to provide, thanks for your help!



Sebastian





Query in multiple formats:



http://localhost:8983/solr/wemi/select?q=*:*=client_id:1=cat_db_id:4294967297=m_id_l:[*
 TO *]=(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title 
AND titles_name_t_ns:("Neuland unter den 
Sandalen"'})=0=15=json=true



http://localhost:8983/solr/wemi/select?q=*%3A*=client_id%3A1=cat_db_id%3A4294967297=m_id_l%3A%5B*+TO+*%5D=(type_s%3Awemi+AND+%7B!parent+which%3D%27type_s%3Awemi%27v%3D%27(((type_s%3Acat_title+AND+titles_name_t_ns%3A(%22Neuland+unter+den+Sandalen%22%27%7D)=0=15=json=true



start=0

=15

=client_id:1

=cat_db_id:4294967297

=m_id_l:[* TO *]

=(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title AND 
titles_name_t_ns:("Neuland unter den Sandalen"'})

=*:*

=true

=true

=1

=true

=true

=true

=m_id_l

=m_id_l desc

={!ex=m_mt_0 key=m_mt_0}m_mediaType_lang_2_s



Result of the query:

(to verify that the result is strange, look for the text “Neuland unter den 
Sandalen”, which seems to only occur in one of the two documents)



{

  "responseHeader":{

"status":0,

"QTime":15,

"params":{

  "q":"*:*",

  "indent":"true",

  "start":"0",

  "fq":["client_id:1",

"cat_db_id:4294967297",

"m_id_l:[* TO *]",

"(type_s:wemi AND {!parent which='type_s:wemi'v='(((type_s:cat_title 
AND titles_name_t_ns:(\"Neuland unter den Sandalen\"'})"],

  "rows":"15",

  "wt":"json"}},

  "response":{"numFound":2,"start":0,"docs":[

  {

"type_s":"wemi",

"text":["wemi",

  "4294985955",

  "Work",

  "Werk",

  "Opera",

  "",

  "",

  "Neuland unter den Sandalen ; Müller, Christoph",

  "Müller, Christoph",

  "Neuland unter den Sandalen",

  "4294984086",

  "Neuland unter den Sandalen",

  "Expression",

  "Expression",

  "Espressione",

  "",

  "",

  "Neuland unter den Sandalen",

  "German",

  "Deutsch",

  "Tedesco",

  "German",

  "German",

  "TEXT",

  "4294985990",

  "Neuland unter den Sandalen ; Müller, Christoph",

  "Neuland unter den Sandalen",

  "Book",

  "Buch",

  "Libro",

  "",

  "",

  "Müller, Christoph",

  "Verlagsangaben Angaben aus der Verlagsmeldung \n\n \n\n  Bete, 
arbeite und brich auf! : Ein Benediktiner auf dem Jakobsweg / von Christoph 
Müller \n\n \nWas ein Ordensmann auf dem Jakobsweg erlebt: \nZum \"Ora et 
Labora\" gesellt sich bei Benediktinerpater Christoph das Pilgern hinzu. 
Zunächst per Fahrrad, später auf Schusters Rappen, erlebt er Freud- und 
Leidvolles bis Santiago. Gute Beobachtungsgabe, Sinn für Situationskomik und 
die benediktinische Spiritualität, die immer wieder durchscheint, machen diesen 
Pilgerbericht zu einem niveauvollen Leseerlebnis.",

  "1",

  "UNSPECIFIED",

  "Christoph Müller",

  "UNMEDIATED",

  "Ill., Kt.",

  "German",

  "Deutsch",

  "Tedesco",

  "German",

  "German",

  "205 S.",

  "4294985812",

  "4294985990",

  "4294967297",

  "2016-05-10T00:00:00Z",

  "Mü",

  "18449",

  "false",

  "1",

  "Available",

  "Verfügbar",

  "Disponibile",

  "",

  "",

  "true",

  "http://;],

"wemiId":"4294985955429498408642949859904294985812",

"id":"4294985955429498408642949859904294985812",

"w_id_l":4294985955,

"w_mediaType_lang_1_s":"Work",

"w_mediaType_lang_2_s":"Werk",

"w_mediaType_lang_3_s":"Opera",


AW: Duplicate Document IDs when updateing parent document with child document

2016-03-09 Thread Sebastian Riemer
Hi,

to actually describe my problem in short, instead of just linking to the test 
applicaton, using SolrJ I do the following:

1) Create a new document as a parent and commit
SolrInputDocument parentDoc = new SolrInputDocument();
parentDoc.addField("id", "parent_1");
parentDoc.addField("name_s", "Sarah Connor");
parentDoc.addField("blockJoinId", "1");
solrClient.add(parentDoc);
solrClient.commit();

2) Create a new document with the same unique-id as in 1) with a child document 
appended
SolrInputDocument parentDocUpdateing = new SolrInputDocument();
parentDocUpdateing.addField("id", "parent_1");
parentDocUpdateing.addField("name_s", "Sarah Connor");
parentDocUpdateing.addField("blockJoinId", "1");

SolrInputDocument childDoc = new SolrInputDocument();
childDoc.addField("id", "child_1");
childDoc.addField("name_s", "John Connor");
childDoc.addField("blockJoinId", "1");

parentDocUpdateing.addChildDocument(childDoc);
solrClient.add(parentDocUpdateing);
solrClient.commit();

3) Results in 2 Documents with id="parent_1" in solr index

Is this normal behaviour? I thought the existing document should be updated 
instead of generating a new document with same id.

For a full working test application please see orginal message.

Best regards,
Sebastian

-Ursprüngliche Nachricht-
Von: Sebastian Riemer [mailto:s.rie...@littera.eu] 
Gesendet: Dienstag, 8. März 2016 20:05
An: solr-user@lucene.apache.org
Betreff: Duplicate Document IDs when updateing parent document with child 
document

Hi,

I have created a simple Java application which illustrates this issue.

I am using Solr-Version 5.5.0 and SolrJ.

Here is a link to the github repository: 
https://github.com/sebastianriemer/SolrDuplicateTest

The issue I am facing is also described by another person on stackoverflow: 
http://stackoverflow.com/questions/34253178/solr-doesnt-overwrite-duplicated-uniquekey-entries

I would love if any of you could run the test at your place and give me 
feedback.

If you have any questions do not hesitate to write me.

Many thanks in advance and best regards,

Sebastian Riemer





Duplicate Document IDs when updateing parent document with child document

2016-03-08 Thread Sebastian Riemer
Hi,

I have created a simple Java application which illustrates this issue.

I am using Solr-Version 5.5.0 and SolrJ.

Here is a link to the github repository: 
https://github.com/sebastianriemer/SolrDuplicateTest

The issue I am facing is also described by another person on stackoverflow: 
http://stackoverflow.com/questions/34253178/solr-doesnt-overwrite-duplicated-uniquekey-entries

I would love if any of you could run the test at your place and give me 
feedback.

If you have any questions do not hesitate to write me.

Many thanks in advance and best regards,

Sebastian Riemer