Javadocs are not linkable
Hi, I recently noticed that the SOLR javadocs hosted by lucene are not linkable as the „package-list“ file is not downloadable. Is this on purpose? $ curl https://lucene.apache.org/solr/8_4_0/solr-solrj/package-list 301 Moved Permanently Moved Permanently The document has moved https://lucene.apache.org/solr/8_4_0/solr-solrj/package-list/;>here. It’s the same issue with older versions. My maven build fails with: MavenReportException: Error while generating Javadoc: [ERROR] Exit code: 1 - javadoc: error - Error fetching URL: https://lucene.apache.org/solr/8_3_0/solr-solrj/ kind regards Thomas
Re: Memory Leak in 7.3 to 7.4
Hi, my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries just for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage after about 85 % of the index process and manual trigger of the garbage collector is about 60-70 MB (That low!!!) My problem now is that we have several setups that triggers this reliably but there is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. I also do not know if the error is inside Tika or inside the glue code that makes Tika usable in SOLR. Should I file an issue for this? kind regards, Thomas > Am 02.08.2018 um 12:06 schrieb Thomas Scheffler > : > > Hi, > > we noticed a memory leak in a rather small setup. 40.000 metadata documents > with nearly as much files that have „literal.*“ fields with it. While 7.2.1 > has brought some tika issues (due to a beta version) the real problems > started to appear with version 7.3.0 which are currently unresolved in 7.4.0. > Memory consumption is out-of-roof. Where previously 512MB heap was enough, > now 6G aren’t enough to index all files. > I am now to a point where I can track this down to the libraries in > solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries > shipped with 7.2.1 the problem disappears. As most files are PDF documents I > tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the > problem. I will next try to downgrade these single libraries back to 2.0.6 > and 1.16 to see if these are the source of the memory leak. > > In the mean time I would like to know if anybody else experienced the same > problems? > > kind regards, > > Thomas signature.asc Description: Message signed with OpenPGP
Re: Memory Leak in 7.3 to 7.4
Hi, SOLR is shipping with a script that handles OOM errors. And produces log files for every case with content like this: Running OOM killer script for process 9015 for Solr on port 28080 Killed process 9015 This script works ;-) kind regards Thomas > Am 02.08.2018 um 12:28 schrieb Vincenzo D'Amore : > > Not clear if you had experienced an OOM error. > > On Thu, Aug 2, 2018 at 12:06 PM Thomas Scheffler < > thomas.scheff...@uni-jena.de> wrote: > >> Hi, >> >> we noticed a memory leak in a rather small setup. 40.000 metadata >> documents with nearly as much files that have „literal.*“ fields with it. >> While 7.2.1 has brought some tika issues (due to a beta version) the real >> problems started to appear with version 7.3.0 which are currently >> unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously >> 512MB heap was enough, now 6G aren’t enough to index all files. >> I am now to a point where I can track this down to the libraries in >> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries >> shipped with 7.2.1 the problem disappears. As most files are PDF documents >> I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the >> problem. I will next try to downgrade these single libraries back to 2.0.6 >> and 1.16 to see if these are the source of the memory leak. >> >> In the mean time I would like to know if anybody else experienced the same >> problems? >> >> kind regards, >> >> Thomas >> > > > -- > Vincenzo D'Amore signature.asc Description: Message signed with OpenPGP
Memory Leak in 7.3 to 7.4
Hi, we noticed a memory leak in a rather small setup. 40.000 metadata documents with nearly as much files that have „literal.*“ fields with it. While 7.2.1 has brought some tika issues (due to a beta version) the real problems started to appear with version 7.3.0 which are currently unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was enough, now 6G aren’t enough to index all files. I am now to a point where I can track this down to the libraries in solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries shipped with 7.2.1 the problem disappears. As most files are PDF documents I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the problem. I will next try to downgrade these single libraries back to 2.0.6 and 1.16 to see if these are the source of the memory leak. In the mean time I would like to know if anybody else experienced the same problems? kind regards, Thomas signature.asc Description: Message signed with OpenPGP
Re: 7.3 appears to leak
Hi, we noticed the same problems here in a rather small setup. 40.000 metadata documents with nearly as much files that have „literal.*“ fields with it. While 7.2.1 has brought some tika issues the real problems started to appear with version 7.3.0 which are currently unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was enough, now 6G aren’t enough to index all files. kind regards, Thomas > Am 04.07.2018 um 15:03 schrieb Markus Jelsma : > > Hello Andrey, > > I didn't think of that! I will try it when i have the courage again, probably > next week or so. > > Many thanks, > Markus > > > -Original message- >> From:Kydryavtsev Andrey >> Sent: Wednesday 4th July 2018 14:48 >> To: solr-user@lucene.apache.org >> Subject: Re: 7.3 appears to leak >> >> If it is not possible to find a resource leak by code analysis and there is >> no better ideas, I can suggest a brute force approach: >> - Clone Solr's sources from appropriate branch >> https://github.com/apache/lucene-solr/tree/branch_7_3 >> - Log every searcher's holder increment/decrement operation in a way to >> catch every caller name (use Thread.currentThread().getStackTrace() or >> something) >> https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java >> - Build custom artefacts and upload them on prod >> - After memory leak happened - analyse logs to see what part of >> functionality doesn't decrement searcher after counter was incremented. If >> searchers are leaked - there should be such code I guess. >> >> This is not something someone would like to do, but it is what it is. >> >> >> >> Thank you, >> >> Andrey Kudryavtsev >> >> >> 03.07.2018, 14:26, "Markus Jelsma" : >>> Hello Erick, >>> >>> Even the silliest ideas may help us, but unfortunately this is not the >>> case. All our Solr nodes run binaries from the same source from our central >>> build server, with the same libraries thanks to provisioning. Only schema >>> and config are different, but the directive is the same all over. >>> >>> Are there any other ideas, speculations, whatever, on why only our main >>> text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 >>> and every version up? >>> >>> Many thanks? >>> Markus >>> >>> -Original message- From:Erick Erickson Sent: Friday 29th June 2018 19:34 To: solr-user Subject: Re: 7.3 appears to leak This is truly puzzling then, I'm clueless. It's hard to imagine this is lurking out there and nobody else notices, but you've eliminated the custom code. And this is also very peculiar: * it occurs only in our main text search collection, all other collections are unaffected; * despite what i said earlier, it is so far unreproducible outside production, even when mimicking production as good as we can; Here's a tedious idea. Restart Solr with the -v option, I _think_ that shows you each and every jar file Solr loads. Is it "somehow" possible that your main collection is loading some jar from somewhere that's different than you expect? 'cause silly ideas like this are all I can come up with. Erick On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma wrote: > Hello Erick, > > The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does: > > public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { > super.handleRequestBody(req, rsp); > > if (rsp.getToLog().get("hits") instanceof Integer) { > rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits"))); > } > if (rsp.getToLog().get("hits") instanceof Long) { > rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits"))); > } > } > > I am not sure this qualifies as one more to go. > > Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left. > > I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher? > > Let me know :) > > Many thanks! > Markus > > -Original message- >> From:Erick Erickson >> Sent: Friday 29th June 2018 18:46 >> To: solr-user >> Subject: Re: 7.3 appears to leak >> >> bq. The only custom
filter groups
Hi, I have metadata and file indexed in solr. All have a different id of cause but share the same value for "returnId" if they belong to the same metadata that describes a bunch of files (1:n). When I start a search. I usually use grouping instead of join queries to keep the information where the hit occurred. Now there it's getting tricky. I want to filter out groups depending on a field that is only available on metadata documents: visibility. I want to search in solr like: "Find all documents containing 'foo' grouped by returnId, where the metadata visibility is 'public'" So it should find any 'foo' files but only display the result if the corresponding metadata documents field visibility='public'. Faceting also uses just the information inside groups. Can I give SOLR some information for 'fq' and 'facet.*' to work with my setup? I am still using SOLR 4.10.5 kind regards Thomas
Integration Tests with SOLR 5
Hi, I noticed that not only SOLR does not deliver a WAR file anymore but also advices not to try to provide a custom WAR file that can be deployed anymore as future version may depend on custom jetty features. Until 4.10. we were able to provide a WAR file with all the plug-ins we need for easier installs. The same WAR file was used together with an web application WAR running integration tests and to check if all application details still work. We used the cargo-mave2-plugin and different servlet container for testing. I think this is quiet common thing to do with continuous integration. Now I wonder if anyone has a similar setup and with integration tests running against SOLR 5. - No artifacts can be used, so no local repository cache is present - How to deploy your schema.xml, stopwords, solr plug-ins etc. for testing in an isolated environment - What does a maven boilerplate code look like? Any ideas would be appreciated. Kind regards, Thomas
grouping of multivalued fields
Hi, I have a special case of grouping multivalued fields and I wonder if this is possible with SOLR. I have a field foo that is generally multivalued. But for a restricted set of documents this field has one value or is not present. So normally grouping should work. Sadly SOLR is failing fast and I wonder if there is some way to specify group by first|any|last|min|max (means all the same here) value of foo. regards, Thomas
Re: grouping of multivalued fields
Am 21.05.2014 15:07, schrieb Joel Bernstein: You may want to investigate the group.func option. This would allow you to plug in your own logic to return the group by key. I don't think there is an existing function that does exactly what you need so you may have to write a custom function. I thought of max(foo) for it but sadly it does work on multivalued fields either. I wait for other suggestions and start looking at custom functions (did not known that option exists) in parallel. Thanks, Thomas
Re: trigger delete on nested documents
Am 19.05.2014 19:25, schrieb Mikhail Khludnev: Thomas, Vanilla way to override a blocks is to send it with the same unique-key (I guess it's id for your case, btw don't you have unique-key defined in the schema?), but it must have at least one child. It seems like analysis issue to me https://issues.apache.org/jira/browse/SOLR-5211 While block is indexed the special field _root_ equal to the unique-key is added across the whole block (caveat, it's not stored by default). At least you can issue deletequery_root_:PK_VAL/query/delete to wipe the whole block. Thank you for your insight information. It sure helps a lot in understanding. The '_root_' field was new to me on this rather poor documented feature of SOLR. It helps already if I perform single updates and deletes from the index. BUT: If I delete by a query this results in a mess: 1.) request all IDs returned by that query 2.) fire a giant delete query with id:(id1 OR .. OR idn) _root_:(id1 OR .. OR idn) Before every update of single documents I have to fire a delete request. This turns into a mess, when updating in batch mode: 1.) remove chunk of 100 documents and nested documents (see above) 2.) index chunk of 100 documents All information for that is available on SOLR side. Can I configure some hook that is executed on SOLR-Server so that I do not have to change all applications? This would at least save these extra network transfers. After big work to migrate from plain Lucene to SOLR I really require proper nested document support. Elastic Search seems to support it (http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html) but I am afraid of another migration. Elastic Search even hides the nested documents at queries which seems nice, too. Does anyone have information how nested document support evolve in future releases of SOLR? kind regards, Thomas 19.05.2014 10:37 пользователь Thomas Scheffler thomas.scheff...@uni-jena.de написал: Hi, I plan to use nested documents to group some of my fields doc field name=idart0001/field field name=titleMy first article/field doc field name=idart0001-foo/field field name=nameSmith, John/field field name=roleauthor/field /doc doc field name=idart0001-bar/field field name=namePower, Max/field field name=rolereviewer/field /doc /doc This way can ask for any documents that are reviewed by Max Power. However to simplify update and deletes I want to ensure that nested documents are deleted automatically on update and delete of the parent document. Does anyone had to deal with this problem and found a solution?
Re: trigger delete on nested documents
Am 20.05.2014 14:11, schrieb Jack Krupansky: To be clear, you cannot update a single document of a nested document in place - you must reindex the whole block, parent and all children. This is because this feature relies on the underlying Lucene block join feature that requires that the documents be contiguous, and updating a single child document would make it discontiguous with the rest of the block of documents. Just update the block by resending the entire block of documents. For e previous discussion of this limitation: http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html This is totally clear to me and I want nested document to not be accessible without it's root context. There is no way it seems to delete the whole block by the id of the root document. There is no way to update the root document that removes the stale date from the index. Normal SOLR behavior is to automatically delete old documents with same ID. I expect this behavior for other documents in this block to. Anyway to make things clear I issued a JIRA request and tried to explain it more carefully there: https://issues.apache.org/jira/browse/SOLR-6096 regards Thomas
trigger delete on nested documents
Hi, I plan to use nested documents to group some of my fields doc field name=idart0001/field field name=titleMy first article/field doc field name=idart0001-foo/field field name=nameSmith, John/field field name=roleauthor/field /doc doc field name=idart0001-bar/field field name=namePower, Max/field field name=rolereviewer/field /doc /doc This way can ask for any documents that are reviewed by Max Power. However to simplify update and deletes I want to ensure that nested documents are deleted automatically on update and delete of the parent document. Does anyone had to deal with this problem and found a solution? regards, Thomas
Re: trigger delete on nested documents
Am 19.05.2014 08:38, schrieb Walter underwood: Solr does not support nested documents. -- wunder It does since 4.5: http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/common/SolrInputDocument.html#addChildDocuments(java.util.Collection) But this feature is rather poor documented and has some caveats: http://blog.griddynamics.com/2013/09/solr-block-join-support.html regards, Thomas On May 18, 2014, at 11:36 PM, Thomas Scheffler thomas.scheff...@uni-jena.de wrote: Hi, I plan to use nested documents to group some of my fields doc field name=idart0001/field field name=titleMy first article/field doc field name=idart0001-foo/field field name=nameSmith, John/field field name=roleauthor/field /doc doc field name=idart0001-bar/field field name=namePower, Max/field field name=rolereviewer/field /doc /doc This way can ask for any documents that are reviewed by Max Power. However to simplify update and deletes I want to ensure that nested documents are deleted automatically on update and delete of the parent document. Does anyone had to deal with this problem and found a solution?
Re: range types in SOLR
Am 03.03.2014 19:12, schrieb Smiley, David W.: The main reference for this approach is here: http://wiki.apache.org/solr/SpatialForTimeDurations Hoss’s illustrations he developed for the meetup presentation are great. However, there are bugs in the instruction — specifically it’s important to slightly buffer the query and choose an appropriate maxDistErr. Also, it’s more preferable to use the rectangle range query style of spatial query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using “Intersects(minX minY maxX maxY)”. There’s no technical difference but the latter is deprecated and will eventually be removed from Solr 5 / trunk. All this said, recognize this is a bit of a hack (one that works well). There is a good chance a more ideal implementation approach is going to be developed this year. Thank you, having a working example is great but having a practically working example that hides this implementation detail would even better. I would like to store: 2014-03-04T07:05:12,345Z, 2014-03-04, 2014-03 and 2014 into one field and make queries on that field. Currently I have to normalize all to the first format (inventing information). That is only the worst approximation. Normalize them to a range would be the best in my opinion. So a query like date:2014 would hit all but also date:[2014-01 TO 2014-03]. kind regards, Thomas
Re: SOLRJ and SOLR compatibility
Am 27.02.2014 09:15, schrieb Shawn Heisey: On 2/27/2014 12:49 AM, Thomas Scheffler wrote: What problems have you seen with mixing 4.6.0 and 4.6.1? It's possible that I'm completely ignorant here, but I have not heard of any. Actually bug reports arrive me that sound like Unknown type 19 Aha! I found it! It was caused by the change applied for SOLR-5658, fixed in 4.7.0 (just released) by SOLR-5762. Just my luck that there's a bug bad enough to contradict what I told you. https://issues.apache.org/jira/browse/SOLR-5658 https://issues.apache.org/jira/browse/SOLR-5762 I've added a comment that will help users find SOLR-5762 with a search for Unknown type 19. If you use SolrJ 4.7.0, compatibility should be better. Hi, I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR 4.5.1. I received a client stack trace this morning and still waiting for a Log-Output from the Server: -- ERROR unable to submit tasks org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: Unknown type 19 at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495) at org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117) -- There is not much information in that Stacktrace, I know. I'll send further information, when I receive more. In the mean time I asked our customer not to upgrade the SOLR server to resolve the issue. So we could dig deeper. kind regards, Thomas
Re: SOLRJ and SOLR compatibility
Am 04.03.2014 07:21, schrieb Thomas Scheffler: Am 27.02.2014 09:15, schrieb Shawn Heisey: On 2/27/2014 12:49 AM, Thomas Scheffler wrote: What problems have you seen with mixing 4.6.0 and 4.6.1? It's possible that I'm completely ignorant here, but I have not heard of any. Actually bug reports arrive me that sound like Unknown type 19 Aha! I found it! It was caused by the change applied for SOLR-5658, fixed in 4.7.0 (just released) by SOLR-5762. Just my luck that there's a bug bad enough to contradict what I told you. https://issues.apache.org/jira/browse/SOLR-5658 https://issues.apache.org/jira/browse/SOLR-5762 I've added a comment that will help users find SOLR-5762 with a search for Unknown type 19. If you use SolrJ 4.7.0, compatibility should be better. Hi, I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR 4.5.1. I received a client stack trace this morning and still waiting for a Log-Output from the Server: Here we go for the server side (4.5.1): Mrz 03, 2014 2:39:26 PM org.apache.solr.core.SolrCore execute Information: [clausthal_test] webapp=/solr path=/select params={fl=*,scoresort=mods.dateIssued+descq=%2BobjectType:mods+%2Bcategory:clausthal_status\:publishedwt=javabinversion=2rows=3} hits=186 status=0 QTime=2 Mrz 03, 2014 2:39:38 PM org.apache.solr.update.processor.LogUpdateProcessor finish Information: [clausthal_test] webapp=/solr path=/update params={wt=javabinversion=2} {} 0 0 Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log Schwerwiegend: java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116) at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186) at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158) at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99) at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58) at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407) at org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:744) Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log Schwerwiegend: null:java.lang.RuntimeException: Unknown type 19 at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228) at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139
range types in SOLR
Hi, I am in the need of range types in SOLR - similar to PostgreSQL: https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf My schema should allow approximate dates and queries on that. When having a single such date per document one can split this information into two separate fields. But this is not an option if the date is multiple. One have to to split the document into two ore more documents. I wonder if that has to be so complicated. Does somebody know if SOLR already supports range types? If not, how difficult would it be to implement? Is anybody in the need for range types, too? kind regards, Thomas
Re: range types in SOLR
Am 01.03.14 18:24, schrieb Erick Erickson: I'm not clear what you're really after here. Solr certainly supports ranges, things like time:[* TO date_spec] or date_field:[date_spec TO date_spec] etc. There's also a really creative use of spatial (of all things) to, say answer questions involving multiple dates per record. Imagine, for instance, employees with different hours on different days. You can use spatial to answer questions like which employees are available on Wednesday between 4PM and 8PM. And if none of this is relevant, how about you give us some use-cases? This could well be an XY problem. Hi, lets try this example to show the problem. You have some old text that was written in two periods of time: 1.) 2nd half of 13th century: - 1250-1299 2.) Beginning of 18th century: - 1700-1715 You are searching for text that were written between 1300-1699, than this document described above should not be hit. If you make start date and end date multiple this results in: start: [1250, 1700] end: [1299, 1715] A search for documents written between 1300-1699 would be: (+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 TO *]) (+start:[*-1699] +end:[1700 TO *]) You see that the document above would obviously hit by (+start:[* TO 1300] +end:[1300 TO *]) Hope you see the problem. This problem is the same when ever you face multiple ranges in one field of a document. For every duplication you need to create a separate SOLR document. If you have two such fields in one document, field one with n values and field two with m values. You are forced to create m*n documents. This fact and the rather unhandy query (s.o.) are my motivation to ask for range types like in PostgreSQL where the problem is solved. regards, Thomas
SOLRJ and SOLR compatibility
Hi, I am one developer of a repository framework. We rely on the fact, that SolrJ generally maintains backwards compatibility, so you can use a newer SolrJ with an older Solr, or an older SolrJ with a newer Solr. [1] This statement is not even true for bugfix releases like 4.6.0 - 4.6.1. (SOLRJ 4.6.1, SOLR 4.6.0) We use SolrInputDocument from SOLRJ to index our documents (javabin). But as framework developer we are not in a role to force our users to update their SOLR server such often. Instead with every new version we want to update just the SOLRJ library we ship with to enable latest features, if the user wishes. When I send a query to a request handler I can attach a version parameter to tell SOLR which version of the response format I expect. Is there such a configuration when indexing SolrInputDocuments? I did not find it so far. Kind regards, Thomas [1] https://wiki.apache.org/solr/Solrj
Re: SOLRJ and SOLR compatibility
Am 27.02.2014 08:04, schrieb Shawn Heisey: On 2/26/2014 11:22 PM, Thomas Scheffler wrote: I am one developer of a repository framework. We rely on the fact, that SolrJ generally maintains backwards compatibility, so you can use a newer SolrJ with an older Solr, or an older SolrJ with a newer Solr. [1] This statement is not even true for bugfix releases like 4.6.0 - 4.6.1. (SOLRJ 4.6.1, SOLR 4.6.0) We use SolrInputDocument from SOLRJ to index our documents (javabin). But as framework developer we are not in a role to force our users to update their SOLR server such often. Instead with every new version we want to update just the SOLRJ library we ship with to enable latest features, if the user wishes. When I send a query to a request handler I can attach a version parameter to tell SOLR which version of the response format I expect. Is there such a configuration when indexing SolrInputDocuments? I did not find it so far. What problems have you seen with mixing 4.6.0 and 4.6.1? It's possible that I'm completely ignorant here, but I have not heard of any. Actually bug reports arrive me that sound like Unknown type 19 I am currently not able to reproduce it myself with server version 4.5.0, 4.5.1 and 4.6.0 when using solrj 4.6.1 It sounds to be the same issue like described here: http://lucene.472066.n3.nabble.com/After-upgrading-indexer-to-SolrJ-4-6-1-o-a-solr-servlet-SolrDispatchFilter-Unknown-type-19-td4116152.html The solution there was to upgrade the Server to version 4.6.1. This helped here, too. Out there it is a very unpopular decision. Some user have large SOLR installs and stick to a certain (4.x) version. They want upgrades from us but upgrading company-wide SOLR installations is out of their scope. Is that a known SOLRJ issue that is fixed in version 4.7.0? kind regards, Thomas
weak documents
Hi, I am relatively new to SOLR and I am looking for a neat way to implement weak documents with SOLR. Whenever a document is updated or deleted all it's dependent documents should be removed from the index. In other words they exist as long as the document exist they refer to when they were indexed - in that specific version. On update they will be indexed after their master document. I could like to have some kind of dependsOn field that carries the uniqueKey value of the master document. Can this be done efficiently with SOLR? I need this technique because on update and on delete I don't know how many dependent documents exists in the SOLR index. Especially for batch index processes, I need a more efficient way than query before every update or delete. kind regards, Thomas
Re: weak documents
Am 27.11.2013 09:58, schrieb Paul Libbrecht: Thomas, our experience with Curriki.org is that evaluating what I call the related documents is a procedure that needs access to the complete content and thus is run at the DB level and no thte sold-level. For example, if a user changes a part of its name, we need to reindex all of his resources. Sure we could try to run a solr query for this, and maybe add index fields for it, but we felt it better to run this on the index-trigger side, the thing in our (XWiki) wiki which listens to changes and requests the reindexing of a few documents (including deletions). For the maintenance operation, the same issue has appeared. So, if the indexer or listener or solr has been down for a few minutes or hours, we'd need to reindex not only all changed documents but all changed documents and their related documents. If you are able to work through your solution that would be solr-only, to write down all depends-on at index time, it means you would index-update all inverse related documents every time that changes. For the relation above (documents of a user), it means the user documents needs reindexing every time a new document is added. I wonder if this makes a scale difference. I think both use-cases differ a bit. On index-time of my master document I have all information of dependent documents ready. So instead of committing one document I commit - lets say - four. In your case you have to query to get all documents of a user first. Here is a more detailed use-case. I have metadata in 1 to n languages to describe a document (e.g. journal article). I commit a master document in a specified default language to SOLR and one document for every language I have metadata for. If a user adds or removes metadata (e.g. abstract in French) there is one document more or one document less in SOLR. So their number changes and I want stalled data to be kept in the index. A similar use case: I have article documents with authors. I create author documents for every article. If someone adds or removes an author I need to track that change. These dump author documents are used for an alphabetical person index and hold a unique field that is used to group them but these documents exists only as long as their master documents do. My two use-cases are quite similar so I would like these weak documents functionality somehow. SOLR knows if a document is added with id=foo it have to replace a document that matches id:foo. If I can change this behavior to dependsOn:foo I am done. :-D regards Thomas