Javadocs are not linkable

2020-02-27 Thread Thomas Scheffler
Hi,

I recently noticed that the SOLR javadocs hosted by lucene are not linkable as 
the „package-list“ file is not downloadable. Is this on purpose?

$ curl https://lucene.apache.org/solr/8_4_0/solr-solrj/package-list


301 Moved Permanently

Moved Permanently
The document has moved https://lucene.apache.org/solr/8_4_0/solr-solrj/package-list/;>here.



It’s the same issue with older versions. My maven build fails with:

MavenReportException: Error while generating Javadoc:
[ERROR] Exit code: 1 - javadoc: error - Error fetching URL: 
https://lucene.apache.org/solr/8_3_0/solr-solrj/

kind regards

Thomas

Re: Memory Leak in 7.3 to 7.4

2018-08-02 Thread Thomas Scheffler
Hi,

my final verdict is the upgrade to Tika 1.17. If I downgrade the libraries just 
for tika back to 1.16 and keep the rest of SOLR 7.4.0 the heap usage after 
about 85 % of the index process and manual trigger of the garbage collector is 
about 60-70 MB (That low!!!)

My problem now is that we have several setups that triggers this reliably but 
there is no simple test case that „fails“ if Tika 1.17 or 1.18 is used. I also 
do not know if the error is inside Tika or inside the glue code that makes Tika 
usable in SOLR.

Should I file an issue for this?

kind regards,

Thomas


> Am 02.08.2018 um 12:06 schrieb Thomas Scheffler 
> :
> 
> Hi,
> 
> we noticed a memory leak in a rather small setup. 40.000 metadata documents 
> with nearly as much files that have „literal.*“ fields with it. While 7.2.1 
> has brought some tika issues (due to a beta version) the real problems 
> started to appear with version 7.3.0 which are currently unresolved in 7.4.0. 
> Memory consumption is out-of-roof. Where previously 512MB heap was enough, 
> now 6G aren’t enough to index all files.
> I am now to a point where I can track this down to the libraries in 
> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
> shipped with 7.2.1 the problem disappears. As most files are PDF documents I 
> tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
> problem. I will next try to downgrade these single libraries back to 2.0.6 
> and 1.16 to see if these are the source of the memory leak.
> 
> In the mean time I would like to know if anybody else experienced the same 
> problems?
> 
> kind regards,
> 
> Thomas




signature.asc
Description: Message signed with OpenPGP


Re: Memory Leak in 7.3 to 7.4

2018-08-02 Thread Thomas Scheffler
Hi,

SOLR is shipping with a script that handles OOM errors. And produces log files 
for every case with content like this:

Running OOM killer script for process 9015 for Solr on port 28080
Killed process 9015

This script works ;-)

kind regards

Thomas



> Am 02.08.2018 um 12:28 schrieb Vincenzo D'Amore :
> 
> Not clear if you had experienced an OOM error.
> 
> On Thu, Aug 2, 2018 at 12:06 PM Thomas Scheffler <
> thomas.scheff...@uni-jena.de> wrote:
> 
>> Hi,
>> 
>> we noticed a memory leak in a rather small setup. 40.000 metadata
>> documents with nearly as much files that have „literal.*“ fields with it.
>> While 7.2.1 has brought some tika issues (due to a beta version) the real
>> problems started to appear with version 7.3.0 which are currently
>> unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously
>> 512MB heap was enough, now 6G aren’t enough to index all files.
>> I am now to a point where I can track this down to the libraries in
>> solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries
>> shipped with 7.2.1 the problem disappears. As most files are PDF documents
>> I tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the
>> problem. I will next try to downgrade these single libraries back to 2.0.6
>> and 1.16 to see if these are the source of the memory leak.
>> 
>> In the mean time I would like to know if anybody else experienced the same
>> problems?
>> 
>> kind regards,
>> 
>> Thomas
>> 
> 
> 
> --
> Vincenzo D'Amore




signature.asc
Description: Message signed with OpenPGP


Memory Leak in 7.3 to 7.4

2018-08-02 Thread Thomas Scheffler
Hi,

we noticed a memory leak in a rather small setup. 40.000 metadata documents 
with nearly as much files that have „literal.*“ fields with it. While 7.2.1 has 
brought some tika issues (due to a beta version) the real problems started to 
appear with version 7.3.0 which are currently unresolved in 7.4.0. Memory 
consumption is out-of-roof. Where previously 512MB heap was enough, now 6G 
aren’t enough to index all files.
I am now to a point where I can track this down to the libraries in 
solr-7.4.0/contrib/extraction/lib/. If I replace them all by the libraries 
shipped with 7.2.1 the problem disappears. As most files are PDF documents I 
tried updating pdfbox to 2.0.11 and tika to 1.18 with no solution to the 
problem. I will next try to downgrade these single libraries back to 2.0.6 and 
1.16 to see if these are the source of the memory leak.

In the mean time I would like to know if anybody else experienced the same 
problems?

kind regards,

Thomas


signature.asc
Description: Message signed with OpenPGP


Re: 7.3 appears to leak

2018-07-16 Thread Thomas Scheffler
Hi,

we noticed the same problems here in a rather small setup. 40.000 metadata 
documents with nearly as much files that have „literal.*“ fields with it. While 
7.2.1 has brought some tika issues the real problems started to appear with 
version 7.3.0 which are currently unresolved in 7.4.0. Memory consumption is 
out-of-roof. Where previously 512MB heap was enough, now 6G aren’t enough to 
index all files.

kind regards,

Thomas

> Am 04.07.2018 um 15:03 schrieb Markus Jelsma :
> 
> Hello Andrey,
> 
> I didn't think of that! I will try it when i have the courage again, probably 
> next week or so.
> 
> Many thanks,
> Markus
> 
> 
> -Original message-
>> From:Kydryavtsev Andrey 
>> Sent: Wednesday 4th July 2018 14:48
>> To: solr-user@lucene.apache.org
>> Subject: Re: 7.3 appears to leak
>> 
>> If it is not possible to find a resource leak by code analysis and there is 
>> no better ideas, I can suggest a brute force approach:
>> - Clone Solr's sources from appropriate branch 
>> https://github.com/apache/lucene-solr/tree/branch_7_3
>> - Log every searcher's holder increment/decrement operation in a way to 
>> catch every caller name (use Thread.currentThread().getStackTrace() or 
>> something) 
>> https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
>> - Build custom artefacts and upload them on prod
>> - After memory leak happened - analyse logs to see what part of 
>> functionality doesn't decrement searcher after counter was incremented. If 
>> searchers are leaked - there should be such code I guess.
>> 
>> This is not something someone would like to do, but it is what it is.
>> 
>> 
>> 
>> Thank you,
>> 
>> Andrey Kudryavtsev
>> 
>> 
>> 03.07.2018, 14:26, "Markus Jelsma" :
>>> Hello Erick,
>>> 
>>> Even the silliest ideas may help us, but unfortunately this is not the 
>>> case. All our Solr nodes run binaries from the same source from our central 
>>> build server, with the same libraries thanks to provisioning. Only schema 
>>> and config are different, but the  directive is the same all over.
>>> 
>>> Are there any other ideas, speculations, whatever, on why only our main 
>>> text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 
>>> and every version up?
>>> 
>>> Many thanks?
>>> Markus
>>> 
>>> -Original message-
  From:Erick Erickson 
  Sent: Friday 29th June 2018 19:34
  To: solr-user 
  Subject: Re: 7.3 appears to leak
 
  This is truly puzzling then, I'm clueless. It's hard to imagine this
  is lurking out there and nobody else notices, but you've eliminated
  the custom code. And this is also very peculiar:
 
  * it occurs only in our main text search collection, all other
  collections are unaffected;
  * despite what i said earlier, it is so far unreproducible outside
  production, even when mimicking production as good as we can;
 
  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
  shows you each and every jar file Solr loads. Is it "somehow" possible
  that your main collection is loading some jar from somewhere that's
  different than you expect? 'cause silly ideas like this are all I can
  come up with.
 
  Erick
 
  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
   wrote:
  > Hello Erick,
  >
  > The custom search handler doesn't interact with SolrIndexSearcher, this 
 is really all it does:
  >
  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse 
 rsp) throws Exception {
  > super.handleRequestBody(req, rsp);
  >
  > if (rsp.getToLog().get("hits") instanceof Integer) {
  >   rsp.addHttpHeader("X-Solr-Hits", 
 String.valueOf((Integer)rsp.getToLog().get("hits")));
  > }
  > if (rsp.getToLog().get("hits") instanceof Long) {
  >   rsp.addHttpHeader("X-Solr-Hits", 
 String.valueOf((Long)rsp.getToLog().get("hits")));
  > }
  >   }
  >
  > I am not sure this qualifies as one more to go.
  >
  > Re: compiler warnings on resources, yes! This and tests failing due to 
 resources leaks have always warned me when i forgot to release something 
 or decrement a reference. But except for the above method (and the token 
 filters which i really can't disable) are all that is left.
  >
  > I am quite desperate about this problem so although i am unwilling to 
 disable stuff, i can do it if i must. But i so reason, yet, to remove the 
 search handler or the token filter stuff, i mean, how could those leak a 
 SolrIndexSearcher?
  >
  > Let me know :)
  >
  > Many thanks!
  > Markus
  >
  > -Original message-
  >> From:Erick Erickson 
  >> Sent: Friday 29th June 2018 18:46
  >> To: solr-user 
  >> Subject: Re: 7.3 appears to leak
  >>
  >> bq. The only custom 

filter groups

2016-07-04 Thread Thomas Scheffler

Hi,

I have metadata and file indexed in solr. All have a different id of 
cause but share the same value for "returnId" if they belong to the same 
metadata that describes a bunch of files (1:n).


When I start a search. I usually use grouping instead of join queries to 
keep the information where the hit occurred.


Now there it's getting tricky. I want to filter out groups depending on 
a field that is only available on metadata documents: visibility.


I want to search in solr like: "Find all documents containing 'foo' 
grouped by returnId, where the metadata visibility is 'public'"


So it should find any 'foo' files but only display the result if the 
corresponding metadata documents field visibility='public'.


Faceting also uses just the information inside groups. Can I give SOLR 
some information for 'fq' and 'facet.*' to work with my setup?


I am still using SOLR 4.10.5

kind regards

Thomas


Integration Tests with SOLR 5

2015-02-24 Thread Thomas Scheffler

Hi,

I noticed that not only SOLR does not deliver a WAR file anymore but 
also advices not to try to provide a custom WAR file that can be 
deployed anymore as future version may depend on custom jetty features.


Until 4.10. we were able to provide a WAR file with all the plug-ins we 
need for easier installs. The same WAR file was used together with an 
web application WAR running integration tests and to check if all 
application details still work. We used the cargo-mave2-plugin and 
different servlet container for testing. I think this is quiet common 
thing to do with continuous integration.


Now I wonder if anyone has a similar setup and with integration tests 
running against SOLR 5.


- No artifacts can be used, so no local repository cache is present
- How to deploy your schema.xml, stopwords, solr plug-ins etc. for 
testing in an isolated environment

- What does a maven boilerplate code look like?

Any ideas would be appreciated.

Kind regards,

Thomas


grouping of multivalued fields

2014-05-21 Thread Thomas Scheffler

Hi,

I have a special case of grouping multivalued fields and I wonder if 
this is possible with SOLR.


I have a field foo that is generally multivalued. But for a restricted 
set of documents this field has one value or is not present. So normally 
grouping should work.


Sadly SOLR is failing fast and I wonder if there is some way to specify 
group by first|any|last|min|max (means all the same here) value of foo.


regards,

Thomas


Re: grouping of multivalued fields

2014-05-21 Thread Thomas Scheffler

Am 21.05.2014 15:07, schrieb Joel Bernstein:

You may want to investigate the group.func option. This would allow you to
plug in your own logic to return the group by key. I don't think there is
an existing function that does exactly what you need so you may have to
write a custom function.


I thought of max(foo) for it but sadly it does work on multivalued 
fields either. I wait for other suggestions and start looking at custom 
functions (did not known that option exists) in parallel.


Thanks,

Thomas


Re: trigger delete on nested documents

2014-05-20 Thread Thomas Scheffler

Am 19.05.2014 19:25, schrieb Mikhail Khludnev:

Thomas,

Vanilla way to override a blocks is to send it with the same unique-key (I
guess it's id for your case, btw don't you have unique-key defined in the
schema?), but it must have at least one child. It seems like analysis issue
to me https://issues.apache.org/jira/browse/SOLR-5211

While block is indexed the special field _root_ equal to the unique-key
is added across the whole block (caveat, it's not stored by default). At
least you can issue

deletequery_root_:PK_VAL/query/delete

to wipe the whole block.


Thank you for your insight information. It sure helps a lot in 
understanding. The '_root_' field was new to me on this rather poor 
documented feature of SOLR. It helps already if I perform single updates 
and deletes from the index. BUT:


If I delete by a query this results in a mess:

1.) request all IDs returned by that query
2.) fire a giant delete query with id:(id1 OR .. OR idn) _root_:(id1 OR 
.. OR idn)


Before every update of single documents I have to fire a delete request.

This turns into a mess, when updating in batch mode:
1.) remove chunk of 100 documents and nested documents (see above)
2.) index chunk of 100 documents

All information for that is available on SOLR side. Can I configure some 
hook that is executed on SOLR-Server so that I do not have to change all 
applications? This would at least save these extra network transfers.


After big work to migrate from plain Lucene to SOLR I really require 
proper nested document support. Elastic Search seems to support it 
(http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/mapping-nested-type.html) 
but I am afraid of another migration. Elastic Search even hides the 
nested documents at queries which seems nice, too.


Does anyone have information how nested document support evolve in 
future releases of SOLR?


kind regards,

Thomas




19.05.2014 10:37 пользователь Thomas Scheffler 
thomas.scheff...@uni-jena.de написал:


Hi,

I plan to use nested documents to group some of my fields

doc
field name=idart0001/field
field name=titleMy first article/field
   doc
 field name=idart0001-foo/field
 field name=nameSmith, John/field
 field name=roleauthor/field
   /doc
   doc
 field name=idart0001-bar/field
 field name=namePower, Max/field
 field name=rolereviewer/field
   /doc
/doc

This way can ask for any documents that are reviewed by Max Power. However
to simplify update and deletes I want to ensure that nested documents are
deleted automatically on update and delete of the parent document.
Does anyone had to deal with this problem and found a solution?


Re: trigger delete on nested documents

2014-05-20 Thread Thomas Scheffler

Am 20.05.2014 14:11, schrieb Jack Krupansky:

To be clear, you cannot update a single document of a nested document
in place - you must reindex the whole block, parent and all children.
This is because this feature relies on the underlying Lucene block
join feature that requires that the documents be contiguous, and
updating a single child document would make it discontiguous with the
rest of the block of documents.

Just update the block by resending the entire block of documents.

For e previous discussion of this limitation:
http://lucene.472066.n3.nabble.com/block-join-and-atomic-updates-td4117178.html


This is totally clear to me and I want nested document to not be 
accessible without it's root context.


There is no way it seems to delete the whole block by the id of the root 
document. There is no way to update the root document that removes the 
stale date from the index. Normal SOLR behavior is to automatically 
delete old documents with same ID. I expect this behavior for other 
documents in this block to.


Anyway to make things clear I issued a JIRA request and tried to explain 
it more carefully there:


https://issues.apache.org/jira/browse/SOLR-6096

regards

Thomas


trigger delete on nested documents

2014-05-19 Thread Thomas Scheffler

Hi,

I plan to use nested documents to group some of my fields

doc
field name=idart0001/field
field name=titleMy first article/field
  doc
field name=idart0001-foo/field
field name=nameSmith, John/field
field name=roleauthor/field
  /doc
  doc
field name=idart0001-bar/field
field name=namePower, Max/field
field name=rolereviewer/field
  /doc
/doc

This way can ask for any documents that are reviewed by Max Power. 
However to simplify update and deletes I want to ensure that nested 
documents are deleted automatically on update and delete of the parent 
document.

Does anyone had to deal with this problem and found a solution?

regards,

Thomas


Re: trigger delete on nested documents

2014-05-19 Thread Thomas Scheffler

Am 19.05.2014 08:38, schrieb Walter underwood:

Solr does not support nested documents.  -- wunder


It does since 4.5:

http://lucene.apache.org/solr/4_5_0/solr-solrj/org/apache/solr/common/SolrInputDocument.html#addChildDocuments(java.util.Collection)

But this feature is rather poor documented and has some caveats:

http://blog.griddynamics.com/2013/09/solr-block-join-support.html

regards,

Thomas


On May 18, 2014, at 11:36 PM, Thomas Scheffler
thomas.scheff...@uni-jena.de wrote:

Hi,

I plan to use nested documents to group some of my fields

doc field name=idart0001/field field name=titleMy first
article/field doc field name=idart0001-foo/field field
name=nameSmith, John/field field name=roleauthor/field
/doc doc field name=idart0001-bar/field field
name=namePower, Max/field field name=rolereviewer/field
/doc /doc

This way can ask for any documents that are reviewed by Max Power.
However to simplify update and deletes I want to ensure that nested
documents are deleted automatically on update and delete of the
parent document. Does anyone had to deal with this problem and
found a solution?


Re: range types in SOLR

2014-03-03 Thread Thomas Scheffler

Am 03.03.2014 19:12, schrieb Smiley, David W.:

The main reference for this approach is here:
http://wiki.apache.org/solr/SpatialForTimeDurations


Hoss’s illustrations he developed for the meetup presentation are great.
However, there are bugs in the instruction — specifically it’s important
to slightly buffer the query and choose an appropriate maxDistErr.  Also,
it’s more preferable to use the rectangle range query style of spatial
query (e.g. field:[“minX minY” TO “maxX maxY”] as opposed to using
“Intersects(minX minY maxX maxY)”.  There’s no technical difference but
the latter is deprecated and will eventually be removed from Solr 5 /
trunk.

All this said, recognize this is a bit of a hack (one that works well).
There is a good chance a more ideal implementation approach is going to be
developed this year.


Thank you,

having a working example is great but having a practically working 
example that hides this implementation detail would even better.


I would like to store:

2014-03-04T07:05:12,345Z, 2014-03-04, 2014-03 and 2014 into one field 
and make queries on that field.


Currently I have to normalize all to the first format (inventing 
information). That is only the worst approximation. Normalize them to a 
range would be the best in my opinion. So a query like date:2014 would 
hit all but also date:[2014-01 TO 2014-03].


kind regards,

Thomas


Re: SOLRJ and SOLR compatibility

2014-03-03 Thread Thomas Scheffler

Am 27.02.2014 09:15, schrieb Shawn Heisey:

On 2/27/2014 12:49 AM, Thomas Scheffler wrote:

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

Unknown type 19


Aha!  I found it!  It was caused by the change applied for SOLR-5658,
fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
a bug bad enough to contradict what I told you.

https://issues.apache.org/jira/browse/SOLR-5658
https://issues.apache.org/jira/browse/SOLR-5762

I've added a comment that will help users find SOLR-5762 with a search
for Unknown type 19.

If you use SolrJ 4.7.0, compatibility should be better.


Hi,

I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR 
4.5.1. I received a client stack trace this morning and still waiting 
for a Log-Output from the Server:


--
ERROR unable to submit tasks
org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException:
Unknown type 19
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:495)
at
org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.java:199)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:117)
--

There is not much information in that Stacktrace, I know.
I'll send further information, when I receive more. In the mean time I 
asked our customer not to upgrade the SOLR server to resolve the issue. 
So we could dig deeper.


kind regards,

Thomas


Re: SOLRJ and SOLR compatibility

2014-03-03 Thread Thomas Scheffler

Am 04.03.2014 07:21, schrieb Thomas Scheffler:

Am 27.02.2014 09:15, schrieb Shawn Heisey:

On 2/27/2014 12:49 AM, Thomas Scheffler wrote:

What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

Unknown type 19


Aha!  I found it!  It was caused by the change applied for SOLR-5658,
fixed in 4.7.0 (just released) by SOLR-5762.  Just my luck that there's
a bug bad enough to contradict what I told you.

https://issues.apache.org/jira/browse/SOLR-5658
https://issues.apache.org/jira/browse/SOLR-5762

I've added a comment that will help users find SOLR-5762 with a search
for Unknown type 19.

If you use SolrJ 4.7.0, compatibility should be better.


Hi,

I am sorry to inform you that SolrJ 4.7.0 face the same issue with SOLR
4.5.1. I received a client stack trace this morning and still waiting
for a Log-Output from the Server:


Here we go for the server side (4.5.1):

Mrz 03, 2014 2:39:26 PM org.apache.solr.core.SolrCore execute
Information: [clausthal_test] webapp=/solr path=/select
params={fl=*,scoresort=mods.dateIssued+descq=%2BobjectType:mods+%2Bcategory:clausthal_status\:publishedwt=javabinversion=2rows=3}
hits=186 status=0 QTime=2
Mrz 03, 2014 2:39:38 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
Information: [clausthal_test] webapp=/solr path=/update
params={wt=javabinversion=2} {} 0 0
Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
Schwerwiegend: java.lang.RuntimeException: Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:131)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:221)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:116)
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:186)
at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:112)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:158)
at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:99)
at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:58)
at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1859)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:703)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:406)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:195)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
at
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
at
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:744)

Mrz 03, 2014 2:39:38 PM org.apache.solr.common.SolrException log
Schwerwiegend: null:java.lang.RuntimeException: Unknown type 19
at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:228)
at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:139

range types in SOLR

2014-03-01 Thread Thomas Scheffler

Hi,

I am in the need of range types in SOLR - similar to PostgreSQL:
https://wiki.postgresql.org/images/7/73/Range-types-pgopen-2012.pdf

My schema should allow approximate dates and queries on that. When 
having a single such date per document one can split this information 
into two separate fields. But this is not an option if the date is 
multiple. One have to to split the document into two ore more documents.


I wonder if that has to be so complicated. Does somebody know if SOLR 
already supports range types? If not, how difficult would it be to 
implement? Is anybody in the need for range types, too?


kind regards,

Thomas


Re: range types in SOLR

2014-03-01 Thread Thomas Scheffler

Am 01.03.14 18:24, schrieb Erick Erickson:

I'm not clear what you're really after here.

Solr certainly supports ranges, things like time:[* TO date_spec] or
date_field:[date_spec TO date_spec] etc.


There's also a really creative use of spatial (of all things) to, say
answer questions involving multiple dates per record. Imagine, for
instance, employees with different hours on different days. You can
use spatial to answer questions like which employees are available
on Wednesday between 4PM and 8PM.

And if none of this is relevant, how about you give us some
use-cases? This could well be an XY problem.


Hi,

lets try this example to show the problem. You have some old text that 
was written in two periods of time:


1.) 2nd half of 13th century: - 1250-1299
2.) Beginning of 18th century: - 1700-1715

You are searching for text that were written between 1300-1699, than 
this document described above should not be hit.


If you make start date and end date multiple this results in:

start: [1250, 1700]
end: [1299, 1715]

A search for documents written between 1300-1699 would be:

(+start:[1300 TO 1699] +end:[1300-1699]) (+start:[* TO 1300] +end:[1300 
TO *]) (+start:[*-1699] +end:[1700 TO *])


You see that the document above would obviously hit by (+start:[* TO 
1300] +end:[1300 TO *])


Hope you see the problem. This problem is the same when ever you face 
multiple ranges in one field of a document. For every duplication you 
need to create a separate SOLR document. If you have two such fields in 
one document, field one with n values and field two with m values. 
You are forced to create m*n documents. This fact and the rather unhandy 
query (s.o.) are my motivation to ask for range types like in PostgreSQL 
where the problem is solved.


regards,

Thomas


SOLRJ and SOLR compatibility

2014-02-26 Thread Thomas Scheffler

Hi,

I am one developer of a repository framework. We rely on the fact, that 
SolrJ generally maintains backwards compatibility, so you can use a 
newer SolrJ with an older Solr, or an older SolrJ with a newer Solr. [1]


This statement is not even true for bugfix releases like 4.6.0 - 4.6.1. 
(SOLRJ 4.6.1, SOLR 4.6.0)


We use SolrInputDocument from SOLRJ to index our documents (javabin). 
But as framework developer we are not in a role to force our users to 
update their SOLR server such often. Instead with every new version we 
want to update just the SOLRJ library we ship with to enable latest 
features, if the user wishes.


When I send a query to a request handler I can attach a version 
parameter to tell SOLR which version of the response format I expect.


Is there such a configuration when indexing SolrInputDocuments? I did 
not find it so far.


Kind regards,

Thomas

[1] https://wiki.apache.org/solr/Solrj


Re: SOLRJ and SOLR compatibility

2014-02-26 Thread Thomas Scheffler

Am 27.02.2014 08:04, schrieb Shawn Heisey:

On 2/26/2014 11:22 PM, Thomas Scheffler wrote:

I am one developer of a repository framework. We rely on the fact, that
SolrJ generally maintains backwards compatibility, so you can use a
newer SolrJ with an older Solr, or an older SolrJ with a newer Solr. [1]

This statement is not even true for bugfix releases like 4.6.0 - 4.6.1.
(SOLRJ 4.6.1, SOLR 4.6.0)

We use SolrInputDocument from SOLRJ to index our documents (javabin).
But as framework developer we are not in a role to force our users to
update their SOLR server such often. Instead with every new version we
want to update just the SOLRJ library we ship with to enable latest
features, if the user wishes.

When I send a query to a request handler I can attach a version
parameter to tell SOLR which version of the response format I expect.

Is there such a configuration when indexing SolrInputDocuments? I did
not find it so far.


What problems have you seen with mixing 4.6.0 and 4.6.1?  It's possible
that I'm completely ignorant here, but I have not heard of any.


Actually bug reports arrive me that sound like

Unknown type 19

I am currently not able to reproduce it myself with server version 
4.5.0, 4.5.1 and 4.6.0 when using solrj 4.6.1


It sounds to be the same issue like described here:

http://lucene.472066.n3.nabble.com/After-upgrading-indexer-to-SolrJ-4-6-1-o-a-solr-servlet-SolrDispatchFilter-Unknown-type-19-td4116152.html

The solution there was to upgrade the Server to version 4.6.1.

This helped here, too. Out there it is a very unpopular decision. Some 
user have large SOLR installs and stick to a certain (4.x) version. They 
want upgrades from us but upgrading company-wide SOLR installations is 
out of their scope.


Is that a known SOLRJ issue that is fixed in version 4.7.0?

kind regards,

Thomas


weak documents

2013-11-27 Thread Thomas Scheffler

Hi,

I am relatively new to SOLR and I am looking for a neat way to implement 
weak documents with SOLR.


Whenever a document is updated or deleted all it's dependent documents 
should be removed from the index. In other words they exist as long as 
the document exist they refer to when they were indexed - in that 
specific version. On update they will be indexed after their master 
document.


I could like to have some kind of dependsOn field that carries the 
uniqueKey value of the master document.


Can this be done efficiently with SOLR?

I need this technique because on update and on delete I don't know how 
many dependent documents exists in the SOLR index. Especially for batch 
index processes, I need a more efficient way than query before every 
update or delete.


kind regards,

Thomas


Re: weak documents

2013-11-27 Thread Thomas Scheffler

Am 27.11.2013 09:58, schrieb Paul Libbrecht:

Thomas,

our experience with Curriki.org is that evaluating what I call the
related documents is a procedure that needs access to the complete
content and thus is run at the DB level and no thte sold-level.

For example, if a user changes a part of its name, we need to reindex
all of his resources. Sure we could try to run a solr query for this,
and maybe add index fields for it, but we felt it better to run this
on the index-trigger side, the thing in our (XWiki) wiki which
listens to changes and requests the reindexing of a few documents
(including deletions).

For the maintenance operation, the same issue has appeared. So, if
the indexer or listener or solr has been down for a few minutes or
hours, we'd need to reindex not only all changed documents but all
changed documents and their related documents.

If you are able to work through your solution that would be
solr-only,  to write down all depends-on at index time, it means you
would index-update all inverse related documents every time that
changes. For the relation above (documents of a user), it means the
user documents needs reindexing every time a new document is added. I
wonder if this makes a scale difference.


I think both use-cases differ a bit. On index-time of my master document 
I have all information of dependent documents ready. So instead of 
committing one document I commit - lets say - four.


In your case you have to query to get all documents of a user first.

Here is a more detailed use-case. I have metadata in 1 to n languages to 
describe a document (e.g. journal article).


I commit a master document in a specified default language to SOLR and 
one document for every language I have metadata for. If a user adds or 
removes metadata (e.g. abstract in French) there is one document more or 
one document less in SOLR. So their number changes and I want stalled 
data to be kept in the index.


A similar use case: I have article documents with authors. I create 
author documents for every article. If someone adds or removes an 
author I need to track that change. These dump author documents are 
used for an alphabetical person index and hold a unique field that is 
used to group them but these documents exists only as long as their 
master documents do.


My two use-cases are quite similar so I would like these weak 
documents functionality somehow.


SOLR knows if a document is added with id=foo it have to replace a 
document that matches id:foo. If I can change this behavior to 
dependsOn:foo I am done. :-D


regards

Thomas