Re: docValue vs. analyzer

2018-04-26 Thread Uwe Reh
Hi Erick, Thank your for the hint about SortableTextField. This seems to be really the type I was looking for. UpdateProcesors could be a workaround, but I don't like them. For me they are neither fish nor fowl. (neither internal nor external) Uwe Am 19.04.2018 um 18:38 schrieb Erick

wired behavior of own tokenizer

2018-04-26 Thread Uwe Reh
Hi I'm trying to write a own tokenizer for Solr7. Doing this, everything seems to be fine: - the tokenizer compiles - the tokanizer is instanced fine by it's factory - the tokenizer seems to do his work, when tested with the gui. "../solr/#/collection/analysis" BUT - the expected result

docValue vs. analyzer

2018-04-19 Thread Uwe Reh
Hi, I'm stuck in a dead end. My task is to map individual ids, to group them. So far, so simple: * copyfield 'id' -> 'groupId' * use a SynonymFilter on 'groupId' Now, I had the idea to improve the performance of grouping with 'docValues'. Unfortunately, this leads to a contradiction: *

Re: CVE-2017-12629 which versions are vulnerable?

2017-10-16 Thread Uwe Reh
Sorry, I missed the post from Florian Gleixner: >Re: Several critical vulnerabilities discovered in Apache Solr (XXE & RCE) Am 16.10.2017 um 16:52 schrieb Uwe Reh: Hi, I'm still using V4.10. Is this version also vulnerable by http://openwall.com/lists/oss-security/2017/10/13/1 ? Uwe

CVE-2017-12629 which versions are vulnerable?

2017-10-16 Thread Uwe Reh
Hi, I'm still using V4.10. Is this version also vulnerable by http://openwall.com/lists/oss-security/2017/10/13/1 ? Uwe

Re: CDCR (Solr6.x) does not start

2016-07-08 Thread Uwe Reh
configuration seems correct, see my comments below. On 28/06/16 15:36, Uwe Reh wrote: 9. Start CDCR http://SOURCE:s_port/solr/scoll/cdcr?action=start=json {"responseHeader":{"status":0,"QTime":13},"status":["process","started","buf

Re: CDCR (Solr6.x) does not start (logfile)

2016-06-29 Thread Uwe Reh
Hi, trying to get more information, I restarted the SOURCE node and watched the log. For each shard i got following triple: WARN org.apache.solr.handler.CdcrRequestHandler - Action LASTPROCESSEDVERSION sent to non-leader replica @ scoll:shard1 ERROR

CDCR (Solr6.x) does not start

2016-06-28 Thread Uwe Reh
Hi, I'm trying to get CDCR to run, but I can't even trigger any communication between SOURCE and TARGET. It seems to be a small but grave misunderstanding. I've tested a lot of variants but now I'm blind on this point. If anyone could give me a hint, I would appreciate. Uwe Testsetting:

Re: Solr 6 CDCR does not work

2016-06-07 Thread Uwe Reh
Hi Adam, maybe it's my poor English, but I'm confused. I've taken Renault's quote as a hint to activate autocommit on the target cluster. Or at least doing manually frequent commits, to see the replicated documents. Now you wrote disabling autocommit helps. Could you please clarify this

relaxed vs. improved validation in solr.TrieDateField

2016-04-29 Thread Uwe Reh
Hi, doing some migration tests (4.10 to 6.0) I recognized a improved validation of TrieDateField. Syntactical correct but impossible days are rejected now. (stack trace at the end of the mail) Examples: - '1997-02-29T00:00:00Z' - '2006-06-31T00:00:00Z' - '2000-00-00T00:00:00Z' The first two

Re: faceting is unusable slow since upgrade to 5.3.0

2015-10-08 Thread Uwe Reh
Sorry for the delay. I had an ugly flu. SOLR-7730 seems to work fine. Using docValues with Solr 5.4.0-2015-09-29_08-29-55 1705813 makes my faceted queries fast again. (90ms vs. 2ms) :-) Thanks Uwe Am 27.09.2015 um 20:32 schrieb Mikhail Khludnev: On Sun, Sep 27, 2015 at 2:00 PM, Uwe

Re: Scramble data

2015-10-08 Thread Uwe Reh
Hi, my suggestions are probably to simple, because they are not a real protection of privacy. But maybe one fits to your needs. Most simple: Declare your 'hidden' fields just as "indexed=true stored=false", the data will be used for searching, but the fields are not listed in the query

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-27 Thread Uwe Reh
invert INFO: UnInverted multi-valued field {*field=nomejornal*,memSize=827108,tindexSize=40,time=16,phase1=4,*nTerms=15,bigTerms=0*,termInstances=750,uses=0} Those heavy requests, do they find more than half of docs, eg hits>maxdoc/2 ? Thanks for your input! On Thu, Sep 24, 2015 at 11:38 A

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-27 Thread Uwe Reh
Hi Mikhail, thanks for the hint, and "no" it wasn't obvious for me. :-) But I think, for us it's better to remain at 4.10.3 and observe the evolution of SOLR-8096. When 5.4 with SOLR-7730 will be released, I will start to use docValues. Going this way, seems more straight forward to me. Uwe

Re: Different ports for search and upload request

2015-09-25 Thread Uwe Reh
Am 25.09.2015 um 00:05 schrieb Siddhartha Singh Sandhu: *Never did this. *But how about this crazy idea: Take an Amazon EFS and share it between 2 EC2. I think, you are on the right way. Imho this requirement should be solved external. Option 1: Hide your Solr node behind a http-proxy

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-25 Thread Uwe Reh
Am 25.09.2015 um 05:16 schrieb Yonik Seeley: I did some performance benchmarks and opened an issue. It's bad. https://issues.apache.org/jira/browse/SOLR-8096 Hi Yonik, thanks a lot for your investigation. Using the JSON Facet API is fast and seems to be a usable workaround for new

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Uwe Reh
Am 23.09.2015 um 10:02 schrieb Mikhail Khludnev: ... Accelerating non-DV facets is not so clear so far. Please show profiler snapshot for non-DV facets if you wish to go this way. Hi, attached is a visualvm profile to several times a simplified query (just one facet):

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-24 Thread Uwe Reh
Am 22.09.2015 um 18:10 schrieb Walter Underwood: Faceting on an author field is almost always a bad idea. Or at least a slow, expensive idea. Hi Wunder, n a technical context, the 'author'-facet may be suboptimal. In our businesses (library services) it's a core feature. Yes the facet is

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-22 Thread Uwe Reh
Am 22.09.2015 um 02:12 schrieb Joel Bernstein: Have you looked at your Solr instance with a cpu profiler like YourKit? It would be useful to see the hotspots which should be really obvious with 20 second response times. No, until now I have done no profiling. I thought the unused

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-22 Thread Uwe Reh
The exact version as shown by the UI is: - solr-impl 5.3.0 1696229 - noble - 2015-08-17 17:10:43 - lucene-impl 5.3.0 1696229 - noble - 2015-08-17 16:59:03 Unfortunately my skills in debugging are limited. So I'm not sure about a 'deeper caller stack'. Did you mean the attached snapshot from

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-22 Thread Uwe Reh
here is my try to detect with VirtualVM some hot spots with VirtualVM. Enviroment: A newly started node with ~15 times the query: http://yxz/solr/hebis/select/?q=darwin=true=1=30=material_access=department_3=rvk_facet=author_facet=material_brief=language==count=all=true Ordered by self time

Re: faceting is unusable slow since upgrade to 5.3.0 (missing attachment)

2015-09-22 Thread Uwe Reh
virtualvm_snapshot_solr5.3_facetting.csv Description: MS-Excel spreadsheet

faceting is unusable slow since upgrade to 5.3.0

2015-09-21 Thread Uwe Reh
Hi, our bibliographic index (~20M entries) runs fine with Solr 4.10.3 With Solr 5.3 faceted searching is constantly incredibly slow (~ 20 seconds) Output of 'debugQuery': 17705.0 2.0 17590.0 !! 111.0 The 'fieldValueCache' seems to be unused (no inserts nor lookups) in Solr 5.3. In Solr

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-21 Thread Uwe Reh
Am 21.09.2015 um 15:16 schrieb Shalin Shekhar Mangar: Can you post your complete facet request as well as the schema definition of the field on which you are faceting? Query:

Re: Understanding the Debug explanations for Query Result Scoring/Ranking

2014-07-24 Thread Uwe Reh
Hi, to get an idea of the meaning of all this numbers, have a look on http://explain.solr.pl. I like this tool, it's great. Uwe Am 25.07.2014 00:45, schrieb O. Olson: Hi, If you add /*debug=true*/ to the Solr request /(and wt=xml if your current output is not XML)/, you would get a

Re: Solr Fields Multilingue

2014-06-30 Thread Uwe Reh
Am 30.06.2014 16:57, schrieb benjelloun: AllChamp that don't do analyzer and filter. any idea? Exemple: I search for : AllChamp:presenton -- num result=0 AllChamp:présenton -- num result=1 Hi Anass, any analyzer means any modification (no ICU-Normalisation). copyField

Re: Two solr instances access common index

2014-06-26 Thread Uwe Reh
Hi, with the lock type 'simple' I have tree instances (different JREs, GC-Problem) running on the same files. You should use this option only for a readonly system. Otherwise it's easy to corrupt the index. Maybe you should have a look on replication or SolrCloud. Uwe Am 26.06.2014 11:25,

Re: Distributed search with Terms Component and Solr Cloud.

2014-01-24 Thread Uwe Reh
Hi Ryan, just take a look on the thread TermsComponent/SolrCloud. Setting your parameters as default in solrconfig.xml should help. Uwe Am 13.01.2014 20:24, schrieb Ryan Fox: Hello, I am running Solr 4.6.0. I am experiencing some difficulties using the terms component across multiple

Re: Error when creating collection in Solr 4.6

2014-01-20 Thread Uwe Reh
Hi, I had the same problem. In my case the error was, I had a copy/paste typo in my solr.xml. str name=genericCoreNodeNames${genericCoreNodeNames:true}/str !^! Ouch! With the type 'bool' instead of 'str' it works definitely better. ;-) Uwe Am 28.11.2013 08:53, schrieb lansing:

Re: How to index X™ as #8482; (HTML decimal entity)

2013-11-20 Thread Uwe Reh
What's about having a simple charfilter in the analyzer queue for indexing *and* searching. e.g charFilter class=solr.PatternReplaceFilterFactory pattern=™ replacement=#8482; / or charFilter class=solr.MappingCharFilterFactory mapping=mapping-specials.txt / Uwe Am 19.11.2013 23:46, schrieb

Re: [ANNOUNCE] Apache Solr Reference Guide 4.5 Available

2013-11-19 Thread Uwe Reh
Am 18.11.2013 14:39, schrieb Furkan KAMACI: Atlassian Jira has two options at default: exporting to PDF and exporting to Word. I see, 'Word' isn't optimal for a reference guide. But OO can handle 'doc' and has epub plugins. Could it be possible, to offer the doku also as 'doc(x)' barefaced

Re: [ANNOUNCE] Apache Solr Reference Guide 4.5 Available

2013-11-19 Thread Uwe Reh
on it: https://issues.apache.org/jira/browse/SOLR-5467. On Tue, Nov 19, 2013 at 6:34 AM, Uwe Reh r...@hebis.uni-frankfurt.de wrote: Am 18.11.2013 14:39, schrieb Furkan KAMACI: Atlassian Jira has two options at default: exporting to PDF and exporting to Word. I see, 'Word' isn't optimal

Re: [ANNOUNCE] Apache Solr Reference Guide 4.5 Available

2013-11-18 Thread Uwe Reh
I'd like to read the guide as e-paper. Is there a way to obtain the document in the Format epub or odt. Trying to convert the PDF with Calibre, wasn't very satisfyingly. :-( Uwe Am 05.10.2013 14:19, schrieb Steve Rowe: The Lucene PMC is pleased to announce the release of the Apache Solr

SolrCloud: read only node

2013-11-04 Thread Uwe Reh
Hi, as service provider for libraries we run a small cloud (1 collection, 1 shard, 3 replicas). To improve the local reliability we want to offer the possibility to set up own local replicas. As fas as I know, this can be easily done just by adding a new node to the cloud. But the external

Re: SolrCloud: read only node

2013-11-04 Thread Uwe Reh
F***, this is the answer, I was afraid of. ;-) I hoped, there could be anything, similar to http://zookeeper.apache.org/doc/trunk/zookeeperObservers.html. Nevertheless, thank you. Uwe Am 04.11.2013 14:14, schrieb Erick Erickson: In this situation, I'd consider going with the older

SOLR-3076 for beginners?

2013-03-08 Thread Uwe Reh
Hi, blockjoin seems to be a real cool feature. Unfortunately I'm to dumb, to get the patch running. I even don't know what to do :-( Is there anywhere an example, a howto or a cookbook, other than using elasticsearch or bare lucene? Uwe

Re: Nested function query must use ....

2013-02-02 Thread Uwe Reh
Hi Jack thanks a lot for the hint. Am 02.02.2013 00:46, schrieb Jack Krupansky: I've updated the example on the Function Query wiki that you may have copied: http://wiki.apache.org/solr/FunctionQuery#exists Thanks again, because the wiki page was really my start point. Uwe

Nested function query must use ....

2013-02-01 Thread Uwe Reh
Hi, should be easy, but I'm to blind to find the correct syntax (Solr4.1) Problem: I' have some documents in the index, because of their structure they tend to get too high scores. This documents are easy to identify and I want to boost the others to get a fair ranking. Could anyone give my

Re: Tokenized keywords

2013-01-21 Thread Uwe Reh
Hi probably my note is nonsense. But sometimes one is blind and not able to see simple things anymore. Is this query, what you are looking for? q=modified:(search+for+Laptops)fl=original,modified Sorry, if my suggest is too trivial. Uwe Am 21.01.2013 09:17, schrieb Romita Saha: Hi, I

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Uwe Reh
Hi Mark, one entry in my long list of self made problems is: Done the commit before the ConcurrentUpdateSolrServer was finished. Since the ConcurrentUpdateSolrServer is asynchronous, it's very easy to create a race conditions. Make sure that your program is waiting () before it's doing the

Re: Missing documents with ConcurrentUpdateSolrServer (vs. HttpSolrServer) ?

2013-01-17 Thread Uwe Reh
Hi Shawn, don't panic Due 'historical' reasons, like comparing the different subclasses of SolrServer, I have an HttpSolrServer for querys and commits. I've never tried to to use the CUSS for anything else than adding documents. As I wrote, it was a home made problem and not a bug. Sometimes

Re: Results in same or different fields

2013-01-15 Thread Uwe Reh
Hi, maybe it helps to have a closer look on the other params of edismax. http://wiki.apache.org/solr/ExtendedDisMax#pf_.28Phrase_Fields.29 'mm=2' will be to strong, but th usage of pf, pf2, and pf is likely your solution. uwe Am 15.01.2013 10:15, schrieb Gastone Penzo: Hi, i'm using

Re: theory of sets

2013-01-14 Thread Uwe Reh
Am 08.01.2013 10:26, schrieb Uwe Reh: OK, OK, I will try it again with dynamic fields. NO! dynamic fields are nice, but not for my problem. :-( I got more than *52* new fields. I was wrong, the impact on searching is really reasonable. But have you ever used the Admin's Schema Browser

Re: POST query with non-ASCII to solr using httpclient wont work

2013-01-14 Thread Uwe Reh
Hi Jie, maybe there is a simple solution. When we used tomcat as servlet container for solr I notices similar problems. Even with the hints from the solr wiki about unicode and Tomcat, i wasn't able to fix this. So we switched back to Jetty, querys like q=allfields2%3A能力 are reliable now.

Re: retrieving latest document **only**

2013-01-11 Thread Uwe Reh
Am 10.01.2013 11:54, schrieb jmozah: I need a query that matches only the most recent ones... Because my stats depend on it.. But I have a requirement to show **only** the latest documents and the stats along with it.. What do you want? 'the most recent ones' or '**only** the latest' ?

Re: Hotel Searches

2013-01-09 Thread Uwe Reh
Hi, maybe I'm thinking too simple again. Nevertheless, here an idea to solve the question. The basic thought is to get rid of the range query. Have: - a textfield 'vacant_days'. Instead of ISO-Dates just simple dates in the form mmdd - a dynamic field 'price_*', You can add the tariff for

Re: theory of sets

2013-01-08 Thread Uwe Reh
OK, OK, I will try it again with dynamic fields. May be the Problem has been something else. All statements sound reasonable. Even Lisheng's thoughts about the impact of to many fields on memory consumption should not be the problem for a JVM with 32G Ram an almost no gc. Please give me

Re: fieldtype for name

2013-01-08 Thread Uwe Reh
Hi Michael, in our index ob bibliographic metadata, we see the need for at least tree fields: - name_facet: String as type, because the facet should should represent the original inverted format from our data. - name: TextField for searching. This field is heavily analyzed to match different

Re: theory of sets (first solution)

2013-01-07 Thread Uwe Reh
Hi, I found a own hack. It's based on free interpretation of the function strdist(). Have: - one multivalued field 'part_of' - one unique field 'groupsort' Index each item: For each group membership: add groupid to 'part_of' concat groupid and sortstring to new string

Re: custom solr sort

2013-01-07 Thread Uwe Reh
Am 06.01.2013 02:32, schrieb andy: I want to custom solr sort and pass solr param from client to solr server, Hi Andy, not a answer of your question, but maybe an other approach to solve your initial question. Instead of writing a new SearchComponent I decided to (miss)use the function

Re: Sorting on mutivalued fields still impossible?

2013-01-07 Thread Uwe Reh
Hi Jack, thank you for the hint. Since I have already a solrj client to do the preprocessing, mapping to sort fields isn't my problem. I will try to explain better in my reply to Erick. Uwe (Sorry late reaction) Am 30.08.2012 16:04, schrieb Jack Krupansky: You can also use a Field

Re: Sorting on mutivalued fields still impossible?

2013-01-07 Thread Uwe Reh
Am 31.08.2012 13:35, schrieb Erick Erickson: ... what would the correct behavior be for sorting on a multivalued field Hi Erick, in generally you are right, the question of multivalued fields is which value the reference is. But there are thousands of cases where this question is implicit

Re: Sorting on mutivalued fields still impossible?

2013-01-07 Thread Uwe Reh
Hi, like I just wrote in my reply to the similar suggestion form Jack. I'm not looking for a way to preprocess my data. My question is, why do i need two redundant fields to sort a multivalued field ('date_max' and 'date_min' for 'date') For me it's just a waste of space, poisoning the

Re: theory of sets

2013-01-07 Thread Uwe Reh
Hi Robi, thank you for the contribution. It's exiting to read, that your index isn't contaminated by the number of fields. I can't exclude other mistakes, but my first experience with extensive use of dynamic fields have been very poor response times. Even though I found an other solution,

Re: indexing cpu utilization

2013-01-04 Thread Uwe Reh
Hi Mark, SOLR-3929 rocks! A nigthly build of 4.1 with maxIndexingThreads configured to 24, takes 80% to 100% of the cpu resources :-) Thank you, Otis and Gora mpstat 10 CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 00 0 13 607 241 234 78 100

Re: indexing cpu utilization

2013-01-03 Thread Uwe Reh
Hi, thank you for the hints. On 3 January 2013 05:55, Mark Miller markrmil...@gmail.com wrote: 32 cores eh? You probably have to raise some limits to take advantage of that. 32 cores isn't that much anymore. You can buy amd servers from Supermicro with two sockets and 32G of ram for less than

theory of sets

2013-01-03 Thread Uwe Reh
Hi, I'm looking for a tricky solution of a common problem. I have to handle a lot of items and each could be member of several groups. - OK, just add a field called 'member_of' No that's not enough, because each group is sorted and each member has a sortstring for this group. - OK, still

Re: indexing cpu utilization

2013-01-02 Thread Uwe Reh
Hi, while trying to optimize our indexing workflow I reached the same endpoint like gabriel shen described in his mail. My Solr server won't utilize more than 40% of the computing power. I made some tests, but i'm not able to find the bottleneck. Could anybody help to solve this quest? At

Re: indexing cpu utilization (attachement)

2013-01-02 Thread Uwe Reh
Am 02.01.2013 22:39, schrieb Uwe Reh: To get an idea whats going on, I've done some statistics with visualvm. (see attachement) merde the listserver stripes attachments. You'll find the screen shot at http://fantasio.rz.uni-frankfurt.de/solrtest/HotSpot.gif uwe

Re: Where is ISOLatin1AccentFilterFactory (Solr4)?

2013-01-02 Thread Uwe Reh
Hi, I like the best of both worlds: charFilter class=solr.MappingCharFilterFactory mapping=mapping-specials.txt / Mask some specials like C++ to cplusplus or C# to csharp ... tokenizer class=solr.ICUTokenizerFactory / Tokenize an identify on unicode whitespaces and charsets filter

Sorting on mutivalued fields still impossible?

2012-08-29 Thread Uwe Reh
Hi, just to be sure. There is still no way to sort by multivalued fields? ...sort=max(datefield) desc There is no smarter option, than creating additional singelevalued fields just for sorting? eg. datafield_max and datefield_min Uwe

Re: Paoding analyzer with solr for chinese

2012-08-09 Thread Uwe Reh
Hi Rajani, I'm not really familiar with this paoding tokenizer, but it seems a bit old. We are using the CJKBigramFilter (like in the example of Solr 4.0 alpha), which should be equivalent or even better and it works. analyzer tokenizer class=solr.ICUTokenizerFactory / filter

Two questions on spellchecking

2012-08-06 Thread Uwe Reh
Hi, even though I read a lot, none of my spellchecker configurations works really well. I reached a dead end. Maybe someone could help, to solve my challenges. - How can I get case sensitive suggestions, independent of the given case in the query? - How to configure a 'did you mean'

Re: Can't find org.apache.solr.client.solrj.embedded

2010-07-30 Thread Uwe Reh
Sorry, I had inspected the ...core.jar three times, without recognizing the package. I was realy blind. =8-) thanks Uwe Am 26.07.2010 20:48, schrieb Chris Hostetter: : where is a Jar, containing org.apache.solr.client.solrj.embedded? Classes in the embedded package are useless w/o the rest

Can't find org.apache.solr.client.solrj.embedded

2010-07-26 Thread Uwe Reh
Hello experts, where is a Jar, containing org.apache.solr.client.solrj.embedded? I miss this package in 'apache-solr-solrj-1.4.[01].jar'. Also I can't find any other sources than http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/src/webapp/src/org/apache/solr/client/solrj/embedded/ , which