Analysis page broken on trunk?

2014-01-08 Thread Markus Jelsma
Hi - it seems the analysis page is broken on trunk and it looks like our 4.5 and 4.6 builds are unaffected. Can anyone on trunk confirm this? Markus

RE: Analysis page broken on trunk?

2014-01-08 Thread Markus Jelsma
@lucene.apache.org Subject: Re: Analysis page broken on trunk? Hey Markus i'm not up to date with the latest changes, but if you can describe how to reproduce it, i can try to verify that? -Stefan On Wednesday, January 8, 2014 at 12:44 PM, Markus Jelsma wrote: Hi - it seems

RE: Simple payloads example not working

2014-01-13 Thread Markus Jelsma
Check the bytes property: http://lucene.apache.org/core/4_0_0/core/org/apache/lucene/util/BytesRef.html#bytes @Override public float scorePayload(int doc, int start, int end, BytesRef payload) { if (payload != null) { return PayloadHelper.decodeFloat(payload.bytes); } return

RE: Analysis page broken on trunk?

2014-01-13 Thread Markus Jelsma
fields w/ values from the example docs and that looks pretty okay to me, no change noticed on that. Can you share a screenshot or something like that? And perhaps Input, Fields/Fieldtype which doesn't work for you? -Stefan On Wednesday, January 8, 2014 at 2:24 PM, Markus Jelsma

RE: Analysis page broken on trunk?

2014-01-13 Thread Markus Jelsma
the example docs and that looks pretty okay to me, no change noticed on that. Can you share a screenshot or something like that? And perhaps Input, Fields/Fieldtype which doesn't work for you? -Stefan On Wednesday, January 8, 2014 at 2:24 PM, Markus Jelsma wrote: Hi - You will see

RE: Simple payloads example not working

2014-01-14 Thread Markus Jelsma
Strange, is it really floats you are inserting as payload? We use payloads too but write them via PayloadAttribute in custom token filters as float. -Original message- From:michael.boom my_sky...@yahoo.com Sent: Tuesday 14th January 2014 11:59 To: solr-user@lucene.apache.org

RE: Indexing URLs from websites

2014-01-16 Thread Markus Jelsma
in the LinkDB together with the index-anchor plugin to write the anchor field in your Solrindex. Any help is appreciated! Thanks! Markus Jelsma Wrote: You need to use the invertlinks command to build a database with docs with inlinks and anchors. Then use the index-anchor plugin when

RE: Indexing URLs from websites

2014-01-16 Thread Markus Jelsma
to be created? I really appreciate the help! Thank you very much! -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Thursday, January 16, 2014 5:45 AM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites -Original message

RE: Indexing URLs from websites

2014-01-16 Thread Markus Jelsma
not exist: file:/.../crawl/linkdb/parse_text Along with a Java stacktrace Those linkdb folders are not being created. -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Thursday, January 16, 2014 10:44 AM To: solr-user@lucene.apache.org Subject: RE

RE: Indexing URLs from websites

2014-01-17 Thread Markus Jelsma
of the link database, keeping only the highest quality links. /description /property So change the property, rebuild the linkdb and try reindexing once again :) -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Thursday, January 16, 2014 11:08 AM

RE: Indexing URLs from websites

2014-01-20 Thread Markus Jelsma
direction on this? Thank you so much for sticking with me on this - I really appreciate your help! -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Friday, January 17, 2014 6:46 AM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites

RE: Solr middle-ware?

2014-01-21 Thread Markus Jelsma
Hi - We use Nginx to expose the index to the internet. It comes down to putting some limitations on input parameters and on-the-fly rewrite of queries using embedded Perl scripting. Limitations and rewrites are usually just a bunch of regular expressions, so it is not that hard. Cheers Markus

RE: Indexing URLs from websites

2014-01-21 Thread Markus Jelsma
, and  /documents/Article 1.pdf How can I get these URLs? -Original Message- From: Markus Jelsma [mailto:markus.jel...@openindex.io] Sent: Monday, January 20, 2014 9:08 AM To: solr-user@lucene.apache.org Subject: RE: Indexing URLs from websites Well it is hard to get a specific anchor because

AIOOBException on trunk since 21st or 22nd build

2014-01-22 Thread Markus Jelsma
Hi - this likely belongs to an existing open issue. We're seeing the stuff below on a build of the 22nd. Until just now we used builds of the 20th and didn't have the issue. This is either a bug or did some data format in Zookeeper change? Until now only two cores of the same shard through the

RE: AIOOBException on trunk since 21st or 22nd build

2014-01-23 Thread Markus Jelsma
22nd January 2014 18:56 To: solr-user solr-user@lucene.apache.org Subject: Re: AIOOBException on trunk since 21st or 22nd build Looking at the list of changes on the 21st and 22nd, I don’t see a smoking gun. - Mark On Jan 22, 2014, 11:13:26 AM, Markus Jelsma markus.jel

RE: AIOOBException on trunk since 21st or 22nd build

2014-01-23 Thread Markus Jelsma
On Jan 22, 2014, 11:13:26 AM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - this likely belongs to an existing open issue. We're seeing the stuff below on a build of the 22nd. Until just now we used builds of the 20th and didn't have the issue. This is either a bug or did some data

RE: Solr Related Search Suggestions

2014-01-28 Thread Markus Jelsma
Query Recommendations using Query Logs in Search Engines http://personales.dcc.uchile.cl/~churtado/clustwebLNCS.pdf Very interesting paper and section 2.1 covers related work plus references. In our first attempt we did it even simpler, by finding for each query other top queries by inspecting

Re: Solr Nutch

2014-01-28 Thread Markus Jelsma
Short answer, you can't.rashmi maheshwari maheshwari.ras...@gmail.com schreef:Thanks All for quick response. Today I crawled a webpage using nutch. This page have many links. But all anchor tags have href=# and javascript is written on onClick event of each anchor tag to open a new page. So

LUCENE-5388 AbstractMethodError

2014-01-29 Thread Markus Jelsma
Hi, We have a developement environment running trunk but have custom analyzers and token filters built on 4.6.1. Now the constructors have changes somewhat and stuff breaks. Here's a consumer trying to get a TokenStream from an Analyzer object doing TokenStream stream =

RE: Sentence Detection for Highlighting

2014-02-04 Thread Markus Jelsma
Boundary scanner using Java's break iterator: http://wiki.apache.org/solr/HighlightingParameters#hl.boundaryScanner -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Tuesday 4th February 2014 12:03 To: solr-user@lucene.apache.org Subject: Sentence Detection for

RE: Inconsistency between Leader and replica in solr cloud

2014-02-24 Thread Markus Jelsma
Yes, that issue is fixed. We are on trunk and seeing it happen again. Kill some nodes when indexing, trigger OOM or reload the collection and you are in trouble again. -Original message- From:Yago Riveiro yago.rive...@gmail.com Sent: Monday 24th February 2014 14:54 To:

RE: How To Test SolrCloud Indexing Limits

2014-02-27 Thread Markus Jelsma
Something must be eating your memory in your solrcloud indexer in Nutch. We have our own SolrCloud indexer in Nutch and it uses extremely little memory. You either have a leak or your batch size is too large. -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent:

RE: Id As URL for Solrj

2014-03-04 Thread Markus Jelsma
You are not escaping the Lucene query parser special characters: + - || ! ( ) { } [ ] ^ ~ * ? : \ / -Original message- From:Furkan KAMACI furkankam...@gmail.com Sent: Tuesday 4th March 2014 16:57 To: solr-user@lucene.apache.org Subject: Id As URL for Solrj Hi; This maybe

RE: IDF maxDocs / numDocs

2014-03-12 Thread Markus Jelsma
Hi Steve - it seems most similarities use CollectionStatistics.maxDoc() in idfExplain but there's also a docCount(). We use docCount in all our custom similarities, also because it allows you to have multiple languages in one index where one is much larger than the other. The small language

RE: IDF maxDocs / numDocs

2014-03-13 Thread Markus Jelsma
for the number of docs as this won't change dramatically often.. steve On Wed, Mar 12, 2014 at 11:18 AM, Markus Jelsma markus.jel...@openindex.iowrote: Hi Steve - it seems most similarities use CollectionStatistics.maxDoc() in idfExplain but there's also a docCount(). We use docCount

Re: Bug with OpenJDK on Ubuntu - affects Solr users

2014-03-26 Thread Markus Jelsma
Hi - as far as i know it has never been a good idea to run Lucene on OpenJDK 6 at all. Only either Oracle Java 6 or higher or OpenJDK 7. On Wednesday, March 26, 2014 06:54:41 PM Nigel Sheridan-Smith wrote: Hi all, This is a bit of a 'heads up'. We have recently come across this bug on

Re: tf and very short text fields

2014-04-01 Thread Markus Jelsma
Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero in your schema. Walter Underwood wun...@wunderwood.org schreef:And here is another peculiarity of short text fields. The movie New York, New York should not be twice as relevant for the query new

Re: omitNorms and very short text fields

2014-04-01 Thread Markus Jelsma
Yes, that will work. And combined with your other question scores will always be equal even if cinderella or chuck occur more than once in one document. Walter Underwood wun...@wunderwood.org schreef:Just double-checking my understanding of omitNorms. For very short text fields like personal

Re: Re: tf and very short text fields

2014-04-01 Thread Markus Jelsma
Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the calculation. So thats easy to play with without reindexing. Markus Jelsma markus.jel...@openindex.io schreef:Yes, override tfidfsimilarity and emit 1f in tf(). You can also use bm25 with k1 set to zero

Re: Re: solr 4.2.1 index gets slower over time

2014-04-01 Thread Markus Jelsma
You may want to increase reclaimdeletesweight for tieredmergepolicy from 2 to 3 or 4. By default it may keep too much deleted or updated docs in the index. This can increase index size by 50%!! Dmitry Kan solrexp...@gmail.com schreef:Elisabeth, Yes, I believe you are right in that the deletes

RE: tf and very short text fields

2014-04-04 Thread Markus Jelsma
On Apr 1, 2014, at 12:30 PM, Markus Jelsma markus.jel...@openindex.io wrote: Also, if i remember correctly, k1 set to zero for bm25 automatically omits norms in the calculation. So thats easy to play with without reindexing. Markus Jelsma markus.jel...@openindex.io schreef:Yes

RE: Strange relevance scoring

2014-04-08 Thread Markus Jelsma
Hi - the thing you describe is possible when your set up uses SpanFirstQuery. But to be sure what's going on you should post the debug output. -Original message- From:John Nielsen j...@mcb.dk Sent: Tuesday 8th April 2014 11:03 To: solr-user@lucene.apache.org Subject: Strange

Re: Fails to index if unique field has special characters

2014-04-11 Thread Markus Jelsma
Well, this is somewhat of a problem if you have have URL's as uniqueKey that contain exclamation marks. Isn't it an idea to allow those to be escaped and thus ignored by CompositeIdRouter? On Friday, April 11, 2014 11:43:31 AM Cool Techi wrote: Thanks, that was helpful. Regards,Rohit

RE: Topology of Solr use

2014-04-17 Thread Markus Jelsma
This may help a bit: https://wiki.apache.org/solr/PublicServers   -Original message- From:Olivier Austina olivier.aust...@gmail.com Sent:Thu 17-04-2014 18:16 Subject:Topology of Solr use To:solr-user@lucene.apache.org; Hi All, I would to have an idea about Solr usage: number of users,

Re: Boost Search results

2014-04-18 Thread Markus Jelsma
Hi, replicating full features search engine behaviour is not going to work with nutch and solr out of the box. You are missing a thousand features such as proper main content extraction, deduplication, classification of content and hub or link pages, and much more. These things are possible to

Re: Re: PostingHighlighter complains about no offsets

2014-05-03 Thread Markus Jelsma
Hello michael, you are not on lucene 4.8? https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-5111 Michael Sokolov msoko...@safaribooksonline.com schreef:For posterity, in case anybody follows this thread, I tracked the problem down to WordDelimiterFilter; apparently it creates

RE: permissive mm value and efficient spellchecking

2014-05-14 Thread Markus Jelsma
Elisabeth, i think you are looking for SOLR-3211 that introduced spellcheck.collateParam.* to override e.g. dismax settings. Markus   -Original message- From:elisabeth benoit elisaelisael...@gmail.com Sent:Wed 14-05-2014 14:01 Subject:permissive mm value and efficient spellchecking

RE: Solr + SPDY

2014-05-15 Thread Markus Jelsma
Hi Harsh,   Does SPDY provide lower latency than HTTP/1.1 with KeepAlive or is it encryption that you're after?   Markus   -Original message- From:harspras prasadta...@outlook.com Sent:Tue 13-05-2014 05:38 Subject:Re: Solr + SPDY To:solr-user@lucene.apache.org; Hi Vinay, I have been

RE: Edismax should, should not, exact match operators

2014-06-10 Thread Markus Jelsma
http://wiki.apache.org/solr/ExtendedDisMax#Query_Syntax   -Original message- From:michael.boom my_sky...@yahoo.com Sent:Tue 10-06-2014 13:15 Subject:Edismax should, should not, exact match operators To:solr-user@lucene.apache.org; On google a user can query using operators like + or - and

RE: Recommended ZooKeeper topology in Production

2014-06-10 Thread Markus Jelsma
Yes, always use three or a higher odd number of machines. It is best to have them on dedicated machines and unless the cluster is very large three small VPS machines with 512 MB RAM suffice.   -Original message- From:Gili Nachum gilinac...@gmail.com Sent:Tue 10-06-2014 08:58

RE: docFreq coming to be more than 1 for unique id field

2014-06-17 Thread Markus Jelsma
Hi - did you perhaps update on of those documents? -Original message- From:Apoorva Gaurav apoorva.gau...@myntra.com Sent: Tuesday 17th June 2014 16:58 To: solr-user@lucene.apache.org Subject: docFreq coming to be more than 1 for unique id field Hello All, We are using solr

RE: docFreq coming to be more than 1 for unique id field

2014-06-17 Thread Markus Jelsma
Yes, it is unique but they are not immediately purged, only when `optimized` or forceMerge or during regular segment merges. The problem is that they keep messing with the statistics. -Original message- From:Apoorva Gaurav apoorva.gau...@myntra.com Sent: Tuesday 17th June 2014 17:16

Re: Unable to start solr 4.8

2014-06-19 Thread Markus Jelsma
Hi - remove the lock file in your solr/collection_name/data/index.*/ directory. Markus On Thursday, June 19, 2014 04:10:51 AM atp wrote: Hi experts, i have cnfigured solrcloud, on three machines , zookeeper started with no errors, tomcat log also no errors , solr log alos no errors

RE: How much free disk space will I need to optimize my index

2014-06-25 Thread Markus Jelsma
-Original message- From:johnmu...@aol.com johnmu...@aol.com Sent: Wednesday 25th June 2014 20:13 To: solr-user@lucene.apache.org Subject: How much free disk space will I need to optimize my index Hi, I need to de-fragment my index. My question is, how much free disk

RE: unable to start solr instance

2014-06-30 Thread Markus Jelsma
(Too many open files) Try raising the limit from probably 1024 to 4k-16k orso. -Original message- From:Niklas Langvig niklas.lang...@globesoft.com Sent: Monday 30th June 2014 17:09 To: solr-user@lucene.apache.org Subject: unable to start solr instance Hello, We havet o solr

RE: NPE when using facets with the MLT handler.

2014-07-02 Thread Markus Jelsma
Hi, i don't think this is ever going to work with the MLT Handler, you should use the regular SearchHandler instead. -Original message- From:SafeJava T t...@safejava.com Sent: Monday 30th June 2014 17:52 To: solr-user@lucene.apache.org Subject: NPE when using facets with the MLT

RE: Memory Leaks in solr 4.8.1

2014-07-02 Thread Markus Jelsma
Hi, you can safely ignore this, it is shutting down anyway. Just don't reload the app a lot of times without actually restarting Tomcat. -Original message- From:Aman Tandon amantandon...@gmail.com Sent: Wednesday 2nd July 2014 7:22 To: solr-user@lucene.apache.org Subject: Memory

RE: Disable Regular Expression Support

2014-07-03 Thread Markus Jelsma
Hi, you can escape the surrounding slashes in your front-end. Markus -Original message- From:Markus Schuch markus_sch...@web.de Sent: Thursday 3rd July 2014 20:53 To: solr-user@lucene.apache.org Subject: Disable Regular Expression Support Hi Solr Community, we migrate from

RE: Any Solr consultants available??

2014-07-24 Thread Markus Jelsma
Hahaha thanks wunder, made me laugh! -Original message- From:Walter Underwood wun...@wunderwood.org Sent: Thursday 24th July 2014 2:07 To: solr-user@lucene.apache.org Subject: Re: Any Solr consultants available?? When I see job postings like this, I have to assume they were

RE: crawling all links of same domain in nutch in solr

2014-07-29 Thread Markus Jelsma
Hi - use the domain URL filter plugin and list the domains, hosts or TLD's you want to restrict the crawl to. -Original message- From:Vivekanand Ittigi vi...@biginfolabs.com Sent: Tuesday 29th July 2014 7:17 To: solr-user@lucene.apache.org Subject: crawling all links of same

RE: Solr substring search yields all indexed results

2014-08-04 Thread Markus Jelsma
Don't use N-grams at query time. -Original message- From:prem1980 prem1...@gmail.com Sent: Monday 4th August 2014 17:47 To: solr-user@lucene.apache.org Subject: Solr substring search yields all indexed results To do a substring search, I have added a new fieldType - Text with

RE: NGramTokenizer influence to length normalization?

2014-08-08 Thread Markus Jelsma
All tokens produced have still have the same position as their initial position, so no. -Original message- From:Johannes Siegert johannes.sieg...@marktjagd.de Sent: Friday 8th August 2014 11:11 To: solr-user@lucene.apache.org Subject: NGramTokenizer influence to length

RE: Solr cloud performance degradation with billions of documents

2014-08-13 Thread Markus Jelsma
Hi - You are running mapred jobs on the same nodes as Solr runs right? The first thing i would think of is that your OS file buffer cache is abused. The mappers read all data, presumably residing on the same node. The mapper output and shuffling part would take place on the same node, only the

RE: Announcing Splainer -- Open Source Solr Sandbox

2014-08-27 Thread Markus Jelsma
Yeah, very cool. Since this is all just client side, how about integrating it in Solr's UI? Also, it seems to assume `id` is the ID field, which is not always true. -Original message- From:david.w.smi...@gmail.com david.w.smi...@gmail.com Sent: Friday 22nd August 2014 19:42 To:

RE: Query ReRanking question

2014-09-05 Thread Markus Jelsma
Hi - You can already achieve this by boosting on the document's recency. The result set won't be exactly ordered by date but you will get the most relevant and recent documents on top. Markus -Original message- From:Ravi Solr ravis...@gmail.com mailto:ravis...@gmail.com Sent:

RE: Problem deploying solr-4.10.0.war in Tomcat

2014-09-17 Thread Markus Jelsma
Yes, this is a nasty error. You have not set up logging libraries properly: https://cwiki.apache.org/confluence/display/solr/Configuring+Logging -Original message- From:phi...@free.fr phi...@free.fr Sent: Wednesday 17th September 2014 11:51 To: solr-user@lucene.apache.org Subject:

RE: How does KeywordRepeatFilterFactory help giving a higher score to an original term vs a stemmed term

2014-09-24 Thread Markus Jelsma
Hi - but this makes no sense, they are scored as equals, except for tiny differences in TF and IDF. What you would need is something like a stemmer that preserves the original token and gives a 1 payload to the stemmed token. The same goes for filters like decompounders and accent folders that

RE: Best practice for KStemFilter query or index or both?

2014-09-25 Thread Markus Jelsma
Hi - most filters should be used both sides, especially stemmers, accent foldings and obviously lowercasing. Synonyms only on one side, depending on how you want to utilize them. Markus -Original message- From:eShard zim...@yahoo.com Sent: Thursday 25th September 2014 22:23 To:

RE: Flexible search field analyser/tokenizer configuration

2014-09-29 Thread Markus Jelsma
Yes, it appeared in 4.8 but you could use PatternReplaceFilterFactory to simulate the same behavior. Markus -Original message- From:PeterKerk petervdk...@hotmail.com Sent: Monday 29th September 2014 21:08 To: solr-user@lucene.apache.org Subject: Re: Flexible search field

RE: Solr query field (qf) conditional boost

2014-09-29 Thread Markus Jelsma
Hi - you need to use function queries via the bf parameter. The function exists() and in some cases query() will do the conditional work, depending on your use case. Markus -Original message- From:Shamik Bandopadhyay sham...@gmail.com Sent: Monday 29th September 2014 21:30 To:

RE: Solr query field (qf) conditional boost

2014-09-29 Thread Markus Jelsma
Hi - check the def() and if() functions, they can have embedded functions such as exists() and query(). You can use those to apply the main query the the productline field if author has some value. I cannot give a concrete example because i don't have an environment to fiddle around with. If

RE: If I can a field from text_ws to text do I need to drop and reindex or just reindex?

2014-10-03 Thread Markus Jelsma
Hi - you don't need to erase the data directory, you can just reindex, but make sure you overwrite all documents. -Original message- From:Wayne W waynemailingli...@gmail.com Sent: Friday 3rd October 2014 11:55 To: solr-user@lucene.apache.org Subject: If I can a field from text_ws

RE: search query text field with Comma

2014-10-06 Thread Markus Jelsma
Hi - you are probably using the WhitespaceTokenizer without a WordDelimiterFilter. Consider using the StandardTokenizer or add the WordDelimiterFilter. Markus -Original message- From:EXTERNAL Taminidi Ravi (ETI, Automotive-Service-Solutions) external.ravi.tamin...@us.bosch.com

Re: Weird Problem (possible bug?) with german stemming and wildcard search

2014-10-07 Thread Markus Jelsma
Hi - you should not use wild cards for autocompletion, Lucene has far better tools for making very good autocompletion, also, since a wild card is a multi term query, they are not passed through your configured query time analyzer. Some other comments: - you use a porter stemmer but you should

RE: NullPointerException for ExternalFileField when key field has no terms

2014-10-08 Thread Markus Jelsma
Hi - yes it is worth a ticket as the javadoc says it is ok: http://lucene.apache.org/solr/4_10_1/solr-core/org/apache/solr/schema/ExternalFileField.html -Original message- From:Matthew Nigl matthew.n...@gmail.com Sent: Wednesday 8th October 2014 14:48 To: solr-user@lucene.apache.org

WhitespaceTokenizer to consider incorrectly encoded c2a0?

2014-10-08 Thread Markus Jelsma
Hi, For some crazy reason, some users somehow manage to substitute a perfectly normal space with a badly encoded non-breaking space, properly URL encoded this then becomes %c2a0 and depending on the encoding you use to view you probably see  followed by a space. For example: Because c2a0 is

RE: WhitespaceTokenizer to consider incorrectly encoded c2a0?

2014-10-08 Thread Markus Jelsma
/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 8 October 2014 09:59, Markus Jelsma markus.jel...@openindex.io wrote: Hi, For some crazy reason, some users somehow manage

RE: does one need to reindex when changing similarity class

2014-10-09 Thread Markus Jelsma
Hi - no you don't have to, although maybe if you changed on how norms are encoded. Markus -Original message- From:elisabeth benoit elisaelisael...@gmail.com Sent: Thursday 9th October 2014 12:26 To: solr-user@lucene.apache.org Subject: does one need to reindex when changing

RE: per field similarity not working with solr 4.2.1

2014-10-09 Thread Markus Jelsma
Hi - it should work, not seeing your implemenation in the debug output is a known issue. -Original message- From:elisabeth benoit elisaelisael...@gmail.com Sent: Thursday 9th October 2014 12:22 To: solr-user@lucene.apache.org Subject: per field similarity not working with solr

RE: per field similarity not working with solr 4.2.1

2014-10-09 Thread Markus Jelsma
with solr 4.2.1 Thanks for the information! I've been struggling with that debug output. Any other way to know for sure my similarity class is being used? Thanks again, Elisabeth 2014-10-09 13:03 GMT+02:00 Markus Jelsma markus.jel...@openindex.io: Hi - it should work, not seeing your

RE: does one need to reindex when changing similarity class

2014-10-13 Thread Markus Jelsma
participate in indexing. -- Jack Krupansky -Original Message- From: Markus Jelsma Sent: Thursday, October 9, 2014 6:59 AM To: solr-user@lucene.apache.org Subject: RE: does one need to reindex when changing similarity class Hi - no you don't have to, although maybe if you

Re: Recovering from Out of Mem

2014-10-14 Thread Markus Jelsma
And don't forget to set the proper permissions on the script, the tomcat or jetty user. Markus On Tuesday 14 October 2014 13:47:47 Boogie Shafer wrote: a really simple approach is to have the OOM generate an email e.g. 1) create a simple script (call it java_oom.sh) and drop it in your

Re: Recovering from Out of Mem

2014-10-14 Thread Markus Jelsma
This will do: kill -9 `ps aux | grep -v grep | grep tomcat6 | awk '{print $2}'` pkill should also work On Tuesday 14 October 2014 07:02:03 Yago Riveiro wrote: Boogie, Any example for java_error.sh script? — /Yago Riveiro On Tue, Oct 14, 2014 at 2:48 PM, Boogie Shafer

RE: update external file

2014-10-23 Thread Markus Jelsma
You either need to upload them and issue the reload command, or download them from the machine, and then issue the reload command. There is no REST support for it (yet) like the synonym filter, or was it stop filter? MArkus -Original message- From:Michael Sokolov

RE: Stopwords in shingles suggester

2014-10-27 Thread Markus Jelsma
You do not want stopwords in your shingles? Then put the stopword filter on top of the shingle filter. Markus -Original message- From:O. Klein kl...@octoweb.nl Sent: Monday 27th October 2014 13:56 To: solr-user@lucene.apache.org Subject: Stopwords in shingles suggester Is there a

RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread Markus Jelsma
It is an ancient issue. One of the major contributors to the issue was resolved some versions ago but we are still seeing it sometimes too, there is nothing to see in the logs. We ignore it and just reindex. -Original message- From:S.L simpleliving...@gmail.com Sent: Monday 27th

RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread Markus Jelsma
-threaded indexing and SolrCloud 4.10.1 replicas out of synch. I'm curious, could you elaborate on the issue and the partial fix? Thanks! On 10/27/14 11:31, Markus Jelsma wrote: It is an ancient issue. One of the major contributors to the issue was resolved some versions ago but we

RE: Heavy Multi-threaded indexing and SolrCloud 4.10.1 replicas out of synch.

2014-10-27 Thread Markus Jelsma
goes to, because of huge amount of discrepancy between the replicas. Thank you for confirming that it is a know issue , I was thinking I was the only one facing this due to my set up. On Mon, Oct 27, 2014 at 11:31 AM, Markus Jelsma markus.jel...@openindex.io wrote: It is an ancient

Re: SolrCloud config question and zookeeper

2014-10-28 Thread Markus Jelsma
On Tuesday 28 October 2014 10:42:11 Bernd Fehling wrote: Thanks for the explanations. My idea about 4 zookeepers is a result of having the same software (java, zookeeper, solr, ...) installed on all 4 servers. But yes, I don't need to start a zookeeper on the 4th server. 3 other machines

RE: MoreLikeThis filter by score threshold

2015-02-03 Thread Markus Jelsma
Hi - sure you can, using the frange parser as a filter: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-FunctionRangeQueryParser http://lucene.apache.org/solr/4_10_3/solr-core/org/apache/solr/search/FunctionRangeQParserPlugin.html But this is very much not

RE: Score results by only the highest scoring term

2015-02-03 Thread Markus Jelsma
Either use the MaxScoreQueryParser [1] or set tie to zero when using a DisMax parser. [1]: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-MaxScoreQueryParser -Original message- From:Burgmans, Tom tom.burgm...@wolterskluwer.com Sent: Tuesday 3rd

RE: low qps with high load averages on solrcloud

2015-02-04 Thread Markus Jelsma
We recently upgraded our cloud from 4.8 to 4.10.3, the only config we updated was the luceneMatchVersion. Response times were very stable prior to the upgrade, but are quite erratic since the upgrade, and rising. I still have to check all the resolved issues but something went very wrong

RE: Lucene cosine similarity score for more like this query

2015-02-02 Thread Markus Jelsma
Hi - MoreLikeThis is not based on cosine similarity. The idea is that rare terms - high IDF - are extracted from the source document, and then used to build a regular Query(). That query follows the same rules as regular queries, the rules of your similarity implemenation, which is TFIDF by

RE: Hit Highlighting and More Like This

2015-02-02 Thread Markus Jelsma
Hi - you can use the MLT query parser in Solr 5.0 or patch 4.10.x https://issues.apache.org/jira/browse/SOLR-6248 -Original message- From:Tim Hearn timseman...@gmail.com Sent: Saturday 31st January 2015 0:31 To: solr-user@lucene.apache.org Subject: Hit Highlighting and More Like

RE: Question regarding SolrIndexSearcher implementation

2015-02-02 Thread Markus Jelsma
From memory: there are different methods in SolrIndexSearcher for reason. It has to do with paging and sorting. Whenever you sort on a simple field, you can easily start at a specific offset. The problem with sorting on score, is that score has to be calculated for all documents matching query.

RE: More Like This similarity tuning

2015-02-04 Thread Markus Jelsma
Well, maxqt is easy, it is just the number of terms that compose your query. MinTF is a strange parameter, rare terms have a low DF and most usually not a high TF, so i would keep it at 1. MinDF is more useful, it depends entirely on the size of your corpus. If you have a lot of user-generated

RE: MoreLikeThis filter by score threshold

2015-02-04 Thread Markus Jelsma
similar documents by score threshold. Please correct me if I am wrong. Thank you very much. Regards. On Feb 3, 2015 7:00 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - sure you can, using the frange parser as a filter: https://cwiki.apache.org/confluence/display/solr

RE: MoreLikeThis filter by score threshold

2015-02-04 Thread Markus Jelsma
. In this case I think it is reasonable to filter similar documents by score threshold. Please correct me if I am wrong. Thank you very much. Regards. On Feb 3, 2015 7:00 PM, Markus Jelsma markus.jel...@openindex.io wrote: Hi - sure you can, using the frange parser as a filter: https

Re: OutOfMemoryError for PDF document upload into Solr

2015-01-16 Thread Markus Jelsma
Tika 1.6 has PDFBox 1.8.4, which has memory issues, eating excessive RAM! Either upgrade to Tika 1.7 (out now) or manually use the PDFBox 1.8.8 dependency. M. On Friday 16 January 2015 15:21:55 Charlie Hull wrote: On 16/01/2015 04:02, Dan Davis wrote: Why re-write all the document

RE: American /British Dictionary for solr-4.10.2

2015-02-12 Thread Markus Jelsma
There are no dictionaries that sum up all possible conjugations, using a heuristics based normalizer would be more appropriate. There are nevertheless some good sources to start: Contains lots of useful spelling issues, incl. british/american/canadian/australian http://grammarist.com/spelling

RE: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Markus Jelsma
We have seen an increase between 4.8.1 and 4.10. -Original message- From:Dmitry Kan solrexp...@gmail.com Sent: Tuesday 17th February 2015 11:06 To: solr-user@lucene.apache.org Subject: unusually high 4.10.2 vs 4.3.1 RAM consumption Hi, We are currently comparing the RAM

RE: unusually high 4.10.2 vs 4.3.1 RAM consumption

2015-02-17 Thread Markus Jelsma
17, 2015 at 12:12 PM, Markus Jelsma markus.jel...@openindex.io wrote: We have seen an increase between 4.8.1 and 4.10. -Original message- From:Dmitry Kan solrexp...@gmail.com Sent: Tuesday 17th February 2015 11:06 To: solr-user@lucene.apache.org Subject: unusually high

Distributed unit tests and SSL doesn't have a valid keystore

2015-01-12 Thread Markus Jelsma
Hi - in a small Maven project depending on Solr 4.10.3, running unit tests that extend BaseDistributedSearchTestCase randomly fail with SSL doesn't have a valid keystore, and a lot of zombie threads. We have a solrtest.keystore file laying around, but where to put it? Thanks, Markus

RE: Extending solr analysis in index time

2015-01-12 Thread Markus Jelsma
Hi - You mention having a list with important terms, then using payloads would be the most straightforward i suppose. You still need a custom similarity and custom query parser. Payloads work for us very well. M -Original message- From:Ahmet Arslan iori...@yahoo.com.INVALID Sent:

RE: Distributed unit tests and SSL doesn't have a valid keystore

2015-01-13 Thread Markus Jelsma
might know offhand. You might just want to use @SupressSSL on the tests :) - Mark On Mon Jan 12 2015 at 8:45:11 AM Markus Jelsma markus.jel...@openindex.io wrote: Hi - in a small Maven project depending on Solr 4.10.3, running unit tests that extend BaseDistributedSearchTestCase randomly

RE: multiple patterns in solr.PatternTokenizerFactory

2015-02-09 Thread Markus Jelsma
You can split into all groups by specifying group=-1. -Original message- From:Nivedita nivedita.pa...@tcs.com Sent: Monday 9th February 2015 12:08 To: solr-user@lucene.apache.org Subject: multiple patterns in solr.PatternTokenizerFactory Can I give multiple patterns in

RE: Upgrading Solr 4.7.2 to 4.10.3

2015-02-10 Thread Markus Jelsma
Well, the CHANGES.txt is filled with just the right information you need :) -Original message- From:Elan Palani elan.pal...@kaybus.com Sent: Tuesday 10th February 2015 22:30 To: solr-user@lucene.apache.org Subject: Upgrading Solr 4.7.2 to 4.10.3 Team.. Planning to Upgrade

RE: Relevancy : Keyword stuffing

2015-03-16 Thread Markus Jelsma
Hello - setting (e)dismax' tie breaker to 0 or much low than default would `solve` this for now. Markus -Original message- From:Mihran Shahinian slowmih...@gmail.com Sent: Monday 16th March 2015 16:29 To: solr-user@lucene.apache.org Subject: Relevancy : Keyword stuffing Hi all,

RE: Relevancy : Keyword stuffing

2015-03-16 Thread Markus Jelsma
Hello - Chris' suggestion is indeed a good one but it can be tricky to properly configure the parameters. Regarding position information, you can override dismax to have it use SpanFirstQuery. It allows for setting strict boundaries from the front of the document to a given position. You can

RE: Distributed IDF performance

2015-03-18 Thread Markus Jelsma
Anshum, Jack - don't any of you have a cluster at hand to get some real results on this? After testing the actual functionality for a quite some time while the final patch was in development, we have not had the change to work on performance tests. We are still on Solr 4.10 and have to port

<    5   6   7   8   9   10   11   12   13   14   >