Re: SOLR 1.4 and Lucene 3.0.3 index problem

2011-02-01 Thread Peter Karich
solr 1.4.x uses 2.9.x of lucene you could try the trunk which uses lucene 3.0.3 and should be compatible if I'm correct Regards, Peter. I have the exact opposite problem where Luke won't even load the index but Solr starts fine. I believe there are major differences between the two indexes

Re: Search for social networking sites

2011-01-21 Thread Peter Karich
First, its more Solandra now (although the project is still named lucandra) ;) Second, it can help because data which is written to the index is immediately (configurable) available for search. solandra is distributed + real time solr, with no changes required on client side (be it SolrJ or

Re: [POLL] Where do you get Lucene/Solr from? Maven? ASF Mirrors?

2011-01-18 Thread Peter Karich
Am 18.01.2011 22:33, schrieb Steven A Rowe: [] ASF Mirrors (linked in our release announcements or via the Lucene website) [x] Maven repository (whether you use Maven, Ant+Ivy, Buildr, etc.) [x] I/we build them from source via an SVN/Git checkout.

Re: verifying that an index contains ONLY utf-8

2011-01-13 Thread Peter Karich
take a look also into icu4j which is one of the contrib projects ... converting on the fly is not supported by Solr but should be relative easy in Java. Also scanning is relative simple (accept only a range). Detection too: http://www.mozilla.org/projects/intl/chardet.html We've created an

Re: Input raw log file

2011-01-12 Thread Peter Karich
Dinesh, it will stay 'real time' even if you convert it. Converting should be done in the millisecond range if at all measureable (e.g. if you apply streaming). Beware: To use the real features you'll need the latest trunk of solr IMHO. I've done similar log-feeding stuff here (with code!):

Re: verifying that an index contains ONLY utf-8

2011-01-12 Thread Peter Karich
converting on the fly is not supported by Solr but should be relative easy in Java. Also scanning is relative simple (accept only a range). Detection too: http://www.mozilla.org/projects/intl/chardet.html We've created an index from a number of different documents that are supplied by third

Exciting Solr Use Cases

2011-01-12 Thread Peter Karich
Hi all! Would you mind to write about your Solr project if it has an uncommon approach or if it is somehow exciting? I would like to extend my list for a new blog post. Examples I have in mind at the moment are: loggly (real time + big index), solandra (nice solr + cassandra combination), haiti

Re: Luke for inspecting indexes on remote solr servers?

2011-01-04 Thread Peter Karich
Am 04.01.2011 21:43, schrieb Ahmet Arslan: Is that supported? Pointer(s) to how to do it? perhaps http://wiki.apache.org/solr/LukeRequestHandler ? or via ssh u...@host -X ;-)

Re: Removing deleted terms from spellchecker index

2010-12-29 Thread Peter Karich
how did you remove the term? In the spellcheck file? did you rebuild the spellcheck index? Regards, Peter. Hi, I have configured spellchecker in solrconfig.xml and it is working fine for existing terms. However, if i delete a term, it is still being returned as a suggestion from the

Re: White space in facet values

2010-12-22 Thread Peter Karich
you should try fq=Product:Electric Guitar How do I handle facet values that contain whitespace? Say I have a field Product that I want to facet on. A value for Product could be Electric Guitar. How should I handle the white space in Electric Guitar during indexing? What about when I

Re: solr equiv of : SELECT count(distinct(field)) FROM index WHERE length(field) 0 AND other_criteria

2010-12-22 Thread Peter Karich
facets=truefacet.field=field // SELECT count(distinct(field)) fq=field:[* TO *] // WHERE length(field) 0 q=other_criteriaAfq=other_criteriaB// AND other_criteria advantage: you can look into several fields at one time when adding another facet.field disadvantage: you get the counts splitted by

Re: Solr (and mabye Java?) version numbering systems

2010-12-17 Thread Peter Karich
the current stable release is 1.4.1 (before there was 1.4) it has nothing todo with java's version numbers! (own release cycle) the next release will be 3.x: https://svn.apache.org/repos/asf/lucene/dev/branches/branch_3x/ and then 4.x (current trunk):

Re: Rebuild Spellchecker based on cron expression

2010-12-13 Thread Peter Karich
Building on optimize is not possible as index optimization is done on the master and the slaves don't even run an optimize but only fetch the optimized index. isn't the spellcheck index replicated to the slaves too? -- http://jetwick.com open twitter search

Re: Solr JVM performance issue after 2 days

2010-12-07 Thread Peter Karich
Hi Hamid, try to avoid autowarming when indexing (see solrconfig.xml: caches-autowarm + newSearcher + maxSearcher). If you need to query and indexing at the same time, then probably you'll need one read-only core and one for writing with no autowarming configured. See:

Re: Solr JVM performance issue after 2 days

2010-12-07 Thread Peter Karich
Am 07.12.2010 13:01, schrieb Hamid Vahedi: Hi Peter Thanks a lot for reply. Actually I need real time indexing and query at the same time. Here told: You can run multiple Solr instances in separate JVMs, with both having their solr.xml configured to use the same index folder. Now Q1: I'm

Re: How to get all the search results?

2010-12-06 Thread Peter Karich
for dismax just pass an empty query all q= or none at all Hello, shouldn't that query syntax be *:* ? Regards, -- Savvas. On 6 December 2010 16:10, Solr Usersolr...@gmail.com wrote: Hi, First off thanks to the group for guiding me to move from default search handler to dismax. I have a

Re: Taxonomy and Faceting

2010-12-06 Thread Peter Karich
I'm unsure but maybe you mean something like clustering? Then carrot^2 can do this (at index time I think): http://search.carrot2.org/stable/search?query=jetwickview=visu (There is a plugin for solr) Or do you already know the categories of your docs. E.g. you already have a category tree and

Re: Solr Got Exceptions When schema.xml is Changed

2010-12-04 Thread Peter Karich
QueryElevationComponent requires the schema to have a uniqueKeyFie ld implemented using StrField you should use the type StrField ('string') for the field used in uniqueKeyField

Re: Restrict access to localhost

2010-12-02 Thread Peter Karich
for 1) use the tomcat configuration in conf/server.xml Connector address=127.0.0.1 port=8080 ... for 2) if they have direct access to solr either insert a middleware layer or create a write lock ;-) Hello all, 1) I want to restrict access to Solr only in localhost. How to acheive that? 2)

Re: distributed architecture

2010-12-01 Thread Peter Karich
Hi, also take a look at solandra: https://github.com/tjake/Lucandra/tree/solandra I don't have it in prod yet but regarding administration overhead it looks very promising. And you'll get some other neat features like (soft) real time, for free. So its same like A) + C) + X) - Y) ;-)

Re: entire farm fails at the same time with OOM issues

2010-12-01 Thread Peter Karich
also try to minimize maxWarming searchers to 1(?) or 2. And decrease cache usage (especially autowarming) if possible at all. But again: only if it doesn't affect performance ... Regards, Peter. On Tue, Nov 30, 2010 at 6:04 PM, Robert Petersenrober...@buy.com wrote: My question is this.

Re: SOLR for Log analysis feasibility

2010-11-30 Thread Peter Karich
take a look into this: http://vimeo.com/16102543 for that amount of data it isn't that easy :-) We are looking into building a reporting feature and investigating solutions which will allow us to search though our logs for downloads, searches and view history. Each log item is relatively

Re: How to generate tag cloud in SOLR?

2010-11-23 Thread Peter Karich
Hi, another way is to use facets for the tagcloud as we did it in jetwick. Every document then needs a tag field (multivalued). See: https://github.com/karussell/Jetwick/blob/master/src/main/java/de/jetwick/ui/TagCloudPanel.java for an example with wicket and SolrJ. With that you could also

Jetwick Twitter Search now Open Source

2010-11-22 Thread Peter Karich
Jetwick is now available under the Apache 2 license: http://www.pannous.info/2010/11/jetwick-is-now-open-source/ Regards, Peter. PS: features http://www.pannous.info/products/jetwick-twitter-search/ installation https://github.com/karussell/Jetwick/wiki for devs

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-19 Thread Peter Karich
Hi, the final solution is explained here in context: http://mail-archives.apache.org/mod_mbox/lucene-dev/201011.mbox/%3caanlktimatgvplph_mgfbsughdoedc8tc2brrwxhid...@mail.gmail.com%3e /If you are using Solr branch_3x or trunk, you can turn this off, by setting autoGeneratePhraseQueries to

Re: Spell-Check Component Functionality

2010-11-18 Thread Peter Karich
Hi Rajani, some notes: * try spellcheck.q=curst or completely without spellcheck.q but with q * compared to the normal q parameter spellcheck.q can have a different analyzer/tokenizer and is used if present * do not do spellcheck.build=true for every request (creating the spellcheck index

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Karich
Hi Peter! * I believe the NRT patches are included in the 4.x trunk. I don't think there's any support as yet in 3x (uses features in Lucene 3.0). I'll investage how much effort it is to update to solr4 * For merging, I'm talking about commits/writes. If you merge while commits are going

WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, I am going crazy but which config is necessary to include the missing doc 2? I have: doc1 tw:aBc doc2 tw:abc Now a query aBc returns only doc 1 although when I try doc2 from admin/analysis.jsp then the term text 'abc' of the index gets highlighted as intended. I even indexed a simple

Re: Possibilities of (near) real time search with solr

2010-11-18 Thread Peter Karich
Does yours need to be once a day? no, I only thought you use one day :-) so you don't or do you have 31 shards? having a look at Solr Cloud or Katta - could be useful here in dynamically allocating shards. ah, thx! I will take a look at it (after trying solr4)! Regards, Peter.

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Hi, Please add preserveOriginal=1 to your WDF [1] definition and reindex (or just try with the analysis page). but it is already there!? filter class=solr.WordDelimiterFilterFactory protected=protwords.txt generateWordParts=1 generateNumberParts=1 catenateAll=0

Re: WordDelimiterFilterFactory + CamelCase query

2010-11-18 Thread Peter Karich
Peter, I recently had this issue, and I had to set splitOnCaseChange=0 to keep the word delimiter filter from doing what you describe. Can you try that and see if it helps? - Ken Hi Ken, yes this would solve my problem, but then I would lost a match for 'SuperMario' if I query 'mario',

Re: sort desc and out of memory exception

2010-11-17 Thread Peter Karich
You are applying the sort against a (tokenized) text field? You should better sort against a number or a string. Probably using the copyField directive. Regards, Peter. hi all: I configure a solr application and there is a field of type text,and some kind like this 123456, that is a

Re: Solr context search

2010-11-17 Thread Peter Karich
take a look if the 'more like this' handler can solve your problem. Hi. I wonder is it possible in built-in way to make context search in Solr? I have about 50k documents (mainly 'name' of char(150)), so i receive a content of a page and should show found documents. Of course i

Re: Possibilities of (near) real time search with solr

2010-11-16 Thread Peter Karich
Hi Peter, thanks for your response. I will dig into the sharding stuff asap :-) This may have changed recently, but the NRT stuff - e.g. per-segment commits etc. is for the latest Solr 4 trunk only. Do I need to turn something 'on'? Or do you know wether the NRT patches are documented

Possibilities of (near) real time search with solr

2010-11-15 Thread Peter Karich
Hi, I wanted to provide my indexed docs (tweets) relative fast: so 1 to 10 sec or even 30 sec would be ok. At the moment I am using the read only core scenario described here (point 5)* with a commit frequency of 180 seconds which was fine until some days. (I am using solr1.4.1) Now the

Re: Tuning Solr caches with high commit rates (NRT)

2010-11-15 Thread Peter Karich
Just in case someone is interested: I put the emails of Peter Sturge with some minor edits in the wiki: http://wiki.apache.org/solr/NearRealtimeSearchTuning I found myself search the thread again and again ;-) Feel free to add and edit content! Regards, Peter. Hi Erik, I thought this

Re: Tuning Solr caches with high commit rates (NRT)

2010-11-15 Thread Peter Karich
in 1.4, and generally no longer takes a lot of memory -- for facets with many unique values, method fc in fact should take less than enum, I think? Peter Karich wrote: Just in case someone is interested: I put the emails of Peter Sturge with some minor edits in the wiki: http

Re: Tuning Solr caches with high commit rates (NRT)

2010-11-15 Thread Peter Karich
=enum is still valid in Solr 1.4+. The fc facet.method was changed significantly in 1.4, and generally no longer takes a lot of memory -- for facets with many unique values, method fc in fact should take less than enum, I think? Peter Karich wrote: Just in case someone is interested: I put

Re: How to Facet on a price range

2010-11-05 Thread Peter Karich
take a look here http://stackoverflow.com/questions/33956/how-to-get-facet-ranges-in-solr-results I am able to facet on a particular field because I have index on that field. But I am not sure how to facet on a price range when I have the exact price in the 'price' field. Can anyone help

Re: Using setStart in solrj

2010-11-04 Thread Peter Karich
Hi Ron, how do I know what the starting row Always 0. especially if the original SolrQuery object has them all thats the point. solr will normally cache it for you. This is your friend: queryResultWindowSize40/queryResultWindowSize !-- Maximum number of documents to cache for any entry

Re: Optimize Index

2010-11-04 Thread Peter Karich
what you can try maxSegments=2 or more as a 'partial' optimize: If the index is so large that optimizes are taking longer than desired or using more disk space during optimization than you can spare, consider adding the maxSegments parameter to the optimize command. In the XML message, this

Re: Testing/packaging question

2010-11-04 Thread Peter Karich
Hi, don't know if the python package provides one but solrj offers to start solr embedded (|EmbeddedSolrServer|) and setting up different schema + config is possible. for this see: https://karussell.wordpress.com/2010/06/10/how-to-test-apache-solrj/ if you need an 'external solr' (via jetty

Re: Which is faster -- delete or update?

2010-11-01 Thread Peter Karich
From the user perspective I wouldn't delete it, because it could be that down-voting by mistake or spam or something and up-voting can resurrect it. It could be also wise to keep the docs to see which content (from which users?) are down voted to get spam accounts? From the dev perspective

Re: problem of solr replcation's speed

2010-10-31 Thread Peter Karich
we have an identical-sized index and it takes ~5minutes It takes about one hour to replacate 6G index for solr in my env. But my network can transfer file about 10-20M/s using scp. So solr's http replcation is too slow, it's normal or I do something wrong?

Feeding Solr with its own Logs

2010-10-27 Thread Peter Karich
In case someone is interested: http://karussell.wordpress.com/2010/10/27/feeding-solr-with-its-own-logs/ a lot of TODOs but: it is working. I could also imagine that this kind of example would be suited for an intro-tutorial, because it covers dynamic fields, rapid solr prototyping, filter and

Re: command line to check if Solr is up running

2010-10-26 Thread Peter Karich
Hi Xin, from the wiki: http://wiki.apache.org/solr/SolrConfigXml The URL of the ping query is* /admin/ping * You can also check (via wget) the number of documents. it might look like a rusty hack but it works for me: wget -T 1 -q http://localhost:8080/solr/select?q=*:*; -O - | tr '/'

Re: Does Solr reload schema.xml dynamically?

2010-10-26 Thread Peter Karich
Hi, See this: http://wiki.apache.org/solr/CoreAdmin#RELOAD Solr will also load the new configuration (without restart the webapp) on the slaves when using replication: http://wiki.apache.org/solr/SolrReplication Regards, Peter. Hi Everybody, If I change my schema.xml to, do I have to

After java replication: field not found exception on slaves

2010-10-26 Thread Peter Karich
Hi, we had the following problem. We added a field to schema.xml and fed our master with the new data. After that querying on the master is fine. But when we replicated (solr1.4.0) to our slaves. All slaves said they cannot find the new field (standard exception for missing fields). And that

Re: how can i use solrj binary format for indexing?

2010-10-18 Thread Peter Karich
Hi, you can try to parse the xml via Java yourself and then push the SolrInputDocuments it via SolrJ to solr. setting format to binaray + using the streaming update processor should improve performance, but I am not sure... and performant (+less mem!) reading xml in Java is another topic ... ;-)

Re: API for using Multi cores with SolrJ

2010-10-18 Thread Peter Karich
I asked this myself ... here could be some pointers: http://lucene.472066.n3.nabble.com/SolrJ-and-Multi-Core-Set-up-td1411235.html http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-in-Single-Core-td475238.html Hi everyone, I'm trying to write some code for creating and using multi cores.

Re: weighted facets

2010-10-15 Thread Peter Karich
Hi, answering my own question(s). Result grouping could be the solution as I explained here: https://issues.apache.org/jira/browse/SOLR-385 http://www.cs.cmu.edu/~ddash/papers/facets-cikm.pdf (the file is dated to Aug 2008) yonik implemented this here:

Re: Upgrade to Solr 1.4, very slow at start up when loading all cores

2010-10-14 Thread Peter Karich
just a blind shot (didn't read the full thread): what is your maxWarmingSearchers settings? For large indices we set it to 2 (maximum) Regards, Peter. just update on this issue... we turned off the new/first searchers (upgrade to Solr 1.4.1), and ran benchmark tests, there is no noticeable

Re: NPE for a MLT query on a missing doc due to null facet_counts in solrj

2010-10-13 Thread Peter Karich
Should I create a JIRA ticket? already there: https://issues.apache.org/jira/browse/SOLR-2005 we should provide a patch though ... Regards, Peter. With solrj doing a more like this query for a missing document: /mlt?q=docId:SomeMissingId always throws a null pointer exception: Caused

Re: About setting solrconfig.xml

2010-10-13 Thread Peter Karich
Hi Jason, Hi, all. I got some question about solrconfig.xml. I have 10 fields in a document for index. (Suppose that field names are f1, f2, ... , f10.) Some user will want to search in field f1 and f5. Another user will want to search in field f2, f3 and f7. I am going to use dismax

Re: using score to find high confidence duplicates

2010-10-13 Thread Peter Karich
Hi, are you using moreLikeThis for that feature? I have no suggestion for a reliable threshold, I think this depends on the domain you are operating and is IMO only solvable with a heuristic. It also depends on fields, boosts, ... It could be that there is a 'score gap' between duplicates and

Re: StatsComponent and multi-valued fields

2010-10-12 Thread Peter Karich
I'm not sure ... just reading it yesterday night ... but isn't the unapplied patch from Harish https://issues.apache.org/jira/secure/attachment/12400054/SOLR-680.patch what you want? Regards, Peter. Running 1.4.1. I'm able to execute stats queries against multi-valued fields, but when given

Re: Replication and CPU

2010-10-12 Thread Peter Karich
Hi Olivier, maybe the slave replicates after startup? check replication status here: http://localhost/solr/admin/replication/index.jsp what is your poll frequency (could you paste the replication part)? Regards, Peter. Hello, I setup a server for the replication of Solr. I used 2 cores and

Re: Replication and CPU

2010-10-12 Thread Peter Karich
Hi Olivier, the index size is relative big and you enabled replication after startup: str name=replicateAfterstartup/str This could explain why the slave is replicating from the very beginning. Are the index versions/generations the same? (via command or admin/replication) If not, the slaves

weighted facets

2010-10-11 Thread Peter Karich
Hi, I need a feature which is well explained from Mr Goll at this site ** So, it then would be nice to do sth. like: facet.stats=sum(fieldX)facet.stats.sort=fieldX And the output (sorted against the sum-output) can look sth. like this: lst name=facet_counts lst name=facet_fields lst

Re: multi level faceting

2010-10-09 Thread Peter Karich
Hi, there are two relative similar solutions for this problem. I will describe one of them: * create a multivalued string field called 'category' * you have a category tree. so make sure a document gets not only the leaf category, but all categories (name or id) until the root * now facet

Re: multi level faceting

2010-10-06 Thread Peter Karich
Hi, there is a solution without the patch. Here it should be explained: http://www.lucidimagination.com/blog/2010/08/11/stumped-with-solr-chris-hostetter-of-lucene-pmc-at-lucene-revolution/ If not, I will do on 9.10.2010 ;-) Regards, Peter. I've a similar problem with a project I'm working on

Re: multi level faceting

2010-10-05 Thread Peter Karich
also take a look at: http://wiki.apache.org/solr/HierarchicalFaceting + SOLR-64, SOLR-792 + http://markmail.org/message/jxbw2m5a6zq5jhlp Regards, Peter. Take a look at Mastering the Power of Faceted Search with Chris Hostetter (http://www.lucidimagination.com/solutions/webcasts/faceting). I

Re: Best way to check Solr index for completeness

2010-09-29 Thread Peter Karich
How long does it take to get 1000 docs? Why not ensure this while indexing? I think besides your suggestion or the suggestion of Luke there is no other way... Regards, Peter. Hello, What would be the best way to check Solr index against original system (Database) to make sure index is up to

Re: Autocomplete: match words anywhere in the token

2010-09-24 Thread Peter Karich
Jonathan, this field described here from Chantal: 2.) create an additional field that stores uses the String type with the same content (use copy field to fill either) can be multivalued. Or what did you mean? BTW: The nice thing about facet.prefix is that you can add an arbitrary (filter)

Re: Help: java.lang.OutOfMemoryError: PermGen space

2010-09-20 Thread Peter Karich
see http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-permgen-space-error and the links there. There seems to be no good solution :-/ The only reliable solution is restart, before you haven't enough permgenspace (use jvisualvm to monitor) And try to increase

Re: Full text search in facet scope

2010-09-16 Thread Peter Karich
Hi, if you index your doc with text='operating system' with an additional keyword field='linux' (of type string, can be multivalued) then solr facetting should be what you want: solr/select?q=*:*facet=truefacet.field=keywordrows=10 or rows=0 depending on your needs Does this help? Regards,

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-14 Thread Peter Karich
, as the RO instance is simply another shard in the pack. On Sun, Sep 12, 2010 at 8:46 PM, Peter Karich peat...@yahoo.de wrote: Peter, thanks a lot for your in-depth explanations! Your findings will be definitely helpful for my next performance improvement tests :-) Two questions: 1. How

Re: Tuning Solr caches with high commit rates (NRT)

2010-09-12 Thread Peter Karich
Peter, thanks a lot for your in-depth explanations! Your findings will be definitely helpful for my next performance improvement tests :-) Two questions: 1. How would I do that: or a local read-only instance that reads the same core as the indexing instance (for the latter, you'll need

Re: Autocomplete with Filter Query

2010-09-10 Thread Peter Karich
Hi there, I don't know if my idea is perfect but it seems to work ok in my twitter-search prototype: http://www.jetwick.com (keep in mind it is a vhost and only one fat index, no sharding, etc... so performance isn't perfect ;-)) That said, type in 'so' and you will get 'soldier', 'solar', ...

Re: How to enable Unicode Support in Solr

2010-09-06 Thread Peter Karich
Hi, Solr is only able to handle unicode (UTF-8). Make really sure that you push it into the index in the correct encoding. See my (accepted ;-)) answer: http://stackoverflow.com/questions/3086367/how-to-view-the-xml-documents-sent-to-solr/3088515#3088515 Regards, Peter. I have an index that

Re: Purpose of SolrDocument.java

2010-09-03 Thread Peter Karich
aaah okay. so its SolrDocument in normal search never been used ? its only for other solr-plugins ? SolrDocument is under org.apache.solr.common which is for the solr-solj.jar and not available for the solr-core.jar see e.g.:

Re: java.lang.OutOfMemoryError: PermGen space when reopening solr server

2010-09-02 Thread Peter Karich
Hi, that issue is not really related to solr. See this: http://stackoverflow.com/questions/88235/how-to-deal-with-java-lang-outofmemoryerror-permgen-space-error Increasing maxpermsize -XX:MaxPermSize=128m does not really solve this issue but you will see less errros :-) I have written a mini

Re: solr working...

2010-08-26 Thread Peter Karich
Hi! What do you mean? You want a quickstart? Then see http://lucene.apache.org/solr/tutorial.html (But I thought you already got solr working (from previous threads)!?) Or do you want to know if solr is running? Then try the admin view: http://localhost:8080/solr/admin/ Regards, Peter. Hi

Re: solr

2010-08-21 Thread Peter Karich
Hi Ankita, first: thanks for trying apache solr. does all the data to be indexed has to be in exampledocs folder? No. And there are several ways to push data into solr: via indexing, dataimporthandler, solrj, ... I know that getting comfortable with a new project is a bit complicated at

Re: queryResultCache has no hits for date boost function

2010-08-18 Thread Peter Karich
Thanks a lot Yonik! Rounding makes sense. Is there a date math for the 'LAST_COMMIT'? Peter. On Tue, Aug 17, 2010 at 6:29 PM, Peter Karich peat...@yahoo.de wrote: my queryResultCache has no hits. But if I am removing one line from the bf section in my dismax handler all is fine. Here

Re: queryResultCache has no hits for date boost function

2010-08-18 Thread Peter Karich
Hi Yonik, would you point me to the Java classes where solr handles a commit or an optimize and then the date math definitions? Regards, Peter. On Wed, Aug 18, 2010 at 4:34 PM, Peter Karich peat...@yahoo.de wrote: Thanks a lot Yonik! Rounding makes sense. Is there a date math

Re: queryResultCache has no hits for date boost function

2010-08-18 Thread Peter Karich
forget to say: thanks again! Now the cache gets hits! Regards, Peter. On Wed, Aug 18, 2010 at 4:34 PM, Peter Karich peat...@yahoo.de wrote: Thanks a lot Yonik! Rounding makes sense. Is there a date math for the 'LAST_COMMIT'? No - but it's an interesting idea! -Yonik http

Re: OutOfMemoryErrors

2010-08-17 Thread Peter Karich
Is there a way to verify that I have added correctlly? on linux you can do ps -elf | grep Boot and see if the java command has the parameters added. @all: why and when do you get those OOMs? while querying? which queries in detail? Regards, Peter.

Re: Search document design problem

2010-08-17 Thread Peter Karich
Hi Wenca, I am not sure wether my information here is really helpful for you, sorry if not ;-) I want only hotels that have room with 2 beds and the room has a package with all inclusive boarding and price lower than 400. you should tell us what you want to search and filter? Do you want only

Re: OutOfMemoryErrors

2010-08-17 Thread Peter Karich
is just 5-6 GB yet that particular error is seldom observed... (SEVERE ERROR : JAVA HEAP SPACE , OUT OF MEMORY ERROR ) I could see one lock file generated in the data/index path just after this error. On Tue, Aug 17, 2010 at 4:49 PM, Peter Karich peat...@yahoo.de wrote

Re: Search document design problem

2010-08-17 Thread Peter Karich
. I am new to Solr so excuse me if I don't use the right terminology yet, but I hope that my description of the use case is quite clear now. ;-) Thanks Wenca Dne 17.8.2010 13:46, Peter Karich napsal(a): Hi Wenca, I am not sure wether my information here is really helpful for you, sorry

queryResultCache has no hits for date boost function

2010-08-17 Thread Peter Karich
Hi all, my queryResultCache has no hits. But if I am removing one line from the bf section in my dismax handler all is fine. Here is the line: recip(ms(NOW,date),3.16e-11,1,1) According to http://wiki.apache.org/solr/SolrRelevancyFAQ#How_can_I_boost_the_score_of_newer_documents this should be

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
Hi Robert! Since the example given was http being slow, its worth mentioning that if queries are one word urls [for example http://lucene.apache.org] these will actually form slow phrase queries by default. do you mean that http://lucene.apache.org will be split up into http lucene

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
filter class=solr.CommonGramsQueryFilterFactory words=new400common.txt/ /analyzer /fieldType Tom -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Tuesday, August 10, 2010 3:32 PM To: solr-user@lucene.apache.org Subject: Re: Improve Query Time For Large Index

Re: Improve Query Time For Large Index

2010-08-12 Thread Peter Karich
words list. (Details on CommonGrams here: http://www.hathitrust.org/blogs/large-scale-search/slow-queries-and-common-words-part-2) Tom Burton-West -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Tuesday, August 10, 2010 9:54 AM To: solr-user

Re: Analysing SOLR logfiles

2010-08-12 Thread Peter Karich
I wonder too, that there shouldn't be a special tool which analyzes solr logfiles (e.g. parses qtime, the parameters q, fq, ...) Because there are some other open source log analyzers out there: http://yaala.org/ http://www.mrunix.net/webalizer/ Another free tool is newrelic.com (you will

Improve Query Time For Large Index

2010-08-10 Thread Peter Karich
Hi, I have 5 Million small documents/tweets (= ~3GB) and the slave index replicates itself from master every 10-15 minutes, so the index is optimized before querying. We are using solr 1.4.1 (patched with SOLR-1624) via SolrJ. Now the search speed is slow 2s for common terms which hits more than

Re: Improve Query Time For Large Index

2010-08-10 Thread Peter Karich
) Tom Burton-West -Original Message- From: Peter Karich [mailto:peat...@yahoo.de] Sent: Tuesday, August 10, 2010 9:54 AM To: solr-user@lucene.apache.org Subject: Improve Query Time For Large Index Hi, I have 5 Million small documents/tweets (= ~3GB) and the slave index replicates

Re: Migrating from Lucene 2.9.1 to Solr 1.4.0 - Performance issues under heavy load

2010-08-04 Thread Peter Karich
Ophir, this sounds a bit strange: CommonsHttpSolrServer.java, line 416 takes about 95% of the application's total search time Is this only for heavy load? Some other things: * with lucene you accessed the indices with MultiSearcher in a LAN, right? * did you look into the logs of the

Re: Is there a better for solor server side loadbalance?

2010-08-04 Thread Peter Karich
The default solr solution is client side loadbalance. Is there a solution provide the server side loadbalance? No. Most of us stick a HTTP load balancer in front of multiple Solr servers. E.g. mod_jk is a very easy solution (maybe too simple/stupid?) for a load balancer, but it

Re: Solr Indexing slows down

2010-08-02 Thread Peter Karich
to be reopened, and this happens on commit. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Peter Karich peat...@yahoo.de To: solr-user@lucene.apache.org Sent: Fri, July 30, 2010 6:19

Re: Solr Indexing slows down

2010-07-30 Thread Peter Karich
before the warmup queries from the previous commit have done their magic, you might be getting into a death spiral. HTH Erick On Thu, Jul 29, 2010 at 7:02 AM, Peter Karich peat...@yahoo.de wrote: Hi, I am indexing a solr 1.4.0 core and commiting gets slower and slower. Starting from 3-5

Re: Programmatically retrieving numDocs (or any other statistic)

2010-07-30 Thread Peter Karich
Both approaches are ok, I think. (although I don't know the python API) BTW: If you query q=*:* then add rows=0 to avoid some traffic. Regards, Peter. I want to programmatically retrieve the number of indexed documents. I.e., get the value of numDocs. The only two ways I've come up with are

Re: Solr searching performance issues, using large documents

2010-07-30 Thread Peter Karich
Hi Peter :-), did you already try other values for hl.maxAnalyzedChars=2147483647 ? Also regular expression highlighting is more expensive, I think. What does the 'fuzzy' variable mean? If you use this to query via ~someTerm instead someTerm then you should try the trunk of solr which is a lot

Re: Solr Indexing slows down

2010-07-30 Thread Peter Karich
is pretty frequent for Solr. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ - Original Message From: Peter Karich peat...@yahoo.de To: solr-user@lucene.apache.org Sent: Fri, July 30, 2010 4:06:48 PM

Solr Indexing slows down

2010-07-29 Thread Peter Karich
Hi, I am indexing a solr 1.4.0 core and commiting gets slower and slower. Starting from 3-5 seconds for ~200 documents and ending with over 60 seconds after 800 commits. Then, if I reloaded the index, it is as fast as before! And today I have read a similar thread [1] and indeed: if I set

Re: slave index is bigger than master index

2010-07-29 Thread Peter Karich
Hi Muneeb, I fear you'll have no chance: replicating an index will use more disc space on the slave nodes. Of course, you could minimize disc usage AFTER the replication via the 'optimize-hack'. But are you sure the reason for the slave-node die, is due to disc limitations? Try to observe the

Re: slave index is bigger than master index

2010-07-27 Thread Peter Karich
We have three dedicated servers for solr, two for slaves and one for master, all with linux/debian packages installed. I understand that replication does always copies over the index in an exact form as in master index directory (or it is supposed to do that at least), and if the master

Re: how to Protect data

2010-07-26 Thread Peter Karich
Hi Girish, I am not aware of such a thing. But you could use a middleware to avoid certain fields from being retrieved via the 'fl' parameter: http://wiki.apache.org/solr/CommonQueryParameters#fl E.g. for your customers the query looks like q=hellofl=title and for your admin the query looks like

Re: slave index is bigger than master index

2010-07-26 Thread Peter Karich
did you try an optimize on the slave too? Yes I always run an optimize whenever I index on master. In fact I just ran an optimize command an hour ago, but it didn't make any difference.

  1   2   >