Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
* 1st question (ls from index directory) solr 1.4 -rw-r--r-- 1 user user2180582 Nov 30 07:26 _3g1_cf.del -rw-r--r-- 1 user user 5190652802 Nov 28 17:57 _3g1.fdt -rw-r--r-- 1 user user 139556724 Nov 28 17:57 _3g1.fdx -rw-r--r-- 1 user user 4963 Nov 28 17:56 _3g1.fnm -rw-r--r-- 1 user

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu (left side of chart). at the begining of chart there was about 60rps and about 100rps (before turning off solr 3.5). Then there was 1.4 turned on with 100rps. -- Pawel On Wed, Nov 30, 2011 at 9:07 AM, Pawel Rog

Re: Splitting Words but retaining offsets

2011-11-30 Thread lboutros
I think this is what you are looking for : http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory Ludovic. - Jouve France. -- View this message in context: http://lucene.472066.n3.nabble.com/Splitting-Words-but-retaining-offsets-tp3546104p3547977.html Sent

Re: how to apply fuzzy search with slop

2011-11-30 Thread vrpar...@gmail.com
Thanks Erick, i have download ComplexPhraseQueryParser from your give link, apply maven package to create jar file and add it to WEB-INF/lib folder and generate war file and deploy to jboss server also i added QueryParser into solrconfig.xml file, now when i do normal search, it works fine but

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
I made thread dump. Most active threads have such trace: 471003383@qtp-536357250-245 - Thread t@270 java.lang.Thread.State: RUNNABLE at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:702) at

Re: Don't snowball depending on terms

2011-11-30 Thread Rob Brown
I guess I could do a bit of pre-processing, look for any words that are quoted, and search in a diff field for those How is a query like this formulated? q=unstemmed:perl or javaq=stemmed:manager -- IntelCompute Web Design and Online Marketing http://www.intelcompute.com -Original

Re: how to apply fuzzy search with slop

2011-11-30 Thread Erick Erickson
I have no idea whether it will work with 1.4, although I haven't looked at the underlying code. I actually doubt it. There's an entry in newer solrconfig.xml files luceneMatchVersion that is referenced by that code for that just doesn't exist in the 1.4 code frame. I strongly recommend you

Re: Don't snowball depending on terms

2011-11-30 Thread Erick Erickson
You can't have multiple q clauses (as opposed to fq clauses). You could form something like q=unstemmed:perl or javafq=stemmed:manager or q=+(unstemmed:perl or java) +stemmed:manager BTW, this fragment of the query probably doesn't do what you expect: unstemmed:perl or java would be parsed as

Re: make fuzzy search for phrase

2011-11-30 Thread meghana
I installed ComplexPhraseQueryParser as suggested by you from https://issues.apache.org/jira/browse/SOLR-1604 by adding latest version of it , i am getting error HTTP Status 500 - luceneMatchVersion java.lang.NoSuchFieldError:

Terms Component with documents marked for deletion

2011-11-30 Thread qwamci
I have been playing around with Terms Component in solr and hit a situation i do not understand. When indexing documents and then updating them the termscomponent does not always have the correct count. In specific when updating a document, the termscomponent keeps a track of the former version

Re: Don't snowball depending on terms

2011-11-30 Thread Robert Brown
Boosts can be included there too can't they? so this is valid? q=+(stemmed^2:perl or stemmed^3:java) +unstemmed^5:development manager is it possible to have different boosts on the same field btw? We currently search across 5 fields anyway, so my queries are gonna start getting messy. :-/

Re: Seek past EOF

2011-11-30 Thread Ruben Chadien
Happened again…. I got 3 directories in my index dir 4096 Nov 4 09:31 index.2004083156 4096 Nov 21 10:04 index.2021090440 4096 Nov 30 14:55 index.2029024919 as you can se the first two are old and also empty , the last one from today is and containing 9 files none of the are 0 size

Re: Don't snowball depending on terms

2011-11-30 Thread Erick Erickson
First, watch the syntax G q=+(stemmed:perl^2 or stemmed:java^3) +unstemmed:development manager^5 although it is a bit confusing to see the dismax stuff where the boost is put on the field name, but that's not how the queries are formed. BTW, have you looked at edismax queries? You can

Re: PatternTokenizer failure

2011-11-30 Thread Jay Luker
On Tue, Nov 29, 2011 at 9:37 AM, Michael Kuhlmann k...@solarier.de wrote: Jay, I think the problem is this: You're checking whether the character preceding the array of at least one whitespace is not a hyphen. However, when you've more than one whitespace, like this: foo- \n bar then

Re: Don't snowball depending on terms

2011-11-30 Thread Robert Brown
Thanks Erick, This is a required feature since we're swapping out an existing search engine for Solr - users have saved searches that need to behave the same. I'll look into the edismax stuff, that's the handler we're using anyway. --- IntelCompute Web Design Local Online Marketing

Leaving certain tokens intact during indexing

2011-11-30 Thread Marian Steinbach
I have documents containing tokens of a certain format in arbitrary positions, like this: ... blah blahblah AB/1234/5678 blah blah blahblah ... I would like to enable usual keyword searching within these documents. In addition, I'd also like to enable users to find AB/1234/5678, ideally

Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
I have documents containing tokens of a certain format in arbitrary positions, like this: ... blah blahblah AB/1234/5678 blah blah blahblah ... I would like to enable usual keyword searching within these documents. In addition, I'd also like to enable users to find AB/1234/5678, ideally

Re: Don't snowball depending on terms

2011-11-30 Thread Erick Erickson
Ahhh, I hate making a new implementation match all of the old behavior, but sometimes ya' just got no choice. I *swear* that there's a JIRA with an approach to creating a filter for this situation, but I can't find it Best Erick On Wed, Nov 30, 2011 at 9:19 AM, Robert Brown

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Erick Erickson
There's about a zillion tokenizers, for what you're describing WhitespaceTokenizerFactory is a good candidate. See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters for a partial list, and it has links to the authoritative docs. Best Erick On Wed, Nov 30, 2011 at 9:23 AM, Marian

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
Thanks for the quick response! Are you saying that I should extend WhitespaceTokenizerFactory to create my own? Or should I simply use it? Because, I guess tokenizing on spaces wouldn't be enough. I would need tokenizing on slashes in other positions, just not within strings matching

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Erick Erickson
Well, it depends (tm). No, in your case WhitespaceTokenizer wouldn't work, although it did satisfy your initial statement. You could consider PatternTokenizerFactory, but take a look at the link I provided, and follow it to the javadocs to see if there are better matches. Best Erick On Wed, Nov

Re: Terms Component with documents marked for deletion

2011-11-30 Thread lboutros
Hi, you have to use the 'expungeDeletes' additional parameter: http://wiki.apache.org/solr/UpdateXmlMessages and depending on the version of Solr you are using, you perhaps have to use a merge policy like the LogByteSizeMergePolicy. See : https://issues.apache.org/jira/browse/SOLR-2725

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
Hi Marian, Extending the StandardTokenizer(Factory) java class is not the way to go if you want to change its behavior. StandardTokenizer is generated from a JFlex http://jflex.de/ specification, so you would need to modify the specification to include your special slash-containing-word rule,

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
That's pretty helpful, thanks! Especially since I didn't understand so far that I could use a filter like PatternReplaceCharFilterFactory both as a charFilter and as a filter. In the meantime I had figured out another alternative, involving WordDelimiterFilterFactory. But I had to use

RE: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Steven A Rowe
Note that my example does not actually use PatternReplaceCharFilterFactory twice - the second one is actually a PatternReplaceFilterFactory - note that Char isn't present in the second one. CharFilters operate before tokenizers, and regular filters operate after tokenizers. Steve

Re: Leaving certain tokens intact during indexing and search

2011-11-30 Thread Marian Steinbach
Got me right when Solr reported the error on restart :) Thanks! 2011/11/30 Steven A Rowe sar...@syr.edu Note that my example does not actually use PatternReplaceCharFilterFactory twice - the second one is actually a PatternReplaceFilterFactory - note that Char isn't present in the second one.

mysolr python client

2011-11-30 Thread Marco Martinez
Hi all, For anyone interested, recently I've been using a new Solr client for Python. It's easy and pretty well documented. If you're interested its site is: *http://mysolr.redtuna.org/* * * bye! Marco Martínez Bautista http://www.paradigmatecnologico.com Avenida de Europa, 26. Ática 5. 3ª

Re: InvalidTokenOffsetsException when using MappingCharFilterFactory, DictionaryCompoundWordTokenFilterFactory and Highlighting

2011-11-30 Thread Jay Luker
I am having a similar issue with OffsetExceptions during highlighting. In all of the explanations and bug reports I'm reading there is a mention this is all the result of a problem with HTMLStripCharFilter. But my analysis chains don't (that I'm aware of) make use of HTMLStripCharFilter, so can

Solr indexing custom fields

2011-11-30 Thread VladislavLysov
Hello!!! I have a question. How do I make sure that when you add a file with a specific field, the index remained not the entire field, but only a part? For example - in the field contains the text value VALUE / value text TEXT / text format FORMAT / format, but the index I want to save only the

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Mikhail Khludnev
Hello, I spot the difference in the number of segments (4 vs 14). For me it explains the increased query time, and cpu load, especially because you don't use utilize filters via fq=, only q= in your queries. The first thing you need is make the length of segment chains the same. The first clue

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Simon Willnauer
I wonder if you have a explicitly configured merge policy? In Solr 1.4 ie. Lucene 2.9 LogMergePolicy was the default but in 3.5 TieredMergePolicy is used by default. This could explain the differences segment wise since from what I understand you are indexing the same data on 1.4 and 3.5? simon

Re: Seek past EOF

2011-11-30 Thread Simon Willnauer
can you give us some details about what filesystem you are using? simon On Wed, Nov 30, 2011 at 3:07 PM, Ruben Chadien ruben.chad...@aspiro.com wrote: Happened again…. I got 3 directories in my index dir 4096 Nov  4 09:31 index.2004083156 4096 Nov 21 10:04 index.2021090440 4096

Re: when using group=true facet numbers are incorrect

2011-11-30 Thread O. Klein
Yonik Seeley-2-2 wrote On Mon, Nov 7, 2011 at 8:55 PM, Chris Hostetter lt;hossman_lucene@gt; wrote: : I understand that's a valid thing for faceting to do, I was just wondering : if there's any way to get it to do the faceting on the groups returned. : Otherwise I guess I'll need to

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Chris Hostetter
: I tried to use index from 1.4 (load was the same as on index from 3.5) : but there was problem with synchronization with master (invalid : javabin format) : Then I built new index on 3.5 with luceneMatchVersion LUCENE_35 why would you need to re-replicate from the master? You already have a

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Chris Hostetter
: I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu : (left side of chart). FWIW: The mailing list software filters out most attachments (there are some exceptions for certain text mime types) -Hoss

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
http://imageshack.us/photo/my-images/838/cpuusage.png/ On Wed, Nov 30, 2011 at 9:18 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu : (left side of chart). FWIW: The mailing list software filters out most attachments

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
On Wed, Nov 30, 2011 at 9:05 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : I tried to use index from 1.4 (load was the same as on index from 3.5) : but there was problem with synchronization with master (invalid : javabin format) : Then I built new index on 3.5 with luceneMatchVersion

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Darren Govoni
Monitoring this thread make me ask the question of whether there are standardized performance benchmarks for Solr. Such that they are run and published with each new release. This would affirm its performance under known circumstances, with which people can try in their own environments and

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Yonik Seeley
On Wed, Nov 30, 2011 at 7:08 AM, Pawel Rog pawelro...@gmail.com wrote:        at org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:702)        at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1144)        at

Re: Solr 3.5 very slow (performance)

2011-11-30 Thread Pawel Rog
Yes it works. Thanks a lot. But I stil don't understand why in solr 1.4 that option was efficient but in solr 3.5 not On Wed, Nov 30, 2011 at 11:01 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Wed, Nov 30, 2011 at 7:08 AM, Pawel Rog pawelro...@gmail.com wrote:        at

Blog you might find interesting

2011-11-30 Thread Erick Erickson
At the risk of committing a gaffe, I recently did a blog post about queries and multi term aware capabilities newly added to Solr. The short form is that the recurring problem of wildcard queries (and some other types, e.g. range) not automatically lower-casing (or accent folding or a few others)

is there a way using 1.4 index at 4.0 trunk?

2011-11-30 Thread Jason, Kim
Hello, I'm using solr 1.4 version. I want to use some plugin in trunk version. But I got IndexFormatTooOldException when it run old version index at trunk. Is there a way using 1.4 index at 4.0 trunk? Thanks, Jason -- View this message in context:

Re: is there a way using 1.4 index at 4.0 trunk?

2011-11-30 Thread Lance Norskog
No, you will have to upgrade your index. See the wiki for more information. (To my knowledge, you should be able to drop in your 1.4 (.1?) schema.xml and re-index.) On Wed, Nov 30, 2011 at 6:44 PM, Jason, Kim hialo...@gmail.com wrote: Hello, I'm using solr 1.4 version. I want to use some