* 1st question (ls from index directory)
solr 1.4
-rw-r--r-- 1 user user2180582 Nov 30 07:26 _3g1_cf.del
-rw-r--r-- 1 user user 5190652802 Nov 28 17:57 _3g1.fdt
-rw-r--r-- 1 user user 139556724 Nov 28 17:57 _3g1.fdx
-rw-r--r-- 1 user user 4963 Nov 28 17:56 _3g1.fnm
-rw-r--r-- 1 user
I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu
(left side of chart).
at the begining of chart there was about 60rps and about 100rps
(before turning off solr 3.5). Then there was 1.4 turned on with
100rps.
--
Pawel
On Wed, Nov 30, 2011 at 9:07 AM, Pawel Rog
I think this is what you are looking for :
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PositionFilterFactory
Ludovic.
-
Jouve
France.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Splitting-Words-but-retaining-offsets-tp3546104p3547977.html
Sent
Thanks Erick,
i have download ComplexPhraseQueryParser from your give link, apply maven
package to create jar file
and add it to WEB-INF/lib folder and generate war file and deploy to jboss
server
also i added QueryParser into solrconfig.xml file,
now when i do normal search, it works fine but
I made thread dump. Most active threads have such trace:
471003383@qtp-536357250-245 - Thread t@270
java.lang.Thread.State: RUNNABLE
at
org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:702)
at
I guess I could do a bit of pre-processing, look for any words that are
quoted, and search in a diff field for those
How is a query like this formulated?
q=unstemmed:perl or javaq=stemmed:manager
--
IntelCompute
Web Design and Online Marketing
http://www.intelcompute.com
-Original
I have no idea whether it will work with 1.4, although I haven't looked at the
underlying code. I actually doubt it. There's an entry in newer solrconfig.xml
files luceneMatchVersion that is referenced by that code for that just
doesn't exist in the 1.4 code frame.
I strongly recommend you
You can't have multiple q clauses (as opposed to fq clauses).
You could form something like
q=unstemmed:perl or javafq=stemmed:manager
or
q=+(unstemmed:perl or java) +stemmed:manager
BTW, this fragment of the query probably doesn't do
what you expect:
unstemmed:perl or java
would be parsed as
I installed ComplexPhraseQueryParser as suggested by you from
https://issues.apache.org/jira/browse/SOLR-1604
by adding latest version of it , i am getting error
HTTP Status 500 - luceneMatchVersion java.lang.NoSuchFieldError:
I have been playing around with Terms Component in solr and hit a situation i
do not understand.
When indexing documents and then updating them the termscomponent does not
always have the correct count. In specific when updating a document, the
termscomponent keeps a track of the former version
Boosts can be included there too can't they?
so this is valid?
q=+(stemmed^2:perl or stemmed^3:java) +unstemmed^5:development
manager
is it possible to have different boosts on the same field btw?
We currently search across 5 fields anyway, so my queries are gonna
start getting messy. :-/
Happened again….
I got 3 directories in my index dir
4096 Nov 4 09:31 index.2004083156
4096 Nov 21 10:04 index.2021090440
4096 Nov 30 14:55 index.2029024919
as you can se the first two are old and also empty , the last one from
today is and containing 9 files none of the are 0 size
First, watch the syntax G
q=+(stemmed:perl^2 or stemmed:java^3) +unstemmed:development manager^5
although it is a bit confusing to see the dismax stuff where the boost
is put on the
field name, but that's not how the queries are formed.
BTW, have you looked at edismax queries? You can
On Tue, Nov 29, 2011 at 9:37 AM, Michael Kuhlmann k...@solarier.de wrote:
Jay,
I think the problem is this:
You're checking whether the character preceding the array of at least one
whitespace is not a hyphen.
However, when you've more than one whitespace, like this:
foo- \n bar
then
Thanks Erick,
This is a required feature since we're swapping out an existing search
engine for Solr - users have saved searches that need to behave the
same.
I'll look into the edismax stuff, that's the handler we're using
anyway.
---
IntelCompute
Web Design Local Online Marketing
I have documents containing tokens of a certain format in arbitrary
positions, like this:
... blah blahblah AB/1234/5678 blah blah blahblah ...
I would like to enable usual keyword searching within these documents. In
addition, I'd also like to enable users to find AB/1234/5678, ideally
I have documents containing tokens of a certain format in arbitrary
positions, like this:
... blah blahblah AB/1234/5678 blah blah blahblah ...
I would like to enable usual keyword searching within these documents. In
addition, I'd also like to enable users to find AB/1234/5678, ideally
Ahhh, I hate making a new implementation match all of the old behavior, but
sometimes ya' just got no choice.
I *swear* that there's a JIRA with an approach to creating a filter for
this situation, but I can't find it
Best
Erick
On Wed, Nov 30, 2011 at 9:19 AM, Robert Brown
There's about a zillion tokenizers, for what you're describing
WhitespaceTokenizerFactory is a good candidate.
See: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
for a partial list, and it has links to the authoritative docs.
Best
Erick
On Wed, Nov 30, 2011 at 9:23 AM, Marian
Thanks for the quick response!
Are you saying that I should extend WhitespaceTokenizerFactory to create my
own? Or should I simply use it?
Because, I guess tokenizing on spaces wouldn't be enough. I would need
tokenizing on slashes in other positions, just not within strings matching
Well, it depends (tm). No, in your case WhitespaceTokenizer wouldn't work,
although it did satisfy your initial statement.
You could consider PatternTokenizerFactory, but take a look at the
link I provided, and follow it to the javadocs to see if there are
better matches.
Best
Erick
On Wed, Nov
Hi,
you have to use the 'expungeDeletes' additional parameter:
http://wiki.apache.org/solr/UpdateXmlMessages
and depending on the version of Solr you are using, you perhaps have to use
a merge policy like the LogByteSizeMergePolicy.
See : https://issues.apache.org/jira/browse/SOLR-2725
Hi Marian,
Extending the StandardTokenizer(Factory) java class is not the way to go if you
want to change its behavior.
StandardTokenizer is generated from a JFlex http://jflex.de/ specification,
so you would need to modify the specification to include your special
slash-containing-word rule,
That's pretty helpful, thanks! Especially since I didn't understand so far
that I could use a filter like PatternReplaceCharFilterFactory both as a
charFilter and as a filter.
In the meantime I had figured out another alternative,
involving WordDelimiterFilterFactory. But I had to
use
Note that my example does not actually use PatternReplaceCharFilterFactory
twice - the second one is actually a PatternReplaceFilterFactory - note that
Char isn't present in the second one.
CharFilters operate before tokenizers, and regular filters operate after
tokenizers.
Steve
Got me right when Solr reported the error on restart :) Thanks!
2011/11/30 Steven A Rowe sar...@syr.edu
Note that my example does not actually use PatternReplaceCharFilterFactory
twice - the second one is actually a PatternReplaceFilterFactory - note
that Char isn't present in the second one.
Hi all,
For anyone interested, recently I've been using a new Solr client for
Python. It's easy and pretty well documented. If you're interested its site
is: *http://mysolr.redtuna.org/*
*
*
bye!
Marco Martínez Bautista
http://www.paradigmatecnologico.com
Avenida de Europa, 26. Ática 5. 3ª
I am having a similar issue with OffsetExceptions during highlighting.
In all of the explanations and bug reports I'm reading there is a
mention this is all the result of a problem with HTMLStripCharFilter.
But my analysis chains don't (that I'm aware of) make use of
HTMLStripCharFilter, so can
Hello!!!
I have a question. How do I make sure that when you add a file with a
specific field, the index remained not the entire field, but only a part?
For example - in the field contains the text value VALUE / value text
TEXT / text format FORMAT / format, but the index I want to save only
the
Hello,
I spot the difference in the number of segments (4 vs 14). For me it
explains the increased query time, and cpu load, especially because you
don't use utilize filters via fq=, only q= in your queries.
The first thing you need is make the length of segment chains the same. The
first clue
I wonder if you have a explicitly configured merge policy? In Solr 1.4
ie. Lucene 2.9 LogMergePolicy was the default but in 3.5
TieredMergePolicy is used by default. This could explain the
differences segment wise since from what I understand you are indexing
the same data on 1.4 and 3.5?
simon
can you give us some details about what filesystem you are using?
simon
On Wed, Nov 30, 2011 at 3:07 PM, Ruben Chadien ruben.chad...@aspiro.com wrote:
Happened again….
I got 3 directories in my index dir
4096 Nov 4 09:31 index.2004083156
4096 Nov 21 10:04 index.2021090440
4096
Yonik Seeley-2-2 wrote
On Mon, Nov 7, 2011 at 8:55 PM, Chris Hostetter
lt;hossman_lucene@gt; wrote:
: I understand that's a valid thing for faceting to do, I was just
wondering
: if there's any way to get it to do the faceting on the groups returned.
: Otherwise I guess I'll need to
: I tried to use index from 1.4 (load was the same as on index from 3.5)
: but there was problem with synchronization with master (invalid
: javabin format)
: Then I built new index on 3.5 with luceneMatchVersion LUCENE_35
why would you need to re-replicate from the master?
You already have a
: I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu
: (left side of chart).
FWIW: The mailing list software filters out most attachments (there are
some exceptions for certain text mime types)
-Hoss
http://imageshack.us/photo/my-images/838/cpuusage.png/
On Wed, Nov 30, 2011 at 9:18 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: I attach chart which presents cpu usage. Solr 3.5 uses almost all cpu
: (left side of chart).
FWIW: The mailing list software filters out most attachments
On Wed, Nov 30, 2011 at 9:05 PM, Chris Hostetter
hossman_luc...@fucit.org wrote:
: I tried to use index from 1.4 (load was the same as on index from 3.5)
: but there was problem with synchronization with master (invalid
: javabin format)
: Then I built new index on 3.5 with luceneMatchVersion
Monitoring this thread make me ask the question of whether there are
standardized performance benchmarks for Solr.
Such that they are run and published with each new release. This would
affirm its performance under known circumstances,
with which people can try in their own environments and
On Wed, Nov 30, 2011 at 7:08 AM, Pawel Rog pawelro...@gmail.com wrote:
at
org.apache.solr.search.SolrIndexSearcher.getDocSet(SolrIndexSearcher.java:702)
at
org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:1144)
at
Yes it works. Thanks a lot.
But I stil don't understand why in solr 1.4 that option was efficient
but in solr 3.5 not
On Wed, Nov 30, 2011 at 11:01 PM, Yonik Seeley
yo...@lucidimagination.com wrote:
On Wed, Nov 30, 2011 at 7:08 AM, Pawel Rog pawelro...@gmail.com wrote:
at
At the risk of committing a gaffe, I recently did a blog post about
queries and multi term aware capabilities newly added to Solr. The
short form is that the recurring problem of wildcard queries (and some
other types, e.g. range) not automatically lower-casing (or accent
folding or a few others)
Hello,
I'm using solr 1.4 version.
I want to use some plugin in trunk version.
But I got IndexFormatTooOldException when it run old version index at trunk.
Is there a way using 1.4 index at 4.0 trunk?
Thanks,
Jason
--
View this message in context:
No, you will have to upgrade your index. See the wiki for more information.
(To my knowledge, you should be able to drop in your 1.4 (.1?) schema.xml
and re-index.)
On Wed, Nov 30, 2011 at 6:44 PM, Jason, Kim hialo...@gmail.com wrote:
Hello,
I'm using solr 1.4 version.
I want to use some
43 matches
Mail list logo