--- On Sun, 4/18/10, Grant Ingersoll gsing...@apache.org wrote:
Sure, but I'm biased. ;-) Hopefully, you will find it
useful, but choose the one that best fits your needs (and
let me know if you need help assessing that.)
Thanks for the explanation Grant.
WHat is the advantage of
Lance,
I can submit and extract pdf contents using Solr and SolrJ, as i indicated
earlier.
I've made 'id' a mandatory field and i had to submit its value while
submitting (request.addParams(literal.id,url))..
If i put multiple files/streams in the request, then i can't put 'id' this
way as the
Hello,
Is it a problem if I use *copyField* for some fields and not for others. In my
query, I have both fields, the ones mentioned in copyField and ones that are
not copied to a common destination. Will this cause an anomaly in my search
results. I am seeing some weird behavior.
Thanks,
Hi Ranveer,
The error in the count of the facets its caused by the tokenized field that
you are using, if you want to do facets for the whole string, use a
fieldType that doesn't strip the the field in tokens like the string field.
Regards,
Marco Martínez Bautista
Hello,
I am confused about the proper usage of the Boolean operators, AND, OR and NOT.
Could somebody please provide me an easy to understand explanation.
Thanks,
Sandhya
Andy, I think it is important to know what a stemmer really is.
It reduces words to their infinitves. Those infinitives do not refer to the
real infinitive everytime, but however: for the system, it is an infinitive,
since all its derivates could be reduced to the same form.
Thats a stemmer.
Hello Sandhya,
title: star AND wars NOT sdi
This query will match every document where star *and* wars occur but
*not* the term sdi (SDI = Strategic Defense Initiative = in the media
there was often the term star wars used to describe the project).
title: star OR wars
This query will match
Hello Sandhya,
please, show us your schema.xml, so that we can have a look whether
something might be wrong there.
However, if the source of a copyField is description and the destination
is description_stemmed, you can query both: description and
description_stemmed. There will be no error.
-
Hi,
I have the following filter for a field named myText
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
This enables stemming, I guess.
My questions are:
1) Can I disable stemming for the same field at the query time?
2) Do I need to copyField the
Hello!
If you want to have both non-stemmed and stemmed field You should
use copyField.
Even if there would be a possibility to disable snowball filter at
query time, you would have stemmed tokens written in the index.
Hi,
I have the following filter for a field named myText
Naga,
1) Yes, it is possible.
fieldType name=myText class=solr.TextField positionIncrementGap=100
analyzer type=index
filter class=solr.SnowballPorterFilterFactory
language=English protected=protwords.txt/
/analyzer
analyzer type=query
Hello!
MitchK posted the right solution, my post can be confusing ;( Sorry,
for that.
Hello!
If you want to have both non-stemmed and stemmed field You should
use copyField.
Even if there would be a possibility to disable snowball filter at
query time, you would have stemmed
Thank you Mitch! I will try that.
regards,
Naga
-Original Message-
From: MitchK [mailto:mitc...@web.de]
Sent: Monday, April 19, 2010 2:35 PM
To: solr-user@lucene.apache.org
Subject: Re: Stemming - disable at query time - reg.
Naga,
1) Yes, it is possible.
fieldType name=myText
Thank You Mitch.
I have a query mentioned below : (my defaultOperator is set to AND)
(field1 : This is a good string AND field2 : This is a good string AND field3 :
This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR
field4 : HTMLDocument) AND field5 : doc)
This is
Hi Mitch,
I have defined my field like:
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
ignoreCase=true
Also, one of the fields here, *field3* is a dynamic field. All the other fields
except this field, are copied into text with copyField.
Thanks,
Sandhya
-Original Message-
From: Sandhya Agarwal [mailto:sagar...@opentext.com]
Sent: Monday, April 19, 2010 2:55 PM
To:
I need to perform wildcard search in phrase query. I have 2 documents
containing text how do impair and how to improve. I want to be able to
search both documents by searching (how to im*). There is a provision in
lucene which allows me to perform this operation using SpanWildcardQuery and
Hey All
I have 2 cores which have been used with tika to do index files.
I would like to do one query on both at once as I will be searching
attr_content field.
If I do a test on each core I get 1 17 results but trying with shards I just
get 17 results.
Here is my example query
Hi Naga,
I think you should add the same filter to the query configuration:
fieldType name=text class=solr.TextField positionIncrementGap=100
analyzer type=index
tokenizer class=solr.
WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory
Hi Grant,
I tried command line of Tika v-0.7(newest), and it parsed the file.. I
believe Solr1.4 contains 0.4 version of Tika.
Do you suggest to upgrade to new Tika? Can i upgrade only tika in Solr-1.4?
or i need to wait till Solr ships with new Tika?
Thanks.
On Sun, Apr 18, 2010 at 11:24 PM,
Praveen Agrawal wrote:
Hi Grant,
I tried command line of Tika v-0.7(newest), and it parsed the file.. I
believe Solr1.4 contains 0.4 version of Tika.
Do you suggest to upgrade to new Tika? Can i upgrade only tika in Solr-1.4?
or i need to wait till Solr ships with new Tika?
Thanks.
Solr
I want to build a function expression for a dismax request handler 'bf'
field, to boost the documents if it is referenced by other documents.
I.e. the more often a document is referenced, the higher the boost.
Something like
Hi, i using solr that running on windows server 2008 32-bit.
I add about 100 million article into solr without set store attribute. (only
store document id) (index file size about 164 GB)
when try to get query without sort , it's return doc ids in some ms, but when
add sort command, i get
Regarding stemmers, I ditched them altogether a long time ago in favor
of a dictionary of morphologies of all known words (for any given
language). A simple lookup of any word morphology thus produces the set,
including the correct stem.
Works great. 100% of the time.
Just a tip from me.
On
Hello..
I didnt find any about my problem...
how can i replace an ampersand in indextime ?
my autosuggest words are haveing ampersands. how can i replace this sign ()
???
PatternReplaceCharFilterFactory ??
how is to use this Factory ?
or RegexTransformer ???
thx for ya help ;)
--
View
Thanks for the explanation Mitch.
You're right. There can't be universal stemmers.
What about multi-language stemmers? I'm mostly interested in English, Spanish,
German, French, Italian. Are there any stemmers that would handle those
languages?
If not, what's the recommended way to deal with
Thanks for the tip.
Are there any publicly available dictionary of morphologies that I could use?
Or did you build your own one?
--- On Mon, 4/19/10, Darren Govoni dar...@ontrenet.com wrote:
From: Darren Govoni dar...@ontrenet.com
Subject: Re: LucidWorks Solr
To:
maybe of interest to those doing geo-search in solr?
paul
Début du message réexpédié :
De : Gavin McArdle gavin.mcar...@ucd.ie
Date : 19 avril 2010 14:46:05 GMT+02:00
À : dbwo...@cs.wisc.edu
Objet : [Dbworld] Survey on Web Geo-Spatial Open-Source Technologies
Répondre à :
Dear All,
We're pleased to announce the 3.3.0 release of Carrot2 which significantly
improves the scalability of the clustering algorithms (up to 7x times faster
clustering in case of the STC algorithm) and fixes a number of minor issues.
Release notes:
hey.
sry for this ... stupid question ;)
when i perform an import from my data is use some filters. how can i really
be sure that solr used my configured filters and analyzer ?
when i search in solr the result looks 100% like bevor an import.
th =)
--
View this message in context:
Analyzers/Tokenizers/TokenFilters operate on the text that gets
indexed. Stored text remains exactly as you sent it in.
Erik
On Apr 19, 2010, at 9:53 AM, stockii wrote:
hey.
sry for this ... stupid question ;)
when i perform an import from my data is use some filters. how can i
Hi,
could you provide at least some information? Usually you
can be 100% sure that Solr uses the configuration it is
provided with.
Cheers,
Sven
--On Montag, 19. April 2010 05:53 -0800 stockii st...@shopgate.com wrote:
hey.
sry for this ... stupid question ;)
when i perform an import
There have been some open source ones. I don't have the links handy at
this moment[1]. But I parsed through the electronic dictionary and
generated a database of each word and its morphologies. I got tired of
lame stemmers that were wrong half the time. Computers are fast enough to
do lookups on
okay.
as example. i want to check if WordDelimiterFactory works correct. And i
want to experimant with search in substrings with edgengram...
i have the problem with that string: Kamera-Wasserwaage ...
so i think solr should filter this like this.
Kamera-Wasserwaage
- Kamera
- Wasserwaage
Am 19.04.2010 16:09, schrieb stockii:
so i want to see how it is indexed.
Go to the admin panel, open the schema browser, and set the number of
shown tokens to 1 or something.
-Michael
If you're submitting this:
field1 : This is a good string
then you're searching in field1 ONLY for This. the tokens is,
a good and string are being searched against your default
search field as defined in your schema.
Have you tried parenthesizing?
Try the SOLR admin page for looking at
oha, yes thx but
we have 800 000 items ... to find the right in this way ? XD
--
View this message in context:
http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729749.html
Sent from the Solr - User mailing list archive at Nabble.com.
Am 19.04.2010 16:29, schrieb stockii:
oha, yes thx but
we have 800 000 items ... to find the right in this way ? XD
Then use the TermsComponent: http://wiki.apache.org/solr/TermsComponent
-Michael
I didnt find any about my problem...
how can i replace an ampersand in indextime ?
my autosuggest words are haveing ampersands. how can i
replace this sign ()
???
Easiest way is to use MappingCharFilterFactory before your tokenizer.
charFilter class=solr.MappingCharFilterFactory
I need to perform wildcard search in phrase query. I have 2
documents
containing text how do impair and how to improve. I
want to be able to
search both documents by searching (how to im*). There is a
provision in
lucene which allows me to perform this operation using
SpanWildcardQuery and
hello,
we want to index and search in our intranet documents.
the field body contains html-tags.
in our schema.xml we have a fieldType text_de (see at the end of this mail)
which uses charFilter solr.HTMLStripCharFilterFactory with index.
so this is no problem. the text is put into the index
I'm setting up my Solr index to be updated every x minutes.
Does Solr cache the result of a search, and then when next time the same search
is requested, it'd recognize that the Index has not changed and therefore just
return the previous result from cache without processing the search again?
I'm setting up my Solr index to be
updated every x minutes.
Does Solr cache the result of a search, and then when next
time the same search is requested, it'd recognize that the
Index has not changed and therefore just return the previous
result from cache without processing the search
Additionally to Alejandro's posting, I would say that you don't need to
specify an analyzer for index-time and query-time, since it *seems* (maybe I
am wrong) like you want to use the same functionality on index- and
query-time.
Hope this helps
- Mitch
--
View this message in context:
we want to index and search in our intranet documents.
the field body contains html-tags.
in our schema.xml we have a fieldType text_de (see at the
end of this mail) which uses charFilter
solr.HTMLStripCharFilterFactory with index.
so this is no problem. the text is put into the index
Hi everybody:
I have a big problem with solr in a server with the memory size it is using,
I am setting up Solr with java -jar start.jar command in an ubuntu server,
the process start.jar is using 7Gb of memory in the server and it is
affecting considerably the performance of the server.
I would
Erick,
I am a little bit confused, because I wasn't aware of this fact (and have
never noticed any wrong behaviour... maybe because I used the
dismax-handler).
How should I search for
field1: This is a good string
without doing something like
field1:this field1:is ... ?
If I quote the whole
Hi everybody:
I have a big problem with solr in a server with the memory
size it is using,
I am setting up Solr with java -jar start.jar command in
an ubuntu server,
the process start.jar is using 7Gb of memory in the
server and it is
affecting considerably the performance of the
I am curious:
The idea behind a stemmer is not that he produces the correct infinitive for
a given word. The idea is that he produces always the same infintive for any
derivate of the word.
What would be, if there is an unknown word? For example something like
slang? How does your solution
Where should Solr know that Wasserwaage contains on Wasser and Waage?
You are searching for some extra-filter like
DictionaryCompundWordTokenFilter.
Kind regards
- Mitch
stockii wrote:
okay.
as example. i want to check if WordDelimiterFactory works correct. And i
want to experimant
I have just read the post, but it doesn't said if the problems with memory
are associated with that way, the jetty web server it is used when I start
solr that way, then I supposed that problems with memory should not happen
because jetty must administrate the way the memory is used.
Then are you
This is a little bit of hijacking going on here, but
It's algorithmic. That is, there isn't a list of variants that
stem to the same infinitive, and your statement
always the same infintive for any derivate of the word
isn't quite what happens.
Stemmers will always produce the same
yes, thats what im sying to my chef...
but i found another solution in this moment ;)
-
i use EdgeNGram only for my productnames and search with an OR operator in
my default text field and in the productname field. so i found all
substrings :D
--
View this message in context:
if you want to limit the use of memory by the java process you could use
java -XmxNGB
where N is the amount of memory you want to limit to jetty container.
On Mon, Apr 19, 2010 at 10:05 PM, Ariel isaacr...@gmail.com wrote:
I have just read the post, but it doesn't said if the problems with
And what is the recommended max size memory I should use ??? Is there anyone
recommended ???
Regards.
On Mon, Apr 19, 2010 at 12:44 PM, Geek Gamer geek4...@gmail.com wrote:
if you want to limit the use of memory by the java process you could use
java -XmxNGB
where N is the amount of memory
And what is the recommended max size
memory I should use ??? Is there anyone
recommended ???
What is your index size?
Yes, you are right, thank you Erick.
I've lost this point and thought only of common cases, not of special ones.
However, one can combine the mentioned solutions and different stem-filters
in different fields, so that one can be quite (not absolutely) sure, that in
most of all cases the
Wasn't there a good posting on lucidworks.com?
The title was something like deadly sins or so.
There are some good suggestions on things like that :).
Kind regards
- Mitch
--
View this message in context:
http://n3.nabble.com/Big-problem-with-solr-in-an-official-server-tp730049p730168.html
Any ideas about my below Q ?
Lee
Begin forwarded message:
From: Lee Smith l...@weblee.co.uk
Date: 19 April 2010 11:19:45 GMT+01:00
To: solr-user@lucene.apache.org
Subject: Query 2 Cores
Reply-To: solr-user@lucene.apache.org
Hey All
I have 2 cores which have been used with tika to do
On 4/19/2010 11:09 AM, Lee Smith wrote:
http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2q=attr_content:test
Is this the correct way to query 2 cores at once ?
This should do what you want:
My use requires a mroe correct processing of language than what you define
as a stemmer. My experience with stemmers is that even with some words
without a stem, it makes a new word from it. I consider those false
positives.
My approach is based on the need to recognize that walk, walked, walking
This is a little bit of hijacking going on here, but
You are right. Accept my regrets.
It's algorithmic. That is, there isn't a list of variants that
stem to the same infinitive, and your statement
always the same infintive for any derivate of the word
isn't quite what happens.
no big deal, just wanted to mention.
On Mon, Apr 19, 2010 at 1:24 PM, dar...@ontrenet.com wrote:
This is a little bit of hijacking going on here, but
You are right. Accept my regrets.
It's algorithmic. That is, there isn't a list of variants that
stem to the same infinitive, and
hello *, im having issues with the synonym filter altering token offsets,
my input text is
saturday night live
its is tokenized by the whitespace tokenizer yielding 3 tokens
[saturday, 0,8], [night, 9, 14], [live, 15,19]
on indexing these are passed through a synonym filter that has this line
Andy,
This will help with smooth injection of your multilingual documents into Solr
(multilingual either in the sense of 1 doc containing fields in multiple
languages or 1 index containing documents in different languages):
http://sematext.com/products/multilingual-indexer/index.html
Re
Andy,
This will help with smooth injection of your multilingual
documents into Solr (multilingual either in the sense of 1
doc containing fields in multiple languages or 1 index
containing documents in different languages):
http://sematext.com/products/multilingual-indexer/index.html
?id you try parenthesizing:
field1:(This is a good string)
You can try lots of things easily by going to
http://localhost:8983/solr/admin/form.jsp
and clicking the debug enable checkbox...
HTH
Erick
On Mon, Apr 19, 2010 at 12:23 PM, MitchK mitc...@web.de wrote:
Erick,
I am a little bit
Careful though... the Solr admin page is for *analysis* testing, not
query parsing. I saw that mentioned earlier too. To test query
parsing, submit your query to http://localhost:8983/solr/select?q=your_querydebugQuery=true
and look at the parsed query output.
Erik
On Apr 19,
Hmmm, I *thought* I saw the XML response with the parsed query in it, did I
miss the details *again*?
Erick
On Mon, Apr 19, 2010 at 7:15 PM, Erik Hatcher erik.hatc...@gmail.comwrote:
Careful though... the Solr admin page is for *analysis* testing, not query
parsing. I saw that mentioned
Ah sorry... my bad. You're right. I thought you were referring to
the admin analysis.jsp page, but I misread and replied to quickly.
You're spot on, Erick.
Erik
On Apr 19, 2010, at 7:21 PM, Erick Erickson wrote:
Hmmm, I *thought* I saw the XML response with the parsed query in
I have the following text field:
fieldType name=text class=solr.TextField omitNorms=false
analyzer
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1
generateNumberParts=1
Same general question about highlighting the full work sunglasses when I
search for glasses. Is this possible?
Thanks
--
View this message in context:
http://n3.nabble.com/Highlighting-apostrophe-tp731155p731305.html
Sent from the Solr - User mailing list archive at Nabble.com.
Yes, both have same filters, so we can avoid specifying analyzer type.
- Naga
-Original Message-
From: MitchK [mailto:mitc...@web.de]
Sent: Monday, April 19, 2010 9:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Stemming - disable at query time - reg.
Additionally to Alejandro's
Thanks Erick. Using parentheses works.
With parentheses, the query,q=field1: (this is a good string) is parsed as
follows :
+field1:this +field1:good +field1:string
Is that ok to do.
Thanks,
Sandhya
-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent:
I'm using Solr 1.4 distribution, with Solr cell. Can i update only new
version of Tika in Solr 1.4 distn? If yes, any guide etc?
Thanks.
On Mon, Apr 19, 2010 at 4:36 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:
Praveen Agrawal wrote:
Hi Grant,
I tried command line of Tika v-0.7(newest), and
75 matches
Mail list logo