Re: LucidWorks Solr

2010-04-19 Thread Andy
--- On Sun, 4/18/10, Grant Ingersoll gsing...@apache.org wrote: Sure, but I'm biased. ;-)  Hopefully, you will find it useful, but choose the one that best fits your needs (and let me know if you need help assessing that.) Thanks for the explanation Grant. WHat is the advantage of

Re: Autofill 'id' field with the URL of files posted to Solr?

2010-04-19 Thread pk
Lance, I can submit and extract pdf contents using Solr and SolrJ, as i indicated earlier. I've made 'id' a mandatory field and i had to submit its value while submitting (request.addParams(literal.id,url)).. If i put multiple files/streams in the request, then i can't put 'id' this way as the

Query regarding copyField

2010-04-19 Thread Sandhya Agarwal
Hello, Is it a problem if I use *copyField* for some fields and not for others. In my query, I have both fields, the ones mentioned in copyField and ones that are not copied to a common destination. Will this cause an anomaly in my search results. I am seeing some weird behavior. Thanks,

Re: Facet count problem

2010-04-19 Thread Marco Martinez
Hi Ranveer, The error in the count of the facets its caused by the tokenized field that you are using, if you want to do facets for the whole string, use a fieldType that doesn't strip the the field in tokens like the string field. Regards, Marco Martínez Bautista

Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Hello, I am confused about the proper usage of the Boolean operators, AND, OR and NOT. Could somebody please provide me an easy to understand explanation. Thanks, Sandhya

Re: LucidWorks Solr

2010-04-19 Thread MitchK
Andy, I think it is important to know what a stemmer really is. It reduces words to their infinitves. Those infinitives do not refer to the real infinitive everytime, but however: for the system, it is an infinitive, since all its derivates could be reduced to the same form. Thats a stemmer.

Re: Help using boolean operators

2010-04-19 Thread MitchK
Hello Sandhya, title: star AND wars NOT sdi This query will match every document where star *and* wars occur but *not* the term sdi (SDI = Strategic Defense Initiative = in the media there was often the term star wars used to describe the project). title: star OR wars This query will match

Re: Query regarding copyField

2010-04-19 Thread MitchK
Hello Sandhya, please, show us your schema.xml, so that we can have a look whether something might be wrong there. However, if the source of a copyField is description and the destination is description_stemmed, you can query both: description and description_stemmed. There will be no error. -

Stemming - disable at query time - reg.

2010-04-19 Thread Naga Darbha
Hi, I have the following filter for a field named myText filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ This enables stemming, I guess. My questions are: 1) Can I disable stemming for the same field at the query time? 2) Do I need to copyField the

Re: Stemming - disable at query time - reg.

2010-04-19 Thread Rafał Kuć
Hello! If you want to have both non-stemmed and stemmed field You should use copyField. Even if there would be a possibility to disable snowball filter at query time, you would have stemmed tokens written in the index. Hi, I have the following filter for a field named myText

Re: Stemming - disable at query time - reg.

2010-04-19 Thread MitchK
Naga, 1) Yes, it is possible. fieldType name=myText class=solr.TextField positionIncrementGap=100 analyzer type=index filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer analyzer type=query

Re: Stemming - disable at query time - reg.

2010-04-19 Thread Rafał Kuć
Hello! MitchK posted the right solution, my post can be confusing ;( Sorry, for that. Hello! If you want to have both non-stemmed and stemmed field You should use copyField. Even if there would be a possibility to disable snowball filter at query time, you would have stemmed

RE: Stemming - disable at query time - reg.

2010-04-19 Thread Naga Darbha
Thank you Mitch! I will try that. regards, Naga -Original Message- From: MitchK [mailto:mitc...@web.de] Sent: Monday, April 19, 2010 2:35 PM To: solr-user@lucene.apache.org Subject: Re: Stemming - disable at query time - reg. Naga, 1) Yes, it is possible. fieldType name=myText

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Thank You Mitch. I have a query mentioned below : (my defaultOperator is set to AND) (field1 : This is a good string AND field2 : This is a good string AND field3 : This is a good string AND (field4 : ASCIIDocument OR field4 : BinaryDocument OR field4 : HTMLDocument) AND field5 : doc) This is

RE: Stemming - disable at query time - reg.

2010-04-19 Thread Naga Darbha
Hi Mitch, I have defined my field like: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Also, one of the fields here, *field3* is a dynamic field. All the other fields except this field, are copied into text with copyField. Thanks, Sandhya -Original Message- From: Sandhya Agarwal [mailto:sagar...@opentext.com] Sent: Monday, April 19, 2010 2:55 PM To:

Wildcard search in phrase query using spanquery

2010-04-19 Thread Maddy.Jsh
I need to perform wildcard search in phrase query. I have 2 documents containing text how do impair and how to improve. I want to be able to search both documents by searching (how to im*). There is a provision in lucene which allows me to perform this operation using SpanWildcardQuery and

Query 2 Cores

2010-04-19 Thread Lee Smith
Hey All I have 2 cores which have been used with tika to do index files. I would like to do one query on both at once as I will be searching attr_content field. If I do a test on each core I get 1 17 results but trying with shards I just get 17 results. Here is my example query

Re: Stemming - disable at query time - reg.

2010-04-19 Thread Alejandro Marqués Rodríguez
Hi Naga, I think you should add the same filter to the query configuration: fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr. WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory

Re: Solr throws TikaException while parsing sample PDF

2010-04-19 Thread Praveen Agrawal
Hi Grant, I tried command line of Tika v-0.7(newest), and it parsed the file.. I believe Solr1.4 contains 0.4 version of Tika. Do you suggest to upgrade to new Tika? Can i upgrade only tika in Solr-1.4? or i need to wait till Solr ships with new Tika? Thanks. On Sun, Apr 18, 2010 at 11:24 PM,

Re: Solr throws TikaException while parsing sample PDF

2010-04-19 Thread Koji Sekiguchi
Praveen Agrawal wrote: Hi Grant, I tried command line of Tika v-0.7(newest), and it parsed the file.. I believe Solr1.4 contains 0.4 version of Tika. Do you suggest to upgrade to new Tika? Can i upgrade only tika in Solr-1.4? or i need to wait till Solr ships with new Tika? Thanks. Solr

Howto build a function query using the 'query' function

2010-04-19 Thread Villemos, Gert
I want to build a function expression for a dismax request handler 'bf' field, to boost the documents if it is referenced by other documents. I.e. the more often a document is referenced, the higher the boost. Something like

OutOfMemoryError when using query with sort

2010-04-19 Thread Hamid Vahedi
Hi, i using solr that running on windows server 2008 32-bit. I add about 100 million article into solr without set store attribute. (only store document id) (index file size about 164 GB) when try to get query without sort , it's return doc ids in some ms, but when add sort command, i get

Re: LucidWorks Solr

2010-04-19 Thread Darren Govoni
Regarding stemmers, I ditched them altogether a long time ago in favor of a dictionary of morphologies of all known words (for any given language). A simple lookup of any word morphology thus produces the set, including the correct stem. Works great. 100% of the time. Just a tip from me. On

Ampersand in searchstring. how to replace ?

2010-04-19 Thread stockii
Hello.. I didnt find any about my problem... how can i replace an ampersand in indextime ? my autosuggest words are haveing ampersands. how can i replace this sign () ??? PatternReplaceCharFilterFactory ?? how is to use this Factory ? or RegexTransformer ??? thx for ya help ;) -- View

Re: LucidWorks Solr

2010-04-19 Thread Andy
Thanks for the explanation Mitch. You're right. There can't be universal stemmers. What about multi-language stemmers? I'm mostly interested in English, Spanish, German, French, Italian. Are there any stemmers that would handle those languages? If not, what's the recommended way to deal with

Re: LucidWorks Solr

2010-04-19 Thread Andy
Thanks for the tip. Are there any publicly available dictionary of morphologies that I could use? Or did you build your own one? --- On Mon, 4/19/10, Darren Govoni dar...@ontrenet.com wrote: From: Darren Govoni dar...@ontrenet.com Subject: Re: LucidWorks Solr To:

Fwd: [Dbworld] Survey on Web Geo-Spatial Open-Source Technologies

2010-04-19 Thread Paul Libbrecht
maybe of interest to those doing geo-search in solr? paul Début du message réexpédié : De : Gavin McArdle gavin.mcar...@ucd.ie Date : 19 avril 2010 14:46:05 GMT+02:00 À : dbwo...@cs.wisc.edu Objet : [Dbworld] Survey on Web Geo-Spatial Open-Source Technologies Répondre à :

[ANN] Carrot2 3.3.0 released

2010-04-19 Thread Stanislaw Osinski
Dear All, We're pleased to announce the 3.3.0 release of Carrot2 which significantly improves the scalability of the clustering algorithms (up to 7x times faster clustering in case of the STC algorithm) and fixes a number of minor issues. Release notes:

is solr ignored my filters ?

2010-04-19 Thread stockii
hey. sry for this ... stupid question ;) when i perform an import from my data is use some filters. how can i really be sure that solr used my configured filters and analyzer ? when i search in solr the result looks 100% like bevor an import. th =) -- View this message in context:

Re: is solr ignored my filters ?

2010-04-19 Thread Erik Hatcher
Analyzers/Tokenizers/TokenFilters operate on the text that gets indexed. Stored text remains exactly as you sent it in. Erik On Apr 19, 2010, at 9:53 AM, stockii wrote: hey. sry for this ... stupid question ;) when i perform an import from my data is use some filters. how can i

Re: is solr ignored my filters ?

2010-04-19 Thread Sven Maurmann
Hi, could you provide at least some information? Usually you can be 100% sure that Solr uses the configuration it is provided with. Cheers, Sven --On Montag, 19. April 2010 05:53 -0800 stockii st...@shopgate.com wrote: hey. sry for this ... stupid question ;) when i perform an import

Re: LucidWorks Solr

2010-04-19 Thread darren
There have been some open source ones. I don't have the links handy at this moment[1]. But I parsed through the electronic dictionary and generated a database of each word and its morphologies. I got tired of lame stemmers that were wrong half the time. Computers are fast enough to do lookups on

Re: is solr ignored my filters ?

2010-04-19 Thread stockii
okay. as example. i want to check if WordDelimiterFactory works correct. And i want to experimant with search in substrings with edgengram... i have the problem with that string: Kamera-Wasserwaage ... so i think solr should filter this like this. Kamera-Wasserwaage - Kamera - Wasserwaage

Re: is solr ignored my filters ?

2010-04-19 Thread Michael Kuhlmann
Am 19.04.2010 16:09, schrieb stockii: so i want to see how it is indexed. Go to the admin panel, open the schema browser, and set the number of shown tokens to 1 or something. -Michael

Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
If you're submitting this: field1 : This is a good string then you're searching in field1 ONLY for This. the tokens is, a good and string are being searched against your default search field as defined in your schema. Have you tried parenthesizing? Try the SOLR admin page for looking at

Re: is solr ignored my filters ?

2010-04-19 Thread stockii
oha, yes thx but we have 800 000 items ... to find the right in this way ? XD -- View this message in context: http://n3.nabble.com/is-solr-ignored-my-filters-tp729646p729749.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: is solr ignored my filters ?

2010-04-19 Thread Michael Kuhlmann
Am 19.04.2010 16:29, schrieb stockii: oha, yes thx but we have 800 000 items ... to find the right in this way ? XD Then use the TermsComponent: http://wiki.apache.org/solr/TermsComponent -Michael

Re: Ampersand in searchstring. how to replace ?

2010-04-19 Thread Ahmet Arslan
I didnt find any about my problem... how can i replace an ampersand in indextime ? my autosuggest words are haveing ampersands. how can i replace this sign () ??? Easiest way is to use MappingCharFilterFactory before your tokenizer. charFilter class=solr.MappingCharFilterFactory

Re: Wildcard search in phrase query using spanquery

2010-04-19 Thread Ahmet Arslan
I need to perform wildcard search in phrase query. I have 2 documents containing text how do impair and how to improve. I want to be able to search both documents by searching (how to im*). There is a provision in lucene which allows me to perform this operation using SpanWildcardQuery and

best practice handling html content

2010-04-19 Thread Markus.Rietzler
hello, we want to index and search in our intranet documents. the field body contains html-tags. in our schema.xml we have a fieldType text_de (see at the end of this mail) which uses charFilter solr.HTMLStripCharFilterFactory with index. so this is no problem. the text is put into the index

Caching of search results, caching proxy

2010-04-19 Thread Andy
I'm setting up my Solr index to be updated every x minutes. Does Solr cache the result of a search, and then when next time the same search is requested, it'd recognize that the Index has not changed and therefore just return the previous result from cache without processing the search again?

Re: Caching of search results, caching proxy

2010-04-19 Thread Ahmet Arslan
I'm setting up my Solr index to be updated every x minutes. Does Solr cache the result of a search, and then when next time the same search is requested, it'd recognize that the Index has not changed and therefore just return the previous result from cache without processing the search

Re: Stemming - disable at query time - reg.

2010-04-19 Thread MitchK
Additionally to Alejandro's posting, I would say that you don't need to specify an analyzer for index-time and query-time, since it *seems* (maybe I am wrong) like you want to use the same functionality on index- and query-time. Hope this helps - Mitch -- View this message in context:

Re: best practice handling html content

2010-04-19 Thread Ahmet Arslan
we want to index and search in our intranet documents. the field body contains html-tags. in our schema.xml we have a fieldType text_de (see at the end of this mail) which uses charFilter solr.HTMLStripCharFilterFactory with index. so this is no problem. the text is put into the index

Big problem with solr in an official server.

2010-04-19 Thread Ariel
Hi everybody: I have a big problem with solr in a server with the memory size it is using, I am setting up Solr with java -jar start.jar command in an ubuntu server, the process start.jar is using 7Gb of memory in the server and it is affecting considerably the performance of the server. I would

Re: Help using boolean operators

2010-04-19 Thread MitchK
Erick, I am a little bit confused, because I wasn't aware of this fact (and have never noticed any wrong behaviour... maybe because I used the dismax-handler). How should I search for field1: This is a good string without doing something like field1:this field1:is ... ? If I quote the whole

Re: Big problem with solr in an official server.

2010-04-19 Thread Ahmet Arslan
Hi everybody: I have a big problem with solr in a server with the memory size it is using, I am setting up Solr with java -jar start.jar command in an ubuntu server, the process start.jar is using 7Gb of  memory in the server and it is affecting considerably the performance of the

Re: LucidWorks Solr

2010-04-19 Thread MitchK
I am curious: The idea behind a stemmer is not that he produces the correct infinitive for a given word. The idea is that he produces always the same infintive for any derivate of the word. What would be, if there is an unknown word? For example something like slang? How does your solution

Re: is solr ignored my filters ?

2010-04-19 Thread MitchK
Where should Solr know that Wasserwaage contains on Wasser and Waage? You are searching for some extra-filter like DictionaryCompundWordTokenFilter. Kind regards - Mitch stockii wrote: okay. as example. i want to check if WordDelimiterFactory works correct. And i want to experimant

Re: Big problem with solr in an official server.

2010-04-19 Thread Ariel
I have just read the post, but it doesn't said if the problems with memory are associated with that way, the jetty web server it is used when I start solr that way, then I supposed that problems with memory should not happen because jetty must administrate the way the memory is used. Then are you

Re: LucidWorks Solr

2010-04-19 Thread Erick Erickson
This is a little bit of hijacking going on here, but It's algorithmic. That is, there isn't a list of variants that stem to the same infinitive, and your statement always the same infintive for any derivate of the word isn't quite what happens. Stemmers will always produce the same

Re: is solr ignored my filters ?

2010-04-19 Thread stockii
yes, thats what im sying to my chef... but i found another solution in this moment ;) - i use EdgeNGram only for my productnames and search with an OR operator in my default text field and in the productname field. so i found all substrings :D -- View this message in context:

Re: Big problem with solr in an official server.

2010-04-19 Thread Geek Gamer
if you want to limit the use of memory by the java process you could use java -XmxNGB where N is the amount of memory you want to limit to jetty container. On Mon, Apr 19, 2010 at 10:05 PM, Ariel isaacr...@gmail.com wrote: I have just read the post, but it doesn't said if the problems with

Re: Big problem with solr in an official server.

2010-04-19 Thread Ariel
And what is the recommended max size memory I should use ??? Is there anyone recommended ??? Regards. On Mon, Apr 19, 2010 at 12:44 PM, Geek Gamer geek4...@gmail.com wrote: if you want to limit the use of memory by the java process you could use java -XmxNGB where N is the amount of memory

Re: Big problem with solr in an official server.

2010-04-19 Thread Ahmet Arslan
And what is the recommended max size memory I should use ??? Is there anyone recommended ??? What is your index size?

Re: LucidWorks Solr

2010-04-19 Thread MitchK
Yes, you are right, thank you Erick. I've lost this point and thought only of common cases, not of special ones. However, one can combine the mentioned solutions and different stem-filters in different fields, so that one can be quite (not absolutely) sure, that in most of all cases the

Re: Big problem with solr in an official server.

2010-04-19 Thread MitchK
Wasn't there a good posting on lucidworks.com? The title was something like deadly sins or so. There are some good suggestions on things like that :). Kind regards - Mitch -- View this message in context: http://n3.nabble.com/Big-problem-with-solr-in-an-official-server-tp730049p730168.html

Fwd: Query 2 Cores

2010-04-19 Thread Lee Smith
Any ideas about my below Q ? Lee Begin forwarded message: From: Lee Smith l...@weblee.co.uk Date: 19 April 2010 11:19:45 GMT+01:00 To: solr-user@lucene.apache.org Subject: Query 2 Cores Reply-To: solr-user@lucene.apache.org Hey All I have 2 cores which have been used with tika to do

Re: Fwd: Query 2 Cores

2010-04-19 Thread Shawn Heisey
On 4/19/2010 11:09 AM, Lee Smith wrote: http://localhost8983/solr/core1/select?shards=localhost:8983/solr/core2q=attr_content:test Is this the correct way to query 2 cores at once ? This should do what you want:

Re: LucidWorks Solr

2010-04-19 Thread darren
My use requires a mroe correct processing of language than what you define as a stemmer. My experience with stemmers is that even with some words without a stem, it makes a new word from it. I consider those false positives. My approach is based on the need to recognize that walk, walked, walking

Re: LucidWorks Solr

2010-04-19 Thread darren
This is a little bit of hijacking going on here, but You are right. Accept my regrets. It's algorithmic. That is, there isn't a list of variants that stem to the same infinitive, and your statement always the same infintive for any derivate of the word isn't quite what happens.

Re: LucidWorks Solr

2010-04-19 Thread Erick Erickson
no big deal, just wanted to mention. On Mon, Apr 19, 2010 at 1:24 PM, dar...@ontrenet.com wrote: This is a little bit of hijacking going on here, but You are right. Accept my regrets. It's algorithmic. That is, there isn't a list of variants that stem to the same infinitive, and

synonym filter and offsets

2010-04-19 Thread Joe Calderon
hello *, im having issues with the synonym filter altering token offsets, my input text is saturday night live its is tokenized by the whitespace tokenizer yielding 3 tokens [saturday, 0,8], [night, 9, 14], [live, 15,19] on indexing these are passed through a synonym filter that has this line

Re: LucidWorks Solr

2010-04-19 Thread Otis Gospodnetic
Andy, This will help with smooth injection of your multilingual documents into Solr (multilingual either in the sense of 1 doc containing fields in multiple languages or 1 index containing documents in different languages): http://sematext.com/products/multilingual-indexer/index.html Re

Re: LucidWorks Solr

2010-04-19 Thread Andy
Andy, This will help with smooth injection of your multilingual documents into Solr (multilingual either in the sense of 1 doc containing fields in multiple languages or 1 index containing documents in different languages):   http://sematext.com/products/multilingual-indexer/index.html

Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
?id you try parenthesizing: field1:(This is a good string) You can try lots of things easily by going to http://localhost:8983/solr/admin/form.jsp and clicking the debug enable checkbox... HTH Erick On Mon, Apr 19, 2010 at 12:23 PM, MitchK mitc...@web.de wrote: Erick, I am a little bit

Re: Help using boolean operators

2010-04-19 Thread Erik Hatcher
Careful though... the Solr admin page is for *analysis* testing, not query parsing. I saw that mentioned earlier too. To test query parsing, submit your query to http://localhost:8983/solr/select?q=your_querydebugQuery=true and look at the parsed query output. Erik On Apr 19,

Re: Help using boolean operators

2010-04-19 Thread Erick Erickson
Hmmm, I *thought* I saw the XML response with the parsed query in it, did I miss the details *again*? Erick On Mon, Apr 19, 2010 at 7:15 PM, Erik Hatcher erik.hatc...@gmail.comwrote: Careful though... the Solr admin page is for *analysis* testing, not query parsing. I saw that mentioned

Re: Help using boolean operators

2010-04-19 Thread Erik Hatcher
Ah sorry... my bad. You're right. I thought you were referring to the admin analysis.jsp page, but I misread and replied to quickly. You're spot on, Erick. Erik On Apr 19, 2010, at 7:21 PM, Erick Erickson wrote: Hmmm, I *thought* I saw the XML response with the parsed query in

Highlighting apostrophe

2010-04-19 Thread Blargy
I have the following text field: fieldType name=text class=solr.TextField omitNorms=false analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1

Re: Highlighting apostrophe

2010-04-19 Thread Blargy
Same general question about highlighting the full work sunglasses when I search for glasses. Is this possible? Thanks -- View this message in context: http://n3.nabble.com/Highlighting-apostrophe-tp731155p731305.html Sent from the Solr - User mailing list archive at Nabble.com.

RE: Stemming - disable at query time - reg.

2010-04-19 Thread Naga Darbha
Yes, both have same filters, so we can avoid specifying analyzer type. - Naga -Original Message- From: MitchK [mailto:mitc...@web.de] Sent: Monday, April 19, 2010 9:44 PM To: solr-user@lucene.apache.org Subject: Re: Stemming - disable at query time - reg. Additionally to Alejandro's

RE: Help using boolean operators

2010-04-19 Thread Sandhya Agarwal
Thanks Erick. Using parentheses works. With parentheses, the query,q=field1: (this is a good string) is parsed as follows : +field1:this +field1:good +field1:string Is that ok to do. Thanks, Sandhya -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent:

Re: Solr throws TikaException while parsing sample PDF

2010-04-19 Thread Praveen Agrawal
I'm using Solr 1.4 distribution, with Solr cell. Can i update only new version of Tika in Solr 1.4 distn? If yes, any guide etc? Thanks. On Mon, Apr 19, 2010 at 4:36 PM, Koji Sekiguchi k...@r.email.ne.jp wrote: Praveen Agrawal wrote: Hi Grant, I tried command line of Tika v-0.7(newest), and