locks in solr
Hi, Is there any article which explains the locks in solr?? there is some info on solrconfig.txt which says that you can set the lock type to none(NoLockFactory), single(SingleInstanceLockFactory), NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime we create a new file. suppose my index dir has the following files: _2s.fdt, _2t.fnm, _2u.nrm, _2v.tii, _2x.fdt, _2y.fnm, _2z.nrm, _30.tii, _2s.fdx, _2t.frq, _2u.prx, _2v.tis _2x.fdx, _2y.frq _2z.prx, _30.tis, _2s.fnm, _2t.nrm, _2u.tii, _2w.fdt _2x.fnm, _2y.nrm _2z.tii, segments_2s, _2s.frq, _2t.prx, _2u.tis, _2w.fdx _2x.frq, _2y.prx _2z.tis, segments.gen 1.) I assume for each of these files there is a lock. please correct me if i am wrong. 2.) what are the different lock types in terms of read/write/updates? 3.) Can we have a document level locking scheme? 4.) we would like to know the best way to handle multiple simulataneous writes to the index Thanks a ton, Raakhi
Multicore - Post xml to core0, core1 or core2
Hallo, at the moment i tryed to create a Solr instance wite more then one Cores I use solr 1.4 and multicore Runs :-) But i dont know how i post a XML in one of my cores. At the Moment i use java -jar post.jar *.xml now i will fill the core0 index with core0*.xml , and core1 with core1*.xml But how? in the wiki i cant find anythink about that. King
Re: Multicore - Post xml to core0, core1 or core2
try this java -Durl=http://localhost:8983/solr/core0/update -jar post.jar *.xml On Wed, Nov 25, 2009 at 3:23 PM, Jörg Agatz joerg.ag...@googlemail.com wrote: Hallo, at the moment i tryed to create a Solr instance wite more then one Cores I use solr 1.4 and multicore Runs :-) But i dont know how i post a XML in one of my cores. At the Moment i use java -jar post.jar *.xml now i will fill the core0 index with core0*.xml , and core1 with core1*.xml But how? in the wiki i cant find anythink about that. King -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: Multicore - Post xml to core0, core1 or core2
Thanks, it works realy fine.. Maby you have an Ideo, to search in Core0 and Core1 I want to search in all cores, or only in 2 of 3 cores.
Sending Tika parse result to Solr
Hello, i want to send the Tika parse results of my data to my Solr-Server. My File-Server is not my Solr-Server, so Solr Cell is no option for me. In Lucene i can pass my Reader Object (as an result of the parsing) to a Lucene Document for indexing. Is this also possible with Solr? Or is there an other or better way to do this? I'm using SolrJ for the connection. Regards, Daniel smime.p7s Description: S/MIME cryptographic signature
Buggy search Solr1.4 Multicore
Hi... I have a Problem with Solr, I try it with 3 cores, and it starts. I can search but i only become results, when i exaktly search for the howls field. i mean, in the field stand: Dell Widescreen Ultra when i search for name:Widescreen i get Nothing name:Dell Widescreen Ultra i get the file name:Dell* i get the file now i creat copyfiels and search only for Dell* and get it, bit Widescreen Nothing What is wrong with the index? I will search each word in each field! Pleas Help me
Re: Buggy search Solr1.4 Multicore
Hello! Hi... I have a Problem with Solr, I try it with 3 cores, and it starts. I can search but i only become results, when i exaktly search for the howls field. i mean, in the field stand: Dell Widescreen Ultra when i search for name:Widescreen i get Nothing name:Dell Widescreen Ultra i get the file name:Dell* i get the file now i creat copyfiels and search only for Dell* and get it, bit Widescreen Nothing What is wrong with the index? I will search each word in each field! Pleas Help me I assume your name field type is string right ? If it is right, than change it to text, it should work as You would like. -- Regards, Rafał Kuć
Help on this parsed query
I have the text analyzer defined as follows fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType when i search on this field name simple of the above field type , the term peRsonal *I expect it to search as simple:personal simple :pe simple:rsonal* instead the parsed query string says str name=*rawquerystring**simple:peRsonal*/str * * str name=*querystring**simple:peRsonal*/str * * str name=*parsedquery**MultiPhraseQuery(simple:(person pe) rsonal)*/str * * str name=*parsedquery_toString**simple:(person pe) rsonal*/str what is this multiphrase query ,why is this a phrase query istead of simple query? Regards Revas
Solr 1.4 search in more the one Core
Hollo, I try to search in more than one Core. I search in Wiki, but i dont find any way to search in 2 of the 3 cores and a way to seacht in all cores. maby Someone of you have tryed the same an can help me?
Re: Sending Tika parse result to Solr
On Nov 25, 2009, at 5:32 AM, Daniel Knapp wrote: Hello, i want to send the Tika parse results of my data to my Solr-Server. My File-Server is not my Solr-Server, so Solr Cell is no option for me. In Lucene i can pass my Reader Object (as an result of the parsing) to a Lucene Document for indexing. Is this also possible with Solr? Or is there an other or better way to do this? I'm using SolrJ for the connection. You can't pass your reader object, but I have opened https://issues.apache.org/jira/browse/SOLR-1526 to provide a SolrJ client side equivalent of Solr Cell. If you'd like to contribute a patch that would be great. Basically, you just need to have your Handler override create a SolrInputDocument (batches, that is) and then send them to Solr. Using the Streaming server may also fit well with this model. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: how to do partial word searches?
Hi Erick, thanks for the links, I read both of them and I still have no idea what to do, lots of back and forth, but didn't see any solution on it. One person talked about indexing the field in reverse and doing and ON on it, this might work I guess. thanks Joel On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote: copying from Eric Hatcher: See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently does not have leading wildcard support enabled. There's a pretty extensive recent exchange on this, see the thread on the user's list titled leading and trailing wildcard queryBest Erick On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I saw some older postings on this, but didnt see a resolution. I have a field called title, I would like to be able to find partial word matches within the title. For example: http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22 I would expect it to find: str name=textTitlethe daily dish | by andrew sullivan/str but it doesnt, it does find sully (which is fine with me also as a bonus), but doesnt seem to get any of the partial word stuff. Oddly enough before I lowercased the title, the wildcard matching seemed to work a bit better, it just didnt deal with the case sensitive query. At first I had mixed case titles and I read that the wildcard doesn't work with mixed case, so I created another field that is a lowered version of the title called textTitle, it is of type text. Is it possible with solr to achieve what I am trying to do, if so how? If not, anything closer than what I have? thanks Joel
Re: Implementing phrase autopop up
On Tue, Nov 24, 2009 at 11:58 PM, darniz rnizamud...@edmunds.com wrote: i created a filed as same as the lucid blog says. field name=autocomp type=edgytext indexed=true stored=true omitNorms=true omitTermFreqAndPositions=true/ with the following field configurtion fieldType name=edgytext class=solr.TextField positionIncrementGap=100 − analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=25/ /analyzer − analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Now when i query i get the correct phrases for example if search for autocomp:how to i get all the correct phrases like How to find a car How to find a mechanic How to choose the right insurance company etc... which is good. Now I have two question. 1) Is it necessary to give the query in quote. My gut feeling is yes, since if you dont give quote i get phrases beginning with How followed by some other words like How can etc... Yes since we want to do phrase searches on n-grams 2)if i search for word for example choose, it gives me nothing I was expecting to see a result considering there is a word choose in the phrase How to choose the right insurance company i might look more at documentation but do you have anything to advice. EdgeNgram creates n-grams from the starting or the ending edge therefore you can't match words in the middle of a phrase. Try using NGramFilterFactory instead. -- Regards, Shalin Shekhar Mangar.
Re: Buggy search Solr1.4 Multicore
If Rafal's response doesn't help (but it's sure where I'd look first, it sounds like you're using a field type that's not Tokenized), then could you post the relevant parts of your config file that define the field and the analyzers used at *both* query and index time? Best Erick On Wed, Nov 25, 2009 at 6:35 AM, Jörg Agatz joerg.ag...@googlemail.comwrote: Hi... I have a Problem with Solr, I try it with 3 cores, and it starts. I can search but i only become results, when i exaktly search for the howls field. i mean, in the field stand: Dell Widescreen Ultra when i search for name:Widescreen i get Nothing name:Dell Widescreen Ultra i get the file name:Dell* i get the file now i creat copyfiels and search only for Dell* and get it, bit Widescreen Nothing What is wrong with the index? I will search each word in each field! Pleas Help me
Re: Help on this parsed query
I think because if it wasn't a phrase query you'd be matching on the broken up parts of the word *wherever* they were in your field. e.g. pe and rsonal could be separated by any number of other tokens and you'd get a match. HTH Erick P.S. I was a bit confused by your asterisks, it took me a while to figure out that you'd added them by hand for emphasis and you weren't sending wildcards through.. On Wed, Nov 25, 2009 at 6:43 AM, revas revas...@gmail.com wrote: I have the text analyzer defined as follows fieldType name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=1 preserveOriginal=1/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldType when i search on this field name simple of the above field type , the term peRsonal *I expect it to search as simple:personal simple :pe simple:rsonal* instead the parsed query string says str name=*rawquerystring**simple:peRsonal*/str * * str name=*querystring**simple:peRsonal*/str * * str name=*parsedquery**MultiPhraseQuery(simple:(person pe) rsonal)*/str * * str name=*parsedquery_toString**simple:(person pe) rsonal*/str what is this multiphrase query ,why is this a phrase query istead of simple query? Regards Revas
Re: SolrPlugin Guidance
On Tue, Nov 24, 2009 at 11:04 PM, Vauthrin, Laurent laurent.vauth...@disney.com wrote: Our team is trying to make a Solr plugin that needs to parse/decompose a given query into potentially multiple queries. The idea is that we're trying to abstract a complex schema (with different document types) from the users so that their queries can be simpler. So basically, we're trying to do the following: 1. Decompose query A into query B and query C 2. Send query B to all shards and plug query B's results into query C 3. Send Query C to all shards and pass the results back to the client I started trying to implement this by subclassing the SearchHandler but realized that I would not have access to HttpCommComponent. Then I tried to replicate the SearchHandler class but realized that I might not have access to fields I would need in ShardResponse. So I figured I should step back and get advice from the mailing list now J. What is the best plugin point for decomposing a query into multiple queries so that all resultant queries can be sent to each shard? All queries are sent to all shards? If yes, it sounds like a job for a custom QParser. -- Regards, Shalin Shekhar Mangar.
Re: locks in solr
On Wed, Nov 25, 2009 at 3:05 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Is there any article which explains the locks in solr?? there is some info on solrconfig.txt which says that you can set the lock type to none(NoLockFactory), single(SingleInstanceLockFactory), NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime we create a new file. suppose my index dir has the following files: _2s.fdt, _2t.fnm, _2u.nrm, _2v.tii, _2x.fdt, _2y.fnm, _2z.nrm, _30.tii, _2s.fdx, _2t.frq, _2u.prx, _2v.tis _2x.fdx, _2y.frq _2z.prx, _30.tis, _2s.fnm, _2t.nrm, _2u.tii, _2w.fdt _2x.fnm, _2y.nrm _2z.tii, segments_2s, _2s.frq, _2t.prx, _2u.tis, _2w.fdx _2x.frq, _2y.prx _2z.tis, segments.gen 1.) I assume for each of these files there is a lock. please correct me if i am wrong. No. The index directory has one lock. Individual files are not locked separately. 2.) what are the different lock types in terms of read/write/updates? Locks are only used for preventing more than one IndexWriters (or Solr instances/cores) writing to the same index. They do not prevent reads. They also do not prevent multiple writes from the same Solr core (there is some synchronization but it has nothing to do with these locks) 3.) Can we have a document level locking scheme? No. I think you have grossly misunderstood the purpose of locks in Solr. 4.) we would like to know the best way to handle multiple simulataneous writes to the index With one Solr instance, you can do writes concurrently without a problem. -- Regards, Shalin Shekhar Mangar.
Re: why is XMLWriter declared as final?
On Wed, Nov 25, 2009 at 3:33 AM, Matt Mitchell goodie...@gmail.com wrote: Is there any reason the XMLWriter is declared as final? I'd like to extend it for a special case but can't. The other writers (ruby, php, json) are not final. I don't think it needs to be final. Maybe it is final because it wasn't designed to be extensible. Please open a jira issue. -- Regards, Shalin Shekhar Mangar.
Re: Solr 1.4 search in more the one Core
On Wed, Nov 25, 2009 at 5:39 PM, Jörg Agatz joerg.ag...@googlemail.comwrote: Hollo, I try to search in more than one Core. I search in Wiki, but i dont find any way to search in 2 of the 3 cores and a way to seacht in all cores. maby Someone of you have tryed the same an can help me? You need to provide urls of the cores in the distributed search request. It will make HTTP calls to the specified cores but there is no way around that right now. http://wiki.apache.org/solr/DistributedSearch Why do you want to search across cores on the same Solr? -- Regards, Shalin Shekhar Mangar.
Re: ExternalRequestHandler and ContentStreamUpdateRequest usage
Grant, can you assist. I am going clueless as to why its not indexing content of the file. I have provided schema, code info below/previous threads. do I need to explicitly add param(content, ') into ContentStreamUpdateRequest? which I don't think is the right thing to do. Please advie. let me know if you need anything else. Appreciate your help. Thanks, javaxmlsoapdev wrote: Following is luke response. lst name=fields / is empty. can someone assist to find out why file content isn't being index? ?xml version=1.0 encoding=UTF-8 ? response lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=index int name=numDocs0/int int name=maxDoc0/int int name=numTerms0/int long name=version1259085661332/long bool name=optimizedfalse/bool bool name=currenttrue/bool bool name=hasDeletionsfalse/bool str name=directoryorg.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index/str date name=lastModified2009-11-24T18:01:01Z/date /lst lst name=fields / lst name=info lst name=key str name=IIndexed/str str name=TTokenized/str str name=SStored/str str name=MMultivalued/str str name=VTermVector Stored/str str name=oStore Offset With TermVector/str str name=pStore Position With TermVector/str str name=OOmit Norms/str str name=LLazy/str str name=BBinary/str str name=CCompressed/str str name=fSort Missing First/str str name=lSort Missing Last/str /lst str name=NOTEDocument Frequency (df) is not updated when a document is marked for deletion. df values include deleted documents./str /lst /response javaxmlsoapdev wrote: I was able to configure /docs index separately from my db data index. still I am seeing same behavior where it only puts .docName its size in the content field (I have renamed field to content in this new schema) below are the only two fields I have in schema.xml field name=key type=slong indexed=true stored=true required=true / field name=content type=text indexed=true stored=true multiValued=true/ Following is updated code from test case File fileToIndex = new File(file.txt); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(fileToIndex); up.setParam(literal.key, 8978); up.setParam(literal.docName, doc123.txt); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedList list = server.request(up); assertNotNull(Couldn't upload .txt,list); QueryResponse rsp = server.query( new SolrQuery( *:*) ); assertEquals( 1, rsp.getResults().getNumFound() ); System.out.println(rsp.getResults().get(0).getFieldValue(content)); Also from solr admin UI when I search for doc123.txt then only it returns me following response. not sure why its not indexing file's content into content attribute. - result name=response numFound=1 start=0 - doc - arr name=content str702/str strtext/plain/str strdoc123.txt/str str / /arr long name=key8978/long /doc /result Any idea? Thanks, javaxmlsoapdev wrote: http://machinename:port/solr/admin/luke gives me 404 error so seems like its not able to find luke. I am reusing schema, which is used for indexing other entity from database, which has no relevance to documents. that was my next question that what do I put in, in a schema if my documents don't need any column mappings or anything. plus I want to keep file documents index separately from database entity index. what's the best way to do this? If I don't have any db columns etc to map and file documents index should leave separate from db entity index, what's the best way to achieve this. thanks, Grant Ingersoll-6 wrote: On Nov 23, 2009, at 5:33 PM, javaxmlsoapdev wrote: *:* returns me 1 count but when I search for specific word (which was part of .txt file I indexed before) it doesn't return me anything. I don't have luke setup on my end. http://localhost:8983/solr/admin/luke should give yo some info. let me see if I can set that up quickly but otherwise do you see anything I am missing in solrconfig mapping or something? What's your schema look like and how are you querying? which maps document content to wrong attribute? thanks, Grant Ingersoll-6 wrote: On Nov 23, 2009, at 5:04 PM, javaxmlsoapdev wrote: Following code is from my test case where it tries to index a file (of type .txt) ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(fileToIndex); up.setParam(literal.key, 8978); //key is the uniqueId up.setParam(ext.literal.docName, doc123.txt); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); server.request(up); test case doesn't give me any error and I think its indexing the file? but when I search for a
Re: Solr 1.4 search in more the one Core
Why do you want to search across cores on the same Solr? -- Regards, Shalin Shekhar Mangar. I only need Multiindexing, but i find no other way to import other indizes. I have some old indexes from a other Projekt, and will us this in Solr. i i use one index it works, but i have a lot of index, so i need to find a way search in more then one index, so more then one Cores
Re: how to do partial word searches?
Confession: I haven't had occasion to use the ngram thingy, but here's the theory And note that SOLR has n-gram tokenizers available.. Using a 2-gram example for sullivan, the n-gram would index these tokens... su, ul, ll, li, iv, va, an. Then at query time in your example, sulli would be broken up into su, ul, ll and li. Which, when searched as a phrase would turn match your field. The expense, of course is that your index is larger (but surprisingly not as much as you'd think). But your queries are much faster. That's the theory anyway, the practice is left as an exercise for the readerG But the folks generously provided quite an explication of what wildcards are all about on the *lucene* user's list, look for a thread titled I just don't get wildcards at all from around 2006. It's a nice background for what the underlying problem is, some of the SOLR tokenizers are realizing some of this I think. And the state of the art has progressed considerably since then, but the underlying issues are still there... Sorry I can't be more help here.. Erick On Wed, Nov 25, 2009 at 8:18 AM, Joel Nylund jnyl...@yahoo.com wrote: Hi Erick, thanks for the links, I read both of them and I still have no idea what to do, lots of back and forth, but didn't see any solution on it. One person talked about indexing the field in reverse and doing and ON on it, this might work I guess. thanks Joel On Nov 24, 2009, at 9:12 PM, Erick Erickson wrote: copying from Eric Hatcher: See http://issues.apache.org/jira/browse/SOLR-218 - Solr currently does not have leading wildcard support enabled. There's a pretty extensive recent exchange on this, see the thread on the user's list titled leading and trailing wildcard queryBest Erick On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I saw some older postings on this, but didnt see a resolution. I have a field called title, I would like to be able to find partial word matches within the title. For example: http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22 I would expect it to find: str name=textTitlethe daily dish | by andrew sullivan/str but it doesnt, it does find sully (which is fine with me also as a bonus), but doesnt seem to get any of the partial word stuff. Oddly enough before I lowercased the title, the wildcard matching seemed to work a bit better, it just didnt deal with the case sensitive query. At first I had mixed case titles and I read that the wildcard doesn't work with mixed case, so I created another field that is a lowered version of the title called textTitle, it is of type text. Is it possible with solr to achieve what I am trying to do, if so how? If not, anything closer than what I have? thanks Joel
Re: how to do partial word searches?
Hi, if you are using Solr 1.4 I think you might want to try type text_rev (look in the example schema.xml) unless i am mistaken: this will enable leading wildcard support for that field. this doesn't do any stemming, which I think might be making your wildcards behave wierd. it also enables reverse wildcard support, so some of your substring matches will be faster. On Tue, Nov 24, 2009 at 7:51 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, I saw some older postings on this, but didnt see a resolution. I have a field called title, I would like to be able to find partial word matches within the title. For example: http://localhost:8983/solr/select?q=textTitle:%22*sulli*%22 I would expect it to find: str name=textTitlethe daily dish | by andrew sullivan/str but it doesnt, it does find sully (which is fine with me also as a bonus), but doesnt seem to get any of the partial word stuff. Oddly enough before I lowercased the title, the wildcard matching seemed to work a bit better, it just didnt deal with the case sensitive query. At first I had mixed case titles and I read that the wildcard doesn't work with mixed case, so I created another field that is a lowered version of the title called textTitle, it is of type text. Is it possible with solr to achieve what I am trying to do, if so how? If not, anything closer than what I have? thanks Joel -- Robert Muir rcm...@gmail.com
Re: why is XMLWriter declared as final?
OK thanks Shalin. Matt On Wed, Nov 25, 2009 at 8:48 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Wed, Nov 25, 2009 at 3:33 AM, Matt Mitchell goodie...@gmail.com wrote: Is there any reason the XMLWriter is declared as final? I'd like to extend it for a special case but can't. The other writers (ruby, php, json) are not final. I don't think it needs to be final. Maybe it is final because it wasn't designed to be extensible. Please open a jira issue. -- Regards, Shalin Shekhar Mangar.
RE: Index Splitter
You can't really use this if you have an optimized index, right? -Original Message- From: Koji Sekiguchi [mailto:k...@r.email.ne.jp] Sent: Tuesday, November 24, 2009 6:57 PM To: solr-user@lucene.apache.org Subject: Re: Index Splitter Giovanni Fernandez-Kincade wrote: Hi, I've heard about a tool that can be used to split Lucene indexes, for cases where you want to break up a large index into shards. Do you know where I can find it? Any observations/recommendations about its use? This seems promising but I'm not sure if there is anything more mature out there: http://blog.foofactory.fi/2008/01/regenerating-equally-sized-shards-from.html Thanks, Gio. There are IndexSplitter and MultiPassIndexSplitter tools in 3.0. https://issues.apache.org/jira/browse/LUCENE-1959 I'd written an article about them before: http://lucene.jugem.jp/?eid=344 It is Japanese but I think you can read out how to use them from command lines... Koji -- http://www.rondhuit.com/en/
Re: Index Splitter
Giovanni Fernandez-Kincade wrote: You can't really use this if you have an optimized index, right? For optimized index, I think you can use MultiPassIndexSplitter. Koji -- http://www.rondhuit.com/en/
Re: how is score computed with hsin functionquery?
Grant Ingersoll-6 wrote: ... Yep. Also note that I added deg() and rad() functions, but for the most part is probably better to do the conversion during indexing. ... Thanks Grant. I hadnt seen the deg and rad functions. Conversion would be difficult since I typically work with degrees. Once I get a bit more experienced with the solr code, maybe I can contribute a degree version of hsin :-) -- View this message in context: http://old.nabble.com/how-is-score-computed-with-hsin-functionquery--tp26504265p26515157.html Sent from the Solr - User mailing list archive at Nabble.com.
Where to put ExternalRequestHandler and Tika jars
My SOLR_HOME =/home/solr_1_4_0/apache-solr-1.4.0/example/solr/conf in tomcat.sh POI, PDFBox, Tika and related jars are under /home/solr_1_4_0/apache-solr-1.4.0/lib When I try to index files using SolrJ API as follow, I don't see content of the file being indexed. It only indexes file size (bytes) and file/type into content field. See below schema defintion as well. ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(file); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); server.request(up); schema.xml has following field name=issueKey type=slong indexed=true stored=true required=true / field name=content type=text indexed=true stored=true multiValued=true/ defaultSearchFieldcontent/defaultSearchField And solrconfig.xml has requestHandler name=/update/extract class=org.apache.solr.handler.extraction.ExtractingRequestHandler lst name=defaults str name=map.contentcontent/str str name=defaultFieldcontent/str /lst /requestHandler Luke response is as below, which displays correct count (7) of indexed documents but no content in the index. in tomcat logs I don't see any errors or anything. Unless I am going blind with something I don't see anything missing in setting things up. Can anyone advise. Do I need to include tika jars in tomcat's deployed solr/lib or unde /example/lib in SOLR_HOME? ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime28/int /lst - lst name=index int name=numDocs7/int int name=maxDoc7/int int name=numTerms25/int long name=version1259164190261/long bool name=optimizedfalse/bool bool name=currenttrue/bool bool name=hasDeletionsfalse/bool str name=directoryorg.apache.lucene.store.NIOFSDirectory:org.apache.lucene.store.NIOFSDirectory@/home/tomcat-solr/bin/docs/data/index/str date name=lastModified2009-11-25T15:50:03Z/date /lst - lst name=fields - lst name=content str name=typetext/str str name=schemaITSM--/str str name=indexITS--/str int name=docs7/int int name=distinct18/int - lst name=topTerms int name=text3/int int name=applic3/int int name=msword3/int int name=applicationmsword3/int int name=plain2/int int name=textplain2/int int name=701441/int int name=4531/int int name=23701/int int name=html1/int /lst - lst name=histogram int name=112/int int name=22/int int name=44/int /lst /lst - lst name=issueKey str name=typeslong/str str name=schemaI-SO-l/str str name=indexI-SO-/str int name=docs7/int int name=distinct7/int - lst name=topTerms int name=11/int int name=21/int int name=31/int int name=41/int int name=51/int int name=61/int int name=01/int /lst - lst name=histogram int name=17/int /lst /lst /lst - lst name=info - lst name=key str name=IIndexed/str str name=TTokenized/str str name=SStored/str str name=MMultivalued/str str name=VTermVector Stored/str str name=oStore Offset With TermVector/str str name=pStore Position With TermVector/str str name=OOmit Norms/str str name=LLazy/str str name=BBinary/str str name=CCompressed/str str name=fSort Missing First/str str name=lSort Missing Last/str /lst str name=NOTEDocument Frequency (df) is not updated when a document is marked for deletion. df values include deleted documents./str /lst /response -- View this message in context: http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26515579.html Sent from the Solr - User mailing list archive at Nabble.com.
Looking for Best Practices: Analyzers vs. UpdateRequestProcessors?
Hello, are there any general criteria when to use Analyzers to implement an indexing function and when it is better to use UpdateRequestProcessors? The main difference I found in the documentation was that UpdateRequestProcessors are able to manipulate several fields at once (create, read, update, delete), while Analyzers operate on the contents of a single field at once. Is that correct so far? Are there more experiences helping to decide which type of module to use implementing indexing modules? Are there differences in processing performance? Is one of the two APIs easier to learn/debug etc? If you have any Best Practices with that I would be very interested to hear about those. Andreas P.S. My experience with Search Engines is mainly with FAST where one uses Stages in a Pipeline no matter which feature to implement.
Re: Index Splitter
Koji Sekiguchi wrote: Giovanni Fernandez-Kincade wrote: You can't really use this if you have an optimized index, right? For optimized index, I think you can use MultiPassIndexSplitter. Correct - MultiPassIndexSplitter can handle any index - optimized or not, with or without deletions, etc. The cost for this flexibility is that it needs to read index files multiple times (hence multi-pass). -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Solr 1.4 search in more the one Core
I think NO, because there is a Crawler for fulltext indexig that permernently uptate the Indexes When you have a Crawler for documents,office ect, than i can switch to solr totaly.
Re: Where to put ExternalRequestHandler and Tika jars
g. I had to include tika and related parsing jars into tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake. apologies for all the noise. Thanks, -- View this message in context: http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html Sent from the Solr - User mailing list archive at Nabble.com.
Batch file upload using solrJ API
Is there an API to upload files over one connection versus looping through all the files and creating new ContentStreamUpdateRequest for each file. This, as expected, doesn't work if there are large number of files and quickly run into memory problems. Please advise. Thanks, -- View this message in context: http://old.nabble.com/Batch-file-upload-using-solrJ-API-tp26518167p26518167.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: converting over from sphinx
: way. In particular, I'm doing phrase searching into a corpus of : descriptions, such as I need help with a foo where I have a bunch of foo: : a foo is a subset of a bar often used to create briznatzes, etc. : : With Sphinx, I could convert I need help with a foo into *need* *help* : *with* *foo* and get pretty nice matches. With Solr, my understanding is : that you can only do wildcard matches on the suffix. In addition, stemming : only happens on non-wildcard terms. So, my first thought would be to convert : I need help with a foo into need need* help help* with with* foo foo*. First off, we need to make sure we have all our terminology in sync -- i'm not very familiar with Sphinx, so i'm not sure what types of vernacular are used there to describe various things, but in Solr/Lucene you have options regarding how you want text to be analyzed when it's indexed -- this analysis is what converts an arbitrary stream of characters into Terms that get indexed. at query time, it's very easy to match on terms, or boolean combinations of terms, and sequential phrases of terms -- you only need wildcard type functionality if you want to provide a wildcard expression that could match more then one individual term. In your specific example, if you just configured a basic wildcard tokenizer when you indexed your documents (ie: foo: a foo is a subset of a bar often used to create briznatzes) then at query time any of the individual words (foo, bar, etc...) would match that document. likewise a phrase query like need help with foo would match that text if you defined some stop words (like need and with) and specified a small amount of slop on your phrase queries. The point is: there are a lot of differnet ways to use Solr, and the terminology you are use to with Sphinx may not map exactly to some of the terminology you'll see in the SOlr docs/configs -- so please feel free to ask. -Hoss
Re: error with multicore CREATE action
: : Are there any use cases for CREATE where the instance directory : *doesn't* yet exist? I ask because I've noticed that Solr will create : an instance directory for me sometimes with the CREATE command. In ... : I guess when you try to add documents and an IndexWriter is opened, the data : directory is created if it does not exist. Since it calls File#mkdirs, all : parent directories are also created. I don't think Solr creates those : directories by itself. Shalin: I'm confused, wasn't this one of the original use cases for the CREATE command as part of the LotsOfCores work you and Noble have been pushing forward? I thought one of the goals was that a user could have a single solrconfig.xml+schema.xml on disk somewhere, and then at run time use the CREATE command to caused many, many new cores to be created (each with a new/unqiue instanceDir). If that isn't intended (and therefor: not handled well) then we should probably make the CREATE command test for the existence of the specified instanceDir and error if it doesn't already exist -- otherwise a typo in an instanceDir file path could lead to some really unexpected behavior. -Hoss
Re: why is XMLWriter declared as final?
: I don't think it needs to be final. Maybe it is final because it wasn't : designed to be extensible. Please open a jira issue. it really wasn't, and it probably shouldn't be ... there is another thread currently in progress (in response to SOLR-1592) about this. Given how kludgy the entire API is, i'd really prefer it not be made un-final .. it would need some serious overhaul/review to make it possible to subclass in a sensical way, and coming up with a new API is likely to make a lot more sense then trying to retrofit that one. -Hoss
Re: why is XMLWriter declared as final?
Hey Hoss, +1. I think we need to overhaul the whole API, even in light of the incremental progress I've been proposing and patching, etc., lately. I think it's good to do that incrementally, though, rather than all at once, especially considering SOLR is in 1.5-dev trunk stage atm. Cheers, Chris On 11/25/09 11:33 AM, Chris Hostetter hossman_luc...@fucit.org wrote: : I don't think it needs to be final. Maybe it is final because it wasn't : designed to be extensible. Please open a jira issue. it really wasn't, and it probably shouldn't be ... there is another thread currently in progress (in response to SOLR-1592) about this. Given how kludgy the entire API is, i'd really prefer it not be made un-final .. it would need some serious overhaul/review to make it possible to subclass in a sensical way, and coming up with a new API is likely to make a lot more sense then trying to retrofit that one. -Hoss ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: [SolrResourceLoader] Unable to load cached class-name
: : I've deployed the contents of dist/ into JBoss's lib directory for the : server I'm running and I've also copied the contents of lib/ into Please be specific ... what is dist/ what is lib/ ? ... are you talking about the top level dist and lib directories in a solr release, then those should *not* be copied into any directory for JBoss. everything you need to access core solr features is available in wht solr.war -- that is all you need to run the solr application. the only reason to ever copy any jars arround when dealing with solr is to load plugins (ie: your own, or things counts in the contrib directory of a solr release) and even then they should go in the special lib directory inside your Solr HOme directory so they are loaded by the appropraite classlaoder -- not in the top level class loader of your servlet container. : [SolrResourceLoader] Unable to load cached class-name : : org.apache.solr.search.FastLRUCache for shortname : : solr.FastLRUCachejava.lang.ClassNotFoundException: : org.apache.solr.search.FastLRUCache this is most likely because you have duplicate copies of (all of) the solr classes at various classloader levels -- the copies in the solr.war, and the copies you've put into the JBoss lib dir. having both can cause problems like this because of the rules involved with hierarchical classloaders. -Hoss
Re: why is XMLWriter declared as final?
Interesting. Well just to clarify my intentions a bit, I'll quickly explain what I was trying to do. I'm using the MLT component but because some of my stored fields are really big, I don't need (or want) all of the fields for my MLT docs in the response. I want my MLT docs to have only 2 fields, but I need my main docs fl to have all fields. So a simple override of the XMLWriter writeNamedList method would do the trick. All you have to do is check if the name == moreLikeThis. If so, process the docs and specify a different field list. If not, just call super(). Worked like a charm, but oh well. I really only need the Ruby response anyway, so I'll move on to that. I'm glad this spurred some interest though. -- It'd be great to let components have control over their fl value instead of having a global fl value for all doc lists within a writer? Matt On Wed, Nov 25, 2009 at 2:33 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : I don't think it needs to be final. Maybe it is final because it wasn't : designed to be extensible. Please open a jira issue. it really wasn't, and it probably shouldn't be ... there is another thread currently in progress (in response to SOLR-1592) about this. Given how kludgy the entire API is, i'd really prefer it not be made un-final .. it would need some serious overhaul/review to make it possible to subclass in a sensical way, and coming up with a new API is likely to make a lot more sense then trying to retrofit that one. -Hoss
Re: locks in solr
: Is there any article which explains the locks in solr?? : there is some info on solrconfig.txt which says that you can set the lock : type to none(NoLockFactory), single(SingleInstanceLockFactory), : NativeFSLockFactory and simple(SimpleFSLockFactory) which locks everytime we : create a new file. FYI: That's not at all what the SimpleFSLockFactory does. Index locking is a pretty low level Level concept -- there isn't really anything Solr specific about it. 90% of all Solr users shouldn't need to worry about it, ever. The only time it becomes an issue is if you are planning on doing something extremeley advanced dealing with the Lucene index files directly. if that's the case: your best bet is to read the Locking code and APIs in Lucene, and ask your questions on the java-us...@lucene mailing list. -Hoss
Re: error with multicore CREATE action
On Thu, Nov 26, 2009 at 12:43 AM, Chris Hostetter hossman_luc...@fucit.orgwrote: : : Are there any use cases for CREATE where the instance directory : *doesn't* yet exist? I ask because I've noticed that Solr will create : an instance directory for me sometimes with the CREATE command. In ... : I guess when you try to add documents and an IndexWriter is opened, the data : directory is created if it does not exist. Since it calls File#mkdirs, all : parent directories are also created. I don't think Solr creates those : directories by itself. Shalin: I'm confused, wasn't this one of the original use cases for the CREATE command as part of the LotsOfCores work you and Noble have been pushing forward? I thought one of the goals was that a user could have a single solrconfig.xml+schema.xml on disk somewhere, and then at run time use the CREATE command to caused many, many new cores to be created (each with a new/unqiue instanceDir). Yes, that is correct but those changes are not in trunk right now. We're planning to spend some time in the next few weeks in splitting that big patch into smaller ones, adding tests and pushing them into trunk. LotsOfCores still needs LotsOfWork :) -- Regards, Shalin Shekhar Mangar.
Re: PatternTokenizer question
: I think the answer to my question is contained in the wiki when discussing : the SynonymFilter, The Lucene QueryParser tokenizes on white space before : giving any text to the Analyzer. This would indeed explain what I am : getting. Next question - can I avoid that behavior? it's the nature of the lucene query parser -- whitespace is a meta character that provides instructions to the parse, just like '+', '\', '', etc... you could always use a quoted string (so the parser treats all of your input as one phrase) or try the field QParser (which is essentailly the same thing as using a quoted phrase but doesn't require the quotes, or respect any of the other escape characters) -Hoss
param version and diferences in /admin/ping response
Hi everyone! Can anyone tell me what's the meaning of the param version ?? There isn't anything about it in the Solr documentation. When I invoke the /admin/ping url, if the version value is between 0 and 2.1, the response looks like this: response responseHeader status0/status QTime5/QTime lst name=params str name=echoParamsall/str str name=rows10/str str name=echoParamsall/str str name=qsolrpingquery/str str name=qtstandard/str str name=version2.1/str /lst /responseHeader str name=statusOK/str /response And when the version value is anything different from that range, the response looks like this: response lst name=responseHeader int name=status0/int int name=QTime4/int lst name=params str name=echoParamsall/str str name=rows10/str str name=echoParamsall/str str name=qsolrpingquery/str str name=qtstandard/str /lst /lst str name=statusOK/str /response Tanks. Regards Nestor Oviedo
Re: param version and diferences in /admin/ping response
: Hi everyone! : Can anyone tell me what's the meaning of the param version ?? There : isn't anything about it in the Solr documentation. http://wiki.apache.org/solr/XMLResponseFormat#A.27version.27 -Hoss
Re: solr/jetty not working for anything other than localhost
first, check what port 8983 is bound to - should be listening on all interfaces netstat -an |grep 8983 You should see tcp0 0 0.0.0.0:8983 0.0.0.0:* LISTEN -Simon On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, if I try to use any other hostname jetty doesnt work, gives a blank page, if I telnet too the server/port it just disconnects. I tried editing the scripts.conf to change the hostname, that didnt seem to help. For example I tried editing my etc/hosts file and added: 127.0.0.1 solriscool then: ping solriscool PING solriscool (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms sh-3.2# telnet solriscool 8983 Trying 127.0.0.1... Connected to solriscool. Escape character is '^]'. GET / HTTP/1.1 Connection closed by foreign host. telnet localhost 8983 Trying ::1... Connected to localhost. Escape character is '^]'. GET /solr HTTP/1.1 Host: localhost HTTP/1.1 302 Found Location: http://localhost/solr/ Content-Length: 0 Server: Jetty(6.1.3) any ideas? thanks Joel
Re: solr/jetty not working for anything other than localhost
I see: tcp46 0 0 *.8983 *.* LISTEN tcp4 0 0 127.0.0.1.8983 *.* LISTEN thanks Joel On Nov 25, 2009, at 5:21 PM, simon wrote: first, check what port 8983 is bound to - should be listening on all interfaces netstat -an |grep 8983 You should see tcp0 0 0.0.0.0:8983 0.0.0.0:* LISTEN -Simon On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, if I try to use any other hostname jetty doesnt work, gives a blank page, if I telnet too the server/port it just disconnects. I tried editing the scripts.conf to change the hostname, that didnt seem to help. For example I tried editing my etc/hosts file and added: 127.0.0.1 solriscool then: ping solriscool PING solriscool (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms sh-3.2# telnet solriscool 8983 Trying 127.0.0.1... Connected to solriscool. Escape character is '^]'. GET / HTTP/1.1 Connection closed by foreign host. telnet localhost 8983 Trying ::1... Connected to localhost. Escape character is '^]'. GET /solr HTTP/1.1 Host: localhost HTTP/1.1 302 Found Location: http://localhost/solr/ Content-Length: 0 Server: Jetty(6.1.3) any ideas? thanks Joel
Re: solr/jetty not working for anything other than localhost
On Wed, Nov 25, 2009 at 5:27 PM, Joel Nylund jnyl...@yahoo.com wrote: I see: tcp46 0 0 *.8983 *.*LISTEN tcp4 0 0 127.0.0.1.8983 *.*LISTEN Not the same version of linux/netstat as mine, but I'd guess that the second line is the key to the problem -looks as though TCP over IPv4 is onl y listening on the localhost interface, which is a network configuration issue. what does the Solr log say after it's started - should be a line INFO: Started SelectChannelConnector @ 0.0.0.0:8983 -Simon thanks Joel On Nov 25, 2009, at 5:21 PM, simon wrote: first, check what port 8983 is bound to - should be listening on all interfaces netstat -an |grep 8983 You should see tcp0 0 0.0.0.0:8983 0.0.0.0:* LISTEN -Simon On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, if I try to use any other hostname jetty doesnt work, gives a blank page, if I telnet too the server/port it just disconnects. I tried editing the scripts.conf to change the hostname, that didnt seem to help. For example I tried editing my etc/hosts file and added: 127.0.0.1 solriscool then: ping solriscool PING solriscool (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms sh-3.2# telnet solriscool 8983 Trying 127.0.0.1... Connected to solriscool. Escape character is '^]'. GET / HTTP/1.1 Connection closed by foreign host. telnet localhost 8983 Trying ::1... Connected to localhost. Escape character is '^]'. GET /solr HTTP/1.1 Host: localhost HTTP/1.1 302 Found Location: http://localhost/solr/ Content-Length: 0 Server: Jetty(6.1.3) any ideas? thanks Joel
Re: solr/jetty not working for anything other than localhost
yes says: 2009-11-25 18:08:59.967::INFO: Started SocketConnector @ 0.0.0.0:8983 running on osx thanks Joel On Nov 25, 2009, at 6:00 PM, simon wrote: On Wed, Nov 25, 2009 at 5:27 PM, Joel Nylund jnyl...@yahoo.com wrote: I see: tcp46 0 0 *.8983 *.* LISTEN tcp4 0 0 127.0.0.1.8983 *.* LISTEN Not the same version of linux/netstat as mine, but I'd guess that the second line is the key to the problem -looks as though TCP over IPv4 is onl y listening on the localhost interface, which is a network configuration issue. what does the Solr log say after it's started - should be a line INFO: Started SelectChannelConnector @ 0.0.0.0:8983 -Simon thanks Joel On Nov 25, 2009, at 5:21 PM, simon wrote: first, check what port 8983 is bound to - should be listening on all interfaces netstat -an |grep 8983 You should see tcp0 0 0.0.0.0:8983 0.0.0.0:* LISTEN -Simon On Wed, Nov 25, 2009 at 3:55 PM, Joel Nylund jnyl...@yahoo.com wrote: Hi, if I try to use any other hostname jetty doesnt work, gives a blank page, if I telnet too the server/port it just disconnects. I tried editing the scripts.conf to change the hostname, that didnt seem to help. For example I tried editing my etc/hosts file and added: 127.0.0.1 solriscool then: ping solriscool PING solriscool (127.0.0.1): 56 data bytes 64 bytes from 127.0.0.1: icmp_seq=0 ttl=64 time=0.055 ms 64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.095 ms sh-3.2# telnet solriscool 8983 Trying 127.0.0.1... Connected to solriscool. Escape character is '^]'. GET / HTTP/1.1 Connection closed by foreign host. telnet localhost 8983 Trying ::1... Connected to localhost. Escape character is '^]'. GET /solr HTTP/1.1 Host: localhost HTTP/1.1 302 Found Location: http://localhost/solr/ Content-Length: 0 Server: Jetty(6.1.3) any ideas? thanks Joel
Re: Deduplication in 1.4
Hey Otis, Yep, I realized this myself after playing some with the dedupe feature yesterday. So it does look like Field collapsing is what I need pretty much. Any idea on how close it is to being production-ready? Thanks, -Chak Otis Gospodnetic wrote: Hi, As far as I know, the point of deduplication in Solr ( http://wiki.apache.org/solr/Deduplication ) is to detect a duplicate document before indexing it in order to avoid duplicates in the index in the first place. What you are describing is closer to field collapsing patch in SOLR-236. Otis -- Sematext is hiring -- http://sematext.com/about/jobs.html?mls Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR - Original Message From: KaktuChakarabati jimmoe...@gmail.com To: solr-user@lucene.apache.org Sent: Tue, November 24, 2009 5:29:00 PM Subject: Deduplication in 1.4 Hey, I've been trying to find some documentation on using this feature in 1.4 but Wiki page is alittle sparse.. In specific, here's what i'm trying to do: I have a field, say 'duplicate_group_id' that i'll populate based on some offline documents deduplication process I have. All I want is for solr to compute a 'duplicate_signature' field based on this one at update time, so that when i search for documents later, all documents with same original 'duplicate_group_id' value will be rolled up (e.g i'll just get the first one that came back according to relevancy). I enabled the deduplication processor and put it into updater, but i'm not seeing any difference in returned results (i.e results with same duplicate_id are returned separately..) is there anything i need to supply in query-time for this to take effect? what should be the behaviour? is there any working example of this? Anything will be helpful.. Thanks, Chak -- View this message in context: http://old.nabble.com/Deduplication-in-1.4-tp26504403p26504403.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Deduplication-in-1.4-tp26504403p26522386.html Sent from the Solr - User mailing list archive at Nabble.com.
Date ranges for indexes constructed outside Solr
I'm working on an application that will build indexes directly using the Lucene API, but will expose them to clients using Solr. I'm seeing plenty of documentation on how to support date range fields in Solr, but they all assume that you are inserting documents through Solr rather than merging already-generated indexes. Where can I find details about the Lucene-level field operations that can be used to generate date fields that Solr will work with? In particular date resolution settings are unclear. On a similar note: how much of schema.xml is relevant in cases where Solr is not performing insertions? Obviously defaultSearchField is as well as the solrQueryParser defaultOperator attribute, but it seems like most of the field declarations might not matter. thanks, Phil
Re: Where to put ExternalRequestHandler and Tika jars
HI! does your example finally works? I index the data with solrj and I have the same problem and could not retrieve file data. On Wed, Nov 25, 2009 at 3:41 PM, javaxmlsoapdev vika...@yahoo.com wrote: g. I had to include tika and related parsing jars into tomcat/webapps/solr/WEB-INF/lib.. this was an embarrassing mistake. apologies for all the noise. Thanks, -- View this message in context: http://old.nabble.com/Where-to-put-ExternalRequestHandler-and-Tika-jars-tp26515579p26518100.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Trouble Configuring WordDelimiterFilterFactory
Hello, Would really appreciate any inputs/suggestions on this. Thank you. On Tue, Nov 24, 2009 at 10:59 PM, Rahul R rahul.s...@gmail.com wrote: Hello, In our application we have a catch-all field (the 'text' field) which is cofigured as the default search field. Now this field will have a combination of numbers, alphabets, special characters etc. I have a requirement wherein the WordDelimiterFilterFactory does not work on numbers, especially those with decimal points. Accuracy of results with relevance to numerical data is quite important, So if the text field of a document has data like Bridge-Diode 3.55 Volts, I want to make sure that a search for 355 or 35.5 does not retrieve this document. So I found the following setting for the WordDelimiterFilterFactory to work for me (for most parts): filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=0 catenateWords=1 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0 splitOnNumerics=0 preserveOriginal=1/ I am using the same setting for both index and query. Now the only problem is, if I have data like .355. With the above setting, the analysis jsp shows me that WordDelimiterFilterFactory is creating term texts as both .355' and 355. So a search for .355 retrieves documents containing both .355 and 355. A search for 355 also has the same effect. I noticed that when the entry for the WordDelimiterFilterFactory was completely removed (both index and query), then the above problem was resolved. But this seems too harsh a measure. Is there a way by which I can prevent the WordDelimiterFilterFactory from totally acting on numerical data ? Regards Rahul