SOLR 1.4: how to configure the improved chinese analyzer?
Hello, is there any existing FAQ or HowTo on how to setup the improved (and new?) chinese analyzer on Solr 1.4? I'd appreciate any help you may provide on this. Thanks, -- View this message in context: http://old.nabble.com/SOLR-1.4%3A-how-to-configure-the-improved-chinese-analyzer--tp26706709p26706709.html Sent from the Solr - User mailing list archive at Nabble.com.
Selection of returned fields - dynamic fields?
Hi Guys, We need to eliminate one of our stored fields from the Solr response to reduce traffic as it is very bulky and not used externally. I have been experimenting both with fl=FIELDNAME and addField(FIELDNAME) from SolrJ and have found it is possible to achieve this effect for fixed fields by starting with an empty list and adding the field names explicitly in the request. Unfortunately this does not seem to work for dynamic fields - fl=PREFIX* does not return anything, and neither does fl=*POSTFIX. What seems to be missing from Solr is a removeField(FIELDNAME) method in SolrJ, or a fl=-FIELDNAME query parameter to remove the fixed field. Is such a feature planned, or is there a workaround that I have missed? Regards, Ian.
Re: indexing XML with solr example webapp - out of java heap space
the post.jar does not stream. use curl if you are using *nix. --Noble On Wed, Dec 9, 2009 at 12:28 AM, Feroze Daud fero...@zillow.com wrote: Hi! I downloaded SOLR and am trying to index an XML file. This XML file is huge (500M). When I try to index it using the post.jar tool in example\exampledocs, I get a out of java heap space error in the SimplePostTool application. Any ideas how to fix this? Passing in -Xms1024M does not fix it. Feroze. -- - Noble Paul | Systems Architect| AOL | http://aol.com
DIH solrconfig
Hi All There seems to be massive difference between the solrconfig in the DIH example to the one in the normal example ? Would I be correct in saying if I was to add the dataimport request handler in the solrconfig.xml thats all I will need ? ie: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler Is this correct ? Lee
Re: Solr Cell and Spellchecking.
What's your schema and your config look like for the various relevant pieces? On Dec 8, 2009, at 8:04 PM, Michael Boyle wrote: Following Eric Hatcher's post about using SolrCell and acts_as_solr { http://www.lucidimagination.com/blog/2009/02/17/acts_as_solr_cell/ }, I have been able to index a rich document stream and retrieve it's id. No worries. However, I have the SpellCheckComponent setup to build on commit (buildOnCommit=true). Alas, the rich document text is not being added to the spellchecker dictionary. Is there something special I need to do within the SolrConfig.xml or within the acts_as_solr ruby classes? - thanks in advance for any ideas - Mike Boyle -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: DIH solrconfig
On Wed, Dec 9, 2009 at 3:34 PM, Lee Smith l...@weblee.co.uk wrote: Hi All There seems to be massive difference between the solrconfig in the DIH example to the one in the normal example ? Would I be correct in saying if I was to add the dataimport request handler in the solrconfig.xml thats all I will need ? ie: requestHandler name=/dataimport class=org.apache.solr.handler.dataimport.DataImportHandler lst name=defaults str name=configdb-data-config.xml/str /lst /requestHandler Is this correct ? yep . this is all you need Lee -- - Noble Paul | Systems Architect| AOL | http://aol.com
Hi. What Configuration we require?
Hi To run a Solr-1.3.0 with Data/index directory size of 11GB, 80 lakhs documents and 11 lakhs read request and 30 thousand writes. Every month 200mb of index directory size getting increase. Please suggest me. What type of configuration(CPU, Ram, hard disk) server require to make the solr as Stable. Thanks, Kalidoss.m, Get your world in your inbox! Mail, widgets, documents, spreadsheets, organizer and much more with your Sifymail WIYI id! Log on to http://www.sify.com ** DISCLAIMER ** Information contained and transmitted by this E-MAIL is proprietary to Sify Limited and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If this is a forwarded message, the content of this E-MAIL may not have been sent with the authority of the Company. If you are not the intended recipient, an agent of the intended recipient or a person responsible for delivering the information to the named recipient, you are notified that any use, distribution, transmission, printing, copying or dissemination of this information in any way or in any manner is strictly prohibited. If you have received this communication in error, please delete this mail notify us immediately at ad...@sifycorp.com
Re: Re: Solr Cell and Spellchecking.
I just resolved the issue (fresh coffee == good) ! In my schema, I had added: field name=text type=text indexed=true stored=false multivalued=true / but missed the copyField definition. Adding these: copyField source=text dest=a_spell/ copyField source=text dest=a_spellPhrase/ and a restart and everything is working properly. Thanks for the reply and for LucidImagination -- the only reason I have been able to get Solr integrated into our ruby app. -Mike
RE: Facet query with special characters
Hi, Thanks for your help and answers. I believe I have isolated the issue, and yes, it was 'schema/write'-related. Basically, the issue was this: All indexing is performed via solrj objects (to an EmbeddedSolrServer instance), and this was ported over from 'raw' Lucene java indexing code. When I moved over to SolrJ, I hadn't realized that the schema.xml file will then affect all writes for the given type. Once I sorted out my schema properly, and reindexed - queries started behaving as expected. Thank you very much for your excellent insight - I'm quite new to Solr, so it's really great to have an expert show me the err of my ways. I had only recently discovered the power of debugQuery=true - awesomely good! Many thanks again, Peter Date: Tue, 8 Dec 2009 09:35:31 -0800 From: hossman_luc...@fucit.org To: solr-user@lucene.apache.org Subject: RE: Facet query with special characters : Note that I am (supposed to be) indexing/searching without analysis : tokenization (if that's the correct term) - i.e. field values like : 'pds-comp.domain' shouldn't be (and I believe aren't) broken up as in : 'pds', 'comp' 'domain' etc. (e.g. using the 'text_ws' fieldtype). ... : What would be your opinion on the best way to index/analyze/not-analyze such fields? a whitespace tokenizer is probeably the best bet, but in order to be certain what's going on, you would need to look at a few things (and if you wanted help from other people, you would need to post those things) that i mentioned before : check your analysis configuration for this fieldtype, in particular look : at what debugQuery produces for your parsed query, and look at what : analysis.jsp says it will do at query time with the input string : pds-comp.domain ... because it sounds like you have a disconnect between : how the text is indexed and how it is searched. adding a * to your ...so what does your schema look like, what is the outputfrom debugQuery, what is the output from analysis.jsp, etc... -Hoss _ Have more than one Hotmail account? Link them together to easily access both http://clk.atdmt.com/UKM/go/186394591/direct/01/
Re: SOLR 1.4: how to configure the improved chinese analyzer?
hello, in order to use smart chinese analyzer with Solr 1.4 (it is not yet included), you need to go get the lucene-smartcn.jar file from lucene-2.9.1.zip and put this jar file in your solr lib directory then you can define a field type similar to the the greek example in schema.xml: !-- One can also specify an existing Analyzer class that has a default constructor via the class attribute on the analyzer element fieldType name=text_greek class=solr.TextField analyzer class=org.apache.lucene.analysis.el.GreekAnalyzer/ /fieldType -- except you need to use org.apache.lucene.analysis.cn.smart.SmartChineseAnalyzer On Wed, Dec 9, 2009 at 3:27 AM, Fer-Bj fernando.b...@gmail.com wrote: Hello, is there any existing FAQ or HowTo on how to setup the improved (and new?) chinese analyzer on Solr 1.4? I'd appreciate any help you may provide on this. Thanks, -- View this message in context: http://old.nabble.com/SOLR-1.4%3A-how-to-configure-the-improved-chinese-analyzer--tp26706709p26706709.html Sent from the Solr - User mailing list archive at Nabble.com. -- Robert Muir rcm...@gmail.com
Logging
Im trying to import data with DIH (mysql) All my SQL's are good having been tested manually. When I run full import ie: http://localhost:8983/solr/dataimport?command=full-import I get my XML result but nothing is being imported and it Rolles back. In loggin I set DIH logging to fine and set them then re-run but I can seem to find detailed logs. Im looking at the log in example/logs/ but its just giving basic logs still ? How can I find out whats going on ?? Thank you if you can advise. Lee
Facet across second level of hierarchy - Dismax Request Handler - Best practice?
Hello, i want a second level of hierarchy in my facets (as seen here: http://www.lucidimagination.com/search/?q=) My RequestHandler is the following: requestHandler name=estudy class=solr.SearchHandler lst name=defaults str name=defTypedismax/str str name=q.alt*:*/str str name=rows10/str str name=qf courseid^1.0 module^1.0 vorname^1.0 nachname^1.0 email^1.0 postauthor^1.0 posttext^1.0 posttime^1.0 threadtopic^1.0 threadauthor^1.0 ShortName^1.0 title^1.0 content^1.0 doc_name^1.0 doc_content^1.0 doc_author^1.0 doc_contenttype^1.0 /str /lst /requestHandler the example on the end of the wiki page about about HierarchicalFaceting doesn't work (i think because of the level1_s:A - this doesn't work on a dismax requesthandler, right?) http://localhost:8983/solr/select?q=*:*rows=0facet=onfacet.field=level2_sfq=level1_s:Afacet.mincount=1 i want to do the following: faceting on module (file, forum...) and for example if file is checked, faceting on doc_contenttype (msword, pdf, ...). what is the best practice for that? is there a build in functionality? thanks in advance. Regards, Daniel smime.p7s Description: S/MIME cryptographic signature
how to use boost factor
Hi, While searching (querying) the solr, how can we achieved following scenario. Search priority should be in the following order: 1. Genre 2. nowplaying 3. Stationname 4. Keywords Say I am searching for rock it should search in genre field first and then nowplaying then stationname and then keyword fields, what would be the query Thanks in advance Prakash
atypical MLT use-case
This is somewhat of an odd use-case for MLT. Basically I'm using it for near-duplicate detection (I'm not using the built in dup detection for a variety of reasons). While this might sound like an okay idea, the problem lies in the order of which things happen. Ideally, duplicate detection would prevent me from adding a document to my index which is already there (or at least partially there). However, more like this only works on documents which are *already* in the index. Ideally what I would be able to do is: post an xml document to solr, and receive a MLT response (the same kind of MLT response I would recieve had the document been in Solr already, and queried with id=#{id}mlt=true). Is anybody aware of how I could achieve this functionality leveraging existing handlers? If not I will bump over to solr-dev and see if this is a tractable problem. Thanks in advance, Mike
Re: how to use boost factor
I don't quite understand what you mean by priority. Are you clear about the difference between boosting and sorting? If you're sure you want to boost, have you seen: http://wiki.apache.org/solr/DisMaxRequestHandler#bq_.28Boost_Query.29 http://wiki.apache.org/solr/DisMaxRequestHandler#bq_.28Boost_Query.29Best Erick On Wed, Dec 9, 2009 at 11:05 AM, Doddamani, Prakash prakash.doddam...@corp.aol.com wrote: Hi, While searching (querying) the solr, how can we achieved following scenario. Search priority should be in the following order: 1. Genre 2. nowplaying 3. Stationname 4. Keywords Say I am searching for rock it should search in genre field first and then nowplaying then stationname and then keyword fields, what would be the query Thanks in advance Prakash
Concurrent access to EmbeddedSolrServer
Hi there, I'm about to start implementing some code which will access a Solr instance via a ThreadPool concurrently. I've been looking at the solrj API docs ( particularly http://lucene.apache.org/solr/api/index.html?org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html ) and I just want to make sure what I have in mind makes sense. The Javadoc is a bit sparse, so I thought I'd ask a couple of questions here. 1) I'm assuming that EmbeddedSolrServer can be accessed concurrently by several threads at once for add, delete and query operations (on the SolrServer parent interface). Is that right? I don't have to enforce single-threaded access? 2) What happens if multiple threads simultaneously call commit? 3) What happens if multiple threads simultaneously call optimize? 4) Both commit and optimise have optional parameters called waitFlush and waitSearcher. These are undocumented in the Javadoc. What do they signify? Thanks in advance for any help. Cheers Jon
Re: atypical MLT use-case
the solr 1.4 book says you can do this. usages of mlt: As a request handler with an external input document: What if you want similarity results based on something that isn't in the index? A final option that Solr supports is returning MLT results based on text data sent to the MLT handler (through HTTP POST). For example, if you were to send a text file to the handler, then Solr's MLT handler would return the documents in the index that are most similar to it. This is atypical but an interesting option nonetheless. not sure about the details of how though as i haven't used mlt myself. On 09/12/09 17:27, Mike Anderson wrote: This is somewhat of an odd use-case for MLT. Basically I'm using it for near-duplicate detection (I'm not using the built in dup detection for a variety of reasons). While this might sound like an okay idea, the problem lies in the order of which things happen. Ideally, duplicate detection would prevent me from adding a document to my index which is already there (or at least partially there). However, more like this only works on documents which are *already* in the index. Ideally what I would be able to do is: post an xml document to solr, and receive a MLT response (the same kind of MLT response I would recieve had the document been in Solr already, and queried with id=#{id}mlt=true). Is anybody aware of how I could achieve this functionality leveraging existing handlers? If not I will bump over to solr-dev and see if this is a tractable problem. Thanks in advance, Mike
SolrQuerySyntax : Types of Range Queries in Solr 1.4
Hi Guys, In Lucene 2.9 and Solr 1.4, it is possible to perform inclusive and exclusive range searches with square and curly brackets respectively. However, when I looked at the SolrQuerySyntax, only the the include range search is illustrated. It seems like the examples only talk about the inclusive range searches. http://wiki.apache.org/solr/SolrQuerySyntax Illustrative example: There is a field in the index name 'year' and it contains the following values : 2000, 2004, 2005, 2006, 2007, 2008, 2009, 2010 year:[2005 TO 2009] will match 2005, 2006, 2007, 2008, 2009 [inclusive with square brackets] year:{2005 TO 2009} will only match 2006, 2007, 2008 {exclusive with curly brackets}. The bounds are not included. Is there any other page on the wiki where there are examples of exclusive range searches with curly brackets? If not I would like to know so that I can add some examples to the wiki. Thanks. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
copyField question
All, Can one use the copyField option and copy a TextField field into a longField field? I have some data that i want to extract (filter) out all but the long and/or integer values. Example data:xxx yyy aaa 504 yyy 444234 eee hh I have the copyField in place and the destination field gets the numberic terms after filtering, 514 444234, when the destination field is a TextField. I then stopped solr, deleted all the files in the data directory, changed the destination field type to longField, started solr, and then inserted a few documents. When i use the schema browser so look at the top N values in the destination field i see the whole copied text for the terms and not the numeric values I expected. The top terms for the the destination field looked like: xxx yyy aaa 504 yyy 444234 eee hh Any idea why I might be seeing this? I did find something that might be better suited for what I want to do and that is the TeeToken and TeeSinkToken filters. Are those filters usable from solr? I guess I'll find out in a few minutes when i try to configure solr to use them. Thanks, Pete
Re: SolrQuerySyntax : Types of Range Queries in Solr 1.4
Solr standard query syntax is an extension of Lucene query syntax, and we reference that on the page: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html -Yonik http://www.lucidimagination.com On Wed, Dec 9, 2009 at 1:08 PM, Israel Ekpo israele...@gmail.com wrote: Hi Guys, In Lucene 2.9 and Solr 1.4, it is possible to perform inclusive and exclusive range searches with square and curly brackets respectively. However, when I looked at the SolrQuerySyntax, only the the include range search is illustrated. It seems like the examples only talk about the inclusive range searches. http://wiki.apache.org/solr/SolrQuerySyntax Illustrative example: There is a field in the index name 'year' and it contains the following values : 2000, 2004, 2005, 2006, 2007, 2008, 2009, 2010 year:[2005 TO 2009] will match 2005, 2006, 2007, 2008, 2009 [inclusive with square brackets] year:{2005 TO 2009} will only match 2006, 2007, 2008 {exclusive with curly brackets}. The bounds are not included. Is there any other page on the wiki where there are examples of exclusive range searches with curly brackets? If not I would like to know so that I can add some examples to the wiki. Thanks. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Exception encountered during replication on slave....Any clues?
try the url http://localhost:8080/postingsmaster/replication?command=indexversion using ur browser On Tue, Dec 8, 2009 at 9:56 PM, William Pierce evalsi...@hotmail.com wrote: Hi, Noble: When I hit the masterUrl from the slave box at http://localhost:8080/postingsmaster/replication I get the following xml response: ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int /lst str name=statusOK/str str name=messageNo command/str /response And then when I look in the logs, I see the exception that I mentioned. What exactly does this error mean that replication is not available. By the way, when I go to the admin url for the slave and click on replication, I see a screen with the master url listed (as above) and the word unreachable after it. And, of course, the same exception shows up in the tomcat logs. Thanks, - Bill -- From: Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com Sent: Monday, December 07, 2009 9:20 PM To: solr-user@lucene.apache.org Subject: Re: Exception encountered during replication on slaveAny clues? are you able to hit the http://localhost:8080/postingsmaster/replication using a browser from the slave box. if you are able to hit it what do you see? On Tue, Dec 8, 2009 at 3:42 AM, William Pierce evalsi...@hotmail.com wrote: Just to make doubly sure, per tck's suggestion, I went in and explicitly added in the port in the masterurl so that it now reads: http://localhost:8080/postingsmaster/replication Still getting the same exception... I am running solr 1.4, on Ubuntu karmic, using tomcat 6 and Java 1.6. Thanks, - Bill -- From: William Pierce evalsi...@hotmail.com Sent: Monday, December 07, 2009 2:03 PM To: solr-user@lucene.apache.org Subject: Re: Exception encountered during replication on slaveAny clues? tck, thanks for your quick response. I am running on the default port (8080). If I copy that exact string given in the masterUrl and execute it in the browser I get a response from solr: ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int /lst str name=statusOK/str str name=messageNo command/str /response So the masterUrl is reachable/accessible so far as I am able to tell Thanks, - Bill -- From: TCK moonwatcher32...@gmail.com Sent: Monday, December 07, 2009 1:50 PM To: solr-user@lucene.apache.org Subject: Re: Exception encountered during replication on slaveAny clues? are you missing the port number in the master's url ? -tck On Mon, Dec 7, 2009 at 4:44 PM, William Pierce evalsi...@hotmail.comwrote: Folks: I am seeing this exception in my logs that is causing my replication to fail. I start with a clean slate (empty data directory). I index the data on the postingsmaster using the dataimport handler and it succeeds. When the replication slave attempts to replicate it encounters this error. Dec 7, 2009 9:20:00 PM org.apache.solr.handler.SnapPuller fetchLatestIndex SEVERE: Master at: http://localhost/postingsmaster/replication is not available. Index fetch failed. Exception: Invalid version or the data in not in 'javabin' format Any clues as to what I should look for to debug this further? Replication is enabled as follows: The postingsmaster solrconfig.xml looks as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=master !--Replicate on 'optimize' it can also be 'commit' -- str name=replicateAftercommit/str !--If configuration files need to be replicated give the names here . comma separated -- str name=confFiles/str /lst /requestHandler The postings slave solrconfig.xml looks as follows: requestHandler name=/replication class=solr.ReplicationHandler lst name=slave !--fully qualified url for the replication handler of master -- str name=masterUrlhttp://localhost/postingsmaster/replication /str !--Interval in which the slave should poll master .Format is HH:mm:ss . If this is absent slave does not poll automatically. But a snappull can be triggered from the admin or the http API -- str name=pollInterval00:05:00/str /lst /requestHandler Thanks, - Bill -- - Noble Paul | Systems Architect| AOL | http://aol.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: SolrQuerySyntax : Types of Range Queries in Solr 1.4
On Wed, Dec 9, 2009 at 1:13 PM, Yonik Seeley yo...@lucidimagination.comwrote: Solr standard query syntax is an extension of Lucene query syntax, and we reference that on the page: http://lucene.apache.org/java/2_4_0/queryparsersyntax.html -Yonik http://www.lucidimagination.com On Wed, Dec 9, 2009 at 1:08 PM, Israel Ekpo israele...@gmail.com wrote: Hi Guys, In Lucene 2.9 and Solr 1.4, it is possible to perform inclusive and exclusive range searches with square and curly brackets respectively. However, when I looked at the SolrQuerySyntax, only the the include range search is illustrated. It seems like the examples only talk about the inclusive range searches. http://wiki.apache.org/solr/SolrQuerySyntax Illustrative example: There is a field in the index name 'year' and it contains the following values : 2000, 2004, 2005, 2006, 2007, 2008, 2009, 2010 year:[2005 TO 2009] will match 2005, 2006, 2007, 2008, 2009 [inclusive with square brackets] year:{2005 TO 2009} will only match 2006, 2007, 2008 {exclusive with curly brackets}. The bounds are not included. Is there any other page on the wiki where there are examples of exclusive range searches with curly brackets? If not I would like to know so that I can add some examples to the wiki. Thanks. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ Hi Yonik, I saw that. I posted the question because someone asked me how to do the exclusive search where the bounds are excluded. Initially they started with field:[lower-1 TO upper-1] and then I just told them to use curly brackets so when I came to the Solr wiki to do a search I did not see any examples with the curly brackets. For me this was very obvious, but I think it would be nice to add a few examples with curly brackets to the SolrQuerySyntax examples because most people that are using Solr for the very first time may not have heard of or used Lucene before. Just a thought. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: Selection of returned fields - dynamic fields?
: Unfortunately this does not seem to work for dynamic fields - you can definiltely ask for a field that exists because of a dynamicField by name, but you can't use wildcard style patterns in the fl param. : fl=PREFIX* does not return anything, and neither does fl=*POSTFIX. : What seems to be missing from Solr is a removeField(FIELDNAME) method in : SolrJ, or a fl=-FIELDNAME query parameter to remove the fixed field. : : Is such a feature planned, or is there a workaround that I have missed? There's been a lot of discussion about it over the years, the crux of the problem is that it's hard to come up with a good way of dealing with field names using meta characters that doesn't make it hard for people to actaully use those metacharacters in their field names... http://wiki.apache.org/solr/FieldAliasesAndGlobsInParams -Hoss
Re: Can we build complex filter queries in SOLR
Can you please let me know how to describe that condition. For example lets say i want to give the following condition ((category:audio or category:video) AND (brand:sony OR brand:samsung OR brand:sanyo)) How would you represent this condition in fq paramenter of dismax str name=fqcondition goes here/str is it represented in lucene syntax. Please let me know darniz Alessandro Ferrucci-3 wrote: yeah that is possible, I just tried on one of my solr instances..let's say you have an index of player names: (first-name:Tim AND last-name:Anderson) OR (first-name:Anwar AND last-name:Johnson) OR (conference:Mountain West) will give you the results that logically match this query.. HTH. Alessandro Ferrucci :) On 9/17/07, Dilip.TS dilip...@starmarksv.com wrote: Hi, I would like to know if we can build a complex filter queryString in SOLR using the following condition. (Field1 = abc AND Field2 = def) OR (Field3 = abcd AND Field4 = defgh AND (...)). so on... Thanks in advance Regards, Dilip TS -- View this message in context: http://old.nabble.com/Can-we-build-complex-filter-queries-in-SOLR-tp12735426p26717914.html Sent from the Solr - User mailing list archive at Nabble.com.
content stream/MLT
I'm trying to understand how content stream works with respect to MLT. I did a regular MLT query using a document ID and specifying two fields to do MLT on and got back a set of results. I then copied the xml for the document with the aforementioned ID and pasted it to a text file. Then I made the query with stream.file=mlt_doc.xml, but my result set was completely different and didn't really make sense. Am I not using content streams correctly here? Or does solr not use the schema when accepting a content stream? Thanks in advance, Mike
Re: content stream/MLT
The MoreLikeThis content stream support is implemented such that the content stream is simply text, analyzed as if it was in the mlt.fl. It doesn't handle Solr XML as you'd expect - simply treats it as a string and analyzes it to get the terms out. Erik On Dec 9, 2009, at 10:21 PM, Mike Anderson wrote: I'm trying to understand how content stream works with respect to MLT. I did a regular MLT query using a document ID and specifying two fields to do MLT on and got back a set of results. I then copied the xml for the document with the aforementioned ID and pasted it to a text file. Then I made the query with stream.file=mlt_doc.xml, but my result set was completely different and didn't really make sense. Am I not using content streams correctly here? Or does solr not use the schema when accepting a content stream? Thanks in advance, Mike
Re: Stopping Starting
This would be a handy addition to solr-contrib. The further evolution we had is that sometimes java freezes the the 'stop' command does not work. It is better to use the 'stop' command than kill the process, so we added a sleep command that gave it maybe 30 seconds to shut down and then hit it with 'kill'. 'pkill -f start.jar' is nice, wish we had known about it. On Mon, Dec 7, 2009 at 2:35 PM, regany re...@newzealand.co.nz wrote: Lee Smith-6 wrote: So how can I stop and restart the service ? Hope you can help get me going again. Thank you Lee I found this shell script which works well for me... #!/bin/sh -e # Starts, stops, and restarts solr SOLR_DIR=/usr/local/solr/example JAVA_OPTIONS=-Xmx1024m -DSTOP.PORT=8079 -DSTOP.KEY=stopkey -jar start.jar LOG_FILE=/var/log/solr.log JAVA=/usr/bin/java case $1 in start) echo Starting Solr cd $SOLR_DIR $JAVA $JAVA_OPTIONS 2 $LOG_FILE ;; stop) echo Stopping Solr cd $SOLR_DIR $JAVA $JAVA_OPTIONS --stop ;; restart) $0 stop sleep 1 $0 start ;; *) echo Usage: $0 {start|stop|restart} 2 exit 1 ;; esac -- View this message in context: http://old.nabble.com/Stopping---Starting-tp26633950p26685498.html Sent from the Solr - User mailing list archive at Nabble.com. -- Lance Norskog goks...@gmail.com
does fq parameter effects boosting
Hello can somone please answer this. someone told me that using fq parameter in the dismax handler might cuase some relevancy and weighting issues. I haven't read this anywhere. Please let me know if this is the case. Thanks darniz -- View this message in context: http://old.nabble.com/does-fq-parameter-effects-boosting-tp26718016p26718016.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: does fq parameter effects boosting
fq's are filters and have no effect on the relevancy scores generated for documents. They only affect which documents are matched. -Yonik http://www.lucidimagination.com On Wed, Dec 9, 2009 at 5:00 PM, darniz rnizamud...@edmunds.com wrote: Hello can somone please answer this. someone told me that using fq parameter in the dismax handler might cuase some relevancy and weighting issues. I haven't read this anywhere. Please let me know if this is the case. Thanks darniz -- View this message in context: http://old.nabble.com/does-fq-parameter-effects-boosting-tp26718016p26718016.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: SolrPlugin Guidance
Thanks for the response. I went ahead and gave it a shot. In my case, the directory name may not be unique so if I get multiple ids back then I create a BooleanQuery (Occur.SHOULD) to substitute the directory name query. This seems to work at the moment so hopefully that's the right approach. Thanks, Laurent Vauthrin -Original Message- From: solr-user-return-30054-laurent.vauthrin=disney@lucene.apache.org [mailto:solr-user-return-30054-laurent.vauthrin=disney@lucene.apache .org] On Behalf Of Chris Hostetter Sent: Monday, December 07, 2009 12:17 PM To: solr-user@lucene.apache.org Subject: RE: SolrPlugin Guidance : e.g. For the following query that looks for a file in a directory: : q=+directory_name:myDirectory +file_name:myFile : : We'd need to decompose the query into the following two queries: : 1. q=+directory_name:myDirectoryfl=directory_id : 2. q=+file_name:myFile +directory_id:(results from query #1) : : I guess I'm looking for the following feedback: : - Does this sound crazy? it's a little crazy, but not absurd. : - Is the QParser the right place for this logic? If so, can I get a : little more guidance on how to decompose the queries there (filter : queries maybe)? a QParser could work. (and in general, if you can solve something with a QParser that's probably for the best, since it allows the most reuse). but exactly how to do it depends on how many results you expect from your first query: if you are going to structure things so they have to uniquely id a directory, and you'll have a singleID, then this is something that could easily make sense in a QParser (you are essentailly just rewriting part of the query from string to id -- you just happen to be using solr as a lookup table for those strings). but if you plan to support any arbitrary N directories, then you may need something more complicated ... straight filter queries won't help much because you'll want the union instead of hte intersection, so for every directoryId you find, use it as a query to get a DocSet and then maintain a running union of all those DocSets to use as your final filter (hmm... that may not actually be possible with the QParser API ... i haven't look at ti in a while, but for an approach like this you may beed to subclass QueryComponent instead) -Hoss
Re: does fq parameter effects boosting
Thanks Yonik The question i was asking was that since filter queries are cached, if i change the relevancy model the cached queries wont be returned. correct me if i am wrong. Yonik Seeley-2 wrote: fq's are filters and have no effect on the relevancy scores generated for documents. They only affect which documents are matched. -Yonik http://www.lucidimagination.com On Wed, Dec 9, 2009 at 5:00 PM, darniz rnizamud...@edmunds.com wrote: Hello can somone please answer this. someone told me that using fq parameter in the dismax handler might cuase some relevancy and weighting issues. I haven't read this anywhere. Please let me know if this is the case. Thanks darniz -- View this message in context: http://old.nabble.com/does-fq-parameter-effects-boosting-tp26718016p26718016.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/does-fq-parameter-effects-boosting-tp26718016p26719680.html Sent from the Solr - User mailing list archive at Nabble.com.
full-text indexing XML files
Hi! I am trying to full text index an XML file. For various reasons, I cannot use Tika or other technology to parse the XML file. The requirement is to full-text index the XML file, including Tags and everything. So, I created a input index spec like this: add doc field name=id1001/field field name=nameNASA Advanced Research Labs/field field name=address1010 Main Street, Chattanooga, FL 32212/field field name=contentlistingid1001/id name NASA Advanced Research Labs / name address1010 main street, chattanooga, FL 32212/address/listing/field /doc /add When I try to pump this into SLOR with java -jar post.jar I get an exception saying: SimplePostTool: version 1.2 SimplePostTool: WARNING: Make sure your XML documents are encoded in UTF-8, other encodings are not currently supported SimplePostTool: POSTing files to http://localhost:8983/solr/update.. SimplePostTool: POSTing file index.doc SimplePostTool: FATAL: Solr returned an error: unexpected_XML_tag_doclisting Any idea what I am doing wrong? Does the Solr index generator support inner XML content in it's field tags? I tried enclosing the innerXML in ![CDATA[]] but that didn't work either. Any help appreciated. Thanks Feroze.
Re: does fq parameter effects boosting
On Wed, Dec 9, 2009 at 6:37 PM, darniz rnizamud...@edmunds.com wrote: The question i was asking was that since filter queries are cached, if i change the relevancy model the cached queries wont be returned. Not sure I understand the question... is there something that you think that Solr won't handle properly? Or is there something that you want Solr to handle differently? If not, assume Solr does the right thing and report back to us if it doesn't :-) -Yonik http://www.lucidimagination.com
Re: Hi. What Configuration we require?
Hi. To run a Solr-1.3.0 with Data/index directory size of 11GB, 80 lakhs documents and 11 lakhs read request and 30 thousand writes. Every month 200mb of index directory size getting increase. Please suggest me. What type of configuration(CPU, Ram, hard disk) server require to make the solr as Stable. Thanks, Kalidoss.m, kalidoss wrote: Hi To run a Solr-1.3.0 with Data/index directory size of 11GB, 80 lakhs documents and 11 lakhs read request and 30 thousand writes. Every month 200mb of index directory size getting increase. Please suggest me. What type of configuration(CPU, Ram, hard disk) server require to make the solr as Stable. Thanks, Kalidoss.m, Get your world in your inbox! Mail, widgets, documents, spreadsheets, organizer and much more with your Sifymail WIYI id! Log on to http://www.sify.com Get your world in your inbox! Mail, widgets, documents, spreadsheets, organizer and much more with your Sifymail WIYI id! Log on to http://www.sify.com ** DISCLAIMER ** Information contained and transmitted by this E-MAIL is proprietary to Sify Limited and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If this is a forwarded message, the content of this E-MAIL may not have been sent with the authority of the Company. If you are not the intended recipient, an agent of the intended recipient or a person responsible for delivering the information to the named recipient, you are notified that any use, distribution, transmission, printing, copying or dissemination of this information in any way or in any manner is strictly prohibited. If you have received this communication in error, please delete this mail notify us immediately at ad...@sifycorp.com
UI for solr core admin?
I assume there isn't one? Anything in the works?
Re: UI for solr core admin?
On Thu, Dec 10, 2009 at 11:52 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I assume there isn't one? Anything in the works? Nope. -- Regards, Shalin Shekhar Mangar.
Re: UI for solr core admin?
Hi Jason, Patches welcome, though! :) Cheers, Chris On 12/9/09 10:31 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Dec 10, 2009 at 11:52 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I assume there isn't one? Anything in the works? Nope. -- Regards, Shalin Shekhar Mangar. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Hi. What Configuration we require?
On Wed, Dec 9, 2009 at 5:36 PM, kalidoss kalidoss.muthuramalin...@sifycorp.com wrote: Hi To run a Solr-1.3.0 with Data/index directory size of 11GB, 80 lakhs documents and 11 lakhs read request and 30 thousand writes. Every month 200mb of index directory size getting increase. 11 lakh read requests and 30 thousand write requests within how much time? Please suggest me. What type of configuration(CPU, Ram, hard disk) server require to make the solr as Stable. In general, having enough RAM for Solr caches as well as the OS for the file caches is good. Fast IO helps too. You'd most likely go for a master/slave deployment in production. We use boxes with quad cores, 16 gig RAM, SCSI disks. YMMV. -- Regards, Shalin Shekhar Mangar.
Re: full-text indexing XML files
On Thu, Dec 10, 2009 at 5:13 AM, Feroze Daud fero...@zillow.com wrote: Hi! I am trying to full text index an XML file. For various reasons, I cannot use Tika or other technology to parse the XML file. The requirement is to full-text index the XML file, including Tags and everything. So, I created a input index spec like this: add doc field name=id1001/field field name=nameNASA Advanced Research Labs/field field name=address1010 Main Street, Chattanooga, FL 32212/field field name=contentlistingid1001/id name NASA Advanced Research Labs / name address1010 main street, chattanooga, FL 32212/address/listing/field /doc /add You need to XML encode the value of the content field. -- Regards, Shalin Shekhar Mangar.
Re: UI for solr core admin?
Just a note about the hidden gem that I haven't taken as far as I'd like... With the VelocityResponseWriter, it's as easy as creating a Velocity template (and wiring in VwR in solrconfig, which I'll set up by default in 1.5). It will even default to the template named after the handler name, so all you have to do is wt=velocity. Erik On Dec 10, 2009, at 7:33 AM, Mattmann, Chris A (388J) wrote: Hi Jason, Patches welcome, though! :) Cheers, Chris On 12/9/09 10:31 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Dec 10, 2009 at 11:52 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I assume there isn't one? Anything in the works? Nope. -- Regards, Shalin Shekhar Mangar. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Hi. What Configuration we require?
Thanks Shalin Shekhar. 11 lakh read requests and 30 thousand write requests within how much time? Per day average of 11 lakh read requests and 30 thousand write requests. The system configuration is 4GB RAM and 4 core x 2 CPUs. are you suggesting us to increase the configuration? -Kalidoss.m, Shalin Shekhar Mangar wrote: On Wed, Dec 9, 2009 at 5:36 PM, kalidoss kalidoss.muthuramalin...@sifycorp.com wrote: Hi To run a Solr-1.3.0 with Data/index directory size of 11GB, 80 lakhs documents and 11 lakhs read request and 30 thousand writes. Every month 200mb of index directory size getting increase. 11 lakh read requests and 30 thousand write requests within how much time? Please suggest me. What type of configuration(CPU, Ram, hard disk) server require to make the solr as Stable. In general, having enough RAM for Solr caches as well as the OS for the file caches is good. Fast IO helps too. You'd most likely go for a master/slave deployment in production. We use boxes with quad cores, 16 gig RAM, SCSI disks. YMMV. Get your world in your inbox! Mail, widgets, documents, spreadsheets, organizer and much more with your Sifymail WIYI id! Log on to http://www.sify.com ** DISCLAIMER ** Information contained and transmitted by this E-MAIL is proprietary to Sify Limited and is intended for use only by the individual or entity to which it is addressed, and may contain information that is privileged, confidential or exempt from disclosure under applicable law. If this is a forwarded message, the content of this E-MAIL may not have been sent with the authority of the Company. If you are not the intended recipient, an agent of the intended recipient or a person responsible for delivering the information to the named recipient, you are notified that any use, distribution, transmission, printing, copying or dissemination of this information in any way or in any manner is strictly prohibited. If you have received this communication in error, please delete this mail notify us immediately at ad...@sifycorp.com
Re: UI for solr core admin?
Nice, Erik! Cheers, Chris On 12/9/09 10:39 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Just a note about the hidden gem that I haven't taken as far as I'd like... With the VelocityResponseWriter, it's as easy as creating a Velocity template (and wiring in VwR in solrconfig, which I'll set up by default in 1.5). It will even default to the template named after the handler name, so all you have to do is wt=velocity. Erik On Dec 10, 2009, at 7:33 AM, Mattmann, Chris A (388J) wrote: Hi Jason, Patches welcome, though! :) Cheers, Chris On 12/9/09 10:31 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Dec 10, 2009 at 11:52 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I assume there isn't one? Anything in the works? Nope. -- Regards, Shalin Shekhar Mangar. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: Hi. What Configuration we require?
On Thu, Dec 10, 2009 at 12:18 PM, kalidoss kalidoss.muthuramalin...@sifycorp.com wrote: Thanks Shalin Shekhar. 11 lakh read requests and 30 thousand write requests within how much time? Per day average of 11 lakh read requests and 30 thousand write requests. The system configuration is 4GB RAM and 4 core x 2 CPUs. are you suggesting us to increase the configuration? 4GB RAM for a 11GB index seems to be on the low side. It would be best to benchmark performance on your data with the queries you expect to be made. -- Regards, Shalin Shekhar Mangar.
Re: Can we build complex filter queries in SOLR
On Thu, Dec 10, 2009 at 2:50 AM, darniz rnizamud...@edmunds.com wrote: Can you please let me know how to describe that condition. For example lets say i want to give the following condition ((category:audio or category:video) AND (brand:sony OR brand:samsung OR brand:sanyo)) How would you represent this condition in fq paramenter of dismax Are you saying that the above syntax does not work in an fq? Note, the or should be in capitals. -- Regards, Shalin Shekhar Mangar.
Re: UI for solr core admin?
After I sent that, though, I realized that the core admin is special in that it isn't within the context of a single core. I'll have to research this and see, but I suspect there may be an issue with using VwR for this particular handler, as it wouldn't have a solr-home/conf/ velocity directory to pull templates from. I'll look into it. Erik On Dec 10, 2009, at 7:51 AM, Mattmann, Chris A (388J) wrote: Nice, Erik! Cheers, Chris On 12/9/09 10:39 PM, Erik Hatcher erik.hatc...@gmail.com wrote: Just a note about the hidden gem that I haven't taken as far as I'd like... With the VelocityResponseWriter, it's as easy as creating a Velocity template (and wiring in VwR in solrconfig, which I'll set up by default in 1.5). It will even default to the template named after the handler name, so all you have to do is wt=velocity. Erik On Dec 10, 2009, at 7:33 AM, Mattmann, Chris A (388J) wrote: Hi Jason, Patches welcome, though! :) Cheers, Chris On 12/9/09 10:31 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: On Thu, Dec 10, 2009 at 11:52 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: I assume there isn't one? Anything in the works? Nope. -- Regards, Shalin Shekhar Mangar. ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ ++ Chris Mattmann, Ph.D. Senior Computer Scientist NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 171-266B, Mailstop: 171-246 Email: chris.mattm...@jpl.nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Assistant Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++
Re: copyField question
On Wed, Dec 9, 2009 at 11:43 PM, P Franks pfranks...@gmail.com wrote: All, Can one use the copyField option and copy a TextField field into a longField field? I have some data that i want to extract (filter) out all but the long and/or integer values. No, that won't work. It'd be best to use a TokenFilter which remove characters and just keeps the integer/long values. But you still won't be able to use the LongField because that is not analyzed (so your token filters will not be applied). -- Regards, Shalin Shekhar Mangar.
Re: Concurrent access to EmbeddedSolrServer
On Wed, Dec 9, 2009 at 11:17 PM, Jon Poulton jon.poul...@vyre.com wrote: Hi there, I'm about to start implementing some code which will access a Solr instance via a ThreadPool concurrently. I've been looking at the solrj API docs ( particularly http://lucene.apache.org/solr/api/index.html?org/apache/solr/client/solrj/embedded/EmbeddedSolrServer.html) and I just want to make sure what I have in mind makes sense. The Javadoc is a bit sparse, so I thought I'd ask a couple of questions here. 1) I'm assuming that EmbeddedSolrServer can be accessed concurrently by several threads at once for add, delete and query operations (on the SolrServer parent interface). Is that right? I don't have to enforce single-threaded access? Yes. It is thread-safe. 2) What happens if multiple threads simultaneously call commit? 3) What happens if multiple threads simultaneously call optimize? For both #2 and #3 - The requests will be queued. As a best practice, consider committing only when necessary (preferably, once at the end). 4) Both commit and optimise have optional parameters called waitFlush and waitSearcher. These are undocumented in the Javadoc. What do they signify? See http://wiki.apache.org/solr/UpdateXmlMessages#Optional_attributes_for_.22commit.22_and_.22optimize.22 -- Regards, Shalin Shekhar Mangar.