Re: Solr via ruby
On Sep 18, 2009, at 1:09 AM, rajan chandi wrote: We are planning to use the external Solr on tomcat for scalability reasons. We thought that EmbeddedSolrServer uses HTTP too to talk with Ruby and vise-versa as in acts_as_solr ruby plugin. EmbeddedSolrServer is a way to run Solr as an API (like Lucene) rather than with any web container involved at all. In other words, only Java can use EmbeddedSolrServer (which means JRuby works great). The acts_as_solr plugin uses the solr-ruby library to communicate with Solr. Under solr-ruby, it's HTTP with ruby (wt=ruby) formatted responses for searches, and documents being indexed get converted to Solr's XML format and POSTed to the Solr URL used to open the Solr::Connection Erik If Ruby is not using the HTTP to talk EmbeddedSolrServer, what is it using? Thanks and Regards Rajan Chandi On Thu, Sep 17, 2009 at 9:44 PM, Erik Hatcher erik.hatc...@gmail.comwrote: On Sep 17, 2009, at 11:40 AM, Ian Connor wrote: Is there any support for connection pooling or a more optimized data exchange format? The solr-ruby library (as do other Solr + Ruby libraries) use the ruby response format and eval it. solr-ruby supports keeping the HTTP connection alive too. We are looking at any further ways to optimize the solr queries so we can possibly make more of them in the one request. The JSON like format seems pretty tight but I understand when the distributed search takes place it uses a binary protocol instead of text. I wanted to know if that was available or could be available via the ruby library. Is it possible to host a local shard and skip HTTP between ruby and solr? If you use JRuby you can do some fancy stuff, like use the javabin update and response formats so no XML is involved, and you could also use Solr's EmbeddedSolrServer to avoid HTTP. However, in practice rarely is HTTP the bottleneck and actually offers a lot of advantages, such as easy commodity load balancing and caching. But JRuby + Solr is a very beautiful way to go! If you're using MRI Ruby, though, you don't really have any options other than to go over HTTP. You could use json or ruby formatted responses - I'd be curious to see some performance numbers comparing those two. Erik
Re: multicore shards and relevancy score
On Sep 17, 2009, at 7:11 PM, Lance Norskog wrote: This looks like a Ruby client bug. Maybe, but I doubt it in this case. But let's have some details of the Ruby code used to make the request, and what gets logged on the first Solr for the request. Erik If you do the same query with the HTTP url, it should work. On Tue, Sep 15, 2009 at 7:41 AM, Paul Rosen p...@performantsoftware.com wrote: Shalin Shekhar Mangar wrote: On Tue, Sep 15, 2009 at 2:39 AM, Paul Rosen p...@performantsoftware.comwrote: I've done a few experiments with searching two cores with the same schema using the shard syntax. (using solr 1.3) My use case is that I want to have multiple cores because a few different people will be managing the indexing, and that will happen at different times. The data, however, is homogeneous. Multiple cores were not built for distributed search. It is inefficient as compared to a single index. But if you want to use them that way, that's your choice. Well, I'm experimenting with them because it will simplify index maintenance greatly. I am beginning to think that it won't work in my case, though. I've noticed in my tests that the results are not interwoven, but it might just be my test data. In other words, all the results from one core appear, then all the results from the other core. In thinking about it, it would make sense if the relevancy scores for each core were completely independent of each other. And that would mean that there is no way to compare the relevancy scores between the cores. In other words, I'd like the following results: - really relevant hit from core0 - pretty relevant hit from core1 - kind of relevant hit from core0 - not so relevant hit from core1 but I get: - really relevant hit from core0 - kind of relevant hit from core0 - pretty relevant hit from core1 - not so relevant hit from core1 So, are the results supposed to be interwoven, and I need to study my data more, or is this just not something that is possible? The only difference wrt relevancy between a distributed search and a single-node search is that there is no distributed IDF and therefore a distributed search assumes a random distribution of terms among shards. I'm not sure if that is what you are seeing. Also, if this is insurmountable, I've discovered two show stoppers that will prevent using multicore in my project (counting the lack of support for faceting in multicore). Are these issues addressed in solr 1.4? Can you give more details on what these two issues are? The first issue is detailed above, where the results from a search over two shards don't appear to be returned in relevancy order. The second issue was detailed in an email last week shards and facet count. The facet information is lost when doing a search over two shards, so if I use multicore, I can no longer have facets. -- Lance Norskog goks...@gmail.com
Exact word search in Solr
Hi, I am doing exact word search in Solr 1.3 and I am not getting the expected results. I am giving you the sample XML file along with the mail from where search results are fetched. The following steps were followed to achieve exact word search result in Solr. 1) Schema.xml is configured for title, url and description field name=url type=string indexed=true stored=true required=true/ field name=title type=text indexed=true stored=true required=true / field name=description type=text indexed=true stored=true required=true/ Commented below lines !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- !--filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/-- 2) Started Solr server 3) Indexed sample data with title, url description 5) Assume I am giving (say channelone) as my input search string for exact word search in Solr admin page. I am getting the following output.It sould show output pertaining to channelone only.It should not display combination of words with “channelone”.I am not looking for case sensitive search here. doc field name=urlhttp://c2search1/contactus3.html/field field name=titlec2Search1: Contactus3/field field name=descriptionchannelOne/field /doc - doc field name=urlhttp://c2search1/contactus4.html/field field name=titlec2Search1: Contactus4/field field name=descriptionChannelone/field /doc - doc field name=urlhttp://c2search1/contactus5.html/field field name=titlec2Search1: Contactus5/field field name=descriptionchannel...@$/field /doc - doc field name=urlhttp://c2search1/contactus6.html/field field name=titlec2Search1: Contactus6/field field name=descriptionchannelon...@$/field /doc - doc field name=urlhttp://c2search1/contactus7.html/field field name=titlec2Search1: Contactus7/field field name=descriptionchannelon...@$ab/field /doc Expected Result doc field name=urlhttp://c2search1/contactus3.html/field field name=titlec2Search1: Contactus3/field field name=descriptionchannelOne/field /doc - doc field name=urlhttp://c2search1/contactus4.html/field field name=titlec2Search1: Contactus4/field field name=descriptionChannelone/field /doc Please help me with the above scenario to achieve the desired output. Regards Bhaskar
Multicore Solr + Tomcat
Hello, I have setup Tomcat 6 and Solr 1.3.0 and it works fine for single cores. Now I am trying to make it multicore and the cores don't seem to be recognized. This works: /solr/home/conf/schema.xml /solr/home/conf/solrconfig.xml /solr/home/data/ Clicking the admin link on the Welcome to Solr page brings up the familiar Solr admin page at http://localhost:8080/apache-solr-1.3.0/admin/ Changing the setup to multicore like this works not: /solr/home/core1/conf/schema.xml /solr/home/core1/conf/solrconfig.xml /solr/home/core1/data/ /solr/home/core2/conf/schema.xml /solr/home/core2/conf/solrconfig.xml /solr/home/core2/data/ /solr/home/solr.xml In solr.xml I do: solr persistent=false sharedLib=lib cores adminPath=/admin/cores core name=core1 instanceDir=core1 / core name=core2 instanceDir=core2 / /cores /solr Clicking the admin link brings up an error message at http://localhost:8080/apache-solr-1.3.0/admin/ HTTP Status 404 - missing core name in path The requested resource (missing core name in path) is not available. Manually editing the URL to http://localhost:8080/apache-solr-1.3.0/admin/cores leads to HTTP Status 500 - Can not find a valid core for the cores admin handler java.lang.RuntimeException: Can not find a valid core for the cores admin handler at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:162) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11 Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Unknown Source) And adding core names to the path at one position or the other also brings up 404 Errors. Any hints on what to look for are greatly appreciated. Thanks, Rene -- Neu: GMX Doppel-FLAT mit Internet-Flatrate + Telefon-Flatrate für nur 19,99 Euro/mtl.!* http://portal.gmx.net/de/go/dsl02
Question on omitNorms definition
Hello, A rather trivial question on omitNorms parameter in schema.xml. The out-of-the-box schema.xml uses this parameter during both within the fieldType tag and field tag and If we define the omitNorms during the fieldType definition, will it hold good for all fields that are defined using the same fieldType. For eg: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ dynamicField name=* type=string indexed=true stored=true/ Now, will these dynamic fields have omitNorms=true for it ? I have read about significant RAM usage when omitNorms is not set to true. Hence would like to ensure that it is set to true for most of my fields. Regards Rahul
Re: shards and facet_count
On Sep 17, 2009, at 6:14 PM, Lance Norskog wrote: Yes. facet=false means don't do any faceting. This is why you don't get any facet data back. This is probably a bug in the solr-ruby code. Version number 0.0.x is probably a hint about its production-ready status :) Actually solr-ruby is plenty suitable for production use - it's pretty straightforward mapping from Rubyish stuff to Solr requests. Not really much magic in there. It's 0.0.x version number is really my own laziness (or swampedness) in getting it polished in a form I'd like (I'm a perfectionist with no time on his hands, a frustrating existence). The RSolr API has some features I'd like to pull over into solr-ruby and do some refactoring one of these days in my copious free time, but in general solr-ruby works fine. It is strange that you get facet=false calls in there, but maybe this is just normal distributed search protocol in one of the phases? Erik On Mon, Sep 14, 2009 at 6:46 AM, Paul Rosen p...@performantsoftware.com wrote: Shalin Shekhar Mangar wrote: On Fri, Sep 11, 2009 at 2:35 AM, Paul Rosen p...@performantsoftware.comwrote: Hi again, I've mostly gotten the multicore working except for one detail. (I'm using solr 1.3 and solr-ruby 0.0.6 in a rails project.) I've done a few queries and I appear to be able to get hits from either core. (yeah!) I'm forming my request like this: req = Solr::Request::Standard.new( :start = start, :rows = max, :sort = sort_param, :query = query, :filter_queries = filter_queries, :field_list = @field_list, :facets = {:fields = @facet_fields, :mincount = 1, :missing = true, :limit = -1}, :highlighting = {:field_list = ['text'], :fragment_size = 600}, :shards = @cores) If I leave :shards = @cores out, then the response includes: 'facet_counts' = { 'facet_dates' = {}, 'facet_queries' = {}, 'facet_fields' = { 'myfacet' = [ etc...], etc... } which is what I expect. If I add the :shards = @cores back in (so that I'm doing the exact request above), I get: 'facet_counts' = { 'facet_dates' = {}, 'facet_queries' = {}, 'facet_fields' = {} so I've lost my facet information. Why would it correctly find my documents, but not report the facet info? I'm not a ruby guy but the response format in both the cases is exactly the same so I don't think there is any problem with the ruby client parsing. Can you check the Solr logs to see if there were any exceptions when you sent the shards parameter? I don't see any exceptions. The solr activity is pretty different for the two cases. Without the shards, it makes one call that looks something like this (I ellipsed the id and field parameters for clarity): Sep 14, 2009 9:32:09 AM org.apache.solr.core.SolrCore execute INFO: [resources] webapp=/solr path=/select params = {facet .limit = -1 wt = ruby rows = 30 start = 0 facet = true facet .mincount = 1 q = (rossetti )fl = archive ,...,license qt = standard facet .missing = true hl .fl = text facet .field = genre facet.field=archivefacet.field=freeculturehl.fragsize=600hl=true} hits=27 status=0 QTime=6 Note that facet=true. With the shards, it has five lines for the single call that I make: Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute INFO: [exhibits] webapp=/solr path=/select params = {wt = javabin rows = 30 start = 0 facet = true fl = uri ,score q = (rossetti )version = 2.2 isShard = true facet .missing = true hl .fl = text fsv = true hl .fragsize = 600 facet .field=genrefacet.field=archivefacet.field=freeculturehl=false} hits=6 status=0 QTime=0 Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute INFO: [resources] webapp=/solr path=/select params = {wt = javabin rows = 30 start = 0 facet = true fl = uri ,score q = (rossetti )version = 2.2 isShard = true facet .missing = true hl .fl = text fsv = true hl .fragsize = 600 facet .field=genrefacet.field=archivefacet.field=freeculturehl=false} hits=27 status=0 QTime=3 Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute INFO: [resources] webapp=/solr path=/select params={facet.limit=-1wt=javabinrows=30start=0ids=...,...facet=falsefacet.mincount=1q=(rossetti)fl=archive,...,uriversion=2.2facet.missing=trueisShard=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=true} status=0 QTime=35 Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute INFO: [exhibits] webapp=/solr path=/select params={facet.limit=-1wt=javabinrows=30start=0ids=...,...facet=falsefacet.mincount=1q=(rossetti)fl=archive,...,uriversion=2.2facet.missing=trueisShard=truehl.fl=textfacet.field=genrefacet.field=archivefacet.field=freeculturehl.fragsize=600hl=true} status=0 QTime=41 Sep 14, 2009 9:37:18 AM org.apache.solr.core.SolrCore execute INFO: [resources] webapp=/solr path=/select
Re: Exact word search in Solr
Hi, I am doing exact word search in Solr 1.3 and I am not getting the expected results. I am giving you the sample XML file along with the mail from where search results are fetched. The following steps were followed to achieve exact word search result in Solr. You can simply use the fieldType below to achieve this: fieldType name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Note that there is no WordDelimiterFilterFactory in this type. But probably yours has it. Hope this helps.
Re: Searching with or without diacritics
Hi, Thanks for the suggestions, perhaps I am closer to the goal, but still don't get the result. I would like to find accented characters (mapped by the MappingCharFilterFactory) by writing unaccented queries. On this page: http://issues.ez.no/IssueView.php?Id=14742activeItem=2 I've found that the MappCharFilter should be added to both the index and query type of analyzers I heard of these two types now for first. Is this the issue? I did not have so far any my analyzers marked with type index neither query. Since it is not marked with type index neither query, it used for both. Can you try this fieldType and give feedback: fieldtype class='solr.TextField' name='text' positionIncrementGap='100' analyzer charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.CharStreamAwareWhitespaceTokenizerFactory/ filter class='solr.LowerCaseFilterFactory' / /analyzer /fieldtype Just to make sure: You are using latest nightly build of solr, right? mapping-ISOLatin1Accent.txt file - under the conf directory - contains the character mappings that you want to replace? Just FYI StandardFilter is meaningless without StandardTokenizer. So i removed it from you field type. Hope this helps.
Re: Solr 1.3 deletes not working?
I also seem to be having a similar problem deleting. As far as I can tell, the system thinks we are deleting the records (it logs that it's executing the commands and all looks OK) but the records always remain. Regardsless if we try a delete by ID or by query, nothing happens. It's also not extra characters in our deletion queries. Can anyone think of anything else that I should be checking? I'm sure it's probably a small bit of config we've missed but I can't track it down. Regards, Lee -- View this message in context: http://www.nabble.com/Solr-1.3-deletes-not-working--tp18124561p25506432.html Sent from the Solr - User mailing list archive at Nabble.com.
acts_as_solr integeration with solr separately
Hi, I have setup solr search server in tomcat. I am able to fire queries(of any knid) get results in xml format. Now i want to Integerate it(solr) with ruby on rails . I know ruby on rails has inbuilt plugin acts_as_solr which helps in integerating(talking) with solr. acts_as_solr comes bundled with solr web application with jetty server. But i don't wanna use this inbuilt solr web application . e.g. i don't wanna do rake solr:start. I am running solr as different search server in tomcat at port 8983.(url http://localhost:8983/solr/ all other urls are listening) Now, I want to talk to this solr server (separate) using acts_as_solr plugin. Questions: 1)Can anybody point me how to do this? Any tutorial ? 2)What changes I had to make in acts_as_solr plugin? 3)Any good pointers(urls) will be appreciated... Regards Abhay
solr isnt using default field correctly
hi, if i do a search: text:law order~40 i get this: str name=rawquerystringtext:law order~40/str str name=querystringtext:law order~40/str str name=parsedqueryPhraseQuery(text:law order~40)/str str name=parsedquery_toStringtext:law order~40/str str name=QParserOldLuceneQParser/str However if i do: law order~40 i get this: str name=rawquerystringlaw order~40/str str name=querystringlaw order~40/str str name=parsedquerytext:law order/str str name=parsedquery_toStringtext:law order/str lst name=explain/ str name=QParserOldLuceneQParser/str my Schema xml: field name=text type=string indexed=true stored=false / . defaultSearchFieldtext/defaultSearchField what should i be doing differently to get the second results like the first? -- View this message in context: http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25507985.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr isnt using default field correctly
well it seems what is happening is solr is not being consistent, DHast wrote: hi, if i do a search: text:law order~40 i get this: str name=rawquerystringtext:law order~40/str str name=querystringtext:law order~40/str str name=parsedqueryPhraseQuery(text:law order~40)/str str name=parsedquery_toStringtext:law order~40/str str name=QParserOldLuceneQParser/str However if i do: law order~40 i get this: str name=rawquerystringlaw order~40/str str name=querystringlaw order~40/str str name=parsedquerytext:law order/str str name=parsedquery_toStringtext:law order/str lst name=explain/ str name=QParserOldLuceneQParser/str my Schema xml: field name=text type=string indexed=true stored=false / . defaultSearchFieldtext/defaultSearchField what should i be doing differently to get the second results like the first? -- View this message in context: http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25508264.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr isnt using default field correctly
I just tried this on trunk and both with and without a field selector it parses to a PhraseQuery. I have trouble believing even Solr 1.3 behaved like you reported, something seems fishy. Erik On Sep 18, 2009, at 9:02 AM, DHast wrote: well it seems what is happening is solr is not being consistent, DHast wrote: hi, if i do a search: text:law order~40 i get this: str name=rawquerystringtext:law order~40/str str name=querystringtext:law order~40/str str name=parsedqueryPhraseQuery(text:law order~40)/str str name=parsedquery_toStringtext:law order~40/str str name=QParserOldLuceneQParser/str However if i do: law order~40 i get this: str name=rawquerystringlaw order~40/str str name=querystringlaw order~40/str str name=parsedquerytext:law order/str str name=parsedquery_toStringtext:law order/str lst name=explain/ str name=QParserOldLuceneQParser/str my Schema xml: field name=text type=string indexed=true stored=false / . defaultSearchFieldtext/defaultSearchField what should i be doing differently to get the second results like the first? -- View this message in context: http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25508264.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr isnt using default field correctly
yeah something is definitely strange, i think i know what it is though. im going to make a separate post for it, but it cached the results from when i had field:text as a string, Erik Hatcher-4 wrote: I just tried this on trunk and both with and without a field selector it parses to a PhraseQuery. I have trouble believing even Solr 1.3 behaved like you reported, something seems fishy. Erik On Sep 18, 2009, at 9:02 AM, DHast wrote: well it seems what is happening is solr is not being consistent, DHast wrote: hi, if i do a search: text:law order~40 i get this: str name=rawquerystringtext:law order~40/str str name=querystringtext:law order~40/str str name=parsedqueryPhraseQuery(text:law order~40)/str str name=parsedquery_toStringtext:law order~40/str str name=QParserOldLuceneQParser/str However if i do: law order~40 i get this: str name=rawquerystringlaw order~40/str str name=querystringlaw order~40/str str name=parsedquerytext:law order/str str name=parsedquery_toStringtext:law order/str lst name=explain/ str name=QParserOldLuceneQParser/str my Schema xml: field name=text type=string indexed=true stored=false / . defaultSearchFieldtext/defaultSearchField what should i be doing differently to get the second results like the first? -- View this message in context: http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25508264.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/solr-isnt-using-default-field-correctly-tp25507985p25508691.html Sent from the Solr - User mailing list archive at Nabble.com.
want to features of a 'text' field with the non stemming of a 'string' field
when i have my fieldname: text set as a text field, advanced search queries work very well, but when i have it set as a string it seems to ignore them, like proximity searching and so on. example: text as string: str name=rawquerystringtext:law order~33/str str name=querystringtext:law order~33/str str name=parsedquerytext:law order/str str name=parsedquery_toStringtext:law order/str text as text: str name=rawquerystringtext:law order~32 /str str name=querystringtext:law order~32 /str str name=parsedqueryPhraseQuery(text:law order~32)/str str name=parsedquery_toStringtext:law order~32/str however when i search a single term, it stems it if its text, example: text as text: str name=rawquerystringgoats /str str name=querystringgoats /str str name=parsedquerytext:goat/str str name=parsedquery_toStringtext:goat/str text as string: str name=rawquerystringnuts /str str name=querystringnuts /str str name=parsedquerytext:nuts/str str name=parsedquery_toStringtext:nuts/str OR str name=rawquerystringtext:goats /str str name=querystringtext:goats /str str name=parsedquerytext:goats/str str name=parsedquery_toStringtext:goats/str so what i want/need, is to STOP the stemming/plural killing that is happening on the text field, ideas? also, is tehre a way to wipe the cache while testing? -- View this message in context: http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25508780.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.3 deletes not working?
On Fri, Sep 18, 2009 at 6:26 AM, Lee Theobald l...@openobjects.com wrote: I also seem to be having a similar problem deleting. As far as I can tell, the system thinks we are deleting the records (it logs that it's executing the commands and all looks OK) but the records always remain. Regardsless if we try a delete by ID or by query, nothing happens. It's also not extra characters in our deletion queries. Did you issue a commit after the delete? (you can also specify it with the delete command) -Yonik http://www.lucidimagination.com Can anyone think of anything else that I should be checking? I'm sure it's probably a small bit of config we've missed but I can't track it down. Regards, Lee
Re: Disabling tf (term frequency) during indexing and/or scoring
Hi Alexey, Thank you for your suggestion! My understanding of Similarity, though, is that this would affect the entire index, whereas I need something that is field-configurable. Looking at Similarity.tf(), it seems to be independent of the field (and unaware of it). I don't necessarily want to disable tf entirely, as it'll likely be useful for other fulltext fields. Looking at more of the code, I'm guessing I'll need to get under the hood a fair bit more and possibly write a custom TermScorer and TermQuery. I suppose I'm curious why the omitTfAndPositions option conflates two apparently independent features. It seems like it would have been entirely reasonable to treat these as separate options, as their use cases don't necessarily overlap. I suppose it was just the path of least resistance or the assumed common-case scenario. Anyways, thanks again for your time. Best regards, Aaron Alexey Serba wrote: Hi Aaron, You can overwrite default Lucene Similarity and disable tf and lengthNorm factors in scoring formula ( see http://lucene.apache.org/java/2_4_1/api/org/apache/lucene/search/Similarity.html and http://lucene.apache.org/java/2_4_1/api/index.html ) You need to 1) compile the following class and put it into Solr WEB-INF/classes --- package my.package; import org.apache.lucene.search.DefaultSimilarity; public class NoLengthNormAndTfSimilarity extends DefaultSimilarity { public float lengthNorm(String fieldName, int numTerms) { return numTerms 0 ? 1.0f : 0.0f; } public float tf(float freq) { return freq 0 ? 1.0f : 0.0f; } } --- 2. Add similarity class=my.package.NoLengthNormAndTfSimilarity/ into your schema.xml http://wiki.apache.org/solr/SchemaXml#head-e343cad75d2caa52ac6ec53d4cee8296946d70ca HIH, Alex On Mon, Sep 14, 2009 at 9:50 PM, Aaron McKee ucbmc...@gmail.com wrote: Hello, Let me preface this by admitting that I'm still fairly new to Lucene and Solr, so I apologize if any of this sounds naive and I'm open to thinking about my problem differently. I'm currently responsible for a rather large dataset of business records that I'm trying to build a Lucene/Solr infrastructure around, to replace an in-house solution that we've been using for a few years. These records are sourced from multiple providers and there's often a fair bit of overlap in the business coverage. I have a set of fuzzy correlation libraries that I use to identify these documents and I ultimately create a super-record that includes metadata from each of the providers. Given the nature of things, these providers often have slight variations in wording or spelling in the overlapping fields (it's amazing how many ways people find to refer to the same business or address). I'd like to capture these variations, as they facilitate searching, but TF considerations are currently borking field scoring here. For example, taking business names into consideration, I have a Solr schema similar to: field name=name_provider1 type=string indexed=false stored=false multiValued=true ... field name=name_providerN type=string indexed=false stored=false multiValued=true field name=nameNorm type=text indexed=true stored=false multiValued=true omitNorms=true copyField source=name_provider1 dest=nameNorm ... copyField source=name_providerN dest=nameNorm For any given business record, there may be 1..N business names present in the nameNorm field (some with naming variations, some identical). With TF enabled, however, I'm getting different match scores on this field simply based on how many providers contributed to the record, which is not meaningful to me. For example, a record containing nameNormfoo barpositionIncrementGapfoo bar/nameNorm is necessarily scoring higher than a record just containing nameNormfoo bar/nameNorm. Although I wouldn't mind TF data being considered within each discrete field value, I need to find a way to prevent score inflation based simply on the number of contributing providers. Looking at the mailing list archive and searching around, it sounds like the omitTf boolean in Lucene used to function somewhat in this manner, but has since taken on a broader interpretation (and name) that now also disables positional and payload data. Unfortunately, phrase support for fields like this is absolutely essential. So what's the best way to address a need like this? I guess I don't mind whether this is handled at index time or search time, but I'm not sure what I may need to override or if there's some existing provision I should take advantage of. Thank you for any help you may have. Best regards, Aaron
Re: Disabling tf (term frequency) during indexing and/or scoring
On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee ucbmc...@gmail.com wrote: I suppose I'm curious why the omitTfAndPositions option conflates two apparently independent features. This relates to the index format, and is more for performance/size benefits when they are not needed. In the index, it's impossible to omit the tf info and keep the position info (the frequency is the number of positions). -Yonik http://www.lucidimagination.com
Re: want to features of a 'text' field with the non stemming of a 'string' field
On Sep 18, 2009, at 6:37 AM, DHast wrote: when i have my fieldname: text set as a text field, advanced search queries work very well, but when i have it set as a string it seems to ignore them, like proximity searching and so on. example: text as string: str name=rawquerystringtext:law order~33/str str name=querystringtext:law order~33/str str name=parsedquerytext:law order/str str name=parsedquery_toStringtext:law order/str text as text: str name=rawquerystringtext:law order~32 /str str name=querystringtext:law order~32 /str str name=parsedqueryPhraseQuery(text:law order~32)/str str name=parsedquery_toStringtext:law order~32/str however when i search a single term, it stems it if its text, example: text as text: str name=rawquerystringgoats /str str name=querystringgoats /str str name=parsedquerytext:goat/str str name=parsedquery_toStringtext:goat/str text as string: str name=rawquerystringnuts /str str name=querystringnuts /str str name=parsedquerytext:nuts/str str name=parsedquery_toStringtext:nuts/str OR str name=rawquerystringtext:goats /str str name=querystringtext:goats /str str name=parsedquerytext:goats/str str name=parsedquery_toStringtext:goats/str so what i want/need, is to STOP the stemming/plural killing that is happening on the text field, ideas? It sounds like you need to dig into your schema.xml a bit more and set your analysis better. See http://wiki.apache.org/solr/SchemaXml also, is tehre a way to wipe the cache while testing? -- View this message in context: http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25508780.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Question on omitNorms definition
On Sep 18, 2009, at 2:45 AM, Rahul R wrote: Hello, A rather trivial question on omitNorms parameter in schema.xml. The out-of-the-box schema.xml uses this parameter during both within the fieldType tag and field tag and If we define the omitNorms during the fieldType definition, will it hold good for all fields that are defined using the same fieldType. For eg: fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ dynamicField name=* type=string indexed=true stored=true/ Now, will these dynamic fields have omitNorms=true for it ? I have read about significant RAM usage when omitNorms is not set to true. Hence would like to ensure that it is set to true for most of my fields. Yes, it will hold be set for all fields for that field type -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: want to features of a 'text' field with the non stemming of a 'string' field
i have looked, and seem to be running into a dead end every time i try it, but again it may be because of the caching and me not realizing it was doing it till my hair was half pulled. i dont suppose youd be willing to give a hint then? Grant Ingersoll-6 wrote: On Sep 18, 2009, at 6:37 AM, DHast wrote: when i have my fieldname: text set as a text field, advanced search queries work very well, but when i have it set as a string it seems to ignore them, like proximity searching and so on. example: text as string: str name=rawquerystringtext:law order~33/str str name=querystringtext:law order~33/str str name=parsedquerytext:law order/str str name=parsedquery_toStringtext:law order/str text as text: str name=rawquerystringtext:law order~32 /str str name=querystringtext:law order~32 /str str name=parsedqueryPhraseQuery(text:law order~32)/str str name=parsedquery_toStringtext:law order~32/str however when i search a single term, it stems it if its text, example: text as text: str name=rawquerystringgoats /str str name=querystringgoats /str str name=parsedquerytext:goat/str str name=parsedquery_toStringtext:goat/str text as string: str name=rawquerystringnuts /str str name=querystringnuts /str str name=parsedquerytext:nuts/str str name=parsedquery_toStringtext:nuts/str OR str name=rawquerystringtext:goats /str str name=querystringtext:goats /str str name=parsedquerytext:goats/str str name=parsedquery_toStringtext:goats/str so what i want/need, is to STOP the stemming/plural killing that is happening on the text field, ideas? It sounds like you need to dig into your schema.xml a bit more and set your analysis better. See http://wiki.apache.org/solr/SchemaXml also, is tehre a way to wipe the cache while testing? -- View this message in context: http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25508780.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25509828.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: want to features of a 'text' field with the non stemming of a 'string' field
ok, used the built in fieldtype text_ws that seems to go well DHast wrote: i have looked, and seem to be running into a dead end every time i try it, but again it may be because of the caching and me not realizing it was doing it till my hair was half pulled. i dont suppose youd be willing to give a hint then? Grant Ingersoll-6 wrote: On Sep 18, 2009, at 6:37 AM, DHast wrote: when i have my fieldname: text set as a text field, advanced search queries work very well, but when i have it set as a string it seems to ignore them, like proximity searching and so on. example: text as string: str name=rawquerystringtext:law order~33/str str name=querystringtext:law order~33/str str name=parsedquerytext:law order/str str name=parsedquery_toStringtext:law order/str text as text: str name=rawquerystringtext:law order~32 /str str name=querystringtext:law order~32 /str str name=parsedqueryPhraseQuery(text:law order~32)/str str name=parsedquery_toStringtext:law order~32/str however when i search a single term, it stems it if its text, example: text as text: str name=rawquerystringgoats /str str name=querystringgoats /str str name=parsedquerytext:goat/str str name=parsedquery_toStringtext:goat/str text as string: str name=rawquerystringnuts /str str name=querystringnuts /str str name=parsedquerytext:nuts/str str name=parsedquery_toStringtext:nuts/str OR str name=rawquerystringtext:goats /str str name=querystringtext:goats /str str name=parsedquerytext:goats/str str name=parsedquery_toStringtext:goats/str so what i want/need, is to STOP the stemming/plural killing that is happening on the text field, ideas? It sounds like you need to dig into your schema.xml a bit more and set your analysis better. See http://wiki.apache.org/solr/SchemaXml also, is tehre a way to wipe the cache while testing? -- View this message in context: http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25508780.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- View this message in context: http://www.nabble.com/want-to-features-of-a-%27text%27-field-with-the-non-stemming-of-a-%27string%27-field-tp25508780p25510178.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Disabling tf (term frequency) during indexing and/or scoring
Though it would be possible to calculate a binary tf, where the score is 1 if there are one or more occurances of the term. --wunder On Sep 18, 2009, at 7:08 AM, Yonik Seeley wrote: On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee ucbmc...@gmail.com wrote: I suppose I'm curious why the omitTfAndPositions option conflates two apparently independent features. This relates to the index format, and is more for performance/size benefits when they are not needed. In the index, it's impossible to omit the tf info and keep the position info (the frequency is the number of positions). -Yonik http://www.lucidimagination.com
Re: Disabling tf (term frequency) during indexing and/or scoring
Hi Yonik, Thank you for the explanation. If the primary goal was to save index space for a very specific subclass of fields, the implementation certainly makes more sense. I wonder, though, if it could also make sense to support a query-time only boolean to optionally disable TF independently, on a per-field basis? Or, perhaps (and this may be demonstrating my naivete), allowing Similarity to be overridden on a per-field basis? I imagine it could make scoring even more confusing than it sometimes already is, though. It's an atrocious hack on my part, but I largely seem to have achieved my tf goals in this manner; I overrode the getSimilarity methods in PhraseQuery and TermQuery to return a fixed-tf Similarity implementation if the field value is in the set of those I care about. From the looks of it, though, generalizing the change into anything other than a hack would touch a rather large number of code points. Best regards, Aaron Yonik Seeley wrote: On Fri, Sep 18, 2009 at 9:38 AM, Aaron McKee ucbmc...@gmail.com wrote: I suppose I'm curious why the omitTfAndPositions option conflates two apparently independent features. This relates to the index format, and is more for performance/size benefits when they are not needed. In the index, it's impossible to omit the tf info and keep the position info (the frequency is the number of positions). -Yonik http://www.lucidimagination.com
Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?
On Thu, Sep 17, 2009 at 4:30 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I was wondering if there is a way I can modify calibrateSizeByDeletes just by configuration ? Alas, no. The only option that I see for you is to sub-class LogByteSizeMergePolicy and set calibrateSizeByDeletes to true in the constructor. However, please open a Jira issue and so we don't forget about it. It's the continuing stuff like this that makes me feel like we should be Spring (or equivalent) based someday... I'm just not sure how we're going to get there. -Yonik http://www.lucidimagination.com
Re: Latest trunk locks execution thread in SolrCore.getSearcher()
Also, do you have any custom components or anything that implements SolrInfoMBean? On Sep 18, 2009, at 8:16 AM, Grant Ingersoll wrote: Can you try the patch I just put up on https://issues.apache.org/jira/browse/SOLR-1427 and let me know if it works when JMX is enabled? Also, do you have warming queries setup? On Sep 17, 2009, at 12:46 PM, Chris Harris wrote: It looks like this works as a fix for me as well. (I'm not currently using JMX for anything anyway.) Curiously, the single-core example solrconfig.xml also has jmx /, but it doesn't seem to be a problem there. 2009/9/17 Dadasheva, Olga olga_dadash...@harvard.edu: Hi, FWIW: disabling jmx/ fixed this problem for me. Thanks you! -Olga -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, September 17, 2009 1:09 PM To: solr-user@lucene.apache.org Subject: Re: Latest trunk locks execution thread in SolrCore.getSearcher() Interesting... I still haven't been able to reproduce a hang with either jetty or tomcat. I enabled replication and JMX... still nothing. -Yonik http://www.lucidimagination.com On Thu, Sep 17, 2009 at 12:35 PM, Chris Harris rygu...@gmail.com wrote: I found what looks like the same issue when I tried to install r815830 under Tomcat. (It works ok with the normal Jetty example/ start.jar.) I haven't checked the stack trace, but Tomcat would hang right after the message INFO: Adding debug component:org.apache.solr.handler.component.debugcompon...@1904e0d showed up in the log. I have a little more evidence about Yonik's theory that SOLR-1427 is part of the cause. In particular, when I reverse-merged r815587 (the commit for SOLR-1427) into (out of?) my r815830-based working copy, then Tomcat was able to load Solr normally. 2009/9/16 Yonik Seeley yo...@lucidimagination.com: On a quick look, it looks like this was caused (or at least triggered by) https://issues.apache.org/jira/browse/SOLR-1427 Registering the bean in the SolrCore constructor causes it to immediately turn around and ask for the stats which asks for a searcher, which blocks. -Yonik http://www.lucidimagination.com On Wed, Sep 16, 2009 at 9:34 PM, Dadasheva, Olga olga_dadash...@harvard.edu wrote: Hi, I am testing EmbeddedSolrServer vs StreamingUpdateSolrServer for my crawlers using more or less recent Solr code and everything was fine till today when I took the latest trunk code. When I start my crawler I see a number of INFO outputs 2009-09-16 21:08:29,399 INFO Adding component:org.apache.solr.handler.component.HighlightComponent @36ae8 3 (SearchHandler.java:132) - [main] 2009-09-16 21:08:29,400 INFO Adding component:org.apache.solr.handler.component.StatsComponent @1fb24d3 (SearchHandler.java:132) - [main] 2009-09-16 21:08:29,401 INFO Adding component:org.apache.solr.handler.component.TermVectorComponent @14ba 9a2 (SearchHandler.java:132) - [main] 2009-09-16 21:08:29,402 INFO Adding debug component:org.apache.solr.handler.component.DebugComponent @12ea1dd (SearchHandler.java:137) - [main] and then the log/program stops. -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: Disabling tf (term frequency) during indexing and/or scoring
Hi Yonik, For my particular needs, IDF considerations are fine and helpful; if a user is requesting a rare term/phrase, increasing the score based on that makes sense as the match has higher confidence. I simply need to compensate for title and category type fields that may contain redundant information and disregard length considerations (these fields are multi-valued and may be populated from a varying number of sources, and I don't want the number of sources and the level of repetitiveness to affect the score). Basically, a boolean does it match score adjusted solely based on IDF. Of course, I'm sure there are others who probably wouldn't need or care about IDF, either, but still want phrase matching. Cheers, Aaron Yonik Seeley wrote: On Fri, Sep 18, 2009 at 11:05 AM, Aaron McKee ucbmc...@gmail.com wrote: I wonder, though, if it could also make sense to support a query-time only boolean to optionally disable TF independently, on a per-field basis? I guess it could make sense. But do you still want idf too? length norm? or do you really want a constant score (match/no-match)? -Yonik http://www.lucidimagination.com
Re: Latest trunk locks execution thread in SolrCore.getSearcher()
No, I'm pretty sure nothing implements SolrInfoMBean. I applied the new 1K version of SOLR-1427.patch from https://issues.apache.org/jira/browse/SOLR-1427 (which appears to be a secondary patch, to be applied once the main SOLR-1427 patch has already been applied) to my problematic Solr instance, which is based on Solr SVN r815830. This patch did not seem to solve the hang problem; once I reenabled JMX, then the process would hang at the same spot, i.e. right after INFO: Adding debug component:org.apache.solr.handler.component.debugcompon...@1d7b222 appeared in the Tomcat log. When Solr/Tomcat are hung, there are two Solr-related threads that show up in a thread dump. I'll paste those stack traces below: pool-1-thread-1 prio=6 tid=0x0b1ef800 nid=0xdc8 waiting on condition [0x0b68f000] java.lang.Thread.State: WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for 0x035f3e60 (a java.util.concurrent.CountDownLatch$Sync) at java.util.concurrent.locks.LockSupport.park(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(Unknown Source) at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(Unknown Source) at java.util.concurrent.CountDownLatch.await(Unknown Source) at org.apache.solr.core.SolrCore$1.call(SolrCore.java:559) at java.util.concurrent.FutureTask$Sync.innerRun(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Thread-1 prio=6 tid=0x00c92c00 nid=0xf14 in Object.wait() [0x0b19e000] java.lang.Thread.State: WAITING (on object monitor) at java.lang.Object.wait(Native Method) - waiting on 0x035d2ab0 (a java.lang.Object) at java.lang.Object.wait(Object.java:485) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:994) - locked 0x035d2ab0 (a java.lang.Object) at org.apache.solr.core.SolrCore.getSearcher(SolrCore.java:907) at org.apache.solr.handler.ReplicationHandler.getIndexVersion(ReplicationHandler.java:472) at org.apache.solr.handler.ReplicationHandler.getStatistics(ReplicationHandler.java:490) at org.apache.solr.core.JmxMonitoredMap$SolrDynamicMBean.getMBeanInfo(JmxMonitoredMap.java:224) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.getNewMBeanClassName(Unknown Source) at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(Unknown Source) at com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(Unknown Source) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:137) at org.apache.solr.core.JmxMonitoredMap.put(JmxMonitoredMap.java:47) at org.apache.solr.core.SolrResourceLoader.inform(SolrResourceLoader.java:446) at org.apache.solr.core.SolrCore.init(SolrCore.java:578) at org.apache.solr.core.CoreContainer.create(CoreContainer.java:425) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:278) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:117) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:83) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) snip 2009/9/18 Grant Ingersoll gsing...@apache.org: Also, do you have any custom components or anything that implements SolrInfoMBean? On Sep 18, 2009, at 8:16 AM, Grant Ingersoll wrote: Can you try the patch I just put up on https://issues.apache.org/jira/browse/SOLR-1427 and let me know if it works when JMX is enabled? Also, do you have warming queries setup? On Sep 17, 2009, at 12:46 PM, Chris Harris wrote: It looks like this works as a fix for me as well. (I'm not currently using JMX for anything anyway.) Curiously, the single-core example solrconfig.xml also has jmx /, but it doesn't seem to be a problem there. 2009/9/17 Dadasheva, Olga olga_dadash...@harvard.edu: Hi, FWIW: disabling jmx/ fixed this problem for me. Thanks you! -Olga -Original Message- From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley Sent: Thursday, September 17, 2009 1:09 PM To: solr-user@lucene.apache.org Subject: Re: Latest trunk locks execution thread in SolrCore.getSearcher() Interesting... I still haven't been able to reproduce a hang with either jetty or tomcat. I enabled replication and JMX... still nothing. -Yonik http://www.lucidimagination.com On Thu, Sep 17, 2009 at 12:35 PM, Chris Harris rygu...@gmail.com wrote: I found what looks like the same
Re: Latest trunk locks execution thread in SolrCore.getSearcher()
Forgot to answer this one. Yes, I do have a warming query to get the sort caches up to speed. I think it takes a while to run; my guess would be 30 seconds or so. 2009/9/18 Grant Ingersoll gsing...@apache.org: Also, do you have warming queries setup? On Sep 17, 2009, at 12:46 PM, Chris Harris wrote: It looks like this works as a fix for me as well. (I'm not currently using JMX for anything anyway.) Curiously, the single-core example solrconfig.xml also has jmx /, but it doesn't seem to be a problem there. 2009/9/17 Dadasheva, Olga olga_dadash...@harvard.edu: Hi, FWIW: disabling jmx/ fixed this problem for me.
Re: copyfield at search time?
If the reason you're copying from member_of to member_of_facet is because faceting isn't allowed on multi-valued fields, then that's no longer true. See https://issues.apache.org/jira/browse/SOLR-475 which is in the trunk and which will be available in the 1.4 release. If you're running an earlier version of Solr, maybe something like this is necessary. (If multi-valued faceting is possible at all in earlier versions.) In any case, I'm not sure why the title of your message is copyfield at search time. Copyfield stuff happens at indexing time, not search time. So if your approach is going to work, you're going to need to reindex for it to take effect. 2009/9/17 DHast hastings.recurs...@gmail.com: is it possible to do somehting like this: Now im wondering how to do something like this: field name=member_of_facet type=string indexed=true stored=true multiValued=true/ field name=member_of type=string indexed=true stored=true multiValued=false/ copyField source=member_of dest=member_of_facet / if so, i dont seem to be making progress thanks -- View this message in context: http://www.nabble.com/copyfield-at-search-time--tp25491979p25491979.html Sent from the Solr - User mailing list archive at Nabble.com.
[Job] Solr Search Opportunity - Direct Hire, Not a Recruiter
Hello, The company I work for is looking to hire a Sr. Software Engineer with considerable experience using Solr. The project we are embarking on is relatively new so the person we hire would have a lot of freedom to help define the architecture for our e-commerce product and merchant indexing and search services. Below is a copy of the job description. If you are interested please send an e-mail directly to me. I am the hiring manager using the e- mail address bsmith at auctiva dot com. Many thanks Bennett Smith Director of Software Engineering Auctiva Corporation Auctiva is building an open e-Commerce platform that will power new ways to connect buyers and sellers. We have a long history of building e-Commerce tools to help buyers and sellers connect in the eBay marketplace. Our success is due in large part to the success of our customers. We take immense pride in listening to our customers and developing best-in-class applications that address their unique needs. Auctiva is seeking a Senior Software Engineer to join the Platform Search Indexing Services team in our expanding San Jose office. This team is responsible for e-Commerce content search and indexing services running on a combination of Windows and Linux platforms. Applicants must have significant software development experience on both platforms. The candidate must have solid development skills, the ability to properly analyze a problem, and good written and verbal communication skills. The candidate must also be a self-starter and ready to hit the ground running. The candidate must work well both in a small team environment and on their own. The candidate must have the ability to multi-task and handle dynamic requirements. Education / Experience BS or MS in Computer Science or equivalent work experience 7+ years experience in a similar position Extensive hands-on development experience Experience building large scale server applications and systems Required Skills / Abilities 4+ years OO programming design using Java and/or C# 4+ years experience with Design Patterns 2+ years experience developing with Solr/Lucene 2+ years server side Linux development in Java 2+ years server side Windows development in C# and .NET 3+ years experience in multi-threaded application development Unit test development using JUnit and/or NUnit Proven ability to work through a full development life-cycle from requirements analysis to deployment. Familiarity with Agile practices such as Scrum or Extreme Programming Desired: Experience developing hetrogenous (Linux,Java / Windows,C#) applications Linux System Administration Experience Experience running Apache/Tomcat in a production environment Familiarity with REST Web Service development Familiarity with SOAP and RPC protocols A background in network socket communications Prior C/C++ development experience smime.p7s Description: S/MIME cryptographic signature
Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?
Over the weekend I may write a patch to allow simple reflection based injection from within solrconfig. On Fri, Sep 18, 2009 at 8:10 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Sep 17, 2009 at 4:30 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I was wondering if there is a way I can modify calibrateSizeByDeletes just by configuration ? Alas, no. The only option that I see for you is to sub-class LogByteSizeMergePolicy and set calibrateSizeByDeletes to true in the constructor. However, please open a Jira issue and so we don't forget about it. It's the continuing stuff like this that makes me feel like we should be Spring (or equivalent) based someday... I'm just not sure how we're going to get there. -Yonik http://www.lucidimagination.com
RE: Disabling tf (term frequency) during indexing and/or scoring
Constant tf with idf can work well for very short fields, like titles. For example, the movie New York, New York is not twice as much about New York as movies that have the string in the title only once. wudner -Original Message- From: Aaron McKee [mailto:ucbmc...@gmail.com] Sent: Friday, September 18, 2009 8:33 AM To: solr-user@lucene.apache.org Subject: Re: Disabling tf (term frequency) during indexing and/or scoring Hi Yonik, For my particular needs, IDF considerations are fine and helpful; if a user is requesting a rare term/phrase, increasing the score based on that makes sense as the match has higher confidence. I simply need to compensate for title and category type fields that may contain redundant information and disregard length considerations (these fields are multi-valued and may be populated from a varying number of sources, and I don't want the number of sources and the level of repetitiveness to affect the score). Basically, a boolean does it match score adjusted solely based on IDF. Of course, I'm sure there are others who probably wouldn't need or care about IDF, either, but still want phrase matching. Cheers, Aaron Yonik Seeley wrote: On Fri, Sep 18, 2009 at 11:05 AM, Aaron McKee ucbmc...@gmail.com wrote: I wonder, though, if it could also make sense to support a query-time only boolean to optionally disable TF independently, on a per-field basis? I guess it could make sense. But do you still want idf too? length norm? or do you really want a constant score (match/no-match)? -Yonik http://www.lucidimagination.com
Free Webinar - Apache Lucene 2.9: Technical Overview of New Features
Free Webinar: Apache Lucene 2.9: Discover the Powerful New Features --- Join us for a free and in-depth technical webinar with Grant Ingersoll, co-founder of Lucid Imagination and chair of the Apache Lucene PMC. Thursday, September 24th 2009 11:00AM - 12 NOON PDT / 2:00 - 3:00PM EDT Click on the link below to sign up http://www.eventsvc.com/lucidimagination/092409?trk=WR-SEP2009B-AP Lucene 2.9 offers a rich set of new features and performance improvements alongside plentiful fixes and optimizations. If you are a Java developer building search applications with the Lucene search library, this webinar provides the insights you need to harness this important update to Apache Lucene. Grant will present and discuss key technical features and innovations including: o Real time/Per segment searching and caching o Built in numeric range support with trie structure for speed and simplified programming o Reduced search latency and improved index efficiency Join us for a free webinar. Thursday, September 24th 2009 11:00 AM - NOON PDT / 2:00 - 3:00 PM EDT http://www.eventsvc.com/lucidimagination/092409?trk=WR-SEP2009B-AP
Re: shards and facet_count
On Fri, Sep 18, 2009 at 5:58 AM, Erik Hatcher erik.hatc...@gmail.com wrote: It is strange that you get facet=false calls in there, but maybe this is just normal distributed search protocol in one of the phases? Right, on the second phase of a distrib request, additional faceting may not be needed. But it looks like the distributed request is being directed at two different handlers rather than two different servers or cores? shards=localhost:8983/solr/resources,localhost:8983/solr/exhibits I've never tried this, but from the log file, it doesn't look like the sub-requests are going to those different handlers since the path is always path=/select -Yonik http://www.lucidimagination.com
Re: How to leverage the LogMergePolicy calibrateSizeByDeletes patch in Solr ?
We can use a simple reflection based implementation to simplify reading too many parameters. What I wish to emphasize is that Solr should be agnostic of xml altogether. It should only be aware of specific Objects and interfaces. If users wish to plugin something else in some other way , it should be fine There is a huge learning involved in learning the current solrconfig.xml . Let us not make people throw away that . On Sat, Sep 19, 2009 at 1:59 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Over the weekend I may write a patch to allow simple reflection based injection from within solrconfig. On Fri, Sep 18, 2009 at 8:10 AM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Sep 17, 2009 at 4:30 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: I was wondering if there is a way I can modify calibrateSizeByDeletes just by configuration ? Alas, no. The only option that I see for you is to sub-class LogByteSizeMergePolicy and set calibrateSizeByDeletes to true in the constructor. However, please open a Jira issue and so we don't forget about it. It's the continuing stuff like this that makes me feel like we should be Spring (or equivalent) based someday... I'm just not sure how we're going to get there. -Yonik http://www.lucidimagination.com -- - Noble Paul | Principal Engineer| AOL | http://aol.com