Re: Documentation?
I read a bit Lucene in action with a lot of nice example and was very optimistic to use solr, but i was really disappointed when I was looking for documentation for solr - even the javadoc is very poor, no explanation of the framework, about why there are so many different classes with similar names, everything is just confusing! I think, if somebody would have invested 1% of the time creating solr in his documentation, solr could be very easely used by everyone. I am not an expert in search engine, and i also don't think that the framework of solr is so complicated, but without documentation I just feel lost. What i really want to do is a kind of custom sorting very similar to that DistanceComparator example from Lucene in Action, but there is really no examples out there how to do that... I admit, my own code most of time is not looking much better, but for such a big project i really expected much more documentation and comments for the api. Thanks, Markus jrodenburg wrote: I was checking around the solr site and pages at apache.org and wasn't finding much. Before jumping into the code, I'd like to get as familiar with solr as I could from existing docs or the like. Can someone point me in the direction? thanks, jeff r. -- View this message in context: http://www.nabble.com/Documentation--tp4403595p21029243.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Custom Sorting
I have the same problem, also need to plugin my customComparator, but as there is no explanation of the framework, how a RequestHandler is working, what comes in, what comes out ... just impossible! Can someone explain where i have to add which code, to just have the same functionality as the StandardRequestHandler, but also adding a custom sorting? Thanks, Markus hossman wrote: : Sort sort = new Sort(new SortField[] : { SortField.FIELD_SCORE, new SortField(customValue, SortField.FLOAT, : true) }); : indexSearcher.search(q, sort) that appears to just be a sort on score withe a secondary reversed float sort on whatever field name is in the variable customValue ... assuming hte field name is FIELD that's hte same thing as... sort=score+asc,+FIELD+desc : Sort sort = new Sort(new SortField(customValue, customComparator)) : indexSearcher.search(q, sort) this is using a custom SortComparatorSource -- code you (or someone else) has written which is not part of Lucene and which tells lucene how to order the documents using whatever crazy logic it wants ... for obvious reasons Solr can't do that same logic (since it doesn't know what it is) although many things in Solr are easily customizable, just by writting a little factory and configuring it by class name, i'm afraind SortComparatorSources aren't once of them. You could write a custom RequestHandler which used your SortComparatorSource, or you could write a custom FieldType that used it anything someone sorted on that field ... but those are the best options i cna think of. -Hoss -- View this message in context: http://www.nabble.com/Custom-Sorting-tp1659p21029370.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Documentation?
Markus, as we are all volunteers here, nobody is paid to write code and documentation. Everybody scratches their own itch. Complaining won't do a lot of good here. Most of Solr is not meant to be consumed as an API and therefore there is a general lack of javadocs. Solr is generally used through configuration and consumed over HTTP so you will find a lot of documentation on each of the supported features. There is a lot of documentation on the wiki at http://wiki.apache.org/solr . However the use-case you have is quite rare and therefore not very well documented. However, you will find that people are helpful if you are willing to give them a chance. Starting a conversation in this tone will be a disadvantage. On Tue, Dec 16, 2008 at 2:14 PM, psyron m...@psyron.com wrote: I read a bit Lucene in action with a lot of nice example and was very optimistic to use solr, but i was really disappointed when I was looking for documentation for solr - even the javadoc is very poor, no explanation of the framework, about why there are so many different classes with similar names, everything is just confusing! I think, if somebody would have invested 1% of the time creating solr in his documentation, solr could be very easely used by everyone. I am not an expert in search engine, and i also don't think that the framework of solr is so complicated, but without documentation I just feel lost. What i really want to do is a kind of custom sorting very similar to that DistanceComparator example from Lucene in Action, but there is really no examples out there how to do that... I admit, my own code most of time is not looking much better, but for such a big project i really expected much more documentation and comments for the api. Thanks, Markus jrodenburg wrote: I was checking around the solr site and pages at apache.org and wasn't finding much. Before jumping into the code, I'd like to get as familiar with solr as I could from existing docs or the like. Can someone point me in the direction? thanks, jeff r. -- View this message in context: http://www.nabble.com/Documentation--tp4403595p21029243.html Sent from the Solr - User mailing list archive at Nabble.com. -- Regards, Shalin Shekhar Mangar.
Re: Custom Sorting
Markus, A couple of code pointers for you: * QueryComponent - this is where results are generated, it uses a SortSpec from the QParser. * QParser#getSort - creating a custom QParser you'll be able to wire in your own custom sort You can write your own QParserPlugin and QParser, and configure it into solrconfig.xml and should be good to go. Subclassing existing classes, this should only be a handful of lines of code to do. Erik On Dec 16, 2008, at 3:54 AM, psyron wrote: I have the same problem, also need to plugin my customComparator, but as there is no explanation of the framework, how a RequestHandler is working, what comes in, what comes out ... just impossible! Can someone explain where i have to add which code, to just have the same functionality as the StandardRequestHandler, but also adding a custom sorting? Thanks, Markus hossman wrote: : Sort sort = new Sort(new SortField[] : { SortField.FIELD_SCORE, new SortField(customValue, SortField.FLOAT, : true) }); : indexSearcher.search(q, sort) that appears to just be a sort on score withe a secondary reversed float sort on whatever field name is in the variable customValue ... assuming hte field name is FIELD that's hte same thing as... sort=score+asc,+FIELD+desc : Sort sort = new Sort(new SortField(customValue, customComparator)) : indexSearcher.search(q, sort) this is using a custom SortComparatorSource -- code you (or someone else) has written which is not part of Lucene and which tells lucene how to order the documents using whatever crazy logic it wants ... for obvious reasons Solr can't do that same logic (since it doesn't know what it is) although many things in Solr are easily customizable, just by writting a little factory and configuring it by class name, i'm afraind SortComparatorSources aren't once of them. You could write a custom RequestHandler which used your SortComparatorSource, or you could write a custom FieldType that used it anything someone sorted on that field ... but those are the best options i cna think of. -Hoss -- View this message in context: http://www.nabble.com/Custom-Sorting-tp1659p21029370.html Sent from the Solr - User mailing list archive at Nabble.com.
Flipping data dirs for an (/multiple) SolrCore without affecting search / IndexReaders
We have an architecture where we want to flip the solr data.dir (massive dataset) while running and serving search requests with minimal downtime. Some additional requirements. * While ideally - we want the Solr Search clients to continue to serve from the indices as soon as possible -the overriding requirement is that the downtime for the Search Solr instances should be as less as possible.So when a new set of (Lucene) indices come in - the algorithm we are experimenting with: - create a new solrcore instance with the revised data directory. - warm up the solrcore instance with some test queries. - register the new solrcore instance with the same name as the old one, so that all new queries from the clients are to the new SolrCore instance. - As part of register (String, SolrCore, boolean ) - the III parameter when set to false , closes the core connection. I am trying to understand more about the first and the fourth ( last) steps. 1) What is the fastest / best possible way to get step 1 done ,through a pluggable architecture. Currently - I have written a request handler as follows, that takes care of creating the core. What is the best way to change dataDir (got as input from SolrQueryRequest) before creating SolrCore-s. public class CustomRequestHandler extends RequestHandlerBase implements SolrCoreAware { private CoreDescriptor coreDescriptor; private String coreName; @Override public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception { CoreContainer container = this.coreDescriptor.getCoreContainer(); // TODO: Parse XML to extract data // container.reload(this.coreName); // or // 2. // TODO: Set the new configuration for the data directory / before creating the new core. SolrCore newCore = container.create(this.coreDescriptor); container.register(this.coreName, newCore, false); } @Override public void inform(SolrCore core) { coreDescriptor = core.getCoreDescriptor(); coreName = core.getName(); } } 2) When a close() happens on an existing SolrCore - what happens when there is a long running IndexReader query on that SolrCore . Is that terminated abruptly / would the close wait until the IndexReaders completes the Query. * The same process is repeated potentially for multiple SolrCores as well, with additional closeHooks that might do some heavy i/o tasks - talking over the network etc. Right now - these long running processes are done in an independent thread so that they do not block SolrCore.close() with the currently nightly builds.
Re: Some solrconfig.xml attributes being ignored
Hi Erik, Thanks a lot for looking into this, it's greatly appreciated. Mark On Tue, Dec 16, 2008 at 2:51 AM, Erik Hatcher e...@ehatchersolutions.comwrote: Mark, Looked at the code to discern this... A fragmenter isn't responsible for the number of snippets - the higher level SolrHighlighter is the component that uses that parameter. So yes, it must be specified at the request handler level, not the fragmenter configuration. Erik On Dec 15, 2008, at 7:35 PM, Mark Ferguson wrote: It seems like maybe the fragmenter parameters just don't get displayed with echoParams=all set. It may only display as far as the request handler's parameters. The reason I think this is because I tried increasing hl.fragsize to 1000 and the results were returned correctly (much larger snippets), so I know it was read correctly. I moved hl.snippets into the requestHandler config instead of the fragmenter, and this seems to have solved the problem. However, I'm uneasy with this solution because I don't know why it wasn't being read correctly when setting it inside the fragmenter. Mark On Mon, Dec 15, 2008 at 5:08 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Thanks for this tip, it's very helpful. Indeed, it looks like none of the highlighting parameters are being included. It's using the correct request handler and hl is set to true, but none of the highlighting parameters from solrconfig.xml are in the parameter list. Here is my query: http://localhost:8080/solr1/select?rows=50hl=truefl=url,urlmd5,page_title,scoreechoParams=allq=java Here are the settings for the request handler and the highlighter: requestHandler name=dismax class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str float name=tie0.01/float str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str str name=q.alt*:*/str str name=hl.flbody_text page_title meta_desc/str str name=f.page_title.hl.fragsize0/str str name=f.meta_desc.hl.fragsize0/str str name=hl.fragmenterregex/str /lst /requestHandler highlighting fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter /highlighting And here is the param list returned to me: lst name=params str name=echoParamsall/str str name=tie0.01/str str name=hl.fragmenterregex/str str name=f.page_title.hl.fragsize0/str str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str str name=f.meta_desc.hl.fragsize0/str str name=q.alt*:*/str str name=hl.flpage_title,body_text/str str name=defTypedismax/str str name=echoParamsall/str str name=flurl,urlmd5,page_title,score/str str name=qjava/str str name=hltrue/str str name=rows50/str /lst So it seems like everything is working except for the highlighter. I should mention that when I enter a bogus fragmenter as a parameter (e.g. hl.fragmenter=bogus), it returns a 400 error that the fragmenter cannot be found, so the config file _is_ finding the regex fragmenter. It just doesn't seem to actually be including its parameters... Any ideas are appreciated, thanks again for the help. Mark On Mon, Dec 15, 2008 at 4:23 PM, Yonik Seeley ysee...@gmail.com wrote: Try adding echoParams=all to your query to verify the params that the solr request handler is getting. -Yonik On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Hello, In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When I perform a search, it returns only a single snippet for each highlighted field. However, when I set the hl.snippets field manually as a search parameter, I get up to 3 highlighted snippets. This is the configuration that I am using to set the highlighted parameters: fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter I tried setting hl.fragmenter=regex as a parameter as well, to be sure that it was using the correct one, and the result set is the same. Any ideas what could be causing this attribute not to be read? It has me concerned that other attributes are being ignored as well. Thanks, Mark Ferguson
sorting question
Hey there, I am using sort at searching time. I would like to know the advantages of using sint field type instead of integer field type. Can't find it anywhere I am sorting asc by price. The problem is that not all docs have the field price and the ones that haven't it apear as first results. Is there any way to make them apear as lasts results? Thanks in advance -- View this message in context: http://www.nabble.com/sorting-question-tp21037621p21037621.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtractingRequestHandler and XmlUpdateHandler
No, I didn't mean storing the binary along with, just that I could send a binary file (or a text file) which tika could process and store along with the XML which describes its literal meta-data. Best, Jacob On Mon, Dec 15, 2008 at 7:17 PM, Grant Ingersoll gsing...@apache.org wrote: On Dec 15, 2008, at 8:20 AM, Jacob Singh wrote: Hi Erik, Sorry I wasn't totally clear. Some responses inline: If the file is visible from the Solr server, there is no need to actually send the bits through HTTP. Solr's content steam capabilities allow a file to be retrieved from Solr itself. Yeah, I know. But in my case not possible. Perhaps a simple file receiving HTTP POST handler which simply stored the file on disk and returned a path to it is the way to go here. So I could send the file, and receive back a token which I would then throw into one of my fields as a reference. Then using it to map tika fields as well. like: str name=file_mod_date${FILETOKEN}.last_modified/str str name=file_body${FILETOKEN}.content/str Huh? I'm don't follow the file token thing. Perhaps you're thinking you'll post the file, then later update other fields on that same document. An important point here is that Solr currently does not have document update capabilities. A document can be fully replaced, but cannot have fields added to it, once indexed. It needs to be handled all in one shot to accomplish the blending of file/field indexing. Note the ExtractingRequestHandler already has the field mapping capability. Sorta... I was more thinking of a new feature wherein a Solr Request handler doesn't actually put the file in the index, merely runs it through tika and stores a datastore which links a token with the tika extraction. Then the client could make another request w/ the XMLUpdateHandler which referenced parts of the stored tika extraction. Hmmm, thinking out loud Override SolrContentHandler. It is responsible for mapping the Tika output to a Solr Document. Capture all the content into a single buffer. Add said buffer to a field that is stored only Add a second field that is indexed. This is your token. You could, just as well, have that token be the only thing that gets returned by extract only. Alternately, you could implement an UpdateProcessor thingamajob that takes the output and stores it to the filesystem and just adds the token to a document. But, here's a solution that will work for you right now... let Tika extract the content and return back to you, then turn around and post it and whatever other fields you like: http://wiki.apache.org/solr/TikaExtractOnlyExampleOutput In that example, the contents aren't being indexed, just returned back to the client. And you can leverage the content stream capability with this as well avoiding posting the actual binary file, pointing the extracting request to a file path visible by Solr. Yeah, I saw that. This is pretty much what I was talking about above, the only disadvantage (which is a deal breaker in our case) is the extra bandwidth to move the file back and forth. Thanks for your help and quick response. I think we'll integrate the POST fields as Grant has kindly provided multi-value input now, and see what happens in the future. I realize what I'm talking about (XML and binary together) is probably not a high priority feature. Is the use case this: 1. You want to assign metadata and also store the original and have it stored in binary format, too? Thus, Solr becomes a backing, searchable store? I think we could possibly add an option to serialize the ContentStream onto a Field on the Document. In other words, store the original with the Document. Of course, buyer beware on the cost of doing so. -- +1 510 277-0891 (o) +91 33 7458 (m) web: http://pajamadesign.com Skype: pajamadesign Yahoo: jacobsingh AIM: jacobsingh gTalk: jacobsi...@gmail.com
Re: sorting question
sint sorts in numeric order, int does not. check the sortMissingLast params in the example config On Dec 16, 2008, at 12:24 PM, Marc Sturlese wrote: Hey there, I am using sort at searching time. I would like to know the advantages of using sint field type instead of integer field type. Can't find it anywhere I am sorting asc by price. The problem is that not all docs have the field price and the ones that haven't it apear as first results. Is there any way to make them apear as lasts results? Thanks in advance -- View this message in context: http://www.nabble.com/sorting-question-tp21037621p21037621.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: sorting question
Another possibility is to put a price:[1 TO *] at your query. 2008/12/16 Ryan McKinley ryan...@gmail.com sint sorts in numeric order, int does not. check the sortMissingLast params in the example config On Dec 16, 2008, at 12:24 PM, Marc Sturlese wrote: Hey there, I am using sort at searching time. I would like to know the advantages of using sint field type instead of integer field type. Can't find it anywhere I am sorting asc by price. The problem is that not all docs have the field price and the ones that haven't it apear as first results. Is there any way to make them apear as lasts results? Thanks in advance -- View this message in context: http://www.nabble.com/sorting-question-tp21037621p21037621.html Sent from the Solr - User mailing list archive at Nabble.com. -- Alexander Ramos Jardim
Solrj client - CommonsHttpSolrServer - getting solr.solr.home
I am reading the wiki here at - http://wiki.apache.org/solr/Solrj . Is there a requestHandler ( may be - some admin handler ) already present that can retrieve the solr.solr.home for a given CommonsHttpSolrServer instance ( for a given solr endpoint), through an api.
Re: Some solrconfig.xml attributes being ignored
Mark, Looked at the code to discern this... A fragmenter isn't responsible for the number of snippets - the higher level SolrHighlighter is the component that uses that parameter. So yes, it must be specified at the request handler level, not the fragmenter configuration. Erik On Dec 15, 2008, at 7:35 PM, Mark Ferguson wrote: It seems like maybe the fragmenter parameters just don't get displayed with echoParams=all set. It may only display as far as the request handler's parameters. The reason I think this is because I tried increasing hl.fragsize to 1000 and the results were returned correctly (much larger snippets), so I know it was read correctly. I moved hl.snippets into the requestHandler config instead of the fragmenter, and this seems to have solved the problem. However, I'm uneasy with this solution because I don't know why it wasn't being read correctly when setting it inside the fragmenter. Mark On Mon, Dec 15, 2008 at 5:08 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Thanks for this tip, it's very helpful. Indeed, it looks like none of the highlighting parameters are being included. It's using the correct request handler and hl is set to true, but none of the highlighting parameters from solrconfig.xml are in the parameter list. Here is my query: http://localhost:8080/solr1/select?rows=50hl=truefl=url,urlmd5,page_title,scoreechoParams=allq=java Here are the settings for the request handler and the highlighter: requestHandler name=dismax class=solr.SearchHandler default=true lst name=defaults str name=defTypedismax/str float name=tie0.01/float str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str str name=q.alt*:*/str str name=hl.flbody_text page_title meta_desc/str str name=f.page_title.hl.fragsize0/str str name=f.meta_desc.hl.fragsize0/str str name=hl.fragmenterregex/str /lst /requestHandler highlighting fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter /highlighting And here is the param list returned to me: lst name=params str name=echoParamsall/str str name=tie0.01/str str name=hl.fragmenterregex/str str name=f.page_title.hl.fragsize0/str str name=qfbody_text^1.0 page_title^1.6 meta_desc^1.3/str str name=f.meta_desc.hl.fragsize0/str str name=q.alt*:*/str str name=hl.flpage_title,body_text/str str name=defTypedismax/str str name=echoParamsall/str str name=flurl,urlmd5,page_title,score/str str name=qjava/str str name=hltrue/str str name=rows50/str /lst So it seems like everything is working except for the highlighter. I should mention that when I enter a bogus fragmenter as a parameter (e.g. hl.fragmenter=bogus), it returns a 400 error that the fragmenter cannot be found, so the config file _is_ finding the regex fragmenter. It just doesn't seem to actually be including its parameters... Any ideas are appreciated, thanks again for the help. Mark On Mon, Dec 15, 2008 at 4:23 PM, Yonik Seeley ysee...@gmail.com wrote: Try adding echoParams=all to your query to verify the params that the solr request handler is getting. -Yonik On Mon, Dec 15, 2008 at 6:10 PM, Mark Ferguson mark.a.fergu...@gmail.com wrote: Hello, In my solrconfig.xml file I am setting the attribute hl.snippets to 3. When I perform a search, it returns only a single snippet for each highlighted field. However, when I set the hl.snippets field manually as a search parameter, I get up to 3 highlighted snippets. This is the configuration that I am using to set the highlighted parameters: fragmenter name=regex class=org.apache.solr.highlight.RegexFragmenter default=true lst name=defaults str name=hl.snippets3/str str name=hl.fragsize100/str str name=hl.regex.slop0.5/str str name=hl.regex.pattern\w[-\w ,/\n\']{50,150}/str /lst /fragmenter I tried setting hl.fragmenter=regex as a parameter as well, to be sure that it was using the correct one, and the result set is the same. Any ideas what could be causing this attribute not to be read? It has me concerned that other attributes are being ignored as well. Thanks, Mark Ferguson
RE: php client. json communication
Or have a look at the Wiki, probably a better way to start: http://wiki.apache.org/solr/SolPHP Best, Patrick -- Just trying to help http://www.ipros.nl/ -- -Original Message- From: KishoreVeleti CoreObjects [mailto:kisho...@coreobjects.com] Sent: dinsdag 16 december 2008 15:14 To: solr-user@lucene.apache.org Subject: Re: php client. json communication Check out this link http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html If anyone of you used it can you share your experiences. Thanks, Kishore Veleti A.V.K. Julian Davchev wrote: Hi, I am about to integrate solr for index/search of my documents/data. It's php application but I see it should be no problem as solr works with xml by default. Is there any read php lib that will ease/help whole communication with solr and if possible to send/receive json data. I looked up archive list and seems not many discussions in php. Also from manual it seems that it can only get json response but request should always be xml. Cheers, -- View this message in context: http://www.nabble.com/php-client.-json-communication-tp21033573p21033806 .html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solrj client - CommonsHttpSolrServer - getting solr.solr.home
perhaps CoreAdminRequest... it does not give you the property, but you can see where things are... http://wiki.apache.org/solr/CoreAdmin On Dec 16, 2008, at 12:53 PM, Kay Kay wrote: I am reading the wiki here at - http://wiki.apache.org/solr/Solrj . Is there a requestHandler ( may be - some admin handler ) already present that can retrieve the solr.solr.home for a given CommonsHttpSolrServer instance ( for a given solr endpoint), through an api.
Re: php client. json communication
Check out this link http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html If anyone of you used it can you share your experiences. Thanks, Kishore Veleti A.V.K. Julian Davchev wrote: Hi, I am about to integrate solr for index/search of my documents/data. It's php application but I see it should be no problem as solr works with xml by default. Is there any read php lib that will ease/help whole communication with solr and if possible to send/receive json data. I looked up archive list and seems not many discussions in php. Also from manual it seems that it can only get json response but request should always be xml. Cheers, -- View this message in context: http://www.nabble.com/php-client.-json-communication-tp21033573p21033806.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Unwanted clustering of search results after sorting by score
: Max - field collapsing may be your friend - https://issues.apache.org/jira/browse/SOLR-236 that doesn't really seem related ... i don't believe Max wants to see all results from a store collapsed into on result, i think he wants to see results from differnet stores treated more fairly and to eliminate the clustering effect he's seeing where differnet products from the same store tend to have similar scores because of the way the store provides the data (and not because of any inherent relevancy of hte proudcts) Max: to really diagnose something like this, you have to consider all the details about what exactly your queries look like and spend a lot of time looking at score explanations to really get a sense for the trend of why certain stores score higher then others. off the cuff, the only thing i can comment on is this specific example you made... : Shop 'foo' describes its products with 250 words and uses the searched : word once. Shop 'bar' describes its products with only 25 words and also : uses the searched word once. The score for shop 'foo' will be much worst : than for shop 'bar'. In a search in which are many products of shop : 'foo' and 'bar' the products of shop 'bar' are shown before the products : of shop 'foo'. depending on how you look at it, 'foo' is spamming you with excess keywords and bar deserves to get higher scores. eliminating tf probably isn't wise, but you might want to consider omiting norms, so the length of hte field doesn't factor in ... or you might want to try customizing your lengthNorm function (requires writing a SImilarity class) to make it flatter for 25-250 terms, but have a sharp spike if they go above 250 (if you consider 250 the threshold for a product description before you decide it's spam). You could also consider adding a numeric shop_fudge_factor field that you populate with a number indicating the average number of terms in product descriptions from that shop (you'd have to compute this yourself and add it to every document) and then use that as part of a FunctionQuery to fudge the scores for stores that are long winded a little higher. I would never do that personally though (it encourages keyword spamming in product descriptions) but it's something you can try. A suggestion of *least* resort: if you customize your Similarity class such that all the methods round the score components to very course granularity (ie: 1.2 instead of 1.234567) you should wind up with more tight groupings of products with the *exact* same score ... you could then do a secondary sort on something else (random perhaps?) to try and make the ordering more fair. (i really have no idea how well that might work) -Hoss
php client. json communication
Hi, I am about to integrate solr for index/search of my documents/data. It's php application but I see it should be no problem as solr works with xml by default. Is there any read php lib that will ease/help whole communication with solr and if possible to send/receive json data. I looked up archive list and seems not many discussions in php. Also from manual it seems that it can only get json response but request should always be xml. Cheers,
Re: Please help me articulate this query
Excellent, thank you! :) -Derek On Mon, Dec 15, 2008 at 8:45 PM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Derek, q=+referring:XXX +question:YYY (of course, you'll have to URL-encode that query string0 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Derek Springer de...@mahalo.com To: solr-user@lucene.apache.org Sent: Monday, December 15, 2008 3:40:55 PM Subject: Re: Please help me articulate this query Thanks for the tip, I appreciate it! However, does anyone know how to articulate the syntax of (This AND That) OR (Something AND Else) into a query string? i.e. q=referring:### AND question:### On Mon, Dec 15, 2008 at 12:32 PM, Stephen Weiss wrote: I think in this case you would want to index each question with the possible referrers ( by title might be too imprecise, I'd go with filename or ID) and then do a search like this (assuming in this case it's by filename) q=(referring:TomCruise.html) OR (question: Tom AND Cruise) Which seems to be what you're thinking. I would make the referrer a type string though so that you don't accidentally pull in documents from a different subject (Tom Cruise this would work ok, but imagine you need to distinguish between George Washington and George Washington Carver). -- Steve On Dec 15, 2008, at 2:59 PM, Derek Springer wrote: Hey all, I'm having trouble articulating a query and I'm hopeful someone out there can help me out :) My situation is this: I am indexing a series of questions that can either be asked from a main question entry page, or a specific subject page. I have a field called referring which indexes the title of the specific subject page, plus the regular question whenever that document is submitted from a specific specific subject page. Otherwise, every document is indexed with just the question. Specifically, what I am trying to do is when I am on the page specific subject page (e.g. Tom Cruise) I want to search for all of the questions asked from that page, plus any question asked about Tom Cruise. Something like: q=(referring:Tom AND Cruise) OR (question:Tom AND Cruise) Have you ever used a Tom Tom? - Not returned Where is the best place to take a cruise? - Not returned When did he have is first kid? - Returned iff question was asked from Tom Cruise page Do you think that Tom Cruise will make more movies? - Always returned Any thoughts? -Derek
Re: Boosting by date when only some records have one
: if(document.hasDateField == 1){ : boost = somefunction(document.dateField); : } else{ : boost = 1; : } bq = ( ( +hasDateField:true _val_:somefunc(dateField) ) ( -hasDateField:true *:*^1 ) ) That covers the possiblility that hasDateField is not set for some docs. The query get's simpler if you can concretely know that hasDateField will always have a value of true or false... bq = ( hasDateField:false^1 ( +hasDateField:true _val_:somefunc(dateField) ) -Hoss
Using DIH, getting exception
Hi All, I'm trying to use the Data import handler, with the data config below (snippet): dataSource type=JdbcDataSource name=mySource driver=com.mysql.jdbc.Driver url=jdbc:mysql://myhost/myDB user=username password=password/ document name=post The variables are all good (userrname+password, etc), but I'm getting the following exception, any thoughts? org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource :null available for entity :item Processing Document # Best, Patrick
checkout 1.4 snapshot
Hello, Someone could tell me how can i checkout the 1.4 snapshot ? thanks, -- Without love, we are birds with broken wings. Morrie
RE: checkout 1.4 snapshot
Hi, You can find the SVN repository here: http://www.apache.org/dev/version-control.html#anon-svn I'm not sure if this represent the 1.4 version, but as being the trunk it's the latest version. Best, Patrick -Original Message- From: roberto [mailto:miles.c...@gmail.com] Sent: dinsdag 16 december 2008 22:13 To: solr-user@lucene.apache.org Subject: checkout 1.4 snapshot Hello, Someone could tell me how can i checkout the 1.4 snapshot ? thanks, -- Without love, we are birds with broken wings. Morrie
RE: checkout 1.4 snapshot
Sorry all, Wrong url in the post, right url should be: http://svn.apache.org/repos/asf/lucene/solr/ Best, Patrick -Original Message- From: Plaatje, Patrick [mailto:patrick.plaa...@getronics.com] Sent: dinsdag 16 december 2008 22:19 To: solr-user@lucene.apache.org Subject: RE: checkout 1.4 snapshot Hi, You can find the SVN repository here: http://www.apache.org/dev/version-control.html#anon-svn I'm not sure if this represent the 1.4 version, but as being the trunk it's the latest version. Best, Patrick -Original Message- From: roberto [mailto:miles.c...@gmail.com] Sent: dinsdag 16 december 2008 22:13 To: solr-user@lucene.apache.org Subject: checkout 1.4 snapshot Hello, Someone could tell me how can i checkout the 1.4 snapshot ? thanks, -- Without love, we are birds with broken wings. Morrie
Re: checkout 1.4 snapshot
I'll try to get the source from this link, thanks On Tue, Dec 16, 2008 at 7:23 PM, Plaatje, Patrick patrick.plaa...@getronics.com wrote: Sorry all, Wrong url in the post, right url should be: http://svn.apache.org/repos/asf/lucene/solr/ Best, Patrick -Original Message- From: Plaatje, Patrick [mailto:patrick.plaa...@getronics.com] Sent: dinsdag 16 december 2008 22:19 To: solr-user@lucene.apache.org Subject: RE: checkout 1.4 snapshot Hi, You can find the SVN repository here: http://www.apache.org/dev/version-control.html#anon-svn I'm not sure if this represent the 1.4 version, but as being the trunk it's the latest version. Best, Patrick -Original Message- From: roberto [mailto:miles.c...@gmail.com] Sent: dinsdag 16 december 2008 22:13 To: solr-user@lucene.apache.org Subject: checkout 1.4 snapshot Hello, Someone could tell me how can i checkout the 1.4 snapshot ? thanks, -- Without love, we are birds with broken wings. Morrie -- Without love, we are birds with broken wings. Morrie
Re: php client. json communication
Hi, 1. Thanks for links, I looked at both. Still I think that solr or at least those php clients doesn't support jason as input. It's clear that it's possible to get json response.but search is only possible via xml queries. Plaatje, Patrick wrote: Or have a look at the Wiki, probably a better way to start: http://wiki.apache.org/solr/SolPHP Best, Patrick -- Just trying to help http://www.ipros.nl/ -- -Original Message- From: KishoreVeleti CoreObjects [mailto:kisho...@coreobjects.com] Sent: dinsdag 16 december 2008 15:14 To: solr-user@lucene.apache.org Subject: Re: php client. json communication Check out this link http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html If anyone of you used it can you share your experiences. Thanks, Kishore Veleti A.V.K. Julian Davchev wrote: Hi, I am about to integrate solr for index/search of my documents/data. It's php application but I see it should be no problem as solr works with xml by default. Is there any read php lib that will ease/help whole communication with solr and if possible to send/receive json data. I looked up archive list and seems not many discussions in php. Also from manual it seems that it can only get json response but request should always be xml. Cheers, -- View this message in context: http://www.nabble.com/php-client.-json-communication-tp21033573p21033806 .html Sent from the Solr - User mailing list archive at Nabble.com.
RE: php client. json communication
Glad that's sorted. On the other issue (directly accessing solr from any client) I think I saw a discussion on the list earlier, but I don't know what the result was, browse through the archives and look for something about security (I think). Best, patrick -Original Message- From: Julian Davchev [mailto:j...@drun.net] Sent: dinsdag 16 december 2008 23:02 To: solr-user@lucene.apache.org Subject: Re: php client. json communication I think I got it now. Search request is actually just simple url with few params...no json or xml or fancy stuff needed. I was concerned with this cause I need to use solr with javascript directly, bypassing application and directly searching stuff. Plaatje, Patrick wrote: Hi Julian, I'm a bit confused. The indexing is indeed being done through XML, but in searching it is possible to get JSON results by using the wt=json parameter, have a look here: http://wiki.apache.org/solr/SolJSON Best, Patrick -Original Message- From: Julian Davchev [mailto:j...@drun.net] Sent: dinsdag 16 december 2008 22:39 To: solr-user@lucene.apache.org Subject: Re: php client. json communication Hi, 1. Thanks for links, I looked at both. Still I think that solr or at least those php clients doesn't support jason as input. It's clear that it's possible to get json response.but search is only possible via xml queries. Plaatje, Patrick wrote: Or have a look at the Wiki, probably a better way to start: http://wiki.apache.org/solr/SolPHP Best, Patrick -- Just trying to help http://www.ipros.nl/ -- -Original Message- From: KishoreVeleti CoreObjects [mailto:kisho...@coreobjects.com] Sent: dinsdag 16 december 2008 15:14 To: solr-user@lucene.apache.org Subject: Re: php client. json communication Check out this link http://www.ibm.com/developerworks/library/os-php-apachesolr/index.htm l If anyone of you used it can you share your experiences. Thanks, Kishore Veleti A.V.K. Julian Davchev wrote: Hi, I am about to integrate solr for index/search of my documents/data. It's php application but I see it should be no problem as solr works with xml by default. Is there any read php lib that will ease/help whole communication with solr and if possible to send/receive json data. I looked up archive list and seems not many discussions in php. Also from manual it seems that it can only get json response but request should always be xml. Cheers, -- View this message in context: http://www.nabble.com/php-client.-json-communication-tp21033573p21033 8 06 .html Sent from the Solr - User mailing list archive at Nabble.com.
RE: php client. json communication
Hi Julian, I'm a bit confused. The indexing is indeed being done through XML, but in searching it is possible to get JSON results by using the wt=json parameter, have a look here: http://wiki.apache.org/solr/SolJSON Best, Patrick -Original Message- From: Julian Davchev [mailto:j...@drun.net] Sent: dinsdag 16 december 2008 22:39 To: solr-user@lucene.apache.org Subject: Re: php client. json communication Hi, 1. Thanks for links, I looked at both. Still I think that solr or at least those php clients doesn't support jason as input. It's clear that it's possible to get json response.but search is only possible via xml queries. Plaatje, Patrick wrote: Or have a look at the Wiki, probably a better way to start: http://wiki.apache.org/solr/SolPHP Best, Patrick -- Just trying to help http://www.ipros.nl/ -- -Original Message- From: KishoreVeleti CoreObjects [mailto:kisho...@coreobjects.com] Sent: dinsdag 16 december 2008 15:14 To: solr-user@lucene.apache.org Subject: Re: php client. json communication Check out this link http://www.ibm.com/developerworks/library/os-php-apachesolr/index.html If anyone of you used it can you share your experiences. Thanks, Kishore Veleti A.V.K. Julian Davchev wrote: Hi, I am about to integrate solr for index/search of my documents/data. It's php application but I see it should be no problem as solr works with xml by default. Is there any read php lib that will ease/help whole communication with solr and if possible to send/receive json data. I looked up archive list and seems not many discussions in php. Also from manual it seems that it can only get json response but request should always be xml. Cheers, -- View this message in context: http://www.nabble.com/php-client.-json-communication-tp21033573p210338 06 .html Sent from the Solr - User mailing list archive at Nabble.com.
umlaut index ö == o == oe Possible?
Hi, I am just going through http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and maillist archive but somehow can't find the solution. Is it possible that I treat 'möchten' , 'mochten' and 'moechten' the same way. Of course not hardcoding this but rather work for any umlaut. Cheers
Re: umlaut index ö == o == oe Poss ible?
I believe the german porter stemmer should handle this. I haven't used it with SOLR but I've used it with other projects, and basically, when the word is parsed, the umlauts and also accented vowels are converted to plain vowels. I guess with SOLR you use solr.SnowballPorterFilterFactory: http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#head-b80fb581f4e078142c694014f1a8f60c0935e080 with the German option (like in their example). You probably want to apply this both at index and query time. -- Steve On Dec 16, 2008, at 6:02 PM, Julian Davchev wrote: Hi, I am just going through http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters and maillist archive but somehow can't find the solution. Is it possible that I treat 'möchten' , 'mochten' and 'moechten' the same way. Of course not hardcoding this but rather work for any umlaut. Cheers
Auto Suggest
Hello, I'm looking to implement one auto suggest search feature, i found a fews posts with some information about the EdgeNGramFilterFactory but i couldn't understand very well how to implement, someone can send me kindly the way? thanks, -- Without love, we are birds with broken wings. Morrie
Re: Using DIH, getting exception
you either remove the name=mySource from the dataSource tag or add dataSource=mySource in the entity 'item' On Wed, Dec 17, 2008 at 2:26 AM, Plaatje, Patrick patrick.plaa...@getronics.com wrote: Hi All, I'm trying to use the Data import handler, with the data config below (snippet): dataSource type=JdbcDataSource name=mySource driver=com.mysql.jdbc.Driver url=jdbc:mysql://myhost/myDB user=username password=password/ document name=post The variables are all good (userrname+password, etc), but I'm getting the following exception, any thoughts? org.apache.solr.handler.dataimport.DataImportHandlerException: No dataSource :null available for entity :item Processing Document # Best, Patrick -- --Noble Paul
RE: minimum match issue with dismax
: str name=mm2lt;-25%/str : /lst : /requestHandler : : correct me if i am wrong isn't the above mm config means if its 1 or 2 : terms then match all but if its more than 2 terms then 25% can be : missing. i get below as parsed query This is exactly what you asked in another thread, which i answered in the other thread... http://www.nabble.com/multiword-query-using-dismax-to20920925.html#a20920925 ...by refering to this exact example in the dismax docs... http://wiki.apache.org/solr/DisMaxRequestHandler both of the examples below ultimately mean that for more then 2 clauses 25% can be missing ... the difference is in how fractions are dealt with: the percentage of the total number of clauses is computed and then rounded down to get an integer. If there was a minus sign in front of the percentage that integer is subtracted from the total number of clauses... 2-25% ... 25% of 3 is 0.75, 0.75 rounded down is 0, 3-0 is 3 275% ... 75% of 3 is 2.25, 2.25 rounded down is 2. this is also mentioned *explicitly* in the other URL i refered you to last time... http://lucene.apache.org/solr/api/org/apache/solr/util/doc-files/min-should-match.html When dealing with percentages, negative values can be used to get different behavior in edge cases. 75% and -25% mean the same thing when dealing with 4 clauses, but when dealing with 5 clauses 75% means 3 are required, but -25% means 4 are required. It sounds like you want 75% instead of -25% -Hoss
Re: ExtractingRequestHandler and XmlUpdateHandler
: : If I can find the bandwidth, I'd like to make something which allows : : file uploads via the XMLUpdateHandler as well... Do you have any ideas : : the XmlUpdateRequestHandler already supports file uploads ... all request : But it doesn't do what Jacob is asking for... he wants (if I'm not mistaken) Hmm ... i thought this was an offshoot question ... the main point of this thread seems to have already been solved by the new ext.literal.${fieldname}=${fieldvalue} param support Grant just added to ExtractingRequestHandler right? what am i missunderstanding about the usecase that isn't solved by that? the tika doc from the ContentStream is the primary guts of the doc, with additional literal metadata fields being added, correct? (I can imagine a more complicated usecase where someone might want a single document built from multiple ContentStreams parsed by Tika, with differnet pieces of each TikaDoc contributing in differnet ways ... ie: my name is Hoss, my address is X, my phone number is Y, this first ContentStream should be indexed as my bio field (doesn't matter if it's PDF, HTML, MS-Word, etc.), index and store the ID3 Title length from any MP3 ContentStreams in the multivalued lecture_title and lecture_length fields, and any ContentStreams left over should be indexed in the misc other_text field. but that's not what we're talking about here, correct?) -Hoss
Re: Boosting by date when only some records have one
Hi, thanks a lot! Looks like what I need except that I cannot use dismax because I need to be able to do prefix queries. I'm new to Solr, so I don't know if there's a way to formulate this in a standard query. If not, I could extend DismaxRequestHandler so it doesn't escape the *s, right? Robert Chris Hostetter wrote: : if(document.hasDateField == 1){ : boost = somefunction(document.dateField); : } else{ : boost = 1; : } bq = ( ( +hasDateField:true _val_:somefunc(dateField) ) ( -hasDateField:true *:*^1 ) ) That covers the possiblility that hasDateField is not set for some docs. The query get's simpler if you can concretely know that hasDateField will always have a value of true or false... bq = ( hasDateField:false^1 ( +hasDateField:true _val_:somefunc(dateField) ) -Hoss
Re: Boosting by date when only some records have one
: thanks a lot! Looks like what I need except that I cannot use dismax : because I need to be able to do prefix queries. I'm new to Solr, so I there's nothing dismax related in that syntax, i just suggested using it in a bq param becuase i assumed that's what you were using. q = +pre:fix* (hasDateField:false^1 (+hasDateField:true _val_:somefunc(dateField))) : bq = ( hasDateField:false^1 ( +hasDateField:true _val_:somefunc(dateField) ) -Hoss
Is making removal of wildcards configurable planned for DisMaxRequestHandler
Hi, I'm rather new to Solr and for my current projects came to the conclusion that DisMaxRequestHandler is exactly the tool I need, except that it doesn't allow prefix queries. I found a thread in the archive were someone mentioned the idea of making this behaviour configurable (which characters to strip from the query parameter). Is someone working on this or is my best option currently to implement this behaviour by copying code from DisMaxRequestHandler and modify the code that strips illegal operators? Thanks in advance, Robert
Re: RequestHandler lifecycle in a Multiple SolrCore context
On Wed, Dec 17, 2008 at 3:11 AM, Kay Kay kaykay.uni...@gmail.com wrote: If I have a CustomRequestHandler that inherits from RequestHandlerBase implementing SolrCoreAware interface - what is the lifecycle of the invocation of method - inform(SolrCore) in a multiple SolrCore context . Would it be that we have separate instances of RequestHandlers created for each SolrCore. Yes. Each core maintains an instance of every request handlers configured for that core. -- Regards, Shalin Shekhar Mangar.
Re: Boosting by date when only some records have one
Chris Hostetter wrote: : thanks a lot! Looks like what I need except that I cannot use dismax : because I need to be able to do prefix queries. I'm new to Solr, so I there's nothing dismax related in that syntax, i just suggested using it in a bq param becuase i assumed that's what you were using. q = +pre:fix* (hasDateField:false^1 (+hasDateField:true _val_:somefunc(dateField))) : bq = ( hasDateField:false^1 ( +hasDateField:true _val_:somefunc(dateField) ) -Hoss perfect, thanks!! Robert