Re: If you could have one feature in Solr...
Am 24.02.2010 um 14:42 schrieb Grant Ingersoll: What would it be? Remote administration/editing/filling of synonyms.txt, stopwords.txt, ... through a request handler, maybe a JSON interface or similar best Ingo -- Ingo Renner TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google Summer of Code Apache Solr for TYPO3: http://www.typo3-solr.com
Re: If you could have one feature in Solr...
Am 25.02.2010 um 02:07 schrieb Andy: 1) Built-in hierarchical faceting Right now there're 2 patches, SOLR-64 and SOLR-792. SOLR-64 seems to be slated for 1.5 release but according to the wiki seems to have poor performance. SOLR-792 has better performance according to the wiki but it's unclear if it'll ever be part of the Solr distribution. Not sure what are the pros cons of the 2 patches. Hierarchical faceting is very common and it'd be nice to have this as a standard feature 2) Near real-time update search 3) Partial update - something like LUCENE-1879 4) Automatic language detection stemmer selection I have document collections that are a mix of different languages. It'd be great to be able to automatically detect what language a specific document is in and select an appropriate stemmer for that language +1 on all of these Ingo -- Ingo Renner TYPO3 Core Developer, Release Manager TYPO3 4.2, Admin Google Summer of Code Apache Solr for TYPO3: http://www.typo3-solr.com
Re: If you could have one feature in Solr...
1. Real time or near-real time updates. 2. First-class spatial search. On Wed, Feb 24, 2010 at 9:42 AM, Grant Ingersoll gsing...@apache.orgwrote: What would it be? -- Jacob Elder
Re: If you could have one feature in Solr...
(Sorry for very late response on this topic.) On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote: - langage attribute for each field I was thinking about it and it was one of my wishes. Currently, Solr practically requires that we have a field for each natural language that an application supports. If the app needs to support English, French and German, we would have to have title_en, title_fr, and title_de (suffixes are ISO 2-letter lang codes) instead of just a title field. This isn't pretty. What if we want to support 15 languages? It would be much better if we can have just one title field and language information associated with the value. But after I thought about it a bit deeper, I think the current ugly solution is actually practical. This is because most users want to find documents of the languages they understand. So if a user indicate they understand English and German only, we just need to search title_en and title_de. Maybe I'm missing something... Teruhiko Kuro Kurosaka, 415-227-9600 x122 RLP + Lucene Solr = powerful search for global contents
Re: If you could have one feature in Solr...
Most databases only RECENTLY have set up langauges per column. Languages per ENTRY in a column? I don't think any support that yet. How would you get that information from a database with the corresponding language attribute? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 3/24/10, Teruhiko Kurosaka k...@basistech.com wrote: From: Teruhiko Kurosaka k...@basistech.com Subject: Re: If you could have one feature in Solr... To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, March 24, 2010, 11:36 AM (Sorry for very late response on this topic.) On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote: - langage attribute for each field I was thinking about it and it was one of my wishes. Currently, Solr practically requires that we have a field for each natural language that an application supports. If the app needs to support English, French and German, we would have to have title_en, title_fr, and title_de (suffixes are ISO 2-letter lang codes) instead of just a title field. This isn't pretty. What if we want to support 15 languages? It would be much better if we can have just one title field and language information associated with the value. But after I thought about it a bit deeper, I think the current ugly solution is actually practical. This is because most users want to find documents of the languages they understand. So if a user indicate they understand English and German only, we just need to search title_en and title_de. Maybe I'm missing something... Teruhiko Kuro Kurosaka, 415-227-9600 x122 RLP + Lucene Solr = powerful search for global contents
Re: If you could have one feature in Solr...
First of all, I am not really concerned with per field (or per-column in DB term) portion of the original request. Most documents are monolingual. How languages are identified depends on your application, and database support of language tagging is not necessary. The database schema designer may have created a field that stores the language information, for example. If you are indexing documents that live in a file system, the directory hierarchy or the name of the documents might tell the language, assuming you have set up some standard naming convention. HTML documents may have the META tag for Content-Language. If it is from an HTTP feed, there may be Content-Language header. And if all else fails, or the information is not reliable, the language can be determined by analyzing the document statistically by software such as Nutch's Language Identifier, or commercial language identifier software like my employer, Basis Technology, sells. Most databases only RECENTLY have set up langauges per column. Languages per ENTRY in a column? I don't think any support that yet. How would you get that information from a database with the corresponding language attribute? Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Wed, 3/24/10, Teruhiko Kurosaka k...@basistech.com wrote: From: Teruhiko Kurosaka k...@basistech.com Subject: Re: If you could have one feature in Solr... To: solr-user@lucene.apache.org solr-user@lucene.apache.org Date: Wednesday, March 24, 2010, 11:36 AM (Sorry for very late response on this topic.) On Feb 28, 2010, at 5:47 AM, Adrien Specq wrote: - langage attribute for each field I was thinking about it and it was one of my wishes. Currently, Solr practically requires that we have a field for each natural language that an application supports. If the app needs to support English, French and German, we would have to have title_en, title_fr, and title_de (suffixes are ISO 2-letter lang codes) instead of just a title field. This isn't pretty. What if we want to support 15 languages? It would be much better if we can have just one title field and language information associated with the value. But after I thought about it a bit deeper, I think the current ugly solution is actually practical. This is because most users want to find documents of the languages they understand. So if a user indicate they understand English and German only, we just need to search title_en and title_de. Maybe I'm missing something... Teruhiko Kuro Kurosaka, 415-227-9600 x122 RLP + Lucene Solr = powerful search for global contents Teruhiko Kuro Kurosaka, 415-227-9600 x122 RLP + Lucene Solr = powerful search for global contents
Re: If you could have one feature in Solr...
On Fri, Mar 5, 2010 at 4:34 AM, Mark Miller markrmil...@gmail.com wrote: On 03/04/2010 05:56 PM, Chris Hostetter wrote: : The ability to read solr configuration files from the classpath instead of : solr.solr.home directory. Solr has always supported this. When SolrResourceLoader.openResourceLoader is asked to open a resource it first checks if it's an absolute path -- if it's not then it checks relative the conf dir (under whatever the instanceDir is, ie: Solr Home in a single core setup), then it checks relative the current working dir and if it still can't find it it checks via the current ClassLoader. that said: it's not something that a lot of people have ever taken advantage of, so it wouldn't suprise me if some features in Solr are buggy because they try to open files directly w/o utilizing openResourceLoader -- in particular a quick test of the trunk example using... java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home -jar start.jar ...seems to suggest that QueryElevationComponent isn't using openResource to look for elevate.xml (i set solr.solr.home in that line so solr would *NOT* attempt to look at ./solr ... it does need some sort of Solr Home, but in this case it was a completley empty directory) -Hoss I've been trying to think of ways to tackle this. I hate getConfigDir - it lets anyone just get around the ResourceLoader basically. It would be awesome to get rid of it somehow - it would make ZooKeeperSolrResourceLoader so much easier to get working correctly across the board. Why not just get rid of it? Components depending on filesystems is a big headache. The main thing I'm hung up on is how to update a file - some code I've seen uses getConfigDir to update files eg you get the content of solrconfig, then you want to update it and reload the core. Most other things, I think are doable without getConfigDir. QueryElevationComponent is actually sort of simple to get around - we just need to add an exists method that return true/false if the resource exists. QEC just uses getConfigDir to a do an exists on the elevate.xml - if its not there, it looks in the data dir. -- - Mark http://www.lucidimagination.com -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: If you could have one feature in Solr...
: The ability to read solr configuration files from the classpath instead of : solr.solr.home directory. Solr has always supported this. When SolrResourceLoader.openResourceLoader is asked to open a resource it first checks if it's an absolute path -- if it's not then it checks relative the conf dir (under whatever the instanceDir is, ie: Solr Home in a single core setup), then it checks relative the current working dir and if it still can't find it it checks via the current ClassLoader. that said: it's not something that a lot of people have ever taken advantage of, so it wouldn't suprise me if some features in Solr are buggy because they try to open files directly w/o utilizing openResourceLoader -- in particular a quick test of the trunk example using... java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home -jar start.jar ...seems to suggest that QueryElevationComponent isn't using openResource to look for elevate.xml (i set solr.solr.home in that line so solr would *NOT* attempt to look at ./solr ... it does need some sort of Solr Home, but in this case it was a completley empty directory) -Hoss
Re: If you could have one feature in Solr...
On 03/04/2010 05:56 PM, Chris Hostetter wrote: : The ability to read solr configuration files from the classpath instead of : solr.solr.home directory. Solr has always supported this. When SolrResourceLoader.openResourceLoader is asked to open a resource it first checks if it's an absolute path -- if it's not then it checks relative the conf dir (under whatever the instanceDir is, ie: Solr Home in a single core setup), then it checks relative the current working dir and if it still can't find it it checks via the current ClassLoader. that said: it's not something that a lot of people have ever taken advantage of, so it wouldn't suprise me if some features in Solr are buggy because they try to open files directly w/o utilizing openResourceLoader -- in particular a quick test of the trunk example using... java -Djetty.class.path=./solr/conf -Dsolr.solr.home=/tmp/new-solr-home -jar start.jar ...seems to suggest that QueryElevationComponent isn't using openResource to look for elevate.xml (i set solr.solr.home in that line so solr would *NOT* attempt to look at ./solr ... it does need some sort of Solr Home, but in this case it was a completley empty directory) -Hoss I've been trying to think of ways to tackle this. I hate getConfigDir - it lets anyone just get around the ResourceLoader basically. It would be awesome to get rid of it somehow - it would make ZooKeeperSolrResourceLoader so much easier to get working correctly across the board. The main thing I'm hung up on is how to update a file - some code I've seen uses getConfigDir to update files eg you get the content of solrconfig, then you want to update it and reload the core. Most other things, I think are doable without getConfigDir. QueryElevationComponent is actually sort of simple to get around - we just need to add an exists method that return true/false if the resource exists. QEC just uses getConfigDir to a do an exists on the elevate.xml - if its not there, it looks in the data dir. -- - Mark http://www.lucidimagination.com
Re: If you could have one feature in Solr...
- Built-in hierarchical faceting and - langage attribute for each field On Sat, Feb 27, 2010 at 9:59 PM, Stephen Weiss swe...@stylesight.comwrote: I think an examples page would be a good idea. We've already implemented search in Chinese, Japanese, and Spanish back with 1.3, but it was not really very well laid out how it was supposed to work - I had to dig through bits and pieces of people's configs left in the mailing list archives - and to be honest, I've never been 100% positive that we did it the right way. On the other hand, that it was possible was pretty obvious to me from reading the documentation (it was all in the API docs), it was just *how* to implement it that wasn't very clear for a non-java/lucene programmer like myself. -- Steve On Feb 25, 2010, at 1:06 PM, Robert Muir wrote: Yeah, Thai and Arabic have the stuff in Solr 1.4 For Chinese, if you want to do CJK bigram indexing, this is there too. If you want to do word-based smart indexing, you need to add an additional jar file to your classpath. we can add a wiki page with examples of how to use these maybe to make it easier? we could also add notes to new ones in lucene (hindi, czech, bulgarian, etc), as it might be easier to copy some code around and get them working with solr 1.4 than to write your own! separately, would you be interesting in helping with Bengali and Marathi?
Re: If you could have one feature in Solr...
On 2/24/10 8:42 AM, Grant Ingersoll wrote: What would it be? most of this will be coming in 1.5, but for me it's - sharding.. it still seems a bit clunky secondly.. this one isn't in 1.5. I'd like to be able to find interesting terms that appear in my result set that don't appear in the global corpus. it's kind of like doing a facet count on *:* and then on the search term and discount the terms that appear heavily on the global one. (sorry.. there is a textbook definition of this.. XX distance.. but I haven't got the books in front of me).
Re: If you could have one feature in Solr...
On 2010-02-28 17:26, Ian Holsman wrote: On 2/24/10 8:42 AM, Grant Ingersoll wrote: What would it be? most of this will be coming in 1.5, but for me it's - sharding.. it still seems a bit clunky secondly.. this one isn't in 1.5. I'd like to be able to find interesting terms that appear in my result set that don't appear in the global corpus. it's kind of like doing a facet count on *:* and then on the search term and discount the terms that appear heavily on the global one. (sorry.. there is a textbook definition of this.. XX distance.. but I haven't got the books in front of me). Kullback-Leibler divergence? -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: If you could have one feature in Solr...
On Feb 28, 2010, at 8:47 AM, Adrien Specq wrote: - Built-in hierarchical faceting Adrien - I'm curious what you mean by this exactly. Could you describe your hierarchical faceting needs by example?Often hierarchical faceting can be accomplished by simply indexing /level1/ level2/... type string fields, but there certainly are other ways to go about it as well. Thanks, Erik
Re: If you could have one feature in Solr...
On Wed, Feb 24, 2010 at 7:18 PM, Patrick Sauts patrick.via...@gmail.com wrote: Synchronisation between the slaves to switch the new index at the same time after replication. I shall open as issue for this. And let us figure out how best it should be done https://issues.apache.org/jira/browse/SOLR-1800
Re: If you could have one feature in Solr...
On Feb 26, 2010, at 11:28 PM, Dave Searle wrote: To have a coffee waiting for me every morning when I wake up. Marriage material indeed. Dave, Didn't you know that one already exists? http://localhost:8983/solr/admin/coffeehandler?type=ethiopiancream=falsesugar=truetogo=true :-) -Grant
Re: If you could have one feature in Solr...
Realtime search, hands down.
Re: If you could have one feature in Solr...
+1 I have several projects backburnered in the hope realtime search will come to solr soon... [m] On Feb 26, 2010, at 8:37 PM, Don Werve d...@madwombat.com wrote: Realtime search, hands down.
RE: If you could have one feature in Solr...
The indexer looking for an xml:lang attribute on text fields and using the value to pick, tokeniser, dictionaries, etc, etc automatically (and knowing to look for them in the standard places). cheers stuart
Re: If you could have one feature in Solr...
To have a coffee waiting for me every morning when I wake up. Marriage material indeed.
Re: If you could have one feature in Solr...
Grant, I'm not a java developer but a sysadmin and I've been struggling for a couple of month now to build a full web search engine stack based on hadoop + nutch + solr . I don't know much about the documentation for developers so I trust you if you say it's good. What I do know is that I found good docs for the very first steps (installing and performing simple single-run crawling and indexing with near-default configurations) but I'm now facing a great lack of information about features useful in production scenarios. Now I need to dig deeper into how data are managed, into the workflow and all the features that are needed in real world and i find that the documentation is little, rather confused, incomplete, often quite old and spreaded into too many disconnected pieces. Stumbling around on the Net I discovered I'm not alone, actually. I'm experiencing many difficulties in understanding and implementing even quite basic features such as consistent incremental recrawling/reindexing, adding custom fields, data parsing, duplicate detection, automatic removal of old indexed documents based on insertion date and so on. I mean, I would like a more organic set of use cases suited for real-world scenarios (as starting points) and some in-depth explanation of exactly how the data flow from the crawler into the complex structure of Solr and how it is handled by the different components of the stack. (Nutch documentation is probably even worse, but this is not the right place to complain about that). S -- Anyone proposing to run Windows on servers should be prepared to explain what they know about servers that Google, Yahoo, and Amazon don't. Paul Graham A mathematician is a device for turning coffee into theorems. Paul Erdos (who obviously never met a sysadmin) - Messaggio originale - Da: Grant Ingersoll gsing...@apache.org A: solr-user@lucene.apache.org Inviato: Mer 24 febbraio 2010, 18:54:32 Oggetto: Re: If you could have one feature in Solr... On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote: Decent documentation. What parts do you feel are lacking? Or is it just across the board? Wikis are both good and bad for documentation, IMO. -Grant
Re: If you could have one feature in Solr...
Gora, have you tried the Hindi Analyzer in lucene? if you add it to lucene, the results exceed at least everything from FIRE 2008. So I don't really understand where you are getting this information! Actually, the state of the art for NLP in Indian languages is quite poor, at least in the open-source world. -- Robert Muir rcm...@gmail.com
Re: If you could have one feature in Solr...
On Thu, 25 Feb 2010 07:37:33 -0500 Robert Muir rcm...@gmail.com wrote: Gora, have you tried the Hindi Analyzer in lucene? if you add it to lucene, the results exceed at least everything from FIRE 2008. [...] Oh! No, sorry, I haven't. So far, I have only looked at search through Solr, and I guess I definitely need to look at something like this if it is available in Lucene. So I don't really understand where you are getting this information! Ah, could be my ignorance in this case. Regards, Gora
Re: If you could have one feature in Solr...
On Thu, 25 Feb 2010 07:54:06 -0500 Robert Muir rcm...@gmail.com wrote: Gora, I wonder perhaps if there is a documentation issue. e.g. Thai, Arabic, Chinese were mentioned here previously, these are all supported, too. Let me know if you have any ideas! Sorry, are you saying that these are available in Solr 1.4? If so, I have definitely fallen down on my reading. If they are available in Lucene, but not in Solr, that is still helpful, but it will take me a little while before I can devote enough time to set up access to Lucene directly. In any case, getting Indian languages (and, also down the road other languages) working is an area that I am definitely interested in. Regards, Gora
Re: If you could have one feature in Solr...
Erik Hatcher wrote: Ron - I think SOLR-792 meets the need you describe. What do you think? It's tree faceting, allowing you to facet down 2 levels deep arbitrarily on any two fields. Ideally we'd enhance it to be of arbitrary depth too. Nice! It certainly handles my main use case. There are still a couple cases that would benefit from a more flexible function returning data along with the facets. In this app, each document represents a crime report describing, for example, an auto theft. Those documents have fields such as the make model of a car stolen. In some cases the users would like to see numbers showing the number of incidents involving cars of those types (which I think is what Solr returns easily).Sometimes instead of the number of documents, they'd rather see the number of cars involved - for example, if a single theft from a dealership involved multiple cars. And other times, they'd rather see the value of the cars returned. In SQL I can do a select sum(value) from incidents join vehicles..., and haven't (yet) found similar for facets in solr. Then again, maybe I should be using the database for that part On Feb 24, 2010, at 6:40 PM, Ron Mayer wrote: Make FORD GM Honda TOYOTA [] MONDAY 17 23 4 2 TUESDAY11 9174 5 WEDNESDAY 3 69 1
Re: If you could have one feature in Solr...
On Thu, 25 Feb 2010 13:06:03 -0500 Robert Muir rcm...@gmail.com wrote: Yeah, Thai and Arabic have the stuff in Solr 1.4 For Chinese, if you want to do CJK bigram indexing, this is there too. If you want to do word-based smart indexing, you need to add an additional jar file to your classpath. OK, but unfortunately, I have little knowledge of these languages, so that I would not be able to evaluate to what extent that they are working. we can add a wiki page with examples of how to use these maybe to make it easier? we could also add notes to new ones in lucene (hindi, czech, bulgarian, etc), as it might be easier to copy some code around and get them working with solr 1.4 than to write your own! That sounds great. I am all in favour of not trying to reinvent the wheel, and probably badly at that. separately, would you be interesting in helping with Bengali and Marathi? The Indian languages that I am personally conversant with are Hindi, and my native tongue, Oriya. Bengali is quite close to Oriya linguistically, though with a different script. Marathi shares a script with Hindi, but words in the language are quite different. I can try to enlist other open-source folk in India: We have been part of a moderately successful localisation effort in India (http://indlinux.org). So, yes, I would be interested, but probably I have a fair amount of learning to do about what is needed in the context of a search engine. Regards, Gora
Re: If you could have one feature in Solr...
I would like to be able to do a delta import on arbitrary data, not a last modified date. Specifically, our database has an auto_increment field called DID, or document identifier. For changes to existing data. this field is updated anytime a row is changed in any way, effectively turning it into a new document. On the indexing side, we delete the old document and insert the new one. We are currently using a pricy commercial indexing product (which we know is based on Lucene) and are in the process of developing a replacement with distributed SOLR. The dividing line between indexed and new data is the highest DID in the existing data set, which we track and only update when new data is successfully indexed. If there's a better way to do this already (multiple cores and index merging?), I'm all ears. We are not very far along, so we have a couple of weeks left to define our approach. Thanks, Shawn On 2/24/2010 6:42 AM, Grant Ingersoll wrote: What would it be?
Re: If you could have one feature in Solr...
Yeah, Thai and Arabic have the stuff in Solr 1.4 For Chinese, if you want to do CJK bigram indexing, this is there too. If you want to do word-based smart indexing, you need to add an additional jar file to your classpath. we can add a wiki page with examples of how to use these maybe to make it easier? we could also add notes to new ones in lucene (hindi, czech, bulgarian, etc), as it might be easier to copy some code around and get them working with solr 1.4 than to write your own! separately, would you be interesting in helping with Bengali and Marathi? On Thu, Feb 25, 2010 at 10:48 AM, Gora Mohanty g...@srijan.in wrote: On Thu, 25 Feb 2010 07:54:06 -0500 Robert Muir rcm...@gmail.com wrote: Gora, I wonder perhaps if there is a documentation issue. e.g. Thai, Arabic, Chinese were mentioned here previously, these are all supported, too. Let me know if you have any ideas! Sorry, are you saying that these are available in Solr 1.4? If so, I have definitely fallen down on my reading. If they are available in Lucene, but not in Solr, that is still helpful, but it will take me a little while before I can devote enough time to set up access to Lucene directly. In any case, getting Indian languages (and, also down the road other languages) working is an area that I am definitely interested in. Regards, Gora -- Robert Muir rcm...@gmail.com
Re: If you could have one feature in Solr...
Ron - I think SOLR-792 meets the need you describe. What do you think? It's tree faceting, allowing you to facet down 2 levels deep arbitrarily on any two fields. Ideally we'd enhance it to be of arbitrary depth too. Erik On Feb 24, 2010, at 6:40 PM, Ron Mayer wrote: Another use for this is that I'd like to make a quicker way of drilling down on my documents than going one facet at a time by showing the user a 2-dimensional table that combines 2 facets. For example, showing a table like this on the page: Make FORD GM Honda TOYOTA [] MONDAY 17 23 4 2 TUESDAY11 9174 5 WEDNESDAY 3 69 1 ... and when the user clicks the 174 it automatically adds the Vehicle Make = Toyota and Day of week = Tuesday facets to the query. (I'm a solr newbie, so my apologies if this already exists, or if it's just a bad idea, or if I should just be using another tool for that (possibly in conjunction with solr), but)
Re: If you could have one feature in Solr...
1. Spatial search 2. Ease of managing a sharded index, multi-server Solr instance. I am aware these are in-progress, slated for Solr 1.5. I may find myself getting involved on these shortly because I'm working on a very large scale search project requiring both. ~ David On Feb 24, 2010, at 8:42 AM, Grant Ingersoll wrote: What would it be?
Re: If you could have one feature in Solr...
Error messages that make sense. I have to read the source far too often when a simple change to errror-handling would make some feature easy to use. If I want to read Java I'll use Lucene! Passive-aggressive error handling is a related problem: when I do something nonsensical I too often get 0 results found instead of what does that mean?. On Thu, Feb 25, 2010 at 12:52 PM, Smiley, David W. dsmi...@mitre.org wrote: 1. Spatial search 2. Ease of managing a sharded index, multi-server Solr instance. I am aware these are in-progress, slated for Solr 1.5. I may find myself getting involved on these shortly because I'm working on a very large scale search project requiring both. ~ David On Feb 24, 2010, at 8:42 AM, Grant Ingersoll wrote: What would it be? -- Lance Norskog goks...@gmail.com
If you could have one feature in Solr...
What would it be?
Re: If you could have one feature in Solr...
Synchronisation between the slaves to switch the new index at the same time after replication. Grant Ingersoll a écrit : What would it be?
Re: If you could have one feature in Solr...
On Wed, Feb 24, 2010 at 8:42 AM, Grant Ingersoll gsing...@apache.orgwrote: What would it be? Near real-time search faceting. -- Stephen Duncan Jr www.stephenduncanjr.com
Re: If you could have one feature in Solr...
- performing multiple queries at once, perhaps abusing HTTP POST. On some application there is a page that executes five different queries. The HTTP overhead is not that much of a problem but it would be a nice to have. - retrieving documents per facet, not unlike the results from the MoreLikeThis components, see: http://mail-archives.apache.org/mod_mbox//lucene-solr- user/200905.mbox/%3c200905151342.56927.jeffrey.gel...@buyways.nl%3e for a use- case we once had before, suggested by a colleage. - stemmers for many more different languages And of course (hopefully to be including in Solr 1.5: - field collapsing - Solr Spatial On Wednesday 24 February 2010 14:42:18 Grant Ingersoll wrote: What would it be? Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: If you could have one feature in Solr...
On Wed, Feb 24, 2010 at 9:22 AM, Markus Jelsma mar...@buyways.nl wrote: - stemmers for many more different languages I don't want to hijack this thread, but i would like to know which languages you are interested in! -- Robert Muir rcm...@gmail.com
Re: If you could have one feature in Solr...
A mature document processing pipeline, perhaps integration of www.openpipeline.org which is Apache2.0 licensed
Re: If you could have one feature in Solr...
Well, i don't have a specific request in mind. However, i can image a growing internet market for thai, chinese and arabic speaking people and the native languages on the african continent. Providing them with stemmers to handle plurals etc. will allow for a better search experience. Also, other components might need overhaul, see SOLR-1078 for an example of a language specific issue with a filter. Perhaps i should rephrase stemmers for many more different languages to support (ie. stemmers, tokenizers etc) for many more different languages. On Wednesday 24 February 2010 15:25:46 Robert Muir wrote: On Wed, Feb 24, 2010 at 9:22 AM, Markus Jelsma mar...@buyways.nl wrote: - stemmers for many more different languages I don't want to hijack this thread, but i would like to know which languages you are interested in! Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: If you could have one feature in Solr...
Limit the number of results when the results are sorted. In other words, if the results are sorted by name and there are 10,000 results, then there will be items of low relevancy mixed in with the results and it is hard for the user to find the relevant ones. If I could say, give me no more than 200 results, sorted by name, then I want the most relevant 200 results. (It would be ok to be approximately 200. If there are documents that are the same relevancy, then a few more than 200 would be acceptable.) On Wed, Feb 24, 2010 at 8:42 AM, Grant Ingersoll gsing...@apache.org wrote: What would it be?
Re: If you could have one feature in Solr...
One additional feature within MoreLikeThis might be.. MoreLikeTHESE. This would not be the same as querying multiple documents and fetching MoreLikeThis documents for each individual result. This would then actually only return MoreLikeThis documents based on multiple documents. Another collegue could then easier build a system that allows the end-user to select multiple documents and get suggestions on the combined input. Of course, we could use the existing component's results and cross-reference but that's not as simple as getting it from Solr in one go. On Wednesday 24 February 2010 14:42:18 Grant Ingersoll wrote: What would it be? Markus Jelsma - Technisch Architect - Buyways BV http://www.linkedin.com/in/markus17 050-8536620 / 06-50258350
Re: If you could have one feature in Solr...
Decent documentation. S -- Anyone proposing to run Windows on servers should be prepared to explain what they know about servers that Google, Yahoo, and Amazon don't. Paul Graham A mathematician is a device for turning coffee into theorems. Paul Erdos (who obviously never met a sysadmin) - Messaggio originale - Da: Grant Ingersoll gsing...@apache.org A: solr-user@lucene.apache.org Inviato: Mer 24 febbraio 2010, 14:42:18 Oggetto: If you could have one feature in Solr... What would it be?
Re: If you could have one feature in Solr...
On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote: Decent documentation. What parts do you feel are lacking? Or is it just across the board? Wikis are both good and bad for documentation, IMO. -Grant
Re: If you could have one feature in Solr...
I actually found the documentation pretty great especially since (my experience, anyway) most Java projects seem to default to generic JavaDoc derived documentation (and that makes me cry). That said, more cookbook-style recipes or stories would be helpful for some of the more esoteric parts of Solr. Also: real-time indexing and geo. Cheers, On 2/24/10 9:54 AM, Grant Ingersoll wrote: On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote: Decent documentation. What parts do you feel are lacking? Or is it just across the board? Wikis are both good and bad for documentation, IMO. -Grant
Re: If you could have one feature in Solr...
Grant, One feature that I would like to see is the ability to do a Bitwise search I have had to work around this with a Query Parser plugin that uses a org.apache.lucene.search.Filter I think having this feature would be very nice and I prefer it to searching with multiple OR type queries especially when the bit are known ahead of time. I can submit the code as a patch once I get the approval to do so. On Wed, Feb 24, 2010 at 2:20 PM, straup str...@gmail.com wrote: I actually found the documentation pretty great especially since (my experience, anyway) most Java projects seem to default to generic JavaDoc derived documentation (and that makes me cry). That said, more cookbook-style recipes or stories would be helpful for some of the more esoteric parts of Solr. Also: real-time indexing and geo. Cheers, On 2/24/10 9:54 AM, Grant Ingersoll wrote: On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote: Decent documentation. What parts do you feel are lacking? Or is it just across the board? Wikis are both good and bad for documentation, IMO. -Grant -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/
Re: If you could have one feature in Solr...
Chipping in The wiki based nature of solr's documentation is rather different compared to most payware and some open source products. However once you get used to its style I found it quite adequate. I also dawned on me that portions of Solr are advancing very quickly and that the wiki style of documentation is the simplest method of allowing the documentation to keep pace with the code. In my experience payware products often have proper documentation that trails the state of the code by years in some cases. New functionality is only described in the release notes. I also ran into cases where vendors refused to fix simple features as it would require changing the documentation. I quite like the wiki! However some of the pages are a mix of cookbook, examples and documentation and are too big. , On 24/02/2010 19:20, straup wrote: I actually found the documentation pretty great especially since (my experience, anyway) most Java projects seem to default to generic JavaDoc derived documentation (and that makes me cry). That said, more cookbook-style recipes or stories would be helpful for some of the more esoteric parts of Solr. Also: real-time indexing and geo. Cheers, On 2/24/10 9:54 AM, Grant Ingersoll wrote: On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote: Decent documentation. What parts do you feel are lacking? Or is it just across the board? Wikis are both good and bad for documentation, IMO. -Grant -- == Fergus McMenemieEmail:fer...@twig.me.uk Techmore Limited, Phone:(UK) 07721 376021 Old Stables, Far End, Home: (UK) 01522 810839 Boothby Graffoe, Lincoln, LN5 0LG, England Unix/Mac/Intranets/WWW Analyst Programmer ==
Re: If you could have one feature in Solr...
Grant Ingersoll wrote: What would it be? * Run a MapReduce-likejob on all docs matching the results of a search? I'm currently working on an app where I hope to be able to do a query (hopefully using solr) and generate a map where every state (or county or zip-code or school district or police beat) is colored based on some attribute derived from some fields in the documents. Interestingly it seems pleasantly easy to if I'm just basing it on the count of documents - since I can set up states, etc as facets. But it'd be neat if instead of just getting the count from the facets, if I could run more arbitrary math on the documents without having to suck them into the application. Another use for this is that I'd like to make a quicker way of drilling down on my documents than going one facet at a time by showing the user a 2-dimensional table that combines 2 facets. For example, showing a table like this on the page: Make FORD GM Honda TOYOTA [] MONDAY 17 23 4 2 TUESDAY11 9174 5 WEDNESDAY 3 69 1 ... and when the user clicks the 174 it automatically adds the Vehicle Make = Toyota and Day of week = Tuesday facets to the query. (I'm a solr newbie, so my apologies if this already exists, or if it's just a bad idea, or if I should just be using another tool for that (possibly in conjunction with solr), but)
Re: If you could have one feature in Solr...
The Solr documentation feels more like a reference guide detailing all the API's. It's great for more advanced users, but as a beginner I often feel lost reading the doc. It would be really helpful to have a more step-by-step, tutorial approach in the doc showing how to do things with tips tricks. For me the gold standard of documentation is Django, the doc there is ridiculously good. --- On Wed, 2/24/10, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: Re: If you could have one feature in Solr... To: solr-user@lucene.apache.org Date: Wednesday, February 24, 2010, 12:54 PM On Feb 24, 2010, at 11:08 AM, Stefano Cherchi wrote: Decent documentation. What parts do you feel are lacking? Or is it just across the board? Wikis are both good and bad for documentation, IMO. -Grant
Re: If you could have one feature in Solr...
1) Built-in hierarchical faceting Right now there're 2 patches, SOLR-64 and SOLR-792. SOLR-64 seems to be slated for 1.5 release but according to the wiki seems to have poor performance. SOLR-792 has better performance according to the wiki but it's unclear if it'll ever be part of the Solr distribution. Not sure what are the pros cons of the 2 patches. Hierarchical faceting is very common and it'd be nice to have this as a standard feature 2) Near real-time update search 3) Partial update - something like LUCENE-1879 4) Automatic language detection stemmer selection I have document collections that are a mix of different languages. It'd be great to be able to automatically detect what language a specific document is in and select an appropriate stemmer for that language 5) Cloudy Solr - automatic sharding, replication, fail-over. Sorta like a Cassandra version of Solr... --- On Wed, 2/24/10, Grant Ingersoll gsing...@apache.org wrote: From: Grant Ingersoll gsing...@apache.org Subject: If you could have one feature in Solr... To: solr-user@lucene.apache.org Date: Wednesday, February 24, 2010, 8:42 AM What would it be?
Re: If you could have one feature in Solr...
On Wed, 24 Feb 2010 15:49:15 +0100 Markus Jelsma mar...@buyways.nl wrote: Well, i don't have a specific request in mind. However, i can image a growing internet market for thai, chinese and arabic speaking people and the native languages on the african continent. Providing them with stemmers to handle plurals etc. will allow for a better search experience. The same is true for Indian languages. Actually, the state of the art for NLP in Indian languages is quite poor, at least in the open-source world. So, even before stemmers, one should address things like phonetic analysers, spellcheckers, stop words, etc. This latter part is possible now, and we have an alpha-level implementation for Hindi that we would be glad to contribute once it is stable. Which reminds me: One thing that I would like to see immediately in Solr is to have the Metaphone/DoubleMetaphone phonetic analysers use a configuration file for phonetic rules, rather than having these hard coded. The aspell library does this, for example. Please see http://aspell.net/man-html/Phonetic-Code.html#Phonetic-Code for an explanation of its rules. Regards, Gora
Re: If you could have one feature in Solr...
Real time search would be awesome. -Matt