Re: No Analyzer, tokenizer or stemmer works at Solr
: Imagine there is a query like harry potter dvd-collection cheap or cheap : Harry Potter dvd-collection. : How can I customize, that, if there is something said about the category : cheap, Solr uses a facetting query on cat:cheap? To do so, I have to : alter the original query - how can I do that? TMTOWTDI One solution would be a QParserPlugin ... it's utilized by the QueryComponent to decide how to parse the query string. Or you could write your own SearchComponent to use in place of the QueryComponent, then you could not only modify the way the string is parsed, but you could also modify the DocSet/DocList anyway you want. -Hoss
Re: No Analyzer, tokenizer or stemmer works at Solr
Hello Hossman, sorry for my late response. For this specific case, you are right. It makes more sense to do such work on the fly. However, I am only testing at the moment, what one can do with Solr and what not. Is the UpdateProcessor something that comes froms Lucene itself or from Solr? Thanks! hossman wrote: : Is there a way to prepare a document the described way with Lucene/Solr, : before I analyze it? : My use case is to categorize several documents in an automatic way, which : includes that I have to create data from the given input doing some : information retrieval. As Ryan mentioned earlier: this is what the UpdateRequestProcessor API is for -- it allows you to modify Documents (regardless of how they were added: csv, xml, dih) prior to Solr processing them... http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-to27026739.html Personally, i think you may be looking at your problem from the wrong dirrection... : Imagine you would analyze, index and store them like you normally do and : afterwards you want to set, whether the document belongs to the expensive : item-group or not. : If the price for the item is higher than 500$, it belongs to the : expensive : ones, otherwise not. ...for a situation like that, i wouldn't attempt to classify the docs as expensive or cheap when adding them. instead i would use numeric ranges for faceting and filtering to show me how many docs where expensive or cheap at query time -- that way when the ecomony tanks i can redifine my definition of expensive on the fly w/o needing to reindex a million documents. -Hoss -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27109760.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 11, 2010, at 7:33 AM, MitchK wrote: Is the UpdateProcessor something that comes froms Lucene itself or from Solr? It's at the Solr level - http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessor.html Erik
Re: No Analyzer, tokenizer or stemmer works at Solr
Is there any schemata that explains which class is responsible for which level of processing my data to the index? My example was: I have categorized, whether something is cheap or expensive. Let's say I didn't do that on the fly, but with the help of the UpdateRequestProcessor. Imagine there is a query like harry potter dvd-collection cheap or cheap Harry Potter dvd-collection. How can I customize, that, if there is something said about the category cheap, Solr uses a facetting query on cat:cheap? To do so, I have to alter the original query - how can I do that? Erik Hatcher-4 wrote: On Jan 11, 2010, at 7:33 AM, MitchK wrote: Is the UpdateProcessor something that comes froms Lucene itself or from Solr? It's at the Solr level - http://lucene.apache.org/solr/api/org/apache/solr/update/processor/UpdateRequestProcessor.html Erik -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27111504.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
Okay, you're right. It really would be cleaner, if I do such stuff in the code which populates the document to Solr. Is there a way to prepare a document the described way with Lucene/Solr, before I analyze it? My use case is to categorize several documents in an automatic way, which includes that I have to create data from the given input doing some information retrieval. The problem is I am really new to Solr and Lucene - as you can see - and I do not know, whether there are some classes that fit my needs. Any idea? Erick Erickson wrote: Well, I'd approach either of these use cases by simply performing my computations on the input and storing the result in another (non-indexed unless I wanted to search it) field. This wouldn't happen in the Analyzer, but in the code that populated the document fields. Which is a much cleaner solution IMO than creating some sort of index this but store that capability. The purpose of analysis is to produce *searchable* tokens after all. But we're getting into angels dancing on pins here. Do you actually have a use case you're trying to implement or is this mostly theoretical? Erick On Thu, Jan 7, 2010 at 2:08 PM, MitchK mitc...@web.de wrote: The difference between stored and indexed is clear now. You are right, if you are responsing only to normal users. Use case: You got a stored field The good, the bad and the ugly. And you got a really fantastic analyzer, which is doing some magic to this movie title. Let's say, the analyzer translates the title into md5 or into another abstract expression. Instead of doing the same magical function on the client's side again and again, he only needs to take the prepared data from your response. Another use case could be: Imagine you have got two categories: cheap and expensive and your document gots a title-, a label-, an owner- and a price-field. Imagine you would analyze, index and store them like you normally do and afterwards you want to set, whether the document belongs to the expensive item-group or not. If the price for the item is higher than 500$, it belongs to the expensive ones, otherwise not. I think, this would be a job for a special analyzer - and this only makes sense, if I also store the analyzed data. I think information retrieval is a really interesting use case. Erick Erickson wrote: What is your use case for responding sometimes with the indexed value? Other than reconstructing a field that hasn't been stored, I can't think of one. I still think you're missing the point. Indexing and storing are orthogonal operations that have (almost) nothing to do with each other, for all that they happen at the same time on the same field. You never search against the stored data in a field. You *always* search against the indexed data. Contrariwise, you never display the indexed form to the user, you *always* show the stored data (unless you come up with a really interesting use case). Step back and consider what happens when you index data, it gets broken up all kinds of ways. Stop words are removed, case may change, etc, etc, etc. It makes no sense to then display this data for a user. Would you really like to have, say a movie title The Good, The Bad, and The Ugly. Remove stopwords, puncuation and lowercase and you index three tokens good, bad, ugly. Even if you reconstruct this field, the user would see good bad ugly. Bad, very bad. Yet I want to display the original title to the user in response to searching on ugly, so I need the original, unanalyzed data. Perhaps it would help to think of it this way. 1 take some data and index it in f1 but do NOT store it in f1. Store it in f2 but do NOT index it in f2. 2 take that same data, index AND store it in f3. 1 is almost entirely equivalent to 2 in terms of index resources. Practically though, 1 is harder to use, because you have to remember to use f1 for searching and f2 for getting the raw data. HTH Erick On Thu, Jan 7, 2010 at 12:11 PM, MitchK mitc...@web.de wrote: Thank you, Ryan. I will have a look on lucene's material and luke. I think I got it. :) Sometimes there will be the need, to response on the one hand the value and on the other hand the indexed version of the value. How can I fullfill such needs? Doing copyfield on indexed-only fields? ryantxu wrote: On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes.
Re: No Analyzer, tokenizer or stemmer works at Solr
Somewhere, you have to create the document XML you send to SOLR. Just add the calculated data to your new field there... HTH Erick On Fri, Jan 8, 2010 at 9:30 AM, MitchK mitc...@web.de wrote: Okay, you're right. It really would be cleaner, if I do such stuff in the code which populates the document to Solr. Is there a way to prepare a document the described way with Lucene/Solr, before I analyze it? My use case is to categorize several documents in an automatic way, which includes that I have to create data from the given input doing some information retrieval. The problem is I am really new to Solr and Lucene - as you can see - and I do not know, whether there are some classes that fit my needs. Any idea? Erick Erickson wrote: Well, I'd approach either of these use cases by simply performing my computations on the input and storing the result in another (non-indexed unless I wanted to search it) field. This wouldn't happen in the Analyzer, but in the code that populated the document fields. Which is a much cleaner solution IMO than creating some sort of index this but store that capability. The purpose of analysis is to produce *searchable* tokens after all. But we're getting into angels dancing on pins here. Do you actually have a use case you're trying to implement or is this mostly theoretical? Erick On Thu, Jan 7, 2010 at 2:08 PM, MitchK mitc...@web.de wrote: The difference between stored and indexed is clear now. You are right, if you are responsing only to normal users. Use case: You got a stored field The good, the bad and the ugly. And you got a really fantastic analyzer, which is doing some magic to this movie title. Let's say, the analyzer translates the title into md5 or into another abstract expression. Instead of doing the same magical function on the client's side again and again, he only needs to take the prepared data from your response. Another use case could be: Imagine you have got two categories: cheap and expensive and your document gots a title-, a label-, an owner- and a price-field. Imagine you would analyze, index and store them like you normally do and afterwards you want to set, whether the document belongs to the expensive item-group or not. If the price for the item is higher than 500$, it belongs to the expensive ones, otherwise not. I think, this would be a job for a special analyzer - and this only makes sense, if I also store the analyzed data. I think information retrieval is a really interesting use case. Erick Erickson wrote: What is your use case for responding sometimes with the indexed value? Other than reconstructing a field that hasn't been stored, I can't think of one. I still think you're missing the point. Indexing and storing are orthogonal operations that have (almost) nothing to do with each other, for all that they happen at the same time on the same field. You never search against the stored data in a field. You *always* search against the indexed data. Contrariwise, you never display the indexed form to the user, you *always* show the stored data (unless you come up with a really interesting use case). Step back and consider what happens when you index data, it gets broken up all kinds of ways. Stop words are removed, case may change, etc, etc, etc. It makes no sense to then display this data for a user. Would you really like to have, say a movie title The Good, The Bad, and The Ugly. Remove stopwords, puncuation and lowercase and you index three tokens good, bad, ugly. Even if you reconstruct this field, the user would see good bad ugly. Bad, very bad. Yet I want to display the original title to the user in response to searching on ugly, so I need the original, unanalyzed data. Perhaps it would help to think of it this way. 1 take some data and index it in f1 but do NOT store it in f1. Store it in f2 but do NOT index it in f2. 2 take that same data, index AND store it in f3. 1 is almost entirely equivalent to 2 in terms of index resources. Practically though, 1 is harder to use, because you have to remember to use f1 for searching and f2 for getting the raw data. HTH Erick On Thu, Jan 7, 2010 at 12:11 PM, MitchK mitc...@web.de wrote: Thank you, Ryan. I will have a look on lucene's material and luke. I think I got it. :) Sometimes there will be the need, to response on the one hand the value and on the other hand the indexed version of the value. How can I fullfill such needs? Doing copyfield on indexed-only fields? ryantxu wrote: On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes
Re: No Analyzer, tokenizer or stemmer works at Solr
: Is there a way to prepare a document the described way with Lucene/Solr, : before I analyze it? : My use case is to categorize several documents in an automatic way, which : includes that I have to create data from the given input doing some : information retrieval. As Ryan mentioned earlier: this is what the UpdateRequestProcessor API is for -- it allows you to modify Documents (regardless of how they were added: csv, xml, dih) prior to Solr processing them... http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-to27026739.html Personally, i think you may be looking at your problem from the wrong dirrection... : Imagine you would analyze, index and store them like you normally do and : afterwards you want to set, whether the document belongs to the expensive : item-group or not. : If the price for the item is higher than 500$, it belongs to the : expensive : ones, otherwise not. ...for a situation like that, i wouldn't attempt to classify the docs as expensive or cheap when adding them. instead i would use numeric ranges for faceting and filtering to show me how many docs where expensive or cheap at query time -- that way when the ecomony tanks i can redifine my definition of expensive on the fly w/o needing to reindex a million documents. -Hoss
Re: No Analyzer, tokenizer or stemmer works at Solr
Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? If yes, I have got another problem: I don't want to waste any diskspace. Does the copyfield-order stores the same data two times? I mean: I have got originalField and copiedField. originalField gets indexed with text_analyzer and copiedField with a stemmer. Does this mean, I am storing the original data two times public and once analyzed per analyzer? Or does Solr stores the original input only once and makes a reference to the public data of the originalField? Thank you Mitch Erik Hatcher-4 wrote: Mitch, Again, I think you're misunderstanding what analysis does. You must be expecting we think, though you've not provided exact duplication steps to be sure, that the value you get back from Solr is the analyzer processed output. It's not, it's exactly what you provide. Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. There's some thinking going on implementing it such that analyzed output is stored. You can, however, use the analysis request handler componentry to get analyzed stuff back as you see it in analysis.jsp on a per-document or per-field text basis - if you're looking to leverage the analyzer output in that fashion from a client. Erik On Jan 7, 2010, at 1:21 AM, MitchK wrote: Hello Erick, thank you for answering. I can do whatever I want - Solr does nothing. For example: If I use the textgen-fieldtype which is predefined, nothing happens to the text. Even the stopFilter is not working - no stopword from stopword.txt was replaced. I think, that this only affects the index, because, if I query for for he returns nothing, which is quietly correct, due to the work of the stopFilter. Everything works fine on analysis.jsp, but not in reality. If you have got any testcase-data you want me to add, please, tell me and I will show you the saved data afterwards. Thank you. Mitch Erick Erickson wrote: Well, I have noticed that Solr isn't using ANY analyzer How do you know this? Because it's highly unlikely that SOLR is completely broken on that level. Erick On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055510.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27062080.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes. indexed fields and stored fields are different. Solr results show stored fields in the results (however facets are based on indexed fields) Take a look at Lucene in Action for a better description of what is happening. The best tool to get your head around what is happening is probably luke (http://www.getopt.org/luke/) If yes, I have got another problem: I don't want to waste any diskspace. You have control over what is stored and what is indexed -- how that is configured is up to you. ryan
Re: No Analyzer, tokenizer or stemmer works at Solr
Thank you, Ryan. I will have a look on lucene's material and luke. I think I got it. :) Sometimes there will be the need, to response on the one hand the value and on the other hand the indexed version of the value. How can I fullfill such needs? Doing copyfield on indexed-only fields? ryantxu wrote: On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes. indexed fields and stored fields are different. Solr results show stored fields in the results (however facets are based on indexed fields) Take a look at Lucene in Action for a better description of what is happening. The best tool to get your head around what is happening is probably luke (http://www.getopt.org/luke/) If yes, I have got another problem: I don't want to waste any diskspace. You have control over what is stored and what is indexed -- how that is configured is up to you. ryan -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 7, 2010, at 12:11 PM, MitchK wrote: Thank you, Ryan. I will have a look on lucene's material and luke. I think I got it. :) Sometimes there will be the need, to response on the one hand the value and on the other hand the indexed version of the value. How can I fullfill such needs? Doing copyfield on indexed-only fields? see erik's response on 'analysis request handler' ryantxu wrote: On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes. indexed fields and stored fields are different. Solr results show stored fields in the results (however facets are based on indexed fields) Take a look at Lucene in Action for a better description of what is happening. The best tool to get your head around what is happening is probably luke (http://www.getopt.org/luke/) If yes, I have got another problem: I don't want to waste any diskspace. You have control over what is stored and what is indexed -- how that is configured is up to you. ryan -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
What is your use case for responding sometimes with the indexed value? Other than reconstructing a field that hasn't been stored, I can't think of one. I still think you're missing the point. Indexing and storing are orthogonal operations that have (almost) nothing to do with each other, for all that they happen at the same time on the same field. You never search against the stored data in a field. You *always* search against the indexed data. Contrariwise, you never display the indexed form to the user, you *always* show the stored data (unless you come up with a really interesting use case). Step back and consider what happens when you index data, it gets broken up all kinds of ways. Stop words are removed, case may change, etc, etc, etc. It makes no sense to then display this data for a user. Would you really like to have, say a movie title The Good, The Bad, and The Ugly. Remove stopwords, puncuation and lowercase and you index three tokens good, bad, ugly. Even if you reconstruct this field, the user would see good bad ugly. Bad, very bad. Yet I want to display the original title to the user in response to searching on ugly, so I need the original, unanalyzed data. Perhaps it would help to think of it this way. 1 take some data and index it in f1 but do NOT store it in f1. Store it in f2 but do NOT index it in f2. 2 take that same data, index AND store it in f3. 1 is almost entirely equivalent to 2 in terms of index resources. Practically though, 1 is harder to use, because you have to remember to use f1 for searching and f2 for getting the raw data. HTH Erick On Thu, Jan 7, 2010 at 12:11 PM, MitchK mitc...@web.de wrote: Thank you, Ryan. I will have a look on lucene's material and luke. I think I got it. :) Sometimes there will be the need, to response on the one hand the value and on the other hand the indexed version of the value. How can I fullfill such needs? Doing copyfield on indexed-only fields? ryantxu wrote: On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes. indexed fields and stored fields are different. Solr results show stored fields in the results (however facets are based on indexed fields) Take a look at Lucene in Action for a better description of what is happening. The best tool to get your head around what is happening is probably luke (http://www.getopt.org/luke/) If yes, I have got another problem: I don't want to waste any diskspace. You have control over what is stored and what is indexed -- how that is configured is up to you. ryan -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
The difference between stored and indexed is clear now. You are right, if you are responsing only to normal users. Use case: You got a stored field The good, the bad and the ugly. And you got a really fantastic analyzer, which is doing some magic to this movie title. Let's say, the analyzer translates the title into md5 or into another abstract expression. Instead of doing the same magical function on the client's side again and again, he only needs to take the prepared data from your response. Another use case could be: Imagine you have got two categories: cheap and expensive and your document gots a title-, a label-, an owner- and a price-field. Imagine you would analyze, index and store them like you normally do and afterwards you want to set, whether the document belongs to the expensive item-group or not. If the price for the item is higher than 500$, it belongs to the expensive ones, otherwise not. I think, this would be a job for a special analyzer - and this only makes sense, if I also store the analyzed data. I think information retrieval is a really interesting use case. Erick Erickson wrote: What is your use case for responding sometimes with the indexed value? Other than reconstructing a field that hasn't been stored, I can't think of one. I still think you're missing the point. Indexing and storing are orthogonal operations that have (almost) nothing to do with each other, for all that they happen at the same time on the same field. You never search against the stored data in a field. You *always* search against the indexed data. Contrariwise, you never display the indexed form to the user, you *always* show the stored data (unless you come up with a really interesting use case). Step back and consider what happens when you index data, it gets broken up all kinds of ways. Stop words are removed, case may change, etc, etc, etc. It makes no sense to then display this data for a user. Would you really like to have, say a movie title The Good, The Bad, and The Ugly. Remove stopwords, puncuation and lowercase and you index three tokens good, bad, ugly. Even if you reconstruct this field, the user would see good bad ugly. Bad, very bad. Yet I want to display the original title to the user in response to searching on ugly, so I need the original, unanalyzed data. Perhaps it would help to think of it this way. 1 take some data and index it in f1 but do NOT store it in f1. Store it in f2 but do NOT index it in f2. 2 take that same data, index AND store it in f3. 1 is almost entirely equivalent to 2 in terms of index resources. Practically though, 1 is harder to use, because you have to remember to use f1 for searching and f2 for getting the raw data. HTH Erick On Thu, Jan 7, 2010 at 12:11 PM, MitchK mitc...@web.de wrote: Thank you, Ryan. I will have a look on lucene's material and luke. I think I got it. :) Sometimes there will be the need, to response on the one hand the value and on the other hand the indexed version of the value. How can I fullfill such needs? Doing copyfield on indexed-only fields? ryantxu wrote: On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes. indexed fields and stored fields are different. Solr results show stored fields in the results (however facets are based on indexed fields) Take a look at Lucene in Action for a better description of what is happening. The best tool to get your head around what is happening is probably luke (http://www.getopt.org/luke/) If yes, I have got another problem: I don't want to waste any diskspace. You have control over what is stored and what is indexed -- how that is configured is up to you. ryan -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27063452.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27065305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
Well, I'd approach either of these use cases by simply performing my computations on the input and storing the result in another (non-indexed unless I wanted to search it) field. This wouldn't happen in the Analyzer, but in the code that populated the document fields. Which is a much cleaner solution IMO than creating some sort of index this but store that capability. The purpose of analysis is to produce *searchable* tokens after all. But we're getting into angels dancing on pins here. Do you actually have a use case you're trying to implement or is this mostly theoretical? Erick On Thu, Jan 7, 2010 at 2:08 PM, MitchK mitc...@web.de wrote: The difference between stored and indexed is clear now. You are right, if you are responsing only to normal users. Use case: You got a stored field The good, the bad and the ugly. And you got a really fantastic analyzer, which is doing some magic to this movie title. Let's say, the analyzer translates the title into md5 or into another abstract expression. Instead of doing the same magical function on the client's side again and again, he only needs to take the prepared data from your response. Another use case could be: Imagine you have got two categories: cheap and expensive and your document gots a title-, a label-, an owner- and a price-field. Imagine you would analyze, index and store them like you normally do and afterwards you want to set, whether the document belongs to the expensive item-group or not. If the price for the item is higher than 500$, it belongs to the expensive ones, otherwise not. I think, this would be a job for a special analyzer - and this only makes sense, if I also store the analyzed data. I think information retrieval is a really interesting use case. Erick Erickson wrote: What is your use case for responding sometimes with the indexed value? Other than reconstructing a field that hasn't been stored, I can't think of one. I still think you're missing the point. Indexing and storing are orthogonal operations that have (almost) nothing to do with each other, for all that they happen at the same time on the same field. You never search against the stored data in a field. You *always* search against the indexed data. Contrariwise, you never display the indexed form to the user, you *always* show the stored data (unless you come up with a really interesting use case). Step back and consider what happens when you index data, it gets broken up all kinds of ways. Stop words are removed, case may change, etc, etc, etc. It makes no sense to then display this data for a user. Would you really like to have, say a movie title The Good, The Bad, and The Ugly. Remove stopwords, puncuation and lowercase and you index three tokens good, bad, ugly. Even if you reconstruct this field, the user would see good bad ugly. Bad, very bad. Yet I want to display the original title to the user in response to searching on ugly, so I need the original, unanalyzed data. Perhaps it would help to think of it this way. 1 take some data and index it in f1 but do NOT store it in f1. Store it in f2 but do NOT index it in f2. 2 take that same data, index AND store it in f3. 1 is almost entirely equivalent to 2 in terms of index resources. Practically though, 1 is harder to use, because you have to remember to use f1 for searching and f2 for getting the raw data. HTH Erick On Thu, Jan 7, 2010 at 12:11 PM, MitchK mitc...@web.de wrote: Thank you, Ryan. I will have a look on lucene's material and luke. I think I got it. :) Sometimes there will be the need, to response on the one hand the value and on the other hand the indexed version of the value. How can I fullfill such needs? Doing copyfield on indexed-only fields? ryantxu wrote: On Jan 7, 2010, at 10:50 AM, MitchK wrote: Eric, you mean, everything is okay, but I do not see it? Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. if I use an analyzer, Solr stores it's output two ways? One public output, which is similar to the original input and one hidden or internal output, which is based on the analyzer's work? Did I understand that right? yes. indexed fields and stored fields are different. Solr results show stored fields in the results (however facets are based on indexed fields) Take a look at Lucene in Action for a better description of what is happening. The best tool to get your head around what is happening is probably luke (http://www.getopt.org/luke/) If yes, I have got another problem: I don't want to waste any diskspace. You have control over what is stored and what is indexed -- how that is configured is up to you. ryan -- View this
No Analyzer, tokenizer or stemmer works at Solr
I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
Well, I have noticed that Solr isn't using ANY analyzer How do you know this? Because it's highly unlikely that SOLR is completely broken on that level. Erick On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
On Jan 6, 2010, at 3:48 PM, MitchK wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. The stored value is always the original input. The *indexed* values are transformed by analysis. If you really need to store the analyzed fields, that may be possible with an UpdateRequestProcessor. also see: https://issues.apache.org/jira/browse/SOLR-314 ryan
Re: No Analyzer, tokenizer or stemmer works at Solr
Hello Erick, thank you for answering. I can do whatever I want - Solr does nothing. For example: If I use the textgen-fieldtype which is predefined, nothing happens to the text. Even the stopFilter is not working - no stopword from stopword.txt was replaced. I think, that this only affects the index, because, if I query for for he returns nothing, which is quietly correct, due to the work of the stopFilter. Everything works fine on analysis.jsp, but not in reality. If you have got any testcase-data you want me to add, please, tell me and I will show you the saved data afterwards. Thank you. Mitch Erick Erickson wrote: Well, I have noticed that Solr isn't using ANY analyzer How do you know this? Because it's highly unlikely that SOLR is completely broken on that level. Erick On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055510.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
Hello Ryan, thank you for answering. In my schema.xml I am defining the field as indexed = true. The problem is: nothing, even the original predefined analyzers don't work anyway. Please, have a look on my response to Erick. Mitch P.S. Oh, I see what you mean. The field is indexed = true. My language was a little bit tricky ;). ryantxu wrote: On Jan 6, 2010, at 3:48 PM, MitchK wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. The stored value is always the original input. The *indexed* values are transformed by analysis. If you really need to store the analyzed fields, that may be possible with an UpdateRequestProcessor. also see: https://issues.apache.org/jira/browse/SOLR-314 ryan -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055512.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: No Analyzer, tokenizer or stemmer works at Solr
Mitch, Again, I think you're misunderstanding what analysis does. You must be expecting we think, though you've not provided exact duplication steps to be sure, that the value you get back from Solr is the analyzer processed output. It's not, it's exactly what you provide. Internally for searching the analysis takes place and writes to the index in an inverted fashion, but the stored stuff is left alone. There's some thinking going on implementing it such that analyzed output is stored. You can, however, use the analysis request handler componentry to get analyzed stuff back as you see it in analysis.jsp on a per-document or per-field text basis - if you're looking to leverage the analyzer output in that fashion from a client. Erik On Jan 7, 2010, at 1:21 AM, MitchK wrote: Hello Erick, thank you for answering. I can do whatever I want - Solr does nothing. For example: If I use the textgen-fieldtype which is predefined, nothing happens to the text. Even the stopFilter is not working - no stopword from stopword.txt was replaced. I think, that this only affects the index, because, if I query for for he returns nothing, which is quietly correct, due to the work of the stopFilter. Everything works fine on analysis.jsp, but not in reality. If you have got any testcase-data you want me to add, please, tell me and I will show you the saved data afterwards. Thank you. Mitch Erick Erickson wrote: Well, I have noticed that Solr isn't using ANY analyzer How do you know this? Because it's highly unlikely that SOLR is completely broken on that level. Erick On Wed, Jan 6, 2010 at 3:48 PM, MitchK mitc...@web.de wrote: I have tested a lot and all the time I thought I set wrong options for my custom analyzer. Well, I have noticed that Solr isn't using ANY analyzer, filter or stemmer. It seems like it only stores the original input. I am using the example-configuration of the current Solr 1.4 release. What's wrong? Thank you! -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27026959.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://old.nabble.com/Custom-Analyzer-Tokenizer-works-but-results-were-not-saved-tp27026739p27055510.html Sent from the Solr - User mailing list archive at Nabble.com.