Re: Lucene Newbie Question
Does numFound different for that two queries or not? 8 Aralık 2013 Pazar tarihinde Ted Goldstein tedgoldst...@gmail.com adlı kullanıcı şöyle yazdı: I am new to Lucene and have begun experimenting. I've loaded both the example books.csv and the various example electronic components documents. I then do a variety of queries. Quering http://su2c-dev.ucsc.edu:8983/solr/select?q=name:A* returns both book entries and electronic component entries. Buthttp:// su2c-dev.ucsc.edu:8983/solr/select?q=name:* only returns book entries. This is non-intutive to me that a broader query should return only one document type. Why is that? Thanks, Ted - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: Lucene Newbie Question
On 12/8/2013 12:03 PM, Ted Goldstein wrote: I am new to Lucene and have begun experimenting. I've loaded both the example books.csv and the various example electronic components documents. I then do a variety of queries. Quering http://su2c-dev.ucsc.edu:8983/solr/select?q=name:A* returns both book entries and electronic component entries. Buthttp://su2c-dev.ucsc.edu:8983/solr/select?q=name:* only returns book entries. This is non-intutive to me that a broader query should return only one document type. Why is that? Thanks, Ted - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org Wild guess, since you really didn't tell us much about your setup: are there more entries on another page in the solr admin query tool? I think this may have been what Furkan was hinting at with his question about numDocs. Also -- this is probably more of a question for the solr-user mailing list since you seem to be using solr to do the querying. -Mike - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: lucene newbie question
On Oct 2, 2006, at 2:08 PM, Los Morales wrote: I'm new to Lucene and IR in general. I'm a bit confused on the concept of fields. From what I've read, a field does not have to be indexed but its value can be stored in an index. Likewise a field can be indexed but its value is not stored in an index. Now how can a field be searchable when its value is not stored in the index and vice-versa? Again, I'm new to the Index/Search paradigm. Thanks in advanced. Consider the index in the back of a book. You could tear that out and still use it to tell what page something is on, but you have no actual content in hand. When a field is tokenized (and therefore implicitly indexed), it is run through the specified Analyzer and the terms emitted are indexed, but the original text may or may not also be stored in the index. Make sense? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene newbie question
SSN actually is a common situation. Assume you have a (relational) database with a table of products with three columns : - SSN, which is also a primary key for that table, - DESCRIPTION, which has free text (i.e. unformatted text) describing the product. - OTHER - additional info. Also assume you want to allow users of your application to search a product by its description. For each product found, you intend to fetch the data on that product from the database and display it to the users. This can be done in the following setup: Create a Lucene index with two fields: - ssn - stored, but not indexed - description - tokenized (hence indexed) but not stored. Now the application would send the user query to Lucene, using the description field. For each document found, the application would fetch its ssn (which is available from the Lucene index since it was stored). Using this ssn, the application would fetch all sorts of data on that product and display it to the user. There are other possible designs of course - you may want to have additional data in the Lucene index, but this hopefully just gives the feeling how different fields with different settings are used in an application. I think you would find LIA (Lucene In Action book) very useful. Los Morales [EMAIL PROTECTED] wrote on 02/10/2006 11:46:45: Hi Erik, Thanks for the response. Consider the index in the back of a book. You could tear that out and still use it to tell what page something is on, but you have no actual content in hand. So, I guess what I'm having a hard time trying to figure out is, what's the point of having an index when you can't search/retrieve the contents of a field in the index since it is not stored? Isn't the whole point of having an index is to be able to search and retrieve the contents efficiently? Basically I'm not sure the points of UnIndexed and UnStored fields types. Say I use a field type unindexed for my SSN. I know its stored in the index but how am I suppose to retrieve it? As for the unstored, its like the scenario I described above... I see the fields in the index but I won't be able to search/retrieve it since I don't have the contents. The text field type makes sense to me (with data being a String), as well as the type keyword. Is there a scenario or scenarios you can describe where Unindexed/Unstored will be useful? Thanks in advanced! -los From: Erik Hatcher [EMAIL PROTECTED] Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: lucene newbie question Date: Mon, 2 Oct 2006 14:12:25 -0400 On Oct 2, 2006, at 2:08 PM, Los Morales wrote: I'm new to Lucene and IR in general. I'm a bit confused on the concept of fields. From what I've read, a field does not have to be indexed but its value can be stored in an index. Likewise a field can be indexed but its value is not stored in an index. Now how can a field be searchable when its value is not stored in the index and vice-versa? Again, I'm new to the Index/Search paradigm. Thanks in advanced. Consider the index in the back of a book. You could tear that out and still use it to tell what page something is on, but you have no actual content in hand. When a field is tokenized (and therefore implicitly indexed), it is run through the specified Analyzer and the terms emitted are indexed, but the original text may or may not also be stored in the index. Make sense? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Be seen and heard with Windows Live Messenger and Microsoft LifeCams http://clk.atdmt.com/MSN/go/msnnkwme002001msn/direct/01/? href=http://www.microsoft.com/hardware/digitalcommunication/default. mspx?locale=en-ussource=hmtagline - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene newbie question
Another Erick (note the correct spelling G). See below.. On 10/2/06, Los Morales [EMAIL PROTECTED] wrote: Hi Erik, Thanks for the response. Consider the index in the back of a book. You could tear that out and still use it to tell what page something is on, but you have no actual content in hand. So, I guess what I'm having a hard time trying to figure out is, what's the point of having an index when you can't search/retrieve the contents of a field in the index since it is not stored? Isn't the whole point of having an index is to be able to search and retrieve the contents efficiently? Your confusion here, I think, is that you CAN search on an unstored field. Consider a book. I want to show the user the titles of the most-relevant books. If I store the text of the entire book, it bloats the size of the index markedly. So, I index the text but do NOT store it. Now I can show my titles in relevancy order (when searched over the entire text), but don't have to pay the penalty size-wise. What I can't do in this case is reconstruct the book from the index because I didn't store the text. But I can search it, which is what my app requires. Basically I'm not sure the points of UnIndexed and UnStored fields types. Say I use a field type unindexed for my SSN. I know its stored in the index but how am I suppose to retrieve it? You'd search on what you *have* indexed, get the doc (from the index), and then read the field. Something like String s = Hits.doc(52).get(SSN); I'm doing this now since we have images stored with internal IDs on a separate file system. I *never* care to allow the user to search by our internal ID number. So I index the caption, and STORE but do not INDEX the internal ID. We provide a page full of links (in relevancy order) and when the user clicks on one, use the stored internal ID to fetch the right image. As for the unstored, its like the scenario I described above... I see the fields in the index but I won't be able to search/retrieve it since I don't have the contents. The text field type makes sense to me (with data being a String), as well as the type keyword. Is there a scenario or scenarios you can describe where Unindexed/Unstored will be useful? Thanks in advanced! Again, you can search unstored fields. You just can't reconstruct the input with 100% fidelity (things like stop words will be missing, and any funky games you played during indexing will mess up an attempt to reconstruct the data). Hope this helps. Erick -los From: Erik Hatcher [EMAIL PROTECTED] Reply-To: java-user@lucene.apache.org To: java-user@lucene.apache.org Subject: Re: lucene newbie question Date: Mon, 2 Oct 2006 14:12:25 -0400 On Oct 2, 2006, at 2:08 PM, Los Morales wrote: I'm new to Lucene and IR in general. I'm a bit confused on the concept of fields. From what I've read, a field does not have to be indexed but its value can be stored in an index. Likewise a field can be indexed but its value is not stored in an index. Now how can a field be searchable when its value is not stored in the index and vice-versa? Again, I'm new to the Index/Search paradigm. Thanks in advanced. Consider the index in the back of a book. You could tear that out and still use it to tell what page something is on, but you have no actual content in hand. When a field is tokenized (and therefore implicitly indexed), it is run through the specified Analyzer and the terms emitted are indexed, but the original text may or may not also be stored in the index. Make sense? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] _ Be seen and heard with Windows Live Messenger and Microsoft LifeCams http://clk.atdmt.com/MSN/go/msnnkwme002001msn/direct/01/?href=http://www.microsoft.com/hardware/digitalcommunication/default.mspx?locale=en-ussource=hmtagline - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]