Re: GETVALUES +SEARCH
On Dec 1, 2004, at 12:41 AM, Karthik N S wrote: Is there any API in Lucene Which can retrieve all the searched Values in single fetch into some sort of an 'Array' WITHOUT using this [ below ] Looping process [ This would make the Search and display more Faster ]. for (int i = 0; i hits.length();i++) { Document doc = hits.doc(i); String path = doc.get(path); . } Are you really showing *all* results at one time? Or just the first several? Iterating over all hits and retrieving each Document is often unwise and generally unnecessary if only the first 20 or so are shown at first. I don't know of a simpler way to get all the path values in your example. Perhaps a HitCollector is more to your liking? Though it probably would not speed anything up for you. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: GETVALUES +SEARCH
Hi Erik Apologies.. We create a ArrayList Object and Load all the Hit Values into them and return the same for Display purpose on a Servlet. On the servlet we track the server side created ArrayList for Required number of dispalys. [ At any time we have to have all the hit values loaded into the arryList ,cannot compromise for the same ] We Obsorved that the doc.get() was not continous for an hit of 4000 and was coming in batches, So any new API features will definetly helps us. With regards Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Wednesday, December 01, 2004 4:04 PM To: Lucene Users List Subject: Re: GETVALUES +SEARCH On Dec 1, 2004, at 12:41 AM, Karthik N S wrote: Is there any API in Lucene Which can retrieve all the searched Values in single fetch into some sort of an 'Array' WITHOUT using this [ below ] Looping process [ This would make the Search and display more Faster ]. for (int i = 0; i hits.length();i++) { Document doc = hits.doc(i); String path = doc.get(path); . } Are you really showing *all* results at one time? Or just the first several? Iterating over all hits and retrieving each Document is often unwise and generally unnecessary if only the first 20 or so are shown at first. I don't know of a simpler way to get all the path values in your example. Perhaps a HitCollector is more to your liking? Though it probably would not speed anything up for you. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Dec 1, 2004, at 7:37 AM, Karthik N S wrote: We create a ArrayList Object and Load all the Hit Values into them and return the same for Display purpose on a Servlet. On the servlet we track the server side created ArrayList for Required number of dispalys. [ At any time we have to have all the hit values loaded into the arryList ,cannot compromise for the same ] Be forewarned - you are asking for trouble doing this if you have enormous number of hits. I highly recommend you reconsider your approach. Sure, separation of concerns/tiers is a nice ideal, but pragmatically don't let blind adherence to principles get in the way of performance/scalability. We Obsorved that the doc.get() was not continous for an hit of 4000 and was coming in batches, I'm not following what you mean. Not continuous? Batches? Now is the time for you to show some code of what you're doing. Succinct, clear, examples are best. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Dec 01, 2004, at 13:37, Karthik N S wrote: We create a ArrayList Object and Load all the Hit Values into them and return the same for Display purpose on a Servlet. Talking of which... It would be very handy if org.apache.lucene.search.Hits would implement the java.util.List interface... in addition, org.apache.lucene.document.Document could implement java.util.Map... That way, the rest of the application could pretend to simply have to deal with a List of Maps, without having to get exposed to any Lucene internals... Thought? Cheers, PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Dec 1, 2004, at 1:31 PM, Luke Francl wrote: On Wed, 2004-12-01 at 11:12, petite_abeille wrote: Not really, except perhaps that a Lucene Document could theoretically have multiple identical keys... not something that anyone would want to do though :o) And why not? I use this to store closed captioned text. Each entry must be stored separately, and they all have the same field name. I also extensively use multiple fields of the same name. So does this rule out implementing the Map interface on Document? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Dec 01, 2004, at 20:06, Erik Hatcher wrote: I also extensively use multiple fields of the same name. Odd... on the other hand... perhaps this is une affaire de gout... So does this rule out implementing the Map interface on Document? Why? Nobody mentioned what value such a Map would hold... in the worst case scenario it could hold a Collection... or perhaps its not worth bothering with such esoterism and simply state that the DocumentMap only supports one value per key... after all... the purpose of providing standard interface such as List and Map is to simplify things... not to make them more cumbersome... PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Dec 1, 2004, at 2:21 PM, petite_abeille wrote: On Dec 01, 2004, at 20:06, Erik Hatcher wrote: I also extensively use multiple fields of the same name. Odd... on the other hand... perhaps this is une affaire de gout... There are some places I use this for convenience, and another where it seems the best way to do it. Here's an example that I'm actively working on. I'm parsing XML files. There are dates embedded in the data and the requirement is for year range queries. The original data looks like this, believe it or not: 1837-56 or 1846-9, or 1824-1911, or simply 1856. I wrote a routine to extract a String[] of years. In the first example it would be 1837, 1838, 1839... and so on. I index as: for (int i=0; i years.length; i++) { doc.add(Field.Keyword(year, years[i])); } Sure, I could put it all together as a space separated String and use the WhitespaceAnalyzer, but why not do it this way? What other suggestions do you have for doing this? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Dec 01, 2004, at 20:43, Erik Hatcher wrote: Sure, I could put it all together as a space separated String and use the WhitespaceAnalyzer, but why not do it this way? What other suggestions do you have for doing this? If this works for you, I don't see any problem with it. In general, I avoid storing any raw data in a Lucene Document. And only uses Lucene for, er, indexing... but this is just me :) But lets go back to that fabled Map interface for Document... if the purpose of such interface is to keep thing simple it could behave just like Document.get() [1]: Returns the string value of the field with the given name if any exist in this document, or null. If multiple fields exist with this name, this method returns the first value added. If for some reason(s) you need multiple values per field, stick with getFields()... What's wrong with that? PA. [1] http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/document/ Document.html#get(java.lang.String) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
: Having Document implement Map sounds reasonable to me though. Any : reasons not to do this? : : Not really, except perhaps that a Lucene Document could theoretically : have multiple identical keys... not something that anyone would want to Assuming you want all changes to be backwards compatible, you pretty much have to impliment Map.get(Object):Object usig Document.get(String):String ... otherwise you'll wind up really confusing the hell out of people. But If you really wanted to be mean to people, I guess you could use Document.getField(String):Field or even Document.getValues(String):String[] or Document.getFields(String):Fields[] if you were feeling particularly mean. The real question in my mind is not how should we impliment 'get' given that we allow multiple values?, a better question is how should we impliment 'put'? do you write... Object put(Object k, Object v) { this.add((Field)v); return null; } or... Object put(String k, String v) { this.add(Field.Text(k.toString(),v.toString())); return null; } or... Object put(String k, String v) { throw new UnsupportedOperationException(we're not that nice); } ...i think it may be wiser to just let clinets wrap the Doc in their own Map, using the rules that make sense to them -- becuase no ones ever going to agree 100%. If you think you know how to satisfy 90% of the users, i would still suggest that instead of making Codument impliment Map, instead add a toMap() functin that returns a wrapper with the rules that you think make sense. (and leave the Document API uncluttered of the Map functions that people who don't care about Map don't need to see) - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Dec 01, 2004, at 21:14, Chris Hostetter wrote: The real question in my mind is not how should we impliment 'get' given that we allow multiple values?, a better question is how should we impliment 'put'? Yes, retrofitting Document.add() in the Map interface would be a pain. But this is not really what I was getting at. This is more about Hits and accessing its values. One problem at the time :) If you think you know how to satisfy 90% of the users, i would still suggest that instead of making Codument impliment Map, instead add a toMap() functin that returns a wrapper with the rules that you think make sense. (and leave the Document API uncluttered of the Map functions that people who don't care about Map don't need to see) Agree. Document is fine as it is. It would be nice though to have a more or less standard interface to access the result set (e.g. Collection)... as consumers of Hits are more likely to be build in terms of the Collection API than anything specific to Lucene... PA. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Dec 1, 2004, at 2:59 PM, petite_abeille wrote: On Dec 01, 2004, at 20:43, Erik Hatcher wrote: Sure, I could put it all together as a space separated String and use the WhitespaceAnalyzer, but why not do it this way? What other suggestions do you have for doing this? If this works for you, I don't see any problem with it. In general, I avoid storing any raw data in a Lucene Document. And only uses Lucene for, er, indexing... but this is just me :) Getting further off-topic, but to clarify: Sounds like you're suggesting I'm storing raw data. I'm not. I have to be able to do queries like: someWord AND year:[1837 TO 1856]. So the year is being indexed, I just happen to do it with a doc.add() for each year. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
GETVALUES +SEARCH
Hi Guys Apologies. On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? Please Explaine with example. Thx in advance WITH WARM REGARDS HAVE A NICE DAY [ N.S.KARTHIK] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: GETVALUES +SEARCH
On Nov 30, 2004, at 7:10 AM, Karthik N S wrote: On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? getValues(fieldName) returns a String[] of the values of the field. It's similar to getValue(fieldName). If you index a field multiple times: doc.add(Field.Keyword(keyword, one)); doc.add(Field.Keyword(keyword, two)); getValue(keyword) will return one, but getValues(keyword) will return a String[] {one, two} If you want to retrieve all documents, use IndexReader's various API methods. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: GETVALUES +SEARCH
Hi Guys Apologies... Is there any API in Lucene Which can retrieve all the searched Values in single fetch into some sort of an 'Array' WITHOUT using this [ below ] Looping process [ This would make the Search and display more Faster ]. for (int i = 0; i hits.length();i++) { Document doc = hits.doc(i); String path = doc.get(path); . } Thx in Advance Karthik -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 30, 2004 8:06 PM To: Lucene Users List Subject: Re: GETVALUES +SEARCH On Nov 30, 2004, at 7:10 AM, Karthik N S wrote: On Search API the command [ package org.apache.lucene.document.Document ] Will this'public final String[] getValues(String name)' return me all the docs with out looping thru ? getValues(fieldName) returns a String[] of the values of the field. It's similar to getValue(fieldName). If you index a field multiple times: doc.add(Field.Keyword(keyword, one)); doc.add(Field.Keyword(keyword, two)); getValue(keyword) will return one, but getValues(keyword) will return a String[] {one, two} If you want to retrieve all documents, use IndexReader's various API methods. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]