Re: GETVALUES +SEARCH

2004-12-01 Thread Erik Hatcher
On Dec 1, 2004, at 12:41 AM, Karthik N S wrote:
   Is there any API in Lucene Which can retrieve all the searched 
Values in
single fetch

   into some sort of an 'Array'   WITHOUT using this [ below ] Looping
process [ This would make
   the Search and display more Faster ].
 for (int i = 0; i  hits.length();i++) {
  Document doc = hits.doc(i);
  String path  = doc.get(path);
.
 }
Are you really showing *all* results at one time?  Or just the first 
several?  Iterating over all hits and retrieving each Document is often 
unwise and generally unnecessary if only the first 20 or so are shown 
at first.

I don't know of a simpler way to get all the path values in your 
example.  Perhaps a HitCollector is more to your liking?  Though it 
probably would not speed anything up for you.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: GETVALUES +SEARCH

2004-12-01 Thread Karthik N S
Hi
  Erik

Apologies..


  We create a ArrayList Object and Load all the Hit Values into them and
return
  the same for Display purpose on a Servlet. On the servlet we track the
server side created ArrayList
  for Required number of dispalys.

 [ At any time we have to have all the hit values loaded into the arryList
,cannot compromise for the same ]


  We Obsorved that the doc.get() was not continous for an hit of 4000 and
was coming
  in batches,


 So any new API features will definetly helps us.


With regards
Karthik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Wednesday, December 01, 2004 4:04 PM
To: Lucene Users List
Subject: Re: GETVALUES +SEARCH


On Dec 1, 2004, at 12:41 AM, Karthik N S wrote:
Is there any API in Lucene Which can retrieve all the searched
 Values in
 single fetch

into some sort of an 'Array'   WITHOUT using this [ below ] Looping
 process [ This would make

the Search and display more Faster ].

  for (int i = 0; i  hits.length();i++) {
 Document doc = hits.doc(i);
 String path  = doc.get(path);
 .
  }

Are you really showing *all* results at one time?  Or just the first
several?  Iterating over all hits and retrieving each Document is often
unwise and generally unnecessary if only the first 20 or so are shown
at first.

I don't know of a simpler way to get all the path values in your
example.  Perhaps a HitCollector is more to your liking?  Though it
probably would not speed anything up for you.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: GETVALUES +SEARCH

2004-12-01 Thread Erik Hatcher
On Dec 1, 2004, at 7:37 AM, Karthik N S wrote:
  We create a ArrayList Object and Load all the Hit Values into them 
and
return
  the same for Display purpose on a Servlet. On the servlet we track 
the
server side created ArrayList
  for Required number of dispalys.

 [ At any time we have to have all the hit values loaded into the 
arryList
,cannot compromise for the same ]
Be forewarned - you are asking for trouble doing this if you have 
enormous number of hits.  I highly recommend you reconsider your 
approach.

Sure, separation of concerns/tiers is a nice ideal, but pragmatically 
don't let blind adherence to principles get in the way of 
performance/scalability.

  We Obsorved that the doc.get() was not continous for an hit of 
4000 and
was coming
  in batches,
I'm not following what you mean.  Not continuous?  Batches?  Now is the 
time for you to show some code of what you're doing.  Succinct, clear, 
examples are best.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 13:37, Karthik N S wrote:
  We create a ArrayList Object and Load all the Hit Values into them 
and
return
  the same for Display purpose on a Servlet.
Talking of which...
It would be very handy if org.apache.lucene.search.Hits would implement 
the java.util.List interface... in addition, 
org.apache.lucene.document.Document could implement java.util.Map...

That way, the rest of the application could pretend to simply have to 
deal with a List of Maps, without having to get exposed to any Lucene 
internals...

Thought?
Cheers,
PA.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: GETVALUES +SEARCH

2004-12-01 Thread Erik Hatcher
On Dec 1, 2004, at 1:31 PM, Luke Francl wrote:
On Wed, 2004-12-01 at 11:12, petite_abeille wrote:
Not really, except perhaps that a Lucene Document could theoretically
have multiple identical keys... not something that anyone would want 
to
do though :o)
And why not? I use this to store closed captioned text. Each entry must
be stored separately, and they all have the same field name.
I also extensively use multiple fields of the same name.  So does this 
rule out implementing the Map interface on Document?

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 20:06, Erik Hatcher wrote:
I also extensively use multiple fields of the same name.
Odd... on the other hand... perhaps this is une affaire de gout...
 So does this rule out implementing the Map interface on Document?
Why? Nobody mentioned what value such a Map would hold... in the worst 
case scenario it could hold a Collection... or perhaps its not worth 
bothering with such esoterism and simply state that the DocumentMap 
only supports one value per key... after all... the purpose of 
providing standard interface such as List and Map is to simplify 
things... not to make them more cumbersome...

PA.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: GETVALUES +SEARCH

2004-12-01 Thread Erik Hatcher
On Dec 1, 2004, at 2:21 PM, petite_abeille wrote:
On Dec 01, 2004, at 20:06, Erik Hatcher wrote:
I also extensively use multiple fields of the same name.
Odd... on the other hand... perhaps this is une affaire de gout...
There are some places I use this for convenience, and another where it 
seems the best way to do it.  Here's an example that I'm actively 
working on.  I'm parsing XML files.  There are dates embedded in the 
data and the requirement is for year range queries.  The original data 
looks like this, believe it or not:  1837-56 or 1846-9, or 
1824-1911, or simply 1856.  I wrote a routine to extract a String[] 
of years.  In the first example it would be 1837, 1838, 1839... and so 
on.

I index as:
  for (int i=0; i  years.length; i++) {
doc.add(Field.Keyword(year, years[i]));
}
Sure, I could put it all together as a space separated String and use 
the WhitespaceAnalyzer, but why not do it this way?  What other 
suggestions do you have for doing this?

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 20:43, Erik Hatcher wrote:
Sure, I could put it all together as a space separated String and use  
the WhitespaceAnalyzer, but why not do it this way?  What other  
suggestions do you have for doing this?
If this works for you, I don't see any problem with it.
In general, I avoid storing any raw data in a Lucene Document. And only  
uses Lucene for, er, indexing... but this is just me :)

But lets go back to that fabled Map interface for Document... if the  
purpose of such interface is to keep thing simple it could behave just  
like Document.get() [1]:

Returns the string value of the field with the given name if any exist  
in  this document, or null. If multiple fields exist with this name,  
this method returns the first value added.

If for some reason(s) you need multiple values per field, stick with  
getFields()...

What's wrong with that?
PA.
[1]  
http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/document/ 
Document.html#get(java.lang.String)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: GETVALUES +SEARCH

2004-12-01 Thread Chris Hostetter

:  Having Document implement Map sounds reasonable to me though.  Any
:  reasons not to do this?
:
: Not really, except perhaps that a Lucene Document could theoretically
: have multiple identical keys... not something that anyone would want to

Assuming you want all changes to be backwards compatible, you pretty much
have to impliment Map.get(Object):Object usig Document.get(String):String
... otherwise you'll wind up really confusing the hell out of people.  But
If you really wanted to be mean to people, I guess you could use
Document.getField(String):Field or even
Document.getValues(String):String[] or Document.getFields(String):Fields[]
if you were feeling particularly mean.

The real question in my mind is not how should we impliment 'get' given
that we allow multiple values?, a better question is how should we
impliment 'put'?

do you write...
   Object put(Object k, Object v) {
   this.add((Field)v);
   return null;
   }
or...
   Object put(String k, String v) {
   this.add(Field.Text(k.toString(),v.toString()));
   return null;
   }
or...
   Object put(String k, String v) {
   throw new UnsupportedOperationException(we're not that nice);
   }


...i think it may be wiser to just let clinets wrap the Doc in their own
Map, using the rules that make sense to them -- becuase no ones ever going
to agree 100%.

If you think you know how to satisfy 90% of the users, i would still
suggest that instead of making Codument impliment Map, instead add
a toMap() functin that returns a wrapper with the rules that you think
make sense.  (and leave the Document API uncluttered of the Map functions
that people who don't care about Map don't need to see)

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: GETVALUES +SEARCH

2004-12-01 Thread petite_abeille
On Dec 01, 2004, at 21:14, Chris Hostetter wrote:
The real question in my mind is not how should we impliment 'get' 
given
that we allow multiple values?, a better question is how should we
impliment 'put'?
Yes, retrofitting Document.add() in the Map interface would be a pain. 
But this is not really what I was getting at. This is more about Hits 
and accessing its values. One problem at the time :)

If you think you know how to satisfy 90% of the users, i would still
suggest that instead of making Codument impliment Map, instead add
a toMap() functin that returns a wrapper with the rules that you 
think
make sense.  (and leave the Document API uncluttered of the Map 
functions
that people who don't care about Map don't need to see)
Agree. Document is fine as it is. It would be nice though to have a 
more or less standard interface to access the result set (e.g. 
Collection)... as consumers of Hits are more likely to be build in 
terms of the Collection API than anything specific to Lucene...

PA.
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: GETVALUES +SEARCH

2004-12-01 Thread Erik Hatcher
On Dec 1, 2004, at 2:59 PM, petite_abeille wrote:
On Dec 01, 2004, at 20:43, Erik Hatcher wrote:
Sure, I could put it all together as a space separated String and use 
the WhitespaceAnalyzer, but why not do it this way?  What other 
suggestions do you have for doing this?
If this works for you, I don't see any problem with it.
In general, I avoid storing any raw data in a Lucene Document. And 
only uses Lucene for, er, indexing... but this is just me :)
Getting further off-topic, but to clarify:
Sounds like you're suggesting I'm storing raw data.  I'm not.  I have 
to be able to do queries like: someWord AND year:[1837 TO 1856].  So 
the year is being indexed, I just happen to do it with a doc.add() for 
each year.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


GETVALUES +SEARCH

2004-11-30 Thread Karthik N S

Hi Guys


Apologies.




On Search API the command  [ package  org.apache.lucene.document.Document ]

Will this'public final String[] getValues(String name)' return me
all the docs with out looping  thru ?

Please Explaine with example.



Thx in advance



  WITH WARM REGARDS
  HAVE A NICE DAY
  [ N.S.KARTHIK]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: GETVALUES +SEARCH

2004-11-30 Thread Erik Hatcher
On Nov 30, 2004, at 7:10 AM, Karthik N S wrote:
On Search API the command  [ package  
org.apache.lucene.document.Document ]

Will this'public final String[] getValues(String name)' return 
me
all the docs with out looping  thru ?
getValues(fieldName) returns a String[] of the values of the field.  
It's similar to getValue(fieldName).  If you index a field multiple 
times:

doc.add(Field.Keyword(keyword, one));
doc.add(Field.Keyword(keyword, two));
getValue(keyword) will return one, but getValues(keyword) will 
return a String[] {one, two}

If you want to retrieve all documents, use IndexReader's various API 
methods.

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


RE: GETVALUES +SEARCH

2004-11-30 Thread Karthik N S
Hi Guys


Apologies...



   Is there any API in Lucene Which can retrieve all the searched Values in
single fetch

   into some sort of an 'Array'   WITHOUT using this [ below ] Looping
process [ This would make

   the Search and display more Faster ].

 for (int i = 0; i  hits.length();i++) {
  Document doc = hits.doc(i);
  String path  = doc.get(path);
.
 }



Thx in Advance
Karthik


-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 30, 2004 8:06 PM
To: Lucene Users List
Subject: Re: GETVALUES +SEARCH



On Nov 30, 2004, at 7:10 AM, Karthik N S wrote:
 On Search API the command  [ package
 org.apache.lucene.document.Document ]

 Will this'public final String[] getValues(String name)' return
 me
 all the docs with out looping  thru ?

getValues(fieldName) returns a String[] of the values of the field.
It's similar to getValue(fieldName).  If you index a field multiple
times:

doc.add(Field.Keyword(keyword, one));
doc.add(Field.Keyword(keyword, two));

getValue(keyword) will return one, but getValues(keyword) will
return a String[] {one, two}

If you want to retrieve all documents, use IndexReader's various API
methods.

Erik


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]