Re: How do I delete?

2005-02-01 Thread Joseph Ottinger
I've had success with deletion by running IndexReader.delete(int), then
getting an IndexWriter and optimizing the directory. I don't know if
that's the right way to do it or not.

On Tue, 1 Feb 2005, Jim Lynch wrote:

 I've been merrily cooking along, thinking I was replacing documents when
 I haven't.  My logic is to go through a batch of documents, get a field
 called reference which is unique build a term from it and delete it
 via the reader.delete() method.  Then I close the reader and open a
 writer and reprocess the batch indexing all.

 Here is the delete and associated code:

   reader = IndexReader.open(database);

   Term t = new Term(reference,reference);
   try {
 reader.delete(t);
   } catch (Exception e) {
 System.out.println(Delete exception;+e);
   }

 except it isn't working.  I tried to do a commt and a doCommit, but
 those are both protected.  I do a reader.close() after processing the
 batch the first time.

 What am I missing?  I don't get an exception.  Reference is definitely a
 valid field, 'cause I print out the value at search time and compare to
 the doc and they are identical.

 Thanks,
 Jim.

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How do I delete?

2005-02-01 Thread Joseph Ottinger
Well, in LuceneRAR, the delete by id code does exactly what I said: gets
the indexreader, deletes the doc id, then it opens a writer and optimizes.
Nothing else.

On Tue, 1 Feb 2005, Jim Lynch wrote:

 Thanks, I'd try that, but I don't think it will make any difference.  If
 I modify the code to not reindex the documents, no files in the index
 directory are touched, hence there is no record of the deletions
 anywhere.  I checked the count coming back from the delete operation and
 it is zero.  I even tried to delete another unique term with similar
 results.

 How does one call the commit method anyway? Isn't it automatically called?

 Jim.

 Joseph Ottinger wrote:

 I've had success with deletion by running IndexReader.delete(int), then
 getting an IndexWriter and optimizing the directory. I don't know if
 that's the right way to do it or not.
 
 On Tue, 1 Feb 2005, Jim Lynch wrote:
 
 
 
 I've been merrily cooking along, thinking I was replacing documents when
 I haven't.  My logic is to go through a batch of documents, get a field
 called reference which is unique build a term from it and delete it
 via the reader.delete() method.  Then I close the reader and open a
 writer and reprocess the batch indexing all.
 
 Here is the delete and associated code:
 
   reader = IndexReader.open(database);
 
   Term t = new Term(reference,reference);
   try {
 reader.delete(t);
   } catch (Exception e) {
 System.out.println(Delete exception;+e);
   }
 
 except it isn't working.  I tried to do a commt and a doCommit, but
 those are both protected.  I do a reader.close() after processing the
 batch the first time.
 
 What am I missing?  I don't get an exception.  Reference is definitely a
 valid field, 'cause I print out the value at search time and compare to
 the doc and they are identical.
 
 Thanks,
 Jim.
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
 ---
 Joseph B. Ottinger http://enigmastation.com
 IT Consultant[EMAIL PROTECTED]
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



LuceneRAR nearing first release

2005-01-27 Thread Joseph Ottinger
https://lucenerar.dev.java.net

LuceneRAR is now working on two containers, verified: The J2EE 1.4 RI and
Orion. Websphere testing is underway, with JBoss to follow.

LuceneRAR is a resource adapter for Lucene, allowing J2EE components to
look up an entry in a JNDI tree, using that reference to add and search
for documents. It's much like RemoteSearcher would be, except using JNDI
semantics for communication instead of RMI, which is a little more elegant
in a J2EE environment (where JNDI communication is very common).

LuceneRAR was created to allow J2EE components to legitimately use the
filesystem indexes (for speed) while not violating J2EE's suggestion to
not rely on filesystem access. It also allows distributed access to the
index (as remote servers would simply establish a JNDI connection to the
LuceneRAR home.)

Please take a look at it, if you're interested; the feature set isn't
complete, but it's workable. There's a sample application that allows
creation, searches, and statistical data about the search included in the
distribution.

Any comments are welcomed.

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: LuceneRAR project announcement

2005-01-19 Thread Joseph Ottinger
On Wed, 19 Jan 2005, Erik Hatcher wrote:

 On Jan 19, 2005, at 2:27 PM, Joseph Ottinger wrote:
  After babbling endlessly about an RDMS directory and my lack of success
  with it, I've created a project on java.net to create a Lucene JCA
  component, to allow J2EE components to interact with a Lucene service.
  It's at https://lucenerar.dev.java.net/ currently.

 Could you elaborate on some use cases?

Sure, and I'll pick the one that's been driving me along:

I have a set of J2EE servers, all of which can generate new content for
search, and all of which will be performing searches. They're on separate
machines. Sharing directories isn't my idea of doing J2EE correctly.

Therefore, I chose to represent Lucene as an enterprise service, one
communicated to via a remote service instead, so that every module can
communicate with Lucene without realising the communication layer... for
the most part. Plus, I no longer violate my purist's sensibilities.

 What drove you to consider JCA rather than some other technique?  I'm
 curious why it is important to get all J2EE with it rather than working
 with Lucene much more naturally at a lower level of abstraction.

JCA allows me to provide it as a system service instead of as a dependency
represented at each component layer. An EJB would have served almost as
well, except an EJB has filesystem restrictions that a Connector does not.

 I briefly browsed the source tree from java.net and saw this comment in
 your Hits.java:

 This method loads a LuceneRAR hits object with its equivalent from the
 Apache Lucene Hits object. It basically walks the Lucene Hits object,
 copying values as it goes, so it may not be as light or fast as its
 Apache equivalent

 I'll say!

Haha, it's good to see my propensity for understatement is still alive. :)

The Hits object could CERTAINLY use optimization - callbacks into the
connector would probably be acceptable, for example. The code you were
looking at has a lot of other areas that are, um, surprisingly crippled as
well.

For example, the add() method... well, first, THAT's the signature. Yes,
that's right. It adds constant text. Every time.

Likewise, the super-flexible search() -- again, that's the signature. It
searches for time. That's it. Nothing more. Nothing less.

This is very much a first-cut can I get it working? version. I think,
for very limited definitions of working, the answer is yes. I
certainly don't think it's got that show-room floor gleam going for it
yet.

 For large result sets, which are more often the norm than the exception
 for a search, you are going to take a huge performance hit doing
 something like this, not to mention possibly even killing the process
 as you run out of RAM.

*nod* As stated, a callback would be far more preferable. Given that
Lucene's internal Hits object is final and nonserializable, at least my
client's Hit object gives me an opportunity to do that.

 JCA sounds like an unnecessary abstraction around Lucene - though I'm
 open to be convinced otherwise.

I'm more than happy to talk about it. If I can fulfill my needs with no
code, hey, that's great! I just haven't been able to successfully do so
yet, and everyone to whom I've spoken who says that they HAVE managed...
well, they've almost invariably done so by lowering the bar a great deal
in order to accept what Lucene requires.

I'm certainly not castigating those who've done this - in fact, in many
ways, I'm very impressed. It's just something I'd prefer not to do, given
any alternative.

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: LuceneRAR project announcement

2005-01-19 Thread Joseph Ottinger
First off, Erik, thank you for taking an interest in any way. As I've said
before, I'm not trying to represent myelf as a Lucene expert, so having
someone point out flaws is god.

On Wed, 19 Jan 2005, Erik Hatcher wrote:
  Could you elaborate on some use cases?
 
  Sure, and I'll pick the one that's been driving me along:
 
  I have a set of J2EE servers, all of which can generate new content for
  search, and all of which will be performing searches. They're on
  separate
  machines. Sharing directories isn't my idea of doing J2EE correctly.

 doing J2EE correctly is a funny phrase.   If sharing directories
 works and gets the job done right, on time, under budget, can be
 adjusted later if needed, and has been reasonably well tested, then
 you've done it right.  And since its in Java and not on a cell phone,
 its basically J2EE.

Absolutely. I'm not trying to make fun of pragmatic, working solutions.
Nor am I sneering at those who've done it by sharing filesystems or
whatever.

 Also, what about using Lucene over RMI using the RemoteSearchable
 facility built-in?

Well, I'd prefer to avoid RMI. App servers typically have far better
transport layers than raw RMI, internally, and JCA can leverage that.

  Therefore, I chose to represent Lucene as an enterprise service, one
  communicated to via a remote service instead, so that every module can
  communicate with Lucene without realising the communication layer...
  for
  the most part.

 And this is where I think the abstraction leaks.

 The Nutch project has a very scalable enterprise approach to this
 type of remote service also.

*nod* I'll look it up.

   Plus, I no longer violate my purist's sensibilities.

 Ah, now we get to the real rationale!  :)

 I'm not giving you, personally, a hard time, really ... but rather this
 purist approach, where purist means fitting into the acronyms under
 the J2EE umbrella.  I've been there myself, read the specs, and cringed
 when I saw file system access from a session bean, and so on.

Well, in all honesty, there IS a small factor of Gee, I can use an
acronym here! involved. It's not ALL that's involved, of course - I think
the connector's transparency might be a real benefit for others as well as
satisfying my own I need a deployed component, and not a service I have
to tune need.

  The Hits object could CERTAINLY use optimization - callbacks into the
  connector would probably be acceptable, for example.

 Gotcha.  Yes, callbacks would be the right approach with this type of
 abstraction.

Just as a general question... is it efficient to retrieve a Document by,
uh, a sort of Lucene key? (Is there such a thing?) If there is, I can code
up a callback procedure in almost no time. (There are some other issues to
address first, but THAT would be easy to do.)

  JCA sounds like an unnecessary abstraction around Lucene - though I'm
  open to be convinced otherwise.
 
  I'm more than happy to talk about it. If I can fulfill my needs with no
  code, hey, that's great!

 Would RemoteSearchable get you closer to no code?

Dunno, I'll investigate.

   I just haven't been able to successfully do so
  yet, and everyone to whom I've spoken who says that they HAVE
  managed...
  well, they've almost invariably done so by lowering the bar a great
  deal
  in order to accept what Lucene requires.

 I'm definitely a skeptic when it comes to generic layers on top of
 Lucene, though there is definitely a yearning for easier management of
 the lower-level details.

On my part, too. If I coul get a Directory reliably and quickly using an
RDMS, I'd have gone that route.

 I'll definitely follow your work with LuceneRAR closely and will do
 what I can to help out in this forum.  So take my feedback as
 constructive criticism, but keep up the good work!

Again, no problem - and thank you. The things you bring up are issues I
might not be aware of, so it's good to see them and evaluate them.

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



SUggestions for remoting lucene?

2005-01-17 Thread Joseph Ottinger
I just realised that the Hits object isn't Serializable, although Document
and Field are. I can easily build a Hits equivalent that *is*
Serializable, but should that be on my end, or at the Lucene API level?

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



HELP! Directory is NOT getting closed!

2005-01-12 Thread Joseph Ottinger
*sigh* Yet again, I apologize. I'm generating altogether too much traffic
here lately!

I'm stuck. I have a custom Directory, and I *need* a callback point so I
can clean up. There's a method for this: Directory.close(), which I've
overridden.

It never gets called!

According to IndexWriter.java, line 246 (in 1.4.3's codebase), if closeDir
is set, it's supposed to close the directory. That's fine - but that leads
me to believe that for some reason, closeDir is *not* set.

Why? Under what circumstances would this not be true, and under what
circumstances would you NOT want to close the Directory?

This is absolutely slaughtering my attempt at a Directory, because I need
a single unit-of-work, and I need a place to commit it, when it's done. If
I commit it inside the directory's innards, then the UOW gets corrupted
(and looks like it's more than one atomic action, which is EXACTLY what I
don't need.)

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: HELP! Directory is NOT getting closed!

2005-01-12 Thread Joseph Ottinger
On Wed, 12 Jan 2005, Morus Walter wrote:

 Joseph Ottinger writes:
 
  According to IndexWriter.java, line 246 (in 1.4.3's codebase), if closeDir
  is set, it's supposed to close the directory. That's fine - but that leads
  me to believe that for some reason, closeDir is *not* set.
 
  Why? Under what circumstances would this not be true, and under what
  circumstances would you NOT want to close the Directory?
 
 From the sources, you can see, that is is true only, if the directory
 is created by the IndexWriter itself. If you provide a directory to
 the IndexWriter you have to close it yourself.


ARGH! (I've been saying that a lot lately!)

Okay, I was looking at the sources but missed that. Thank you very much.
*sigh*

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



IndexWriter failure leaves lock in place

2005-01-10 Thread Joseph Ottinger
I'm still working through making my own directory, based on JDBC (and yes,
I know, there are some out there already, unsuitable for this reason or
that reason.)

One thing I've noticed is that the Lock procedure in IndexWriter is a
little off, I think.

My normal process on application startup is to get an IndexWriter, just to
make sure an index is there. If I get an exception (FileNotFoundException
for the FSDirectory, for example), I assume the index isn't created
properly, so then I create a new IndexWriter set to create the index.

With a file-based directory, that works well enough - and I realise there
might be a better way to do it (but I don't know it yet.)

However, the SQL-based directory leaves the lock. I think what's happening
is that the IndexWriter constructor (IndexWriter.java:216 from 1.4.3's
souce distribution) is obtaining the lock, but then the synchronized block
(starting at line 227) gets an IOException from
segmentInfos.read(directory), which throws an IOException - but the
writeLock is never explicitly removed once it's obtained.

I would think that a try/finally (or something even more predictable,
like a try/catch tht rethrows the IOException after cleanup) would be
appropriate to clear the lock *provided it's obtained* in the IndexWriter
construction, and it'd make the code that I typically use work regardless
of the specific directory I rely on.

Now, to be sure, I'm VERY FAR from a Lucene expert; am I missing
something? (I can contribute a patch if you'd like.)

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexWriter failure leaves lock in place

2005-01-10 Thread Joseph Ottinger
On Mon, 10 Jan 2005, Erik Hatcher wrote:

 On Jan 10, 2005, at 8:26 AM, Joseph Ottinger wrote:
  With a file-based directory, that works well enough - and I realise
  there
  might be a better way to do it (but I don't know it yet.)

 How about using IndexReader.indexExists() instead?


*blank stare* .. uh... because I didn't know it was there to look for it?
:) :) :) Thanks.

Would the change still be valid, though, just to catch morons who do what
I did?

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lock obtain timed out from an MDB

2005-01-06 Thread Joseph Ottinger
If this is a stupid question, I deeply apologize. I'm stumped.

I have a message-driven EJB using Lucene. In *every* case where the MDB is
trying to create an index, I'm getting Lock obtain timed out.

It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the user
list has referred to before - but I don't see how the suggestions there
apply to what I'm trying to do. (It's creating a lock file in /var/tmp/
properly, from what I can see, so it's not write permissions, I imagine.)

I set the infoStream in my index writer to System.out, but I don't see any
extra information.

I'm using a SQL-based Directory object, but I get the same problem if I
refer to a file directly.

Is there a way to override the Lock portably so that I can have the lock
itself managed in an RDMS? (It's a J2EE project, so relying on file access
is problematic; if the beans using lucene to write to the index are on
multiple servers, multiple locks could exist anyway.)

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lock obtain timed out from an MDB

2005-01-06 Thread Joseph Ottinger
Sorry to reply to my own post, but I now have a greater understanding of
PART of my problem - my SQLDirectory is not *quite* right, I think. So I'm
rolling back to FSDirectory.

Now, I have a servlet that writes to the filesystem to simplify things (as
I'm not confident enough to debug the RDMS-based directory yet. That's a
task for later, I think). The servlet says it successfully creates the
index like so:

try {
   open the index with create=false
} catch (file not found) {
   open the index with create=true
}
index.optimize();
index.close();

Now, when I fire off any messages to the MDB, it yields the following:

java.io.IOException: Lock obtain timed out:
Lock@/var/tmp/lucene-d6b0a3281487d1bc4d169d00426f475d-write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:58)

Now, this is on only two messages to the MDB, not just a flood of
messages. Two handlers, so I expect a lock in one's case, but not the
first MDB call - it should be the one causing the lock for the second one,
if a lock exists at all.

I've verified that when the servlet that initializes the index runs, a
lock file is NOT present, but again, it looks like every message fired
through looks for a lock and finds one, when I would think it wouldn't be
there.

What am I not understanding?

On Thu, 6 Jan 2005, Joseph Ottinger wrote:

 If this is a stupid question, I deeply apologize. I'm stumped.

 I have a message-driven EJB using Lucene. In *every* case where the MDB is
 trying to create an index, I'm getting Lock obtain timed out.

 It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the user
 list has referred to before - but I don't see how the suggestions there
 apply to what I'm trying to do. (It's creating a lock file in /var/tmp/
 properly, from what I can see, so it's not write permissions, I imagine.)

 I set the infoStream in my index writer to System.out, but I don't see any
 extra information.

 I'm using a SQL-based Directory object, but I get the same problem if I
 refer to a file directly.

 Is there a way to override the Lock portably so that I can have the lock
 itself managed in an RDMS? (It's a J2EE project, so relying on file access
 is problematic; if the beans using lucene to write to the index are on
 multiple servers, multiple locks could exist anyway.)

 ---
 Joseph B. Ottinger http://enigmastation.com
 IT Consultant[EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lock obtain timed out from an MDB

2005-01-06 Thread Joseph Ottinger
Well, I think I isolated the problem: stupid error on my part, I think. I
was adding an indexed field that had, um, a value of null. Correcting that
made the process go much more properly - although note that I haven't
scaled up to have multiple elements to index. Good milestone, though.

SHouldn't Lucene warn the user if they do something like this?

On Thu, 6 Jan 2005, Erik Hatcher wrote:

 Do you have two threads simultaneously either writing or deleting from
 the index?

   Erik

 On Jan 6, 2005, at 9:27 AM, Joseph Ottinger wrote:

  Sorry to reply to my own post, but I now have a greater understanding
  of
  PART of my problem - my SQLDirectory is not *quite* right, I think. So
  I'm
  rolling back to FSDirectory.
 
  Now, I have a servlet that writes to the filesystem to simplify things
  (as
  I'm not confident enough to debug the RDMS-based directory yet. That's
  a
  task for later, I think). The servlet says it successfully creates the
  index like so:
 
  try {
 open the index with create=false
  } catch (file not found) {
 open the index with create=true
  }
  index.optimize();
  index.close();
 
  Now, when I fire off any messages to the MDB, it yields the following:
 
  java.io.IOException: Lock obtain timed out:
  Lock@/var/tmp/lucene-d6b0a3281487d1bc4d169d00426f475d-write.lock
  at org.apache.lucene.store.Lock.obtain(Lock.java:58)
 
  Now, this is on only two messages to the MDB, not just a flood of
  messages. Two handlers, so I expect a lock in one's case, but not the
  first MDB call - it should be the one causing the lock for the second
  one,
  if a lock exists at all.
 
  I've verified that when the servlet that initializes the index runs, a
  lock file is NOT present, but again, it looks like every message fired
  through looks for a lock and finds one, when I would think it wouldn't
  be
  there.
 
  What am I not understanding?
 
  On Thu, 6 Jan 2005, Joseph Ottinger wrote:
 
  If this is a stupid question, I deeply apologize. I'm stumped.
 
  I have a message-driven EJB using Lucene. In *every* case where the
  MDB is
  trying to create an index, I'm getting Lock obtain timed out.
 
  It's in org.apache.lucene.store.Lock.obtain(Lock.java:58), which the
  user
  list has referred to before - but I don't see how the suggestions
  there
  apply to what I'm trying to do. (It's creating a lock file in
  /var/tmp/
  properly, from what I can see, so it's not write permissions, I
  imagine.)
 
  I set the infoStream in my index writer to System.out, but I don't
  see any
  extra information.
 
  I'm using a SQL-based Directory object, but I get the same problem if
  I
  refer to a file directly.
 
  Is there a way to override the Lock portably so that I can have the
  lock
  itself managed in an RDMS? (It's a J2EE project, so relying on file
  access
  is problematic; if the beans using lucene to write to the index are on
  multiple servers, multiple locks could exist anyway.)
 
  --
  -
  Joseph B. Ottinger
  http://enigmastation.com
  IT Consultant
  [EMAIL PROTECTED]
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
  ---
  Joseph B. Ottinger http://enigmastation.com
  IT Consultant[EMAIL PROTECTED]
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lock obtain timed out from an MDB

2005-01-06 Thread Joseph Ottinger
On Thu, 6 Jan 2005, Erik Hatcher wrote:


 On Jan 6, 2005, at 10:41 AM, Joseph Ottinger wrote:
  SHouldn't Lucene warn the user if they do something like this?

 When a user indexes a null?  Or attempts to write to the index from two
 different IndexWriter instances?

 I believe you should get an NPE if you try index a null field value?
 No?

Well, I'd agree - the lack of an exception was rather disturbing,
considering how badly it destroyed Lucene for the application (requiring
not only restart but cleanup as well.)

I don't know Lucene well enough to say according to the code... but NOT
adding the null managed to correct the problem entirely.

---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: How do you pronounce 'Lucene'?

2003-08-12 Thread Joseph Ottinger
I pronounce it lieu'-seen or loo'-seen, usually the latter because I'm
lazy.

On Mon, 11 Aug 2003, Danny Sofer wrote:

 ...and where does the name come from?

 we've already developed three way to say 'lucene' and we can't agree on
 which one we like best.

 somebody please help!

 many thanks,

 danny.
 ===
 danny sofer   t. 020 7378 6655   m. 0795 722 1632
 www.kitsite.com - content management for websites


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


---
Joseph B. Ottinger http://enigmastation.com
IT Consultant[EMAIL PROTECTED]
J2EE Editor - Java Developer's Journal   [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



IndexReader.delete(int) not working for me

2003-03-05 Thread Joseph Ottinger
I've got a versioning content system where I want to replace documents in
a lucene repository. To do so, according to the FAQ and the mailing list
archives, I need to open an IndexReader, look for the document in
question, delete it via the IndexReader, and then add it.

This shouldn't replace the document per se - it should, however, free the
index entry (for reuse by documents added later) as I understand it. It
should also mark the document as deleted. A query still may return the
document (again, as I understand it), requiring a filter to make sure
deleted documents aren't returned.

If I'm offbase in my understanding, I apologize - this is the best I can
tell.

In my removeDocument() method (names and parameters are obscured to remove
cruft not germane to the problem at hand), I iterate through the
IndexReader's documents (because there are non-indexed identifiers used).
When I hit a document that contains the correct identifiers, I use
ir.delete(idx), and output a log message that I'm deleting the document.

This part works as expected. (A log message for one entry is spit out.)

Now, however, when I search for documents, things go awry. I'm using the
standard analyzer (StandardAnalyzer, I should say), and
IndexSearcher(String). I then use code like the following:

Hits hits=searcher.search(query, new Filter() {
  public BitSet bits(IndexReader ir) throws IOException {
BitSet bs=new BitSet();
for(int idx=0;idxir.maxDoc();idx++) {
  boolean deleted=ir.isDeleted(idx);
  bs.set(idx, !deleted);
}
return bs;
  }
});

(I also have a log message to output the salient information about the
document and whether it's been deleted.)

Here's where the problem evinces itself: *every* document here says that
it's not deleted, even though the removeDocument() method mentioned above
doesn't show all of the documents returned here. It's almost like there
are two IndexReaders in action, one noting the deleted documents, and the
other not. It's very confusing to me. Can anyone give me any pointers?

-
Joseph B. Ottinger [EMAIL PROTECTED]
http://enigmastation.comIT Consultant


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexReader.delete(int) not working for me

2003-03-05 Thread Joseph Ottinger
Then this means that my IndexReader.delete(i) isn't working properly. What
would be the common causes for this? My log shows the documents being
deleted, so something's going wrong at that point.

On Wed, 5 Mar 2003, Doug Cutting wrote:

 Joseph Ottinger wrote:
  This shouldn't replace the document per se - it should, however, free the
  index entry (for reuse by documents added later) as I understand it. It
  should also mark the document as deleted. A query still may return the
  document (again, as I understand it), requiring a filter to make sure
  deleted documents aren't returned.

 Searches results do not include deleted documents, so you do not need to
 explicitly filter for them.  After a document is deleted, the space
 consumed by it may not be reclaimed for a while, and some term
 statistics may not be updated immediately, but Lucene never returns
 references deleted documents.

 Doug



 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
Joseph B. Ottinger [EMAIL PROTECTED]
http://enigmastation.comIT Consultant


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexReader.delete(int) not working for me

2003-03-05 Thread Joseph Ottinger
Okay, I think I've done something stupid here: on closer examination, it
looks like my comparison to find the specific documents to delete is
failing. Let me look further at that.

On Wed, 5 Mar 2003, Doug Cutting wrote:
 Joseph Ottinger wrote:
  Then this means that my IndexReader.delete(i) isn't working properly. What
  would be the common causes for this? My log shows the documents being
  deleted, so something's going wrong at that point.

 Are you closing the IndexReader after doing the deletes?  This is
 required for the deletions to be saved.

 What makes you think that that delete is not working properly?

 Doug


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
Joseph B. Ottinger [EMAIL PROTECTED]
http://enigmastation.comIT Consultant


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: IndexReader.delete(int) not working for me

2003-03-05 Thread Joseph Ottinger
Okay, I found the problem: it was a stupid coder. To wit, here's the
salient code:
Document d=indexReader.document(i);
if(d.getField(key).equals(node.getKey()) {
   ...
}

The error, of course, is that getField.equals() is comparing FIELDS and
not string values. When I changed this to pull the stringValue() out of
getField(), everything worked as expected. Turns out my logging actually
was spitting out the *wrong* message somewhere else, which deceived
me^Wthe stupid coder into thinking the removal was occurring when it was
not.

Now everything's working fine. Thank you for your time.

On Wed, 5 Mar 2003, Doug Cutting wrote:

 Joseph Ottinger wrote:
  Then this means that my IndexReader.delete(i) isn't working properly. What
  would be the common causes for this? My log shows the documents being
  deleted, so something's going wrong at that point.

 Are you closing the IndexReader after doing the deletes?  This is
 required for the deletions to be saved.

 What makes you think that that delete is not working properly?

 Doug


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
Joseph B. Ottinger [EMAIL PROTECTED]
http://enigmastation.comIT Consultant


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]