Parsing and Indexing XML Docs

2003-03-18 Thread David Kendig
I am having problems with the
lucene-sandbox/contributions/XML-Indexing-Demo.  I get the following
error when I index my XML documents with the SAX parser in Java 1.4.1

java.lang.StringIndexOutOfBoundsException: String index out of range:
200
at
org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:524)
at org.apache.crimson.parser.Parser2.parse(Parser2.java:305)
at
org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
at
org.xml.sax.helpers.XMLReaderAdapter.parse(XMLReaderAdapter.java:223)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:314)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:253)
at
org.apache.lucenesandbox.xmlindexingdemo.XMLDocumentHandlerSAX.init(XMLDocumentHandlerSAX.java:34)


I thought it may be related to the depricated messages I get when I
build the XML demo so I replaced the depricated calls.  This was mostly
by extending from DefaultHandler instead of BaseHandler.  Now my XML doc
is parsed but there are no events generated that call startElement() and
stopElement(). I need stopElement() to be called to add the field to my
Lucene document.  Any one else had any problems like this?

Thanks,

Dave Kendig


Re: Parsing and Indexing XML Docs

2003-03-18 Thread David Kendig
Bummer, I get the same thing with Xerces.  I do not suspect the XML file
itself since it is from a separate app that has been operational for
over a year.  Does anyone maintain the sandbox contributions?

Dave


Traceback (innermost last):
  File ./indexTest.py, line 22, in ?
java.lang.StringIndexOutOfBoundsException: String index out of range:
200
at
org.apache.xerces.framework.XMLParser.parse(XMLParser.java:)
at
org.xml.sax.helpers.XMLReaderAdapter.parse(XMLReaderAdapter.java:223)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:314)
at javax.xml.parsers.SAXParser.parse(SAXParser.java:253)
at
org.apache.lucenesandbox.xmlindexingdemo.XMLDocumentHandlerSAX.init(XMLDocumentHandlerSAX.java:34)
at
org.apache.lucenesandbox.xmlindexingdemo.IndexFiles.indexDocs(IndexFiles.java:104)


Doesn't that look like an error in Crimson?
If I were you I'd use Xerces instead, I always had a better feeling
about Xerces, and I think that demo code doesn't have anything
Crimson-specific hard-coded in it.

Otis

--- David Kendig [EMAIL PROTECTED] wrote:
 I am having problems with the
 lucene-sandbox/contributions/XML-Indexing-Demo.  I get the following
 error when I index my XML documents with the SAX parser in Java 1.4.1
 
 java.lang.StringIndexOutOfBoundsException: String index out of range:
 200
 at
 org.apache.crimson.parser.Parser2.parseInternal(Parser2.java:524)
 at org.apache.crimson.parser.Parser2.parse(Parser2.java:305)
 at
 org.apache.crimson.parser.XMLReaderImpl.parse(XMLReaderImpl.java:442)
 at
 org.xml.sax.helpers.XMLReaderAdapter.parse(XMLReaderAdapter.java:223)
 at javax.xml.parsers.SAXParser.parse(SAXParser.java:314)
 at javax.xml.parsers.SAXParser.parse(SAXParser.java:253)
 at


org.apache.lucenesandbox.xmlindexingdemo.XMLDocumentHandlerSAX.init(XMLDocumentHandlerSAX.java:34)
 
 
 I thought it may be related to the depricated messages I get when I
 build the XML demo so I replaced the depricated calls.  This was
 mostly
 by extending from DefaultHandler instead of BaseHandler.  Now my XML
 doc
 is parsed but there are no events generated that call startElement()
 and
 stopElement(). I need stopElement() to be called to add the field to
 my
 Lucene document.  Any one else had any problems like this?
 
 Thanks,
 
 Dave Kendig
 


__
Do you Yahoo!?
Yahoo! Platinum - Watch CBS' NCAA March Madness, live on your desktop!
http://platinum.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Re: Book

2002-11-21 Thread David Kendig
Craig 

I do not subscribe to Java Developer's Journal.  Are the articles online?  Or 
could it be posted here after the article is published?

Thanks,

Dave Kendig

 There is a book by Wrox called Professional JSP Site Design (I think)
 that has a chapter on searching and it mentions Lucene, but its coverage on
 Lucene is *VERY* thin. I wouldn't recommend this book for learning Lucene.

 I have an article on Lucene to appear in December's Java Developer's
 Journal. It's not as complete a coverage of Lucene as I would have liked it
 to be, but with limited space in a magazine I couldn't go into much more
 than an introduction. I'd have probably written it differently if I had it
 to do over again. Oh well. Let me know what you think of the article when
 it comes out.

 William W wrote:
  I would like to buy a book about Lucene.
  Who could write it ? : )
 
  _
  STOP MORE SPAM with the new MSN 8 and get 2 months FREE*
  http://join.msn.com/?page=features/junkmail
 
  --
  To unsubscribe, e-mail:  
  mailto:[EMAIL PROTECTED] For additional
  commands, e-mail: mailto:[EMAIL PROTECTED]


--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]




Re: Lucene and XML

2002-10-30 Thread David Kendig
Rob 

I found it under the Lucene 'contributions' page on the main web site. 
Apparently ISOGEN is a commercial company that open sourced their XML 
extention to Lucene.  It seems to be very nice and thought out but I do 
wonder who maintains the contributed code. 

Dave



 Hello all,

   I did not know there were packages like ISOGEN that used Lucene to build a
 searchable index based on XML files.  From visiting ISOGEN's website it
 looks like it is a commercial software, are there any open source
 extensions to Lucene that allow XML indexing and searching?

   Please let me know.

 Thanks again,

 Rob


--
To unsubscribe, e-mail:   mailto:lucene-user-unsubscribe;jakarta.apache.org
For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org




Date Search Problem

2002-10-29 Thread David Kendig
I have XML documents that I indexed using Lucene with ISOGEN's XML package. I 
am unable to get the date search working properly. First let me describe how 
I set things up.  The document has these fields.

Temporal_Coverage
   Start_Date1968-01-01/Start_Date
   Stop_Date1997-12-31/Stop_Date
/Temporal_Coverage

They are indexed and added to org.apache.lucene.document.Document

contentDoc.add(new Field(Start_Date, startDate, false, true, false));

I build a query (in a Jython Servlet the imports the lucene packages)

#if a date range is supplied, use a date filter
dateFormat = SimpleDateFormat(-MM-dd);
dateFilter = DateFilter.After(Start_Date, 
dateFormat.parse(2001-02-03) )
hits = self.searcher.search(lucQuery, dateFilter)

Now when DateFilter.After() is called above, I print the value of the start 
attribute that is  declared as  a string and this is what I get:
DateFilter.After().start=0ciqv3fk0
But in DateFilter.bits() it is comparing against this:
Enum(0)=TermStart_Date:1000-01-01

So could someone please point me in the right direction?  I must be missing 
something here because it looks like it is comparing 0ciqv3fk0 to 
TermStart_Date:1000-01-01 and that is obviously wrong.

I scoured the FAQ and mail listings and the information on how to search using 
dateField is minimal.  The API docs help, but it is not clear to me how to 
put the API's together.  Unfortunately, the demo isn't much better at showing 
how to search using arbitrary date formats.  

Thanks,

Dave Kendig






--
To unsubscribe, e-mail:   mailto:lucene-user-unsubscribe;jakarta.apache.org
For additional commands, e-mail: mailto:lucene-user-help;jakarta.apache.org




Lucene and Geographic Searching

2002-10-09 Thread David Kendig

Hi,

I'm very interested in migrating our current search engine to use Lucene.  
After evaluating Lucene, I have become very impressed and have been telling 
lots of people about it.  One requirement that we have is to be able to 
search our documents by specifying a geographical boundary.  I searched 
everything I could find on Lucene but I barely found any mention of anyone 
using it for such a purpose.  My XML documents contain both temporal and 
spatial information that I would like my users to be able to search on.  Does 
such a thing exist for Lucene?  Is there an easy way to do this with Lucene?  
Is there interest in adding this type of functionality to Lucene if it 
doesn't exist?  Could something like GeoTools or some other Java toolkit be 
integrated into Lucene.  I would even offer my help to make it so, if there 
is a need.  

David Kendig
Global Change Master Directory
GSFC/NASA
http://globalchange.nasa.gov

--
To unsubscribe, e-mail:   mailto:[EMAIL PROTECTED]
For additional commands, e-mail: mailto:[EMAIL PROTECTED]