cant rename segments.new to segment

2003-09-18 Thread Rociel Buico
all,
when i'm trying to run my index writer program at idea (IDE) i got this error (cant 
rename segments.new to segments, sometimes the deletable file got an error), but when 
im going to run the program in the command prompt, it looks fine, no error returned.
 
im just making 1 index and no threads
is this a bug?


-
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software

Re: HTML Parsing problems...

2003-09-18 Thread Peter Becker
Tatu Saloranta wrote:

On Thursday 18 September 2003 14:50, Michael Giles wrote:
 

I know, I know, the HTML Parser in the demo is just that (i.e. a demo), but
I also know that it is updated from time to time and performs much better
than the other ones that I have tested.  Frustratingly, the very first page
I tried to parse failed
(http://www.theregister
.co.uk/content/54/32593.html). It seems to be choking on tags that are being
written inside of JavaScript code (i.e. document.write('');. 
Obviously, the simple solution (that I am using with another parser) is to
just ignore everything inside of 

Re: HTML Parsing problems...

2003-09-18 Thread Tatu Saloranta
On Thursday 18 September 2003 14:50, Michael Giles wrote:
> I know, I know, the HTML Parser in the demo is just that (i.e. a demo), but
> I also know that it is updated from time to time and performs much better
> than the other ones that I have tested.  Frustratingly, the very first page
> I tried to parse failed
> (http://www.theregister
>.co.uk/content/54/32593.html). It seems to be choking on tags that are being
> written inside of JavaScript code (i.e. document.write('');. 
> Obviously, the simple solution (that I am using with another parser) is to
> just ignore everything inside of 

HTML Parsing problems...

2003-09-18 Thread Michael Giles
I know, I know, the HTML Parser in the demo is just that (i.e. a demo), but 
I also know that it is updated from time to time and performs much better 
than the other ones that I have tested.  Frustratingly, the very first page 
I tried to parse failed 
(http://www.theregister.co.uk/content/54/32593.html). 
It seems to be choking on tags that are being written inside of JavaScript 
code (i.e. document.write('');.  Obviously, the simple 
solution (that I am using with another parser) is to just ignore everything 
inside of 

Re: Adding String in a Field

2003-09-18 Thread Otis Gospodnetic
Amit, this is a question for the -user list, not -dev list.

I suggest that you look at one of Lucene articles that deal with
indexing.  You will find links to them on the Resources page, which is
listed on Lucene's home page.

Otis

--- Amit Bhavsar <[EMAIL PROTECTED]> wrote:
> Hi,
> 
> I am trying to add fields to a document as show below:
> 
> doc.add(Field.Text(fieldName, newString));
> 
> 
> Here fieldName and newString are both strings. I wanted to have the
> words in 
> newString searchable, ie indexed.
> 
> Currently, I am using a StringTokenizer to get tokens from the string
> and 
> adding '\n' between tokens.
> 
> Is there a quicer and cleaner way of doing this reliably?
> 
> Here is a sample strings I am trying to add:
> "Interactive teaching, Computer simulations, Universe, Computational
> science, 
> GalaxSee software, Solar system, Earth, Sun, Gravity, Gravitational
> pull, 
> Orbits, Velocity"
> 
> "http://www.shodor.org/master/galaxsee/ = GalaxSee;
> http://www.shodor.org/ = 
> Shodor Education Foundation, Inc.; http://www.shodor.org/master/ =
> Master 
> Tools; http://www.nap.edu/readingroom/books/nses/html/6e.html#csa912
> = Click 
> here for further information on NSES standards.; 
> http://www.shodor.org/master/dodea/index.html = Click here for
> additional 
> information on DoDDs standards"
> 
> 
> Any help/input will be appreciated.
> 
> thank you.
> 
> amit
> 
> 
> 
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 


__
Do you Yahoo!?
Yahoo! SiteBuilder - Free, easy-to-use web site design software
http://sitebuilder.yahoo.com

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene Scoring Behavior

2003-09-18 Thread Terry Steichen
Doug,

I just extracted a portion of the database, reindexed and found the scores
to come out much more like we'd expect.  Appears this may be an indexing
issue - I index new stuff each day and merge the new index with the master
index.  Only redo the master when I can't avoid it (because it takes so
long).  I probably merge 100 times or more before reindexing.  This evening
I'll reindex and let you know if the apparent problem clears up.  If so,
I'll keep track of it as I continue to merge and see if there's any issue
there.

Thanks for the input (and from Erik, pointing me to the Explanation - it's
pretty neat).

Question: The new scores for the test database portion mentioned above all
seem to come out in the range of .06 to .07.  I assume this is because they
never get normalized.  If this is the case, (a) would it hurt anything to
"normalize up" (so the scores range up to 1), and if so (b) is there an
easy, non-disruptive (to the source code) way to do this?

Regards,

Terry


- Original Message -
From: "Doug Cutting" <[EMAIL PROTECTED]>
To: "Lucene Users List" <[EMAIL PROTECTED]>
Sent: Wednesday, September 17, 2003 11:15 PM
Subject: Re: Lucene Scoring Behavior


> Hmm.  This makes no sense to me.  Can you supply a reproducible
> standalone test case?
>
> Doug
>
> Terry Steichen wrote:
> > Doug,
> >
> > (1) No, I did *not* boost the pub_date field, either in the indexing
process
> > or in the query itself.
> >
> > (2) And, each pub_date field of each document (which is in XML format)
> > contains only one instance of the date string.
> >
> > (3) And only the pub_date field itself is indexed.  There are other
> > attributes of this field that may contain the date string, but they
aren't
> > indexed - that is, they are not included in the instantiated Document
class.
> >
> > Regards,
> >
> > Terry
> >
> > - Original Message -
> > From: "Doug Cutting" <[EMAIL PROTECTED]>
> > To: "Lucene Users List" <[EMAIL PROTECTED]>
> > Sent: Wednesday, September 17, 2003 5:51 PM
> > Subject: Re: Lucene Scoring Behavior
> >
> >
> >
> >>Terry Steichen wrote:
> >>
> >>>  0.03125 = fieldNorm(field=pub_date, doc=90992)
> >>>  1.0 = fieldNorm(field=pub_date, doc=90970)
> >>
> >>It looks like the fieldNorm's are what differ, not the IDFs.  These are
> >>the product of the document and/or field boost, and 1/sqrt(numTerms)
> >>where numTerms is the number of terms in the "pub_date" field of the
> >>document.  Thus if each document is only assigned one date, and you
> >>didn't boost the field or the document when you indexed it, this should
> >>be 1.0.  But if the document has two dates, then this would be
> >>1/sqrt(2).  Or if you boosted this document pub_date field, then this
> >>will have whatever boost you provided.
> >>
> >>So, did you boost anything when indexing?  Or could a single document
> >>have two or more different values for pub_date?  Either would explain
> >
> > this.
> >
> >>Doug
> >>
> >>
> >>-
> >>To unsubscribe, e-mail: [EMAIL PROTECTED]
> >>For additional commands, e-mail: [EMAIL PROTECTED]
> >>
> >
> >
> >
> > -
> > To unsubscribe, e-mail: [EMAIL PROTECTED]
> > For additional commands, e-mail: [EMAIL PROTECTED]
> >
>
>
> -
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
>


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Lucene demo ideas?

2003-09-18 Thread Peter Becker
Erik Hatcher wrote:

[...]

- Index text and HTML files.  Any others?  I don't want to get into 
putting too many dependencies in though - let's keep it relatively 
simple, although still demonstrative.  Allow search filtering by last 
modified date range and document type (extension). 
If I may plug our code again ;-) Docco (http://tockit.sf.net) contains a 
framework for document handlers, with implementations for plain text, 
html, xml and OpenOffice based on JDK 1.4 and plugins for PDFBox, POI 
and Multivalent. There is also a notion of file mappings (i.e. mapping 
from a match on a FileFilter to a handler) and we plan to add code to 
mixin external information like meta-data stores or EAs from advanced 
file systems. It is available on SF (within 
http://sf.net/projects/toscanaj) and is at the moment BSD-style 
licensed. We would be happy to contribute bits of that and thanks to the 
plugin architecture dependencies should be controllable. Admittably the 
plugin loader is still a hack, but it works.

 Peter

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]