RE: WebLucene 0.4 released: added full featured demo(dump data php scripts and demo data in Chinese)

2003-12-16 Thread Tun Lin
Hi,

I am using the downloaded weblucene. I have started my tomcat server and trying
to search by clicking on the search button but it says the search page cannot be
found. Also, I cannot find it in the package.

Can anyone help?

Am I missing anything? 

-Original Message-
From: Che Dong [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 17, 2003 1:53 AM
To: Lucene Users List
Subject: Re: WebLucene 0.4 released: added full featured demo(dump data php
scripts and demo data in Chinese)

sorry, demo address is:
http://www.blogchina.com/weblucene/


Che, Dong
- Original Message -
From: Che Dong [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, December 17, 2003 1:33 AM
Subject: WebLucene 0.4 released: added full featured demo(dump data php scripts
and demo data in Chinese)


 http://sourceforge.net/projects/weblucene/
 
 WebLucene: 
 Lucene search engine XML interface, provided sax based indexing, indexing
sequence based result sorting and xml output with highlight support. 
 
 The key features:
 1 The bi-gram based CJK support: org/apache/lucene/analysis/cjk/CJKTokenizer,
The CJKTokenizer support Chinese Japanese and Korean with Westen language
simultaneously.
 
 2 DocID based result sorting: org/apache/lucene/search/IndexOrderSearcher
 
 3 xml output: com/chedong/weblucene/search/DOMSearcher
 
 4 sax based indexing: com/chedong/weblucene/index/SAXIndexer
 
 5 token based highlighter: 
 reverse StopTokenzier:
 org/apache/lucene/anlysis/HighlightAnalyzer.java
   HighlightFilter.java
 with abstract:
 com/chedong/weblucene/search/WebluceneHighlighter
 
 6 A simplified query parser:
 google like syntax with term limit
 org/apache/lucene/queryParser/SimpleQueryParser
 modified from early version of Lucene :)
 
 7 Add full featured demo (including dump script and sample data) runs on:
http://www.blogchina.com/weblucene/
 
 Regards
 
 
 Che Dong
 http://www.chedong.com/tech/weblucene.html
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Web Lucene Question.

2003-12-13 Thread Tun Lin
Hi,

I have tried to type the following at Windows command line at weblucene
directory:

ant build

Everything seems to work fine except the following error:

java.lang.InstantiationException: org.apache.tools.ant.Main
at java.lang.Class.newInstance0(Class.java:293)
at java.lang.Class.newInstance(Class.java:261)
at org.apache.tools.ant.launch.Launcher.run(Launcher.java:214)
at org.apache.tools.ant.launch.Launcher.main(Launcher.java:90)

I have set the necessary classpath but still the error mentioned above.

Can anyone help?


RE: Web Lucene Question.

2003-12-13 Thread Tun Lin
I am using the beta version.

When I typed the command:
ant -version

I have the following:
Apache Ant version 1.6beta3 compiled on December 5 2003

I am downloading the previous to be if there is improvements? 

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Sunday, December 14, 2003 12:42 AM
To: Lucene Users List
Subject: Re: Web Lucene Question.

On Saturday, December 13, 2003, at 11:20  AM, Tun Lin wrote:
 Hi,

 I have tried to type the following at Windows command line at 
 weblucene
 directory:

 ant build

 Everything seems to work fine except the following error:

Everything works fine but it fails miserably?!  :)


 java.lang.InstantiationException: org.apache.tools.ant.Main

This says Ant did not even launch.  I'm not sure which weblucene you mean here
- the built-in demo?

My guess is your Ant installation has issues.  What does ant -version 
tell you?  How about ant -diagnostics?  And finally ant -projecthelp
-verbose

Erik - aka Mr. Ant


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Web Lucene classes.

2003-12-13 Thread Tun Lin
Hi,

When I downloaded the web lucene source classes and all, I did not see the
classes directory at all as instructed in install.txt. 

Anyone knows how to get the whole package of classes for web lucene as in a jar
file?

When I type the command:
Java IndexRunner

I get the following message:
Exception in thread main java.lang.NoClassDefFoundError: IndexRunner

I have set the classpath for Web Lucene as in the path to the class files but it
still does not work.

Please advise.



XMLIndexingDemo.

2003-12-04 Thread Tun Lin
Hi,

I have tried the XMLIndexingDemo. It only supports indexing one xml file at a
time and delete the old one. Also, I customerInfo tag can have only 1 name. Is
there an open source that supports 1 customerInfo tag with many name? 


RE: XMLIndexingDemo.

2003-12-04 Thread Tun Lin
Or supports all xml files in that particular directory? 

-Original Message-
From: Tun Lin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, December 04, 2003 6:27 PM
To: Lucene user list
Subject: XMLIndexingDemo.

Hi,

I have tried the XMLIndexingDemo. It only supports indexing one xml file at a
time and delete the old one. Also, I customerInfo tag can have only 1 name. Is
there an open source that supports 1 customerInfo tag with many name? 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: New Lucene-powered Website

2003-12-02 Thread Tun Lin
Hi,

I am very keen on using the New luceneweb. Has anyone managed to run luceneweb
successfully on Windows? 

The instructions in luceneweb seems to support unix more than windows. 

Anyone has the install instructions for windows to run luceneweb? I cannot even
see the first page when I start tomcat though I have the weblucene in the
webapps directory.

Can anyone help? Please.

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2003 8:35 PM
To: Lucene Users List
Subject: Re: New Lucene-powered Website

Could you add a Lucene logo somewhere on your search results, as noted
here:
http://jakarta.apache.org/lucene/docs/powered.html ?

Thanks!
Otis


--- Ulrich Mayring [EMAIL PROTECTED] wrote:
 Hello,
 
 we (DENIC) are the world's second largest domain registry (.de-zone 
 has almost 6.9 million domains) and are using Lucene to index and 
 search our website in a high-traffic scenario. Most of our web pages 
 are available in English in addition to our native language German. If 
 you want to try our Lucene-based search engine, please start here:
 
 http://www.denic.de/en/special/index.jsp
 
 Use the input field on the page to search our website. Don't use the 
 input field at the top right, that is only for searching domains in 
 our domain database, it has nothing to do with Lucene.
 
 The indexes for German and English are seperate, so you should find 
 only English pages from that page.
 
 A somewhat interesting feature is the summarizer, on the results page
 
 you'll get a short summary of the page. These are not hand-written 
 blurbs, rather they are generated automatically from the HTML pages at 
 indexing time. I'd be especially interested in improvement suggestions 
 in this area.
 
 Naturally, the automatically generated texts don't have the same 
 quality as hand-written ones. But they're better than nothing and in 
 my eyes more useful than Google-style excerpts. How many times has it 
 happened to you that the Google excerpt doesn't really tell you 
 anything, because it's totally out of context? Summaries tell you what 
 the whole page is about, irregardless of the context within which your 
 search terms may
 
 appear. After reading the summary you should (hopefully) be able to 
 decide whether the page contains the info you're looking for.
 Comments
 welcome!
 
 We're using the snowball stemmers/analyzers for German and English, 
 custom stopword lists and the HTML parser from the Sourceforge 
 htmlparser project. Apart from that it's vanilla Lucene.
 
 cheers,
 
 Ulrich
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SearchBlox J2EE Search Component Version 1.1 released

2003-12-02 Thread Tun Lin
Hi,

Does it support xml?  

-Original Message-
From: Tate Avery [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2003 11:45 PM
To: Lucene Users List
Subject: RE: SearchBlox J2EE Search Component Version 1.1 released


If you buy it, apparently:
http://www.searchblox.com/buy.html



-Original Message-
From: Tun Lin [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 02, 2003 10:43 AM
To: 'Lucene Users List'; [EMAIL PROTECTED]
Subject: RE: SearchBlox J2EE Search Component Version 1.1 released


Hi,

Just a feedback.

SearchBlox can only search for html files. Will Searchblox support pdf, xml and
word documents in future? It will be perfect if it can support all document
types mentioned above.

-Original Message-
From: Robert Selvaraj [mailto:[EMAIL PROTECTED]
Sent: Tuesday, December 02, 2003 10:42 PM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: SearchBlox J2EE Search Component Version 1.1 released

SearchBlox is a J2EE search component that enables you to add search
functionality to your applications, intranets or portals in a matter of minutes.
SearchBlox uses Lucene Search API and features integrated HTTP and File System
crawlers, support for different document formats, support for indexing and
searching content in 15 languages and customizable search results, all
controlled from a browser-based Admin Console.


Main features in this update:
=
- Asian language support. SearchBlox now supports Japanese, Chinese Simplified,
Chinese Traditional and Korean language content.
- Performance enhancements to search
- Improved Hit Highlighting

SearchBlox is available as a Web Archive (WAR) and is deployable on any Servlet
2.3/JSP 1.2 compliant server. SearchBlox Getting-Started Guides are available
for the following servers:

JBoss - http://www.searchblox.com/gettingstarted_jboss.html
Jetty - http://www.searchblox.com/gettingstarted_jetty.html
JRun - http://www.searchblox.com/gettingstarted_jrun.html
Pramati - http://www.searchblox.com/gettingstarted_pramati.html
Resin - http://www.searchblox.com/gettingstarted_resin.html
Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html
Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html
Websphere - http://www.searchblox.com/gettingstarted_websphere.html


The SearchBlox FREE Edition is available free of charge and can index up to 1000
HTML documents.

The software can be downloaded from http://www.searchblox.com



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: New Lucene-powered Website (TO Tun Lin)

2003-12-02 Thread Tun Lin
Hi,

It's ok. Take your time. :-) 

-Original Message-
From: lhelper [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, December 03, 2003 9:29 AM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: New Lucene-powered Website (TO Tun Lin)

 Anyone has the install instructions for windows to run luceneweb? I 
 cannot even see the first page when I start tomcat though I have the 
 weblucene in the webapps directory.
 
 Can anyone help? Please.
 
it's a bug with the tar ball of weblucene, we'll fix the bug asap, and some
little index will be added into the tar ball.
please be patient!

Good Luck!



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: SearchBlox J2EE Search Component Version 1.1 released

2003-12-02 Thread Tun Lin
Hi,

Just a feedback.

SearchBlox can only search for html files. Will Searchblox support pdf, xml and
word documents in future? It will be perfect if it can support all document
types mentioned above.

-Original Message-
From: Robert Selvaraj [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, December 02, 2003 10:42 PM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: SearchBlox J2EE Search Component Version 1.1 released

SearchBlox is a J2EE search component that enables you to add search
functionality to your applications, intranets or portals in a matter of minutes.
SearchBlox uses Lucene Search API and features integrated HTTP and File System
crawlers, support for different document formats, support for indexing and
searching content in 15 languages and customizable search results, all
controlled from a browser-based Admin Console.


Main features in this update:
=
- Asian language support. SearchBlox now supports Japanese, Chinese Simplified,
Chinese Traditional and Korean language content.
- Performance enhancements to search
- Improved Hit Highlighting

SearchBlox is available as a Web Archive (WAR) and is deployable on any Servlet
2.3/JSP 1.2 compliant server. SearchBlox Getting-Started Guides are available
for the following servers:

JBoss - http://www.searchblox.com/gettingstarted_jboss.html
Jetty - http://www.searchblox.com/gettingstarted_jetty.html
JRun - http://www.searchblox.com/gettingstarted_jrun.html
Pramati - http://www.searchblox.com/gettingstarted_pramati.html
Resin - http://www.searchblox.com/gettingstarted_resin.html
Tomcat - http://www.searchblox.com/gettingstarted_tomcat.html
Weblogic - http://www.searchblox.com/gettingstarted_weblogic.html
Websphere - http://www.searchblox.com/gettingstarted_websphere.html


The SearchBlox FREE Edition is available free of charge and can index up to 1000
HTML documents.

The software can be downloaded from http://www.searchblox.com



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Translation.

2003-12-02 Thread Tun Lin
Hi,

Can anyone translate this text for me? I cannot understand the instructions.
Please help!

Thanks.

===
 
||
| LUCY 1.1   |   readme.txtUltimo aggiornamento: 18/03/2003
||





STRUTTURA


Lucy 1.1- Lucene 1.2
- HTMLParser 1.2
- PdfBox 0.5.6
- wvWare 0.7.2-3
- xlhtml 0.4.9
- antiword 0.33
- Xpdf 2.01 
- Snowball 0.1
- NGramJ 01.12.11
- it.corila.lucy   - IndexAll.java
- SearchIndex.java
- HTMLDocument.java
- PDFDocument.java
- ExternalParser.java
- ItalianStemFilter.java
- EnglishStemFilter.java
- ApostropheFilter.java
- IndexAnalyzer.java
- SearchAnalyzer.java
- LanguageCategorizer
- NgramjCategorizer.java





DESCRIZIONE

Lucy e' in grado di indicizzare tutti i files con estensione txt, html, pdf,
doc, ppt, xls contenuti in una cartella base e nelle sue sottocartelle. Consente
ricerche da linea di comando DOS oppure mediante interfaccia web. Gestisce testi
in Italiano e Inglese con procedure di elaborazione lessicale specifiche.





SISTEMI OPERATIVI SUPPORTATI

Windows 98 / Windows 2000 / Windows XP





REQUISITI DI SISTEMA

Nessuno tranne i permessi necessari alla scrittura di files su una cartella del
sistema
Per utilizzare il modulo di ricerca con interfaccia web e' necessario disporre
di Apache Tomcat, versione 3 o 4.




INSTALLAZIONE

Lanciare la procedura automatica di installazione Lucy1.1.exe, oppure
scompattare
il file Lucy1.1.zip in una cartella (NB: il percorso non deve contenere spazi).
L'applicazione utilizza di default una propria java virtual machine. E'
possibile utilizzarne un'altra gia' installata nel sistema modificando il valore
della variabile MYJAVAPATH nel file jvm.bat
In questo caso la cartella jre puo' essere eliminata per ridurre l'occupazione
di spazio su disco di circa 40 MBytes.




CONFIGURAZIONE

Modificare i valori delle variabili contenute nel file properties.txt, nella
cartella base dell'applicazione:


lucy.path: cartella in cui si e' installata l'applicazione 

log.files.dir: cartella in cui verranno creati i files di log

del.temp.files: eliminazione dei files temporanei alla fine dell'indicizzazione
(yes/no)

doc.parser: parser da utilizzare per i files .doc (antiword/wvware)

pdf.parser: parser da utilizzare per i files .pdf (xpdf/pdfbox)

index.dir: cartella in cui verranno memorizzati gli indici

index.name: nome dell'indice che deve essere creato

indexing.folder: cartella che deve essere indicizzata


IMPORTANTE: tutti i percorsi devono essere indicati utilizzando come separatori
di directory due barre rovesciate (\\) anziche' una barra singola




MODALITA' DI UTILIZZO

I tre files batch nella cartella base dell'applicazione sono attivabili
direttamente da Windows con doppio click.

indicizza.bat  crea un indice

aggiorna.bat   modifica un indice

cerca.bat  effettua ricerche su un
indice

Tutti i parametri necessari (nome e localizzazione dell'indice, percorso della
cartella da indicizzare) vanno specificati a priori nel file properties.txt


E' possibile in alternativa utilizzare le procedure da riga di comando dos,
sempre con la modifica preventiva del file properties.txt
In questo caso inoltre, mediante la sintassi:

cerca percorso-indice

si possono effettuare ricerche su altri indici creati in precedenza, senza
modificare il file properties.txt




NOTE SULL'UTILIZZO DEI PARSERS

I valori di default impostati per i parsers sono quelli consigliati per la prima
esecuzione dell'indicizzazione. In un secondo momento e' possibile modificarli
ai valori alternativi e procedere a un aggiornamento dell'indice. In questo modo
i documenti che non sono stati indicizzati per errori di parsing vengono
processati anche dai due parsers alternativi.

Qualora il processo di parsing portasse ad errori di sistema che costringessero
a interrompere il processo di indicizzazione, l'utente potra' riprendere
l'indicizzazione da dove si e' interrotta utilizzando la procedura di
aggiornamento, avendo cura di rimuovere - prima di lanciarla - il file
write.lock dalla cartella che contiene i file dell'indice.





UTILIZZO COME WEB APPLICATION PER RICERCHE TESTUALI

Se nel sistema e' installato il motore per servlet e jsp Apache Tomcat, e'
possibile effettuare le ricerche sugli indici tramite un'interfaccia web. Se e'

RE: WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support.

2003-12-01 Thread Tun Lin
Hi Che Dong,

The install.txt that you have in the package, the part on preparing the
environment, can you include the setup for windows because I think what you
wrote in install.txt is for UNIX setup? I still cannot get my system working.
Please help.

Thanks. 

-Original Message-
From: Che Dong [mailto:[EMAIL PROTECTED] 
Sent: Monday, December 01, 2003 4:21 PM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: WebLucene 0.3 release:support CJK, use sax based indexing, docID
based result sorting and xml format output with highlighting support.

build..properties.default 

# -
# WebLucene  BUILD  PROPERTIES
# -
jsdk_jar=/usr/local/resin/lib/jsdk23.jar

# Home directory of JavaCC
javacc.home = /usr/java/javacc/bin

# modify following on Windows
# jsdk_jar=c:\\resin\\lib\\jsdk23.jar
# javacc.home = c:\\java\\javacc\\bin


javacc.zip.dir = ${javacc.home}/lib
javacc.zip = ${javacc.zip.dir}/JavaCC.zip

Che, Dong
- Original Message -
From: Tun Lin [EMAIL PROTECTED]
To: 'Lucene Developers List' [EMAIL PROTECTED]; 'Lucene Users
List' [EMAIL PROTECTED]
Sent: Monday, December 01, 2003 11:34 AM
Subject: RE: WebLucene 0.3 release:support CJK, use sax based indexing, docID
based result sorting and xml format output with highlighting support.


 Hi,
 
 Do you have the install.txt for windows XP setup of the WebLucene? It seems
that
 the install.txt is only for UNIX setup.
 
 Thanks.  
 
 -Original Message-
 From: Che Dong [mailto:[EMAIL PROTECTED] 
 Sent: Sunday, November 30, 2003 9:57 PM
 To: Lucene Developers List; Lucene Users List
 Subject: WebLucene 0.3 release:support CJK, use sax based indexing, docID
based
 result sorting and xml format output with highlighting support.
 
 http://sourceforge.net/projects/weblucene/
 
 WebLucene: 
 Lucene search engine XML interface, provided sax based indexing, indexing
 sequence based result sorting and xml output with highlight support.The
 CJKTokenizer support Chinese Japanese and Korean with Westen language
 simultaneously.
 
 The key features:
 1 The bi-gram based CJK support: org/apache/lucene/analysis/cjk/CJKTokenizer
 
 2 docID based result sorting: org/apache/lucene/search/IndexOrderSearcher
 
 3 xml output: com/chedong/weblucene/search/DOMSearcher
 
 4 sax based indexing: com/chedong/weblucene/index/SAXIndexer
 
 5 token based highlighter: 
 reverse StopTokenzier:
 org/apache/lucene/anlysis/HighlightAnalyzer.java
   HighlightFilter.java
 with abstract:
 com/chedong/weblucene/search/WebluceneHighlighter
 
 6 A simplified query parser:
 google like syntax with term limit
 org/apache/lucene/queryParser/SimpleQueryParser
 modified from early version of Lucene :)
 
 Regards
 
 Che, Dong
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: WebLucene 0.3 release:support CJK, use sax based indexing, docID based result sorting and xml format output with highlighting support.

2003-11-30 Thread Tun Lin
Hi,

Do you have the install.txt for windows XP setup of the WebLucene? It seems that
the install.txt is only for UNIX setup.

Thanks.  

-Original Message-
From: Che Dong [mailto:[EMAIL PROTECTED] 
Sent: Sunday, November 30, 2003 9:57 PM
To: Lucene Developers List; Lucene Users List
Subject: WebLucene 0.3 release:support CJK, use sax based indexing, docID based
result sorting and xml format output with highlighting support.

http://sourceforge.net/projects/weblucene/

WebLucene: 
Lucene search engine XML interface, provided sax based indexing, indexing
sequence based result sorting and xml output with highlight support.The
CJKTokenizer support Chinese Japanese and Korean with Westen language
simultaneously.

The key features:
1 The bi-gram based CJK support: org/apache/lucene/analysis/cjk/CJKTokenizer

2 docID based result sorting: org/apache/lucene/search/IndexOrderSearcher

3 xml output: com/chedong/weblucene/search/DOMSearcher

4 sax based indexing: com/chedong/weblucene/index/SAXIndexer

5 token based highlighter: 
reverse StopTokenzier:
org/apache/lucene/anlysis/HighlightAnalyzer.java
  HighlightFilter.java
with abstract:
com/chedong/weblucene/search/WebluceneHighlighter

6 A simplified query parser:
google like syntax with term limit
org/apache/lucene/queryParser/SimpleQueryParser
modified from early version of Lucene :)

Regards

Che, Dong



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



FileDocument.java

2003-11-28 Thread Tun Lin
Hi Lucene experts,
Can you help on this?
I have included the following code in FileDocument to print out the summary but
I have funny output like:
The result after searching, the summary is displayed as below:
ÐÏࡱáþÿ
UWþÿÿÿTÿ




ÿ
FileInputStream is = new FileInputStream(f);
try
{
Reader reader = new BufferedReader(new InputStreamReader(is));
char [] buf = new char[512];
reader.read(buf);

String a = new String(buf, 0, 510);
doc.add(Field.Text(contents, reader));
doc.add(Field.UnIndexed(summary, a ) );// return the document
}catch (IOException e)
{
e.printStackTrace();
}





RE: log4j.properties

2003-11-26 Thread Tun Lin
I have integrated Lucene and PDFBox and tried the following command to index
files 

java -Dlog4j.configuration=log4j.xml org.pdfbox.searchengine.lucene.IndexFiles
-create -index c:\\index .. 

But I have the following error message:
log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse
r).
log4j:WARN Please initialize the log4j system properly.

Anyone can help?

-Original Message-
From: Erik Hatcher [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 26, 2003 5:19 PM
To: Lucene Users List
Subject: Re: log4j.properties

What does this have to do with Lucene?


On Wednesday, November 26, 2003, at 01:04  AM, Tun Lin wrote:

 I have created the following log4j.properties and put it in your 
 classpath but it still has that error. Anyone can help?

 log4j.rootCategory=stdout

 log4j.appender.stdout=org.apache.log4j.ConsoleAppender
 log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
 log4j.appender.stdout.layout.ConversionPattern=%d %c - %m%n



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene refresh index function (incremental indexing).

2003-11-25 Thread Tun Lin
When I integrate with PDFBox, I cannot update, delete or change the filename
anymore. If I did any of the above, I will get a message: Lock obtain timed out.

Anyone can help? 

-Original Message-
From: Pleasant, Tracy [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 11:42 PM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: RE: Lucene refresh index function (incremental indexing).

I was able to get PDFBox to work with my JSP webpages. 

I think you will have to in a way write your own code to do the PDF files (while
still calling the Lucene functions)

 doc = LucenePDFDocument.getDocument(file);


-Original Message-
From: Tun Lin [mailto:[EMAIL PROTECTED]
Sent: Monday, November 24, 2003 11:07 PM
To: 'Lucene Users List'
Subject: RE: Lucene refresh index function (incremental indexing).


Does it support indexing the contents of pdf files? I have found one project
called PDFBox that can be integrated with Lucene to search inside of the pdf
files. Currently, Lucene can only search for the pdf filename. I tried with
PDFBox and I got the following message when I typed the command: java
org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 

log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse
r).
log4j:WARN Please initialize the log4j system properly.

Can anyone advise?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED]
Sent: Tuesday, November 25, 2003 5:01 AM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Tun Lin wrote:
 These are the steps I took:
 
 1) I compile all the files in a particular directory using the
command: 
 java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
 , putting all the indexed files in c:\\index.
 2) Everytime, I added an additional file in that directory. I need to 
 reindex/recompile that directory to generate the indexes again. As the

 directory gets larger, the indexing takes a longer time.
 
 My question is how do I generate the indexes automatically everytime a

 new document is added in that directory without me recompiling
everytime
manually?

To update, try removing the '-create' from the command line.  The demo code
supports incremental updates.  It will re-scan the directory and figure out
which files have changed, what new files have appeared and which previously
existing files have been removed.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene refresh index function (incremental indexing).

2003-11-24 Thread Tun Lin
Can you elaborate on you don't compile directory using Lucene? 

These are the steps I took:

1) I compile all the files in a particular directory using the command: 
java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
, putting all the indexed files in c:\\index.
2) Everytime, I added an additional file in that directory. I need to
reindex/recompile that directory to generate the indexes again. As the directory
gets larger, the indexing takes a longer time.

My question is how do I generate the indexes automatically everytime a new
document is added in that directory without me recompiling everytime manually? 

How does Lucene detect new documents to be added to the indexes?

I saw the codes but the indexes are only generated for that directory only after
I use the command mentioned above.

Is there a code or built in function that allows Lucene to detect and build the
indexes on its own?

-Original Message-
From: Victor Hadianto [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 24, 2003 1:07 PM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Ah .. ic,

But you don't need to do that even if you can do it. Lucene does incremental
indexing. So you would create a new program to add your document manually using
IndexWriter, not blatting the index and doing it again.

Seems like you just trying out Lucene, I suggest having a look in the source
code of IndexHTML and you will see that there is no magic there, it just
traverse the directory and index the HTML file one by one using IndexWriter.

BTW you don't compile directory using Lucene .. :)

/victor

- Original Message -
From: Tun Lin [EMAIL PROTECTED]
To: 'Lucene Users List' [EMAIL PROTECTED]
Sent: Monday, November 24, 2003 3:45 PM
Subject: RE: Lucene refresh index function (incremental indexing).


 Hi,

 Thanks for your reply.

 What if I add a new document into the directory that I have compiled using
the
 following command: java org.apache.lucene.demo.IndexHTML -create -index
 {index-dir} ..

 Will it automatically reindex like I did manually to reflect the new
document
 being added in that particular directory?

 Please advise.

 -Original Message-
 From: Victor Hadianto [mailto:[EMAIL PROTECTED]
 Sent: Monday, November 24, 2003 12:36 PM
 To: Lucene Users List
 Subject: Re: Lucene refresh index function (incremental indexing).

  I delete the old ones and add them again manually. But how do I
  reindex
 the
  documents automatically without doing it manually?

 You don't need to reindex the documents again. Lucene does incremental
indexing.
 Just add your document to the index and that's it. You need to create a
new
 IndexSearcher to reflect the new changes into the your search result.

 /victor


 
  -Original Message-
  From: Dror Matalon [mailto:[EMAIL PROTECTED]
  Sent: Sunday, November 23, 2003 4:44 AM
  To: Lucene Users List
  Subject: Re: Lucene refresh index function (incremental indexing).
 
  Hi,
 
  It's not clear what you mean when you say refresh indexes  or
 re-compiling.
  If you're adding new documents just use the add() method. If you are
 replacing
  documents, you need to first delete the old ones and then add them
again.
 Look
  at the mailing list archive for this, since it's been discussed
  several
 times.
 
 
  On Sun, Nov 23, 2003 at 12:22:40AM +0800, Tun Lin wrote:
   Hi,
  
   I am new here.
  
   May I know how to refresh indexes in Lucene immediately after new
   documents have been added without re-compiling again to reindex the
   documents in that particular directory?
  
   Thanks.
 
  --
  Dror Matalon
  Zapatec Inc
  1700 MLK Way
  Berkeley, CA 94709
  http://www.fastbuzz.com
  http://www.zapatec.com
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 
 
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene version 1.3.

2003-11-24 Thread Tun Lin
I am now using 1.3RC2.

-Original Message-
From: Scott Smith [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 4:04 AM
To: 'Lucene Users List'; '[EMAIL PROTECTED]'
Subject: RE: Lucene version 1.3.

If you had to be production in January, would you be using 1.3RC2 or 1.2?

-Original Message-
From: Otis Gospodnetic [mailto:[EMAIL PROTECTED]
Sent: Monday, November 24, 2003 4:03 AM
To: Lucene Users List; [EMAIL PROTECTED]
Subject: Re: Lucene version 1.3.


Sorry, no firm date.  However, 1.3 RC2 is pretty solid, so I suggest you
just use that until 1.3 final is out.

Otis

--- Tun Lin [EMAIL PROTECTED] wrote:
 Hi,
 
 Anyone knows when the full version of Lucene version 1.3 will be 
 released?
 
 Please advise.
 
 Thanks.
 


__
Do you Yahoo!?
Free Pop-Up Blocker - Get it now
http://companion.yahoo.com/

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene refresh index function (incremental indexing).

2003-11-24 Thread Tun Lin
 
Will the final version 1.3 include an application that does the incremental
updates automatically?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 5:01 AM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Tun Lin wrote:
 These are the steps I took:
 
 1) I compile all the files in a particular directory using the command: 
 java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
 , putting all the indexed files in c:\\index.
 2) Everytime, I added an additional file in that directory. I need to 
 reindex/recompile that directory to generate the indexes again. As the 
 directory gets larger, the indexing takes a longer time.
 
 My question is how do I generate the indexes automatically everytime a 
 new document is added in that directory without me recompiling everytime
manually?

To update, try removing the '-create' from the command line.  The demo code
supports incremental updates.  It will re-scan the directory and figure out
which files have changed, what new files have appeared and which previously
existing files have been removed.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: Lucene refresh index function (incremental indexing).

2003-11-24 Thread Tun Lin
Does it support indexing the contents of pdf files? I have found one project
called PDFBox that can be integrated with Lucene to search inside of the pdf
files. Currently, Lucene can only search for the pdf filename. I tried with
PDFBox and I got the following message when I typed the command: java
org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 

log4j:WARN No appenders could be found for logger (org.pdfbox.pdfparser.PDFParse
r).
log4j:WARN Please initialize the log4j system properly.

Can anyone advise?

-Original Message-
From: Doug Cutting [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 25, 2003 5:01 AM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

Tun Lin wrote:
 These are the steps I took:
 
 1) I compile all the files in a particular directory using the command: 
 java org.apache.lucene.demo.IndexHTML -create -index c:\\index .. 
 , putting all the indexed files in c:\\index.
 2) Everytime, I added an additional file in that directory. I need to 
 reindex/recompile that directory to generate the indexes again. As the 
 directory gets larger, the indexing takes a longer time.
 
 My question is how do I generate the indexes automatically everytime a 
 new document is added in that directory without me recompiling everytime
manually?

To update, try removing the '-create' from the command line.  The demo code
supports incremental updates.  It will re-scan the directory and figure out
which files have changed, what new files have appeared and which previously
existing files have been removed.

Doug


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Lucene version 1.3.

2003-11-23 Thread Tun Lin
Hi,

Anyone knows when the full version of Lucene version 1.3 will be released? 

Please advise.

Thanks.


RE: Lucene refresh index function (incremental indexing).

2003-11-23 Thread Tun Lin
Hi,

Thanks for your reply.

What if I add a new document into the directory that I have compiled using the
following command: java org.apache.lucene.demo.IndexHTML -create -index
{index-dir} ..

Will it automatically reindex like I did manually to reflect the new document
being added in that particular directory?

Please advise.

-Original Message-
From: Victor Hadianto [mailto:[EMAIL PROTECTED] 
Sent: Monday, November 24, 2003 12:36 PM
To: Lucene Users List
Subject: Re: Lucene refresh index function (incremental indexing).

 I delete the old ones and add them again manually. But how do I 
 reindex
the
 documents automatically without doing it manually?

You don't need to reindex the documents again. Lucene does incremental indexing.
Just add your document to the index and that's it. You need to create a new
IndexSearcher to reflect the new changes into the your search result.

/victor



 -Original Message-
 From: Dror Matalon [mailto:[EMAIL PROTECTED]
 Sent: Sunday, November 23, 2003 4:44 AM
 To: Lucene Users List
 Subject: Re: Lucene refresh index function (incremental indexing).

 Hi,

 It's not clear what you mean when you say refresh indexes  or
re-compiling.
 If you're adding new documents just use the add() method. If you are
replacing
 documents, you need to first delete the old ones and then add them again.
Look
 at the mailing list archive for this, since it's been discussed 
 several
times.


 On Sun, Nov 23, 2003 at 12:22:40AM +0800, Tun Lin wrote:
  Hi,
 
  I am new here.
 
  May I know how to refresh indexes in Lucene immediately after new 
  documents have been added without re-compiling again to reindex the 
  documents in that particular directory?
 
  Thanks.

 --
 Dror Matalon
 Zapatec Inc
 1700 MLK Way
 Berkeley, CA 94709
 http://www.fastbuzz.com
 http://www.zapatec.com

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]