Zilverline Search Engine version 1.0-final released

2004-11-27 Thread Zilverline info
All,
I've just released Zilverline version 1.0.
New features include incremental indexing and scheduling of indexing
proces, as well as a few minor updates.
The source will be made available as well very soon.
Zilverline is protected by a Collaborative Source License. You can read
more on this type of licensing at http://www.zilverline.org
Zilverline is a search engine based on lucene that's ready to
roll, and can be simply dropped in a Servlet Engine. It runs out of the
box, and supports PDF, WORD, HTM, TXT, RTF and CHM, and can  index zip,
rar, and many other formats. Both on Windows and Linux.
Zilverline supports plugins. You can create your own extractors
for various file formats. I've provided Extractors for RTF, Text, PDF,
Word, and HTML.
Zilverline supports collections. A collection is a set of files and
directories in a directory. A collection can be indexed, and searched.
The results of the search can be retrieved from local disk or remotely,
if you run a webserver on your machine. Files inside zip, rar and chm
files are extracted, indexed and can be cached. The cache can be mapped
to sit behind your webserver as well.
It's also possible to specify your own handlers for archives. Say you
have a RAR archive, and you have a program on your system that can
extract the content from it, then you can specify that Zilverline should
use this program.
Please take look at http://www.zilverline.org, and have a swing at it.
cheers,
  Michael Franken


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Search PDF ???

2004-10-25 Thread Zilverline info
Hi Eric,
Try zilverline http://www.zilverline.org
Michael
Eric Chow wrote:
Hello,
1. Is it possibleto use Lucene to search PDF contents ?
2. Can it search Chinese contents PDF files ???
Eric
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Zilverline release candidate 1.0-rc7 available

2004-10-18 Thread Zilverline info
All,
I've just released a new candidate (*1.0-rc7*) New features include 
Highlighting and 'on-the-fly' extraction of archives.

Zilverline is a search engine based on lucene that's ready to
roll, and can be simply dropped in a Servlet Engine. It runs out of the
box, and supports PDF, WORD, HTM, TXT, RTF and
CHM, and can  index zip, rar, and many other formats.
Both on Windows and Linux.
Zilverline supports plugins. You can create your own extractors
for various file formats. I've provided Extractors for RTF, Text, PDF,
Word, and HTML.
Zilverline supports collections. A collection is a set of files and
directories in a directory. A collection can be indexed, and searched.
The results of the search can be retrieved from local disk or remotely,
if you run a webserver on your machine. Files inside zip, rar and chm
files are extracted, indexed and can be cached. The cache can be mapped
to sit behind your webserver as well.
It's also possible to specify your own handlers for archives. Say you
have a RAR archive, and you have a program on your system that can
extract the content from it, then you can specify that Zilverline should
use this program.
Please take look at http://www.zilverline.org, and have a swing at it.
cheers,
  Michael Franken


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Zilverline release candidate 1.0-rc6 available

2004-10-02 Thread Zilverline info
All,
I've just released a new candidate (*1.0-rc6*) New features include a 
command line indexer and support for Chinese and Cyrillic.

Zilverline is an free search engine based on lucene that's ready to
roll, and can be simply dropped in a Servlet Engine. It runs out of the
box, and supports PDF, WORD, HTM, TXT, RTF and
CHM, and can  index zip, rar, and many other formats.
Both on Windows and Linux.
Zilverline supports plugins. You can create your own extractors
for various file formats. I've provided Extractors for RTF, Text, PDF,
Word, and HTML.
Zilverline supports collections. A collection is a set of files and
directories in a directory. A collection can be indexed, and searched.
The results of the search can be retrieved from local disk or remotely,
if you run a webserver on your machine. Files inside zip, rar and chm
files are extracted, indexed and can be cached. The cache can be mapped
to sit behind your webserver as well.
It's also possible to specify your own handlers for archives. Say you
have a RAR archive, and you have a program on your system that can
extract the content from it, then you can specify that Zilverline should
use this program.
Please take look at http://www.zilverline.org, and have a swing at it.
cheers,
  Michael Franken

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


lucene 1.4 in maven repository

2004-08-25 Thread Zilverline info
Hi,
Can anyone tell me why there is no lucene 1.4 jar in the maven 
repository @ http://www.ibiblio.org/maven/lucene/jars/ ? Who makes them 
available? It would be very convenient to be able to get the latest 
version from there (or anywhere else)

regards,
 Michael Franken
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: searchhelp

2004-08-19 Thread Zilverline info
The PDF and WORD stuff has been done too: have a look at 
http://www.zilverline.org.

Michael Franken
Chandan Tamrakar wrote:
For PDF you need to extract a text from pdf files using pdfbox library  and
for word documents u can use apache POI api's . There are messages
posted on the  lucene list related to your queries. About database ,i guess
someone must have done it . :)
- Original Message - 
From: Santosh [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Thursday, August 19, 2004 3:58 PM
Subject: searchhelp

Hi,
I am using lucene search engine for my application.
i am able to search through the text files and htmls as specified by lucene
can you please clarify my doubts
1.can lucene search through pdfs and word documents? if yes then how?
2.can lucene search through database ? if yes then how?
thankyou
santosh
---SOFTPRO DISCLAIMER--
Information contained in this E-MAIL and any attachments are
confidential being  proprietary to SOFTPRO SYSTEMS  is 'privileged'
and 'confidential'.
If you are not an intended or authorised recipient of this E-MAIL or
have received it in error, You are notified that any use, copying or
dissemination  of the information contained in this E-MAIL in any
manner whatsoever is strictly prohibited. Please delete it immediately
and notify the sender by E-MAIL.
In such a case reading, reproducing, printing or further dissemination
of this E-MAIL is strictly prohibited and may be unlawful.
SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment
hereto is free from computer viruses or other defects.
The opinions expressed in this E-MAIL and any ATTACHEMENTS may be
those of the author and are not necessarily those of SOFTPRO SYSTEMS.


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Weighted queries

2004-08-06 Thread Zilverline info
Hi Eric,
I have implemented this in Zilverline. What I do is the following: 
subclass QueryParser and override getFieldQuery:

   protected Query getFieldQuery(String field, Analyzer analyzer, 
String queryText) throws ParseException {

   // for field that contain 'contents' add boostfactors for other 
terms specified in BoostFactor
   if (defaultField.equals(field)) {
   TokenStream source = analyzer.tokenStream(field, new 
StringReader(queryText));
   Vector v = new Vector();
   org.apache.lucene.analysis.Token t;
   while (true) {
   try {
   t = source.next();
   } catch (IOException e) {
   t = null;
   }
   if (t == null)
   break;
   v.addElement(t.termText());
   log.debug(field +  ,  + t.termText());
   }
   try {
   source.close();
   } catch (IOException e) { // ignore
   }

   if (v.size() == 0) {
   return null;
   }
   else {
   // create a new composed query
   BooleanQuery bq = new BooleanQuery();
   // get the static BoostFactors through non static getter
   BoostFactor bf = new BoostFactor();
   // For all boostfactors create a new PhraseQuery
   Iterator iter = bf.getFactors().entrySet().iterator();
   while (iter.hasNext()) {
   Map.Entry element = (Map.Entry) iter.next();
   String thisField = ((String) 
element.getKey()).toLowerCase();
   Float boost = (Float) element.getValue();
   PhraseQuery q = new PhraseQuery();
   // and add all the terms of the query
   for (int i = 0; i  v.size(); i++) {
   q.add(new Term(thisField, (String) v.elementAt(i)));
   }
   // boost the query
   q.setBoost(boost.floatValue());
   // and add it to the composed query
   bq.add(q, false, false);
   }
   log.debug(Query:  + bq);
   return bq;
   }
   } else {
   return super.getFieldQuery(field, analyzer, queryText);
   }
   }

Read the Boostfactors from an external source. Im using a object with a 
Hashmap. see Boostfactors @ www.zilverline.org

Cheers,
  Michael Franken
Eric Jain wrote:
Is it possible to expand a query such as
  foo bar
into
  (title:foo^4 OR abstract:foo^2 OR content:foo) AND
  (title:bar^4 OR abstract:bar^2 OR content:bar)
?
I can assign weights to individual fields when indexing, and could use 
the MultiFieldQueryParser - but it seems this parser can't be 
configured to use AND as default!

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Zilverline release candidate 1.0-rc4 available

2004-07-26 Thread Zilverline info
All,
I've just released a new candidate (*1.0-rc4*) New features
include Spanish GUI, RTF support, searching on date range,
customizable boosting factors, and configurable analyzers per 
collection. Zilverline now generates a MD5 Hash per file,
and prevents duplicate files from being added more than once.

Zilverline supports plugins. You can create your own extractors
for various file formats. I've provided Extractors for RTF, Text, PDF, 
Word, and HTML.

Zilverline supports collections. A collection is a set of files and 
directories in a directory. A collection can be indexed, and searched. 
The results of the search can be retrieved from local disk or remotely, 
if you run a webserver on your machine. Files inside zip, rar and chm 
files are extracted, indexed and can be cached. The cache can be mapped 
to sit behind your webserver as well.

It's also possible to specify your own handlers for archives. Say you
have a RAR archive, and you have a program on your system that can
extract the content from it, then you can specify that Zilverline should
use this program.
Zilverline is an free search engine based on lucene that's ready to
roll, and can be simply dropped in a Servlet Engine. It runs out of the 
box, and supports PDF, WORD, HTM, TXT, and
CHM, and can  index zip, rar, and many other formats.
Both on Windows and Linux.

Please take look at http://www.zilverline.org, and have a swing at it.
cheers,
  Michael Franken

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: PDFBox problem.

2004-07-23 Thread Zilverline info
Natarajan.T wrote:
FYI,
I am using PDFBox.jar  to Convert PDF to Text.
Problem is in the runtime its printing lot of object messages
How can I avoid this one??? How can I go with this one. 

import java.io.InputStream;
import java.io.BufferedWriter;
import java.io.IOException;
import org.pdfbox.util.PDFTextStripper;
import org.pdfbox.pdfparser.PDFParser;
import org.pdfbox.pdmodel.PDDocument;
import org.pdfbox.pdmodel.PDDocumentInformation;
/**
* @author natarajant
*
* TODO To change the template for this generated type comment go to
* Window - Preferences - Java - Code Generation - Code and Comments  */
public class PDFConverter extends DocumentConverter{
 public PDFConverter() {
 }
  /**
   * This method will construct the Lucene document object from the
   * given information by extracting the text from PDF file.
   *
   * @param  reader and writer - InputStream
and BufferedWriter
   * @return true or false i.e. extract the
text or not
   */
   public boolean extractText(InputStream  reader, BufferedWriter
writer) throws IOException{
PDFParser parser = null;
PDDocument pdDoc = null;
PDFTextStripper stripper = null;
String pdftext = ;
String pdftitle = ;
try {
parser = new PDFParser(reader);
  parser.parse();
  pdDoc = parser.getPDDocument();
  stripper = new PDFTextStripper();
  pdftext = stripper.getText(pdDoc);
  writer.write(pdftext + );
PDDocumentInformation info =
pdDoc.getDocumentInformation();
  pdftitle = info.getTitle();
  } catch(Exception err) {
  System.out.println(err.getMessage());
 

change this to
return false;
   }
   writer.close();
   return true;
  }
 

finally { // close all open resources }

}
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Zilverline info
Hi Ian,
Depending on what you want to do, you could also follow the installation 
instructions on http://www.zilverline.org. It describes how to install 
zilverline, but the same goes for the lucene war.

Hope this helps,
  Michael Franken
Ian McDonnell wrote:
Also another silly question, do i need to setup a war on the server?
--- Ian McDonnell [EMAIL PROTECTED] wrote:
Well when i extracted it, it created the org/apache/lucene directories in the 
public_html directory. When i try to compile any of the source it just throws numerous 
errors. I've got the classpath set to web-inf/classes.
Have i extraced it to the wrong directory?
--- Erik Hatcher [EMAIL PROTECTED] wrote:
On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote:
 

Is the package information and import paths ready to deploy on Tomcat 
server. I tried extracting lucene on the server, but when i compile 
files, it just throws numerous no class definition errors and errors 
relating to the package.
   

Huh?  Lucene certainly deploys just fine in Tomcat web applications (in 
a WAR under WEB-INF/lib).  Could you elaborate on what you mean here?

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

_
Sign up for FREE email from SpinnersCity Online Dance Magazine  Vortal at 
http://www.spinnerscity.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

_
Sign up for FREE email from SpinnersCity Online Dance Magazine  Vortal at 
http://www.spinnerscity.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Extracting Lucene onto Tomcat

2004-07-21 Thread Zilverline info
Hi Ian,
You don't extract war files, or jar files. To deploy a web application 
that comes as a war file, you just have to drop it into 
webserver/servlet engine. So just: copy lucene.war 
tomcatserver/webapps. That's it. I advice you to read some of the 
documentation on the Tomcat website on deploying webapplications, or if 
you're really serious buy this book: 
http://www.wiley.com/WileyCDA/WileyTitle/productCd-0471446629.html

regards,
  Michael
Ian McDonnell wrote:
I was looking at your instructions there, but couldnt really figure out what you mean. 
Can i manually add the extracted directories onto the tomcat server, if so what should 
my root directory be?
Say for example the extracted directories org/apache/lucene/
Should i have that as public_html/WEB-INF/org/apache/lucene?
Ian
--- Zilverline info [EMAIL PROTECTED] wrote:
Hi Ian,
Depending on what you want to do, you could also follow the installation 
instructions on http://www.zilverline.org. It describes how to install 
zilverline, but the same goes for the lucene war.

Hope this helps,
  Michael Franken
Ian McDonnell wrote:
 

Also another silly question, do i need to setup a war on the server?
--- Ian McDonnell [EMAIL PROTECTED] wrote:
Well when i extracted it, it created the org/apache/lucene directories in the 
public_html directory. When i try to compile any of the source it just throws numerous 
errors. I've got the classpath set to web-inf/classes.
Have i extraced it to the wrong directory?
--- Erik Hatcher [EMAIL PROTECTED] wrote:
On Jul 21, 2004, at 8:10 AM, Ian McDonnell wrote:
   

Is the package information and import paths ready to deploy on Tomcat 
server. I tried extracting lucene on the server, but when i compile 
files, it just throws numerous no class definition errors and errors 
relating to the package.
  

 

Huh?  Lucene certainly deploys just fine in Tomcat web applications (in 
a WAR under WEB-INF/lib).  Could you elaborate on what you mean here?

Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

_
Sign up for FREE email from SpinnersCity Online Dance Magazine  Vortal at 
http://www.spinnerscity.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

_
Sign up for FREE email from SpinnersCity Online Dance Magazine  Vortal at 
http://www.spinnerscity.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

   


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

_
Sign up for FREE email from SpinnersCity Online Dance Magazine  Vortal at 
http://www.spinnerscity.com
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Anyone use MultiSearcher class

2004-07-12 Thread Zilverline info
Hi Don,
Yes, I'm using the MultiSearcher (in Zilverline), and have seen no 
serious performance issues with it. The app performs well with multiple 
indexes, it's responds so quick (with 100k+ documents) that I haven't 
even taken the time to measure the difference to a single index search.
Michael Franken

Don Vaillancourt wrote:
Hello,
Has anyone used the Multisearcher class?
I have noticed that searching two indexes using this MultiSearcher 
class takes 8 times longer than searching only one index.  I could 
understand if it took 3 to 4 times longer to search due to sorting the 
two search results and stuff, but why 8 times longer.

Is there some optimization that can be done to hasten the search?  Or 
should I just write my own MultiSearcher.  The problem though is that 
there is no way for me to create my own Hits object (no methods are 
available and the class is final).

Anyone have any clue?
Thanks
Don Vaillancourt
Director of Software Development
WEB IMPACT INC.
416-815-2000 ext. 245
email: [EMAIL PROTECTED]
web: http://www.web-impact.com

This email message is intended only for the addressee(s)
and contains information that may be confidential and/or
copyright.  If you are not the intended recipient please
notify the sender by reply email and immediately delete
this email. Use, disclosure or reproduction of this email
by anyone other than the intended recipient(s) is strictly
prohibited. No representation is made that this email or
any attachments are free of viruses. Virus scanning is
recommended and is the responsibility of the recipient.







-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: upgrade from Lucene 1.3 final to 1.4rc3 problem

2004-07-07 Thread Zilverline info
This is a bug (see posting 'Lockfile Problem Solved'), upgrade to 
1.4-final, and you'll be fine

Alex Aw Seat Kiong wrote:
Hi!
I'm using Lucene 1.3 final currently, all things were working fine.
But, after i'm upgraded from Lucene 1.3 final to 1.4rc3 (simply overwrite the 
lucene-1.4-final.jar to lucene-1.4-rc3.jar and re-compile it)
We can re-compile it successfuly. but when will try to index the document. It give the 
error as below:
java.lang.NullPointerException
   at org.apache.lucene.store.FSDirectory.create(FSDirectory.java:146)
   at org.apache.lucene.store.FSDirectory.init(FSDirectory.java:126)
   at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:102)
   at org.apache.lucene.store.FSDirectory.getDirectory(FSDirectory.java:83)
   at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:173)
Which wrong? Pls help.
Thanks.
Regards,
Alex

 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Zilverline release candidate 1.0-rc3 available

2004-06-07 Thread Zilverline info
All,
I've just released a new candidate (*1.0-rc3*) that now supports 
plugins. You can create your own extractors
for various file formats. I've provided Extractors for Text, PDF, Word, 
and HTML.

It's also possible to specify your own handlers for archives. Say you 
have a RAR archive, and you have a program on your system that can 
extract the content from it, then you can specify that zilverline should 
use this program.

Zilverline is an free search engine based on lucene that's ready to 
roll, and can be simply dropped in a Servlet
Engine. It runs out of the box, and supports PDF, WORD, HTM, TXT, and 
CHM, and can  index zip, rar, and many other formats.
Both on Windows and Linux. 

Please take look at http://www.zilverline.org, and have a swing at it.
cheers,
  Michael Franken
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


Re: Tool for analyzing analyzers

2004-05-28 Thread Zilverline info
Hi Erik,
Erik Hatcher wrote:
[snip]
But I'd love to build a Lucene demo application that is powerful 
enough to be used as a foundation for folks to use out-of-the-box.
That's just what I thought. Here's one: http://www.zilverline.org
Erik
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]
Cheers,
   Michael Franken
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]