Re: speeding up queries (MySQL faster)
--- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > The bottleneck seems to be disk IO. But it's not. Linux is caching the whole file, and there really isn't any disk activity at all. Most of the threads are blocked on InputStream.refill, not waiting for the disk, but waiting for their turn into the synchronized block to read from the disk (which is why I asked about cacheing above that level). CPU is a constant 50% on a dual CPU system (meaning 100% of 1 cpu). -Yonik __ Do you Yahoo!? Yahoo! Mail is new and improved - Check it out! http://promotions.yahoo.com/new_mail - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: speeding up queries (MySQL faster)
--- Otis Gospodnetic <[EMAIL PROTECTED]> wrote: > The bottleneck seems to be disk IO. But it's not. Linux is caching the whole file, and there really isn't any disk activity at all. Most of the threads are blocked on InputStream.refill, not waiting for the disk, but waiting for their turn into the synchronized block to read from the disk (which is why I asked about cacheing above that level). CPU is a constant 50% on a dual CPU system (meaning 100% of 1 cpu). -Yonik __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Lucene Search Applet
I have Lucene working in an applet and I've seen this problem only when the jar file really was not available (typo in the jar name), which is what you'd expect. It's possible that the classpath for your application is not the same as the classpath for the applet; perhaps they're using different VMs or JREs from different locations. Try referencing the Lucene jar file in the archive attribute of the applet tag. Also, to get Lucene to work from an unsigned applet, I had to modify a few classes that call System.getProperty(), because the properties that were being requested were disallowed for applets. I think the classes were IndexWriter, FSDirectory, and BooleanQuery. --Jon On Aug 20, 2004, at 6:57 AM, Simon mcIlwaine wrote: > Im a new Lucene User and I'm not too familiar with Applets either but > I've > been doing a bit of testing on java applet security and if im correct > in > saying that applets can read anything below there codebase then my > problem > is not a security restriction one. The error is reading > java.lang.NoClassDefFoundError and the classpath is set as I have it > working > in a Swing App. Does someone actually have Lucene working in an > Applet? Can > it be done?? Please help. > > Thanks > > Simon > > - Original Message - > > From: "Terry Steichen" <[EMAIL PROTECTED]> > To: "Lucene Users List" <[EMAIL PROTECTED]> > Sent: Wednesday, August 18, 2004 4:17 PM > Subject: Re: Lucene Search Applet > > > I suspect it has to do with the security restrictions of the applet, > 'cause > it doesn't appear to be finding your Lucene jar file. Also, regarding > the > lock files, I believe you can disable the locking stuff just for > purposes > like yours (read-only index). > > Regards, > > Terry > - Original Message - > From: Simon mcIlwaine > To: Lucene Users List > Sent: Wednesday, August 18, 2004 11:03 AM > Subject: Lucene Search Applet > > > Im developing a Lucene CD-ROM based search which will search html > pages on > CD-ROM, using an applet as the UI. I know that theres a problem with > lock > files and also security restrictions on applets so I am using the > RAMDirectory. I have it working in a Swing application however when I > put it > into an applet its giving me problems. It compiles but when I go to > run the > applet I get the error below. Can anyone help? Thanks in advance. > Simon > > Error: > > Java.lang.noClassDefFoundError: org/apache/lucene/store/Directory > > At: Java.lang.Class.getDeclaredConstructors0(Native Method) > > At: Java.lang.Class.privateGetDeclaredConstructors(Class.java:1610) > > At: Java.lang.Class.getConstructor0(Class.java:1922) > > At: Java.lang.Class.newInstance0(Class.java:278) > > At: Java.lang.Class.newInstance(Class.java:261) > > At: sun.applet.AppletPanel.createApplet(AppletPanel.java:617) > > At: sun.applet.AppletPanel.runloader(AppletPanel.java:546) > > At: sun.applet.AppletPanel.run(AppletPanel.java:298) > > At: java.lang.Thread.run(Thread.java:534) > > Code: > > import org.apache.lucene.search.IndexSearcher; > > import org.apache.lucene.search.Query; > > import org.apache.lucene.search.TermQuery; > > import org.apache.lucene.store.RAMDirectory; > > import org.apache.lucene.store.Directory; > > import org.apache.lucene.index.Term; > > import org.apache.lucene.search.Hits; > > import java.awt.*; > > import java.awt.event.*; > > import javax.swing.*; > > import java.io.*; > > public class MemorialApp2 extends JApplet implements ActionListener{ > > JLabel prompt; > > JTextField input; > > JButton search; > > JPanel panel; > > String indexDir = "C:/Java/lucene/index-list"; > > private static RAMDirectory idx; > > public void init(){ > > Container cp = getContentPane(); > > panel = new JPanel(); > > panel.setLayout(new FlowLayout(FlowLayout.CENTER, 4, 4)); > > prompt = new JLabel("Keyword search:"); > > input = new JTextField("",20); > > search = new JButton("Search"); > > search.addActionListener(this); > > panel.add(prompt); > > panel.add(input); > > panel.add(search); > > cp.add(panel); > > } > > public void actionPerformed(ActionEvent e){ > > if (e.getSource() == search){ > > String surname = (input.getText()); > > try { > > findSurname(indexDir, surname); > > } catch(Exception ex) { > > System.err.println(ex); > > } > > } > > } > > public static void findSurname(String indexDir, String surname) > throws > Exception{ > > idx = new RAMDirectory(indexDir); > > IndexSearcher searcher = new IndexSearcher(idx); > > Query query = new TermQuery(new Term("surname", surname)); > > Hits hits = searcher.search(query); > > for (int i = 0; i < hits.length(); i++) { > > //Document doc = hits.doc(i); > > System.out.println("Surname: " + hits.doc(i).get("surname")); > > } > > } > > } > > > > - > To unsubscribe, e-mail: [EM
Re: speeding up queries (MySQL faster)
The bottleneck seems to be disk IO. Since this is a read-only index, why not spread some of the frequently scanned index files over multiple disks, or put the index on SCSI disks hooked up in a RAID. Maybe this is already the case, but you didn't mention in. Oh, I already answered a similar question once before: http://www.mail-archive.com/[EMAIL PROTECTED]/msg05103.html Otis http://www.simpy.com/ -- Index, Search and Share your bookmarks --- Yonik Seeley <[EMAIL PROTECTED]> wrote: > Hi, > > I'm trying to figure out how to speed up queries to a > large index. > I'm currently getting 133 req/sec, which isn't bad, > but isn't too close > to MySQL, which is getting 500 req/sec on the same > hardware with the > same set of documents. > > Setup info & Stats: > - 4.3M documents, 12 keyword fields per document, 11 > unindexed fields per document. > - lucene index size on disk=1.3G > - Hardware: dual opteron w/ 16GB memory, running 64 > bit JVM (Sun 1.5 beta) > - Lucene version 1.4.1 > - Hitting multithreaded server w/ 10 clients at once > - This is a read-only index... no updating is done > - Single IndexSearcher that is reused for all requests > > > Q1) while hitting it with multiple queries at once, > lucene is pegged at 50% CPU usage (meaning it is > only using 1 out of 2 CPUs on average). I took a > thread dump > and all of the lucene threads except one are blocked > on > reading a file (see trace below). I could create two > index > readers, but that seems like it might be a waste, and > fixing > a symptom instead of the root problem. Would multiple > IndexSearchers or IndexReaders share internal caches? > Is there a way to cache more info at a higher level > such that > it would get rid of this bottleneck? The JVM isn't > taking up > much space (125M or so), and I have 16GB to work with! > The OS (linux) is obviously caching the index file, > but > that doesn't get rid of the synchronization issues, > and the > overhead of re-reading. > How is caching in lucene configured? > Does it internally use FieldCache, or do I have to use > that > somehow myself? > > "tcpConnection-8080-72" daemon prio=1 > tid=0x002b24412490 nid=0x34a4 waiting for monitor > entry > > [0x45aba000..0x45abb2d0] > at > org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:215) > - waiting to lock <0x002ae153fa00> (a > org.apache.lucene.store.FSInputStream) > at > org.apache.lucene.store.InputStream.refill(InputStream.java:158) > at > org.apache.lucene.store.InputStream.readByte(InputStream.java:43) > at > org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) > at > org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:176) > at > org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:88) > at > org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:53) > at > org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:48) > at > org.apache.lucene.search.Scorer.score(Scorer.java:37) > at > org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92) > at > org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) > at > org.apache.lucene.search.Hits.(Hits.java:43) > at > org.apache.lucene.search.Searcher.search(Searcher.java:33) > at > org.apache.lucene.search.Searcher.search(Searcher.java:27) > > > Even using only 1 cpu though, MySQL is faster. Here is > what > the queries look like: > > "field1:4 AND field2:188453 AND field3:1" > > field1:4 done alone selects around 4.2M records > field2:188453 done alone selects around 1.6M records > field3:1 done alone selects around 1K records > The whole query normally selects less than 50 records > Only the first 10 are returned (or whatever range > the client selects). > > The fields are all keywords checked for exact matches > (no > fulltext search is done). Is there anything I can do > to > speed these queries up, or is the structure just more > suited > to MySQL (and not an inverted index)? > > How is a query like this carried out? > > Any help would be greatly appreciated. There's not a > lot of info > on searching (much more on updating). I'm looking > forward > to "Lucene in Action"! too bad it's not out till > October. > > -Yonik > > > > ___ > Do you Yahoo!? > Win 1 of 4,000 free domain names from Yahoo! Enter now. > http://promotions.yahoo.com/goldrush > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Custom filter
On Aug 20, 2004, at 6:48 PM, [EMAIL PROTECTED] wrote: We're currently in lucene 1.2... haven't moved to 1.3 yet. Skip 1.3 and go straight to 1.4.1 :) Upgrade - why not? Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: memory leek in lucene?
Are you calling ParallelMultiSearcher.search(Query query, Sort sort) to do your search? If so, I am currently having a similar problem. Terence > > Doing query against lucene I run into memomry problem, i.e. it's look like > it's not giving memory back after the > query have been executed. > > I use ParallelMultiSearcher ant call close method after results are > displayed. > > hits=null; // Hits class > if (ms!=null) ms.close(); //ParallelMultiSearcher > > Doesn't help. The memory getting not free. On queries like "No*" I get > incremental memory consume of c. 20-70mb. per query. > Imagine what happens with my web server... > > I tried also from command line and got the similar result. > > Am I doing wrong or miss something? > > Please help, I use 1.4.1 on linux box. > Joel > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > -- Get your free email account from http://www.trekspace.com Your Internet Virtual Desktop! - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Custom filter
We're currently in lucene 1.2... haven't moved to 1.3 yet. Roy. On Fri, 20 Aug 2004 18:46:29 -0400, Erik Hatcher wrote > Have you considered using the built-in QueryFilter for this? Why > isn't it sufficient for your needs? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Debian build problem with 1.4.1
>It's easy enough for folks to compile Lucene this way I'm having trouble, warnings and error messages appended. This is for Lucene 1.4.1. One of the few Debian specific changes was to call the jarball 1.4 instead of the default 1.5-rc1-dev designation in build.xml. rode:~> gcj --version gcj (GCC) 3.3.4 (Debian 1:3.3.4-9) rode:~> gcj build/lucene-1.4.jar build/lucene-demos-1.4.jar -o indexer \ --main=org.apache.lucene.demo.IndexHTML >& /tmp/log.txt > and applications built this way are pretty small. The big thing to > install is libgcj. I'm potentially interested in C applications calling a Lucene gcj compiled native library. But that would be in the distant future if at all. Right now just compiling a working Lucene app with gcj would be pretty cool. Cheers, Jeff = org/apache/lucene/analysis/de/WordlistLoader.java: In class `org.apache.lucene.analysis.de.WordlistLoader': org/apache/lucene/analysis/de/WordlistLoader.java: In method `org.apache.lucene.analysis.de.WordlistLoader.getWordSet(java.io.File)': org/apache/lucene/analysis/de/WordlistLoader.java:47: warning: exception handler inside code that is being protected CompoundFileReader.java: In class `org.apache.lucene.index.CompoundFileReader$CSInputStream': CompoundFileReader.java: In method `org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(byte[],int,int)': CompoundFileReader.java:215: warning: exception handler inside code that is being protected org/apache/lucene/index/CompoundFileReader.java: In class `org.apache.lucene.index.CompoundFileReader': org/apache/lucene/index/CompoundFileReader.java: In constructor `(org.apache.lucene.store.Directory,java.lang.String)': org/apache/lucene/index/CompoundFileReader.java:51: warning: exception handler inside code that is being protected org/apache/lucene/index/CompoundFileWriter.java: In class `org.apache.lucene.index.CompoundFileWriter': org/apache/lucene/index/CompoundFileWriter.java: In method `org.apache.lucene.index.CompoundFileWriter.close()': org/apache/lucene/index/CompoundFileWriter.java:127: warning: exception handler inside code that is being protected org/apache/lucene/index/CompoundFileWriter.java: In method `org.apache.lucene.index.CompoundFileWriter.copyFile(org.apache.lucene.index.CompoundFileWriter$FileEntry,org.apache.lucene.store.OutputStream,byte[])': org/apache/lucene/index/CompoundFileWriter.java:194: warning: exception handler inside code that is being protected org/apache/lucene/index/DocumentWriter.java: In class `org.apache.lucene.index.DocumentWriter': org/apache/lucene/index/DocumentWriter.java: In method `org.apache.lucene.index.DocumentWriter.addDocument(java.lang.String,org.apache.lucene.document.Document)': org/apache/lucene/index/DocumentWriter.java:60: warning: exception handler inside code that is being protected org/apache/lucene/index/DocumentWriter.java: In method `org.apache.lucene.index.DocumentWriter.invertDocument(org.apache.lucene.document.Document)': org/apache/lucene/index/DocumentWriter.java:117: warning: exception handler inside code that is being protected org/apache/lucene/index/DocumentWriter.java: In method `org.apache.lucene.index.DocumentWriter.writePostings(org.apache.lucene.index.Posting[],java.lang.String)': org/apache/lucene/index/DocumentWriter.java:250: warning: exception handler inside code that is being protected org/apache/lucene/index/DocumentWriter.java: In method `org.apache.lucene.index.DocumentWriter.writeNorms(org.apache.lucene.document.Document,java.lang.String)': org/apache/lucene/index/DocumentWriter.java:320: warning: exception handler inside code that is being protected org/apache/lucene/index/FieldInfos.java: In class `org.apache.lucene.index.FieldInfos': org/apache/lucene/index/FieldInfos.java: In constructor `(org.apache.lucene.store.Directory,java.lang.String)': org/apache/lucene/index/FieldInfos.java:36: warning: exception handler inside code that is being protected org/apache/lucene/index/FieldInfos.java: In method `org.apache.lucene.index.FieldInfos.write(org.apache.lucene.store.Directory,java.lang.String)': org/apache/lucene/index/FieldInfos.java:172: warning: exception handler inside code that is being protected org/apache/lucene/index/IndexReader.java: In class `org.apache.lucene.index.IndexReader': org/apache/lucene/index/IndexReader.java: In method `org.apache.lucene.index.IndexReader.open(org.apache.lucene.store.Directory,boolean)': org/apache/lucene/index/IndexReader.java:110: warning: exception handler inside code that is being protected org/apache/lucene/index/IndexReader.java: In method `org.apache.lucene.index.IndexReader.delete(org.apache.lucene.index.Term)': org/apache/lucene/index/IndexReader.java:449: warning: exception handler inside code that is being protected org/apache/lucene/index/IndexReader.java: In method `org.apache.lucene.index.IndexReader.commit()': org/apache/lucene/index/IndexReader.java:480: warning: exception h
Re: Custom filter
Have you considered using the built-in QueryFilter for this? Why isn't it sufficient for your needs? Erik On Aug 20, 2004, at 6:32 PM, [EMAIL PROTECTED] wrote: Hi guys! I was hoping someone here could help me out with a custom filter. We have an index of emails and do some searches on the text of an email message and also searches based on the email addresses in a To, From or CC. Since we also do searches on a bunch of emails, we created a custom filter for searches on an array of fields for an array of values. [code included below] The problem we're having is that creating a query string like so: "Message:viagra AND (From:(email1 OR email2) OR To:(email1 OR email2) OR CC:(email1 OR email2))" would return results, but our filter combined with a query string of "Message:viagra" sometimes wouldn't. One thing I noticed is that when the results do return with the filter, the email has the format of [EMAIL PROTECTED], but the one that doesn't has something like [EMAIL PROTECTED] Also it might have something to do with the storage of the From or To or CC. We don't parse out the email addresses before storing them. So sometimes the value of a From/To/CC field might be "[EMAIL PROTECTED]" or "local <[EMAIL PROTECTED]>" or even "<[EMAIL PROTECTED]>". Could the carrots be throwing off my filter? I also wouldn't mind any suggestions to doing this filter better. Here is the bits method from our custom filter: - final public BitSet bits( IndexReader reader ) throws IOException { BitSet bits = new BitSet( reader.maxDoc() ); for ( int x = 0; x < fields.length; x++ ) { for ( int y = 0; y < values.length; y++ ) { TermDocs termDocs = reader.termDocs( new Term( fields[x], values[y] ) ); try { while ( termDocs.next() ) { bits.set( termDocs.doc() ); } } finally { termDocs.close(); } } } return bits; } - Thanks in advance, Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Custom filter
Hi guys! I was hoping someone here could help me out with a custom filter. We have an index of emails and do some searches on the text of an email message and also searches based on the email addresses in a To, From or CC. Since we also do searches on a bunch of emails, we created a custom filter for searches on an array of fields for an array of values. [code included below] The problem we're having is that creating a query string like so: "Message:viagra AND (From:(email1 OR email2) OR To:(email1 OR email2) OR CC:(email1 OR email2))" would return results, but our filter combined with a query string of "Message:viagra" sometimes wouldn't. One thing I noticed is that when the results do return with the filter, the email has the format of [EMAIL PROTECTED], but the one that doesn't has something like [EMAIL PROTECTED] Also it might have something to do with the storage of the From or To or CC. We don't parse out the email addresses before storing them. So sometimes the value of a From/To/CC field might be "[EMAIL PROTECTED]" or "local <[EMAIL PROTECTED]>" or even "<[EMAIL PROTECTED]>". Could the carrots be throwing off my filter? I also wouldn't mind any suggestions to doing this filter better. Here is the bits method from our custom filter: - final public BitSet bits( IndexReader reader ) throws IOException { BitSet bits = new BitSet( reader.maxDoc() ); for ( int x = 0; x < fields.length; x++ ) { for ( int y = 0; y < values.length; y++ ) { TermDocs termDocs = reader.termDocs( new Term( fields[x], values[y] ) ); try { while ( termDocs.next() ) { bits.set( termDocs.doc() ); } } finally { termDocs.close(); } } } return bits; } - Thanks in advance, Roy. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
speeding up queries (MySQL faster)
Hi, I'm trying to figure out how to speed up queries to a large index. I'm currently getting 133 req/sec, which isn't bad, but isn't too close to MySQL, which is getting 500 req/sec on the same hardware with the same set of documents. Setup info & Stats: - 4.3M documents, 12 keyword fields per document, 11 unindexed fields per document. - lucene index size on disk=1.3G - Hardware: dual opteron w/ 16GB memory, running 64 bit JVM (Sun 1.5 beta) - Lucene version 1.4.1 - Hitting multithreaded server w/ 10 clients at once - This is a read-only index... no updating is done - Single IndexSearcher that is reused for all requests Q1) while hitting it with multiple queries at once, lucene is pegged at 50% CPU usage (meaning it is only using 1 out of 2 CPUs on average). I took a thread dump and all of the lucene threads except one are blocked on reading a file (see trace below). I could create two index readers, but that seems like it might be a waste, and fixing a symptom instead of the root problem. Would multiple IndexSearchers or IndexReaders share internal caches? Is there a way to cache more info at a higher level such that it would get rid of this bottleneck? The JVM isn't taking up much space (125M or so), and I have 16GB to work with! The OS (linux) is obviously caching the index file, but that doesn't get rid of the synchronization issues, and the overhead of re-reading. How is caching in lucene configured? Does it internally use FieldCache, or do I have to use that somehow myself? "tcpConnection-8080-72" daemon prio=1 tid=0x002b24412490 nid=0x34a4 waiting for monitor entry [0x45aba000..0x45abb2d0] at org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:215) - waiting to lock <0x002ae153fa00> (a org.apache.lucene.store.FSInputStream) at org.apache.lucene.store.InputStream.refill(InputStream.java:158) at org.apache.lucene.store.InputStream.readByte(InputStream.java:43) at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83) at org.apache.lucene.index.SegmentTermDocs.skipTo(SegmentTermDocs.java:176) at org.apache.lucene.search.TermScorer.skipTo(TermScorer.java:88) at org.apache.lucene.search.ConjunctionScorer.doNext(ConjunctionScorer.java:53) at org.apache.lucene.search.ConjunctionScorer.next(ConjunctionScorer.java:48) at org.apache.lucene.search.Scorer.score(Scorer.java:37) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:92) at org.apache.lucene.search.Hits.getMoreDocs(Hits.java:64) at org.apache.lucene.search.Hits.(Hits.java:43) at org.apache.lucene.search.Searcher.search(Searcher.java:33) at org.apache.lucene.search.Searcher.search(Searcher.java:27) Even using only 1 cpu though, MySQL is faster. Here is what the queries look like: "field1:4 AND field2:188453 AND field3:1" field1:4 done alone selects around 4.2M records field2:188453 done alone selects around 1.6M records field3:1 done alone selects around 1K records The whole query normally selects less than 50 records Only the first 10 are returned (or whatever range the client selects). The fields are all keywords checked for exact matches (no fulltext search is done). Is there anything I can do to speed these queries up, or is the structure just more suited to MySQL (and not an inverted index)? How is a query like this carried out? Any help would be greatly appreciated. There's not a lot of info on searching (much more on updating). I'm looking forward to "Lucene in Action"! too bad it's not out till October. -Yonik ___ Do you Yahoo!? Win 1 of 4,000 free domain names from Yahoo! Enter now. http://promotions.yahoo.com/goldrush - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Debian build problem with 1.4.1
I can successfully use gcc 3.4.0 with Lucene as follows: ant jar jar-demo gcj -O3 build/lucene-1.5-rc1-dev.jar build/lucene-demos-1.5-rc1-dev.jar -o indexer --main=org.apache.lucene.demo.IndexHTML ./indexer -create docs It runs pretty snappy too! However I don't know if there's much milage in packaging Lucene as a native library. It's easy enough for folks to compile Lucene this way, and applications built this way are pretty small. The big thing to install is libgcj. Doug Jeff Breidenbach wrote: Ok, Lucene 1.4.1 has been uploaded to Debian. Hopefully it will have enough time to percolate before the sarge release. Now that that is taken care of, I'm curious about the status of gcj compilation. Packaging Lucene as a native library might be useful for projects such as PyLucene, and it is also advantageous for license reasons i.e. avoiding the non-free JVM dependency. What's the current gcj compilation recipe? The best I could find on Google (below) seems a little bit stale. http://www.mail-archive.com/[EMAIL PROTECTED]/msg04131.html Cheers, Jeff - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene with English and Spanish Best Practice?
Hello, I'm interested in any feedback from anyone who has worked through implementing Internationalization (I18N) search with Lucene or has ideas for this requirement. Currently, we're using Lucene with straight English and are looking to add Spanish to the mix (with maybe more languages to follow). This is our current IndexWriter setup utilizing the PerFieldAnalyzerWrapper: PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); analyzer.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzer.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writer = new IndexWriter(indexDir, analyzer, create); Would people suggest we switch this over to Snowball so there are English and Spanish Analyzers and IndexWriters? Something like this: PerFieldAnalyzerWrapper analyzerEnglish = new PerFieldAnalyzerWrapper(new SnowballAnalyzer("English")); analyzerEnglish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzerEnglish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writerEnglish = new IndexWriter(indexDir, analyzerEnglish, create); PerFieldAnalyzerWrapper analyzerSpanish = new PerFieldAnalyzerWrapper(new SnowballAnalyzer("Spanish")); analyzerSpanish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzerSpanish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writerSpanish = new IndexWriter(indexDir, analyzerSpanish, create); Are multiple indexes or mirrors of each index then usually created for every language? We currently have 4 indexes that are all English. Would we then create 4 more that are Spanish? Then at search time we would determine the language and which set of indexes to search against, English or Spanish. Or another approach could be to add a Spanish field to the existing 4 indexes since most of the indexes have only one field that will be translated from English to Spanish. thanks a bunch, chad. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Lucene with English and Spanish Best Practice?
Hello, I'm interested in any feedback from anyone who has worked through implementing Internationalization (I18N) search with Lucene or has ideas for this requirement. Currently, we're using Lucene with straight English and are looking to add Spanish to the mix (with maybe more languages to follow). This is our current IndexWriter setup utilizing the PerFieldAnalyzerWrapper: PerFieldAnalyzerWrapper analyzer = new PerFieldAnalyzerWrapper(new StandardAnalyzer()); analyzer.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzer.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writer = new IndexWriter(indexDir, analyzer, create); Would people suggest we switch this over to Snowball so there are English and Spanish Analyzers and IndexWriters? Something like this: PerFieldAnalyzerWrapper analyzerEnglish = new PerFieldAnalyzerWrapper(new SnowballAnalyzer("English")); analyzerEnglish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzerEnglish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writerEnglish = new IndexWriter(indexDir, analyzerEnglish, create); PerFieldAnalyzerWrapper analyzerSpanish = new PerFieldAnalyzerWrapper(new SnowballAnalyzer("Spanish")); analyzerSpanish.addAnalyzer(FIELD_TITLE_STARTS_WITH, new WhitespaceAnalyzer()); analyzerSpanish.addAnalyzer(FIELD_CATEGORY, new WhitespaceAnalyzer()); IndexWriter writerSpanish = new IndexWriter(indexDir, analyzerSpanish, create); Are multiple indexes or mirrors of each index then usually created for every language? We currently have 4 indexes that are all English. Would we then create 4 more that are Spanish? Then at search time we would determine the language and which set of indexes to search against, English or Spanish. Or another approach could be to add a Spanish field to the existing 4 indexes since most of the indexes have only one field that will be translated from English to Spanish. thanks a bunch, chad. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: NegativeArraySizeException when creating a new IndexSearcher
Looks to me like you're using an older version of Lucene on your Linux box. The code is back-compatible, it will read old indexes, but Lucene 1.3 cannot read indexes created by Lucene 1.4, and will fail in the way you describe. Doug Sven wrote: Hi! I have a problem to port a Lucene based knowledgebase from Windows to Linux. On Windows it works fine whereas I get a NegativeArraySizeException on Linux when I try to initialise a new IndexSearcher to search the index. Deleting and rebuilding the index didn't help. I checked permissions, file path and lock_dir but as far as I can say they seem to be all right. As I couldn't find another one with the same problem I guess I've overlooked sth, but I've run out of ideas. I use lucene-1.4-rc2 and tomcat 5.0.18. Can someone help me please with this or has an idea? Kind regards, Sven java.lang.NegativeArraySizeException at org.apache.lucene.index.TermInfosReader.readIndex(TermInfosReader.java:106) at org.apache.lucene.index.TermInfosReader.(TermInfosReader.java:82) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:141) at org.apache.lucene.index.SegmentReader.(SegmentReader.java:120) at org.apache.lucene.index.IndexReader$1.doBody(IndexReader.java:118) at org.apache.lucene.store.Lock$With.run(Lock.java:148) at org.apache.lucene.index.IndexReader.open(IndexReader.java:111) at org.apache.lucene.index.IndexReader.open(IndexReader.java:99) at org.apache.lucene.search.IndexSearcher.(IndexSearcher.java:75) at com.sykon.knowledgebase.action.ListQueryResultAction.act(ListQueryResultActi on.java:134) at org.apache.cocoon.components.treeprocessor.sitemap.ActTypeNode.invoke(ActTyp eNode.java:159) at org.apache.cocoon.components.treeprocessor.sitemap.ActionSetNode.call(Action SetNode.java:121) at org.apache.cocoon.components.treeprocessor.sitemap.ActSetNode.invoke(ActSetN ode.java:98) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:84) at org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok e(PreparableMatchNode.java:165) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:107) at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel ineNode.java:162) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:107) at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe linesNode.java:136) at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess or.java:371) at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess or.java:312) at org.apache.cocoon.components.treeprocessor.sitemap.MountNode.invoke(MountNod e.java:133) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:84) at org.apache.cocoon.components.treeprocessor.sitemap.PreparableMatchNode.invok e(PreparableMatchNode.java:165) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:107) at org.apache.cocoon.components.treeprocessor.sitemap.PipelineNode.invoke(Pipel ineNode.java:162) at org.apache.cocoon.components.treeprocessor.AbstractParentProcessingNode.invo keNodes(AbstractParentProcessingNode.java:107) at org.apache.cocoon.components.treeprocessor.sitemap.PipelinesNode.invoke(Pipe linesNode.java:136) at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess or.java:371) at org.apache.cocoon.components.treeprocessor.TreeProcessor.process(TreeProcess or.java:312) at org.apache.cocoon.Cocoon.process(Cocoon.java:656) at org.apache.cocoon.servlet.CocoonServlet.service(CocoonServlet.java:1112) at javax.servlet.http.HttpServlet.service(HttpServlet.java:856) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:284) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:204) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher. java:742) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDis patcher.java:506) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatch er.java:443) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher .java:359) at org.apache.jasper.runtime.PageContextImpl.doForward(PageContextImpl.java:712 ) at org.apache.jasper.runtime.PageContextImpl.forward(PageContextImpl.java:682) at org.apache.jsp.knowlegebase.controller_jsp._jspService(controller_jsp.java:8 44) at org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:133) at javax.servlet.http.HttpServlet.service(HttpServlet.java:856) at org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:3 11) at org.apache.jasper.serv
RE: continuous index updates
So the finalizer on the underlying reader closes file handles? -Original Message- From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 2:41 PM To: Lucene Users List Subject: Re: continuous index updates I just create a new IndexSearcher, leave the old IndexSearcher alone, and JVM's garbage collection cleans it up. Otis --- "Crump, Michael" <[EMAIL PROTECTED]> wrote: > Hello, > > > > I am currently working on a server app that will require the ability > to > make index additions/deletions at any time. I want to cache/reuse > index > searchers and readers. I know that once an index has changed only > newly > opened readers will see the changes. Creating a new reader to see > the > changes and caching it will be no problem. My problem is that since > this is a multithreaded app other threads may be using the old > readers > making it difficult to know when to close them. I assume that a > reader > must be closed to free the associated resources. I was thinking > about > using some kind of reference counted reader that would keep track of > its > references and only truly close when there were no references. > > > > Am I making this too difficult? > > > > Is there a better way? > > > > I assume others have had to do this using Lucene, do you have any > recommendations? > > > > Regards, > > > > Michael > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: continuous index updates
I just create a new IndexSearcher, leave the old IndexSearcher alone, and JVM's garbage collection cleans it up. Otis --- "Crump, Michael" <[EMAIL PROTECTED]> wrote: > Hello, > > > > I am currently working on a server app that will require the ability > to > make index additions/deletions at any time. I want to cache/reuse > index > searchers and readers. I know that once an index has changed only > newly > opened readers will see the changes. Creating a new reader to see > the > changes and caching it will be no problem. My problem is that since > this is a multithreaded app other threads may be using the old > readers > making it difficult to know when to close them. I assume that a > reader > must be closed to free the associated resources. I was thinking > about > using some kind of reference counted reader that would keep track of > its > references and only truly close when there were no references. > > > > Am I making this too difficult? > > > > Is there a better way? > > > > I assume others have had to do this using Lucene, do you have any > recommendations? > > > > Regards, > > > > Michael > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Debian build problem with 1.4.1
On Aug 20, 2004, at 12:36 PM, Jeff Breidenbach wrote: I don't understand this. StandardTokenizer.java hasn't changed since last year. I have packaged Lucene such that 'ant javacc' is called at package build time. I now see the problem - 'import java.io.*;' has been removed from StandardTokenizer.jj in Lucene 1.4.1. When I put that line back in, things build fine. Now that I know what the problem is, I'll go ahead and patch the Debian package. Please make sure the Lucene codebase gets fixed as well. The codebase has been fixed, as of a couple of weeks ago :) Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Debian build problem with 1.4.1
Ok, Lucene 1.4.1 has been uploaded to Debian. Hopefully it will have enough time to percolate before the sarge release. Now that that is taken care of, I'm curious about the status of gcj compilation. Packaging Lucene as a native library might be useful for projects such as PyLucene, and it is also advantageous for license reasons i.e. avoiding the non-free JVM dependency. What's the current gcj compilation recipe? The best I could find on Google (below) seems a little bit stale. http://www.mail-archive.com/[EMAIL PROTECTED]/msg04131.html Cheers, Jeff - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
memory leek in lucene?
Doing query against lucene I run into memomry problem, i.e. it's look like it's not giving memory back after the query have been executed. I use ParallelMultiSearcher ant call close method after results are displayed. hits=null; // Hits class if (ms!=null) ms.close(); //ParallelMultiSearcher Doesn't help. The memory getting not free. On queries like "No*" I get incremental memory consume of c. 20-70mb. per query. Imagine what happens with my web server... I tried also from command line and got the similar result. Am I doing wrong or miss something? Please help, I use 1.4.1 on linux box. Joel - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Debian build problem with 1.4.1
>I don't understand this. StandardTokenizer.java hasn't changed since >last year. I have packaged Lucene such that 'ant javacc' is called at package build time. I now see the problem - 'import java.io.*;' has been removed from StandardTokenizer.jj in Lucene 1.4.1. When I put that line back in, things build fine. Now that I know what the problem is, I'll go ahead and patch the Debian package. Please make sure the Lucene codebase gets fixed as well. Cheers, Jeff - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene and ejb applications
Infact we do the same exact thing. Session bean method called search() delegates to a POJO SearchService. We lazy load the IndexSearch cache it in memory and invalidate that object when someone else modifies the index. This trick works wonderfually for us. The search has become faster after caching the searcher. Praveen - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, August 20, 2004 12:02 PM Subject: Re: lucene and ejb applications > On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote: > > hi erik > > > > thanks for the warning and the code. > > Let me re-phrase the question, > > > > i have a index generated by lucene, i need to have the search > > capabilty > > to have a high availabilty. What solutions would be the most optimal > > I'm guessing from your descriptions that you want a search server that > multiple applications can access. Correct? Is that what you mean by > "high availability"? > > Take a look at Nutch for examples of doing this kind of thing. And > also... > > > > > Currentlly i have two senarions in mind > > a) setup a RMI based app. that on start-up initializes a > > IndexSearcher > > object > > and waits for invocation of a method like Vector > > executeQuery(Query ) > > Lucene has built-in RMI capability, so you don't need to recreate this > yourself. Look at RemoteSearchable (and the test cases that use it). > > > b) create a web based app(jsp/servlet or struts) that initialises > > the > > IndexSearcher object, and stores in the servletContext on > > intialization, and > > all request invoke the Hits search(Query q) > > This is ok, but you have the same issues with servlet context > (application scope or even session scope) with distributed > applications. IndexSearcher, at the very least, should be transient > and lazy initialized, perhaps nested under a controller object of your > making. > > > with senario a) i can have more control over updates, insert, and > > deletes > > where as with senario b) has higher availabilty > > I disagree with your analysis of those scenarios. Neither has more or > less control or availability than the other. > > > I want to create and store the IndexSearcher object, during > > initailization > > to save on > > mutlitple open and reads. once updates are ready signal can be sent to > > block further searches while the updates are integrated into the > > existing > > index. > > It is a good thing to keep an IndexSearcher instance around for big > indexes to save on that I/O, I completely agree. A simple > IndexSearcher-encapsulating Java object which lazy initializes and > keeps IndexSearcher as a transient would be quite sufficient, I think. > Store that object wherever you like - application scope seems to be > appropriate for your web application scenario. > > Erik > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Debian build problem with 1.4.1
On Aug 20, 2004, at 11:12 AM, Jeff Breidenbach wrote: Hi Otis, I'm asking, because it looks like your compiler is not finding Reader and IOException classes, both of which are in java.io.* package, which I see imported in StandardTokenizer.java as 'import java.io.*;'. In my copy of StandardTokenizer.java, there is no 'import java.io.*;' (and in fact this is a change from lucene-1.4-final). I don't understand this. StandardTokenizer.java hasn't changed since last year. % cvs log StandardTokenizer.java ... revision 1.3 date: 2003/12/22 22:12:24; author: cutting; state: Exp; lines: +6 -6 Fix StandardTokenizer's handling of CJK characters. revision 1.2 date: 2003/10/01 16:39:26; author: ehatcher; state: Exp; lines: +7 -4 oops, forgot to check in JavaCC generated files revision 1.1 date: 2003/09/11 01:51:33; author: ehatcher; state: Exp; PR 19468, but not exactly as it was done in the provided patches. JavaCC is no longer required to build Lucene, but can be run optionally = And I have import java.io.* at the top. Since this file is apparently generated from JavaCC, I'm not sure what to do. You can regenerate StandardTokenizer by running: ant javacc (you'll need JavaCC installed, of course, and this is the reason we check in the generated files in order to save the hassle for others) It seems something is fishy with the copy of the code you have. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene and ejb applications
On Aug 20, 2004, at 7:54 AM, Rupinder Singh Mazara wrote: hi erik thanks for the warning and the code. Let me re-phrase the question, i have a index generated by lucene, i need to have the search capabilty to have a high availabilty. What solutions would be the most optimal I'm guessing from your descriptions that you want a search server that multiple applications can access. Correct? Is that what you mean by "high availability"? Take a look at Nutch for examples of doing this kind of thing. And also... Currentlly i have two senarions in mind a) setup a RMI based app. that on start-up initializes a IndexSearcher object and waits for invocation of a method like Vector executeQuery(Query ) Lucene has built-in RMI capability, so you don't need to recreate this yourself. Look at RemoteSearchable (and the test cases that use it). b) create a web based app(jsp/servlet or struts) that initialises the IndexSearcher object, and stores in the servletContext on intialization, and all request invoke the Hits search(Query q) This is ok, but you have the same issues with servlet context (application scope or even session scope) with distributed applications. IndexSearcher, at the very least, should be transient and lazy initialized, perhaps nested under a controller object of your making. with senario a) i can have more control over updates, insert, and deletes where as with senario b) has higher availabilty I disagree with your analysis of those scenarios. Neither has more or less control or availability than the other. I want to create and store the IndexSearcher object, during initailization to save on mutlitple open and reads. once updates are ready signal can be sent to block further searches while the updates are integrated into the existing index. It is a good thing to keep an IndexSearcher instance around for big indexes to save on that I/O, I completely agree. A simple IndexSearcher-encapsulating Java object which lazy initializes and keeps IndexSearcher as a transient would be quite sufficient, I think. Store that object wherever you like - application scope seems to be appropriate for your web application scenario. Erik - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Indexing and Searching Database in Lucene
Funy thing is I was thinking of doing something like this just today. This is especially good when you perform a lot of queries using the LIKE statement. Lucene would increase search performance a great deal. Aviran wrote: You need to create a lucene index from the database. Just index the columns and the records from the database. It will be useful to have also a field in lucene that contains the database's primary key, so you can retrieve the actual record from the database Aviran -Original Message- From: sivalingam T [mailto:[EMAIL PROTECTED]] Sent: Friday, August 20, 2004 10:55 AM To: [EMAIL PROTECTED] Subject: Indexing and Searching Database in Lucene Hi Can we index and search database in Lucene Search Engine? if anybody have please send reply. With Warm Regards, Sivalingam.T Sai Eswar Innovations (P) Ltd, Chennai-92 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Debian build problem with 1.4.1
Hi Otis, >I'm asking, because it looks like your compiler is not finding Reader >and IOException classes, both of which are in java.io.* package, which >I see imported in StandardTokenizer.java as 'import java.io.*;'. In my copy of StandardTokenizer.java, there is no 'import java.io.*;' (and in fact this is a change from lucene-1.4-final). Since this file is apparently generated from JavaCC, I'm not sure what to do. I'm happy to supply a login to a Debian computer if someone is interested in helping debug. >Are any of those commands actually using Lucene's build.xml? Yes, they are just a wrapper around calling ant. The build.xml file has very minimal debian specific modifications. Cheers, Jeff - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Indexing and Searching Database in Lucene
You need to create a lucene index from the database. Just index the columns and the records from the database. It will be useful to have also a field in lucene that contains the database's primary key, so you can retrieve the actual record from the database Aviran -Original Message- From: sivalingam T [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 10:55 AM To: [EMAIL PROTECTED] Subject: Indexing and Searching Database in Lucene Hi Can we index and search database in Lucene Search Engine? if anybody have please send reply. With Warm Regards, Sivalingam.T Sai Eswar Innovations (P) Ltd, Chennai-92 - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Indexing and Searching Database in Lucene
Hi Can we index and search database in Lucene Search Engine? if anybody have please send reply. With Warm Regards, Sivalingam.T Sai Eswar Innovations (P) Ltd, Chennai-92
Indexing and Searching Database Values in Lucene Search Engine
How to index and search database values using Lucene Search Engine? By T.Sivalingam. Sivalingam T
Re: pdfboxhelp
Iam sorry, mail has been sent accidentally - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 8:02 PM Subject: Re: pdfboxhelp Did I leave you speechless!? :-) Santosh wrote: - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PM Subject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File("/path/to/the/file.pdf"); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment h
Re: pdfboxhelp
Did I leave you speechless!? :-) Santosh wrote: - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PM Subject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File("/path/to/the/file.pdf"); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -
continuous index updates
Hello, I am currently working on a server app that will require the ability to make index additions/deletions at any time. I want to cache/reuse index searchers and readers. I know that once an index has changed only newly opened readers will see the changes. Creating a new reader to see the changes and caching it will be no problem. My problem is that since this is a multithreaded app other threads may be using the old readers making it difficult to know when to close them. I assume that a reader must be closed to free the associated resources. I was thinking about using some kind of reference counted reader that would keep track of its references and only truly close when there were no references. Am I making this too difficult? Is there a better way? I assume others have had to do this using Lucene, do you have any recommendations? Regards, Michael
Re: pdfboxhelp
- Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 7:37 PM Subject: Re: pdfboxhelp Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File("/path/to/the/file.pdf"); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: pdfboxhelp
Here is the super simple code required. import org.pdfbox.searchengine.lucene.*; File pdfFile = new File("/path/to/the/file.pdf"); // Below returns a parse PDF file in a Lucene Document object. Document doc = LucenePDFDocument.getDocument(pdfFile); Santosh wrote: exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PRO
Re: Lucene Search Applet
Im a new Lucene User and I'm not too familiar with Applets either but I've been doing a bit of testing on java applet security and if im correct in saying that applets can read anything below there codebase then my problem is not a security restriction one. The error is reading java.lang.NoClassDefFoundError and the classpath is set as I have it working in a Swing App. Does someone actually have Lucene working in an Applet? Can it be done?? Please help. Thanks Simon - Original Message - From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, August 18, 2004 4:17 PM Subject: Re: Lucene Search Applet I suspect it has to do with the security restrictions of the applet, 'cause it doesn't appear to be finding your Lucene jar file. Also, regarding the lock files, I believe you can disable the locking stuff just for purposes like yours (read-only index). Regards, Terry - Original Message - From: Simon mcIlwaine To: Lucene Users List Sent: Wednesday, August 18, 2004 11:03 AM Subject: Lucene Search Applet Im developing a Lucene CD-ROM based search which will search html pages on CD-ROM, using an applet as the UI. I know that theres a problem with lock files and also security restrictions on applets so I am using the RAMDirectory. I have it working in a Swing application however when I put it into an applet its giving me problems. It compiles but when I go to run the applet I get the error below. Can anyone help? Thanks in advance. Simon Error: Java.lang.noClassDefFoundError: org/apache/lucene/store/Directory At: Java.lang.Class.getDeclaredConstructors0(Native Method) At: Java.lang.Class.privateGetDeclaredConstructors(Class.java:1610) At: Java.lang.Class.getConstructor0(Class.java:1922) At: Java.lang.Class.newInstance0(Class.java:278) At: Java.lang.Class.newInstance(Class.java:261) At: sun.applet.AppletPanel.createApplet(AppletPanel.java:617) At: sun.applet.AppletPanel.runloader(AppletPanel.java:546) At: sun.applet.AppletPanel.run(AppletPanel.java:298) At: java.lang.Thread.run(Thread.java:534) Code: import org.apache.lucene.search.IndexSearcher; import org.apache.lucene.search.Query; import org.apache.lucene.search.TermQuery; import org.apache.lucene.store.RAMDirectory; import org.apache.lucene.store.Directory; import org.apache.lucene.index.Term; import org.apache.lucene.search.Hits; import java.awt.*; import java.awt.event.*; import javax.swing.*; import java.io.*; public class MemorialApp2 extends JApplet implements ActionListener{ JLabel prompt; JTextField input; JButton search; JPanel panel; String indexDir = "C:/Java/lucene/index-list"; private static RAMDirectory idx; public void init(){ Container cp = getContentPane(); panel = new JPanel(); panel.setLayout(new FlowLayout(FlowLayout.CENTER, 4, 4)); prompt = new JLabel("Keyword search:"); input = new JTextField("",20); search = new JButton("Search"); search.addActionListener(this); panel.add(prompt); panel.add(input); panel.add(search); cp.add(panel); } public void actionPerformed(ActionEvent e){ if (e.getSource() == search){ String surname = (input.getText()); try { findSurname(indexDir, surname); } catch(Exception ex) { System.err.println(ex); } } } public static void findSurname(String indexDir, String surname) throws Exception{ idx = new RAMDirectory(indexDir); IndexSearcher searcher = new IndexSearcher(idx); Query query = new TermQuery(new Term("surname", surname)); Hits hits = searcher.search(query); for (int i = 0; i < hits.length(); i++) { //Document doc = hits.doc(i); System.out.println("Surname: " + hits.doc(i).get("surname")); } } } - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: pdfboxhelp
exactly, the same is required to me - Original Message - From: Don Vaillancourt To: Lucene Users List Sent: Friday, August 20, 2004 6:39 PM Subject: Re: pdfboxhelp What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
Re: pdf search
hi karthik, I have a website with some items, each contain html and pdf documents , I have to store keywords against each item, whenever a user enters any search word if it matches with any one of the existing keyword list then it should show the link to particular Item. - Original Message - From: "Karthik N S" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Friday, August 20, 2004 6:56 PM Subject: RE: pdf search > hi > > What is that u intend to Search and What is this own 'search words' > > First Explain properly u'r requirement to the form to get intented > results. > > > > with regards > Karthik > > -Original Message- > From: Santosh [mailto:[EMAIL PROTECTED] > Sent: Friday, August 20, 2004 5:59 PM > To: Lucene Users List > Subject: pdf search > > > Hi, > > I am new bee to lucene. > > I have downloaded zip file. now how can i give my own list words to lucene? > In the demo i saw that lucene is automatically creating index if we run the > java program.but I want to give my own search words, how is it possible? > > > regards > Santosh kumar > SoftPro Systems > Hyderabad > > > "The harder you train in peace, the lesser you bleed in war" > > ---SOFTPRO DISCLAIMER-- > > > > Information contained in this E-MAIL and any attachments are > > confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' > > and 'confidential'. > > > > If you are not an intended or authorised recipient of this E-MAIL or > > have received it in error, You are notified that any use, copying or > > dissemination of the information contained in this E-MAIL in any > > manner whatsoever is strictly prohibited. Please delete it immediately > > and notify the sender by E-MAIL. > > > > In such a case reading, reproducing, printing or further dissemination > > of this E-MAIL is strictly prohibited and may be unlawful. > > > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment > > hereto is free from computer viruses or other defects. > > > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be > > those of the author and are not necessarily those of SOFTPRO SYSTEMS. > > > > > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Debian build problem with 1.4.1
Hello Jeff, I don't have Debian to try this out, and this is going to be a stupid question and suggestion, but where/how is the CLASSPATH set? Are any of those commands actually using Lucene's build.xml? I'm asking, because it looks like your compiler is not finding Reader and IOException classes, both of which are in java.io.* package, which I see imported in StandardTokenizer.java as 'import java.io.*;'. Otis --- Jeff Breidenbach <[EMAIL PROTECTED]> wrote: > > Hi all, > > I am the Debian package maintainer for Lucene, and I'm having build > problems with 1.4.1. We are very close to a major Debian release > (code > named 'sarge'), and the window for changes is very small. Can someone > please help me in the next day or two, otherwise Debian stable will > ship > Lucene 1.4-final for the next couple of years. It looks to me like > the > problem is in javacc generated code, and it's not obvious to me what > to do. > > For debian sarge or sid users out there who want to reproduce the > build problem, download the lucene 1.4.1 source tarball, then: > > apt-get install devscripts > apt-get source liblucene-java > cd lucene-1.4 > uupdate -v 1.4.1 ../lucene-1.4.1-src.tar.gz > cd ../lucene-1.4.1 > debuild -us -uc > > Cheers, > Jeff > > = > > > compile-core: > [mkdir] Created dir: /tmp/lucene/lucene-1.4.1/build/classes/java > [javac] Compiling 160 source files to > /tmp/lucene/lucene-1.4.1/build/classes/java > [javac] > /tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:15: > cannot resolve symbol > [javac] symbol : class Reader > [javac] location: class > org.apache.lucene.analysis.standard.StandardTokenizer > [javac] public StandardTokenizer(Reader reader) { > [javac]^ > [javac] > /tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:24: > cannot resolve symbol > [javac] symbol : class IOException > [javac] location: class > org.apache.lucene.analysis.standard.StandardTokenizer > [javac] final public org.apache.lucene.analysis.Token next() > throws ParseException, IOException { > [javac] > ^ > [javac] > /tmp/lucene/lucene-1.4.1/src/java/org/apache/lucene/analysis/standard/StandardTokenizer.java:15: > recursive constructor invocation > [javac] public StandardTokenizer(Reader reader) { > [javac] ^ > [javac] 3 errors > > - > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: lucene and ejb applications
Option b) sounds simpler and sufficient to me. I don't see why you would need to involve RMI for something as simple as this. I use something similar to your b) option for some indices behind http://www.simpy.com/ . I don't store IndexSearcher in the servlet context, though - I just have some logic like this: /** * Returns an instance of [EMAIL PROTECTED] IndexDescriptor} for the given * indexID, which must represent an absolute file * path to the index directory. * * This method caches [EMAIL PROTECTED] IndexDescriptor}s in a LRU Map and * first tries to retrieve them from there. * * If the specified index has been changed since the the last time * it was used, its [EMAIL PROTECTED] Searcher} is reloaded. * * @param indexID the full path to the index directory * @return an instance of [EMAIL PROTECTED] IndexDescriptor} * @throws SearcherException if the given index cannot be accessed */ IndexDescriptor getUserSearcherIndexDescriptor(String indexID) throws SearcherException { File indexDir = validateIndex(indexID); IndexDescriptor indexDescriptor = getIndexDescriptorFromCache(indexDir); try { // if this is a known index if (indexDescriptor != null) { // if the index has changed since this Searcher was created, make a new Searcher long currentVersion = IndexReader.getCurrentVersion(indexDir); if (currentVersion > indexDescriptor.lastKnownVersion) { indexDescriptor.lastKnownVersion = currentVersion; indexDescriptor.searcher = new LuceneUserSearcher(indexDir); } } // if this is a new index else { indexDescriptor = new IndexDescriptor(); indexDescriptor.indexDir = indexDir; indexDescriptor.lastKnownVersion = IndexReader.getCurrentVersion(indexDir); indexDescriptor.searcher = new LuceneUserSearcher(indexDir); } return cacheIndexDescriptor(indexDescriptor); } catch (IOException e) { throw new SearcherException("Cannot open index: " + indexDir, e); } } IndexDescriptor is a simple struct-like class. Otis --- Rupinder Singh Mazara <[EMAIL PROTECTED]> wrote: > hi erik > > thanks for the warning and the code. > Let me re-phrase the question, > > i have a index generated by lucene, i need to have the search > capabilty > to have a high availabilty. What solutions would be the most optimal > > Currentlly i have two senarions in mind > a) setup a RMI based app. that on start-up initializes a > IndexSearcher > object > and waits for invocation of a method like Vector > executeQuery(Query ) > > b) create a web based app(jsp/servlet or struts) that initialises > the > IndexSearcher object, and stores in the servletContext on > intialization, and > all request invoke the Hits search(Query q) > > with senario a) i can have more control over updates, insert, and > deletes > where as with senario b) has higher availabilty > > I want to create and store the IndexSearcher object, during > initailization > to save on > mutlitple open and reads. once updates are ready signal can be sent > to > block further searches while the updates are integrated into the > existing > index. > > > > >-Original Message- > >From: Erik Hatcher [mailto:[EMAIL PROTECTED] > >Sent: 20 August 2004 11:13 > >To: Lucene Users List > >Subject: Re: lucene and ejb applications > > > > > >What would be the best way? Use Lucene outside of EJB. It's quite > >silly to make such a decision "purely due to a policy decision" when > >the technicalities of it show that it is an unwise decision. > > > >You're going to navigate Hits through a session bean? And as you > said, > >the EJB spec says not to use file I/O from EJB's. That is a good > >recommendation if you are distributing your system across servers > and > >replication is occurring - if another call to a session bean occurs > and > >ends up on a different server, then the file handle is lost. > > > >I violate the spec in my JavaDevWithAnt project and have one mode > where > >I have a stateless session bean returning search results: > >http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely > do > >not recommend it. It works when you are in a single-server > >environment. > > > >In summary - EJB and Lucene are not a good mix - don't force it just > to > >be buzzword compliant. > > > > Erik > > > > > >On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote: > > > >> hi all > >> > >>purely due to a policy decision, we would like to host our > lucene > >> search > >> application , in a j2ee container, preferable by means of a ejb. > >> Since access to java.io is restricted by the ej
RE: pdf search
hi What is that u intend to Search and What is this own 'search words' First Explain properly u'r requirement to the form to get intented results. with regards Karthik -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: Friday, August 20, 2004 5:59 PM To: Lucene Users List Subject: pdf search Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: pdfboxhelp
What are your intensions with PDFBox? You want to use it to index PDF files? Santosh wrote: hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. -- Don Vaillancourt Director of Software Development WEB IMPACT INC. phone: 416-815-2000 ext. 245 fax: 416-815-2001 email: [EMAIL PROTECTED] web: http://www.web-impact.com This email message is intended only for the addressee(s) and contains information that may be confidential and/or copyright. If you are not the intended recipient please notify the sender by reply email and immediately delete this email. Use, disclosure or reproduction of this email by anyone other than the intended recipient(s) is strictly prohibited. No representation is made that this email or any attachments are free of viruses. Virus scanning is recommended and is the responsibility of the recipient. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
pdfboxhelp
hi, I have downloaded pdfbox zip. but i am in ambigous state that where to start. how can I check with demo, I dont see any help document with this download, please help me. regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
RE: pdf search
Hi Santosh, Lucene doesn't search pdfs per se. To make anything searchable you have to first extract the content and then put it in lucene in a form it understands (i.e document objects). So in order to search your pdfs you first need to extract the info from the PDFs using something like PDFBox. So your battleplan should be forget lucene for a while, get the raw data out of all the items you want to search. Then look at the lucene articles about creating simple searchable indices. DT "If we didn't train to fight, who'd fight the wars?" :) -Original Message- From: Santosh [mailto:[EMAIL PROTECTED] Sent: 20 August 2004 13:30 To: Lucene Users List Subject: Fw: pdf search How can I search through PDF? - Original Message - From: Santosh To: Lucene Users List Sent: Friday, August 20, 2004 5:59 PM Subject: pdf search Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Fw: pdf search
In order to search through a PDF document the text must be extracted from the PDF document. There are several libraries to do that, including http://www.pdfbox.org After you have the text from the PDF document you just add it to the lucene index like any other text document. You should go through the intro tutorial to understand how to index/search text using lucene. Ben On Fri, 20 Aug 2004, Santosh wrote: > How can I search through PDF? > - Original Message - > From: Santosh > To: Lucene Users List > Sent: Friday, August 20, 2004 5:59 PM > Subject: pdf search > > > Hi, > > I am new bee to lucene. > > I have downloaded zip file. now how can i give my own list words to lucene? > In the demo i saw that lucene is automatically creating index if we run the java > program.but I want to give my own search words, how is it possible? > > > regards > Santosh kumar > SoftPro Systems > Hyderabad > > > "The harder you train in peace, the lesser you bleed in war" > > ---SOFTPRO DISCLAIMER-- > > Information contained in this E-MAIL and any attachments are > confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' > and 'confidential'. > > If you are not an intended or authorised recipient of this E-MAIL or > have received it in error, You are notified that any use, copying or > dissemination of the information contained in this E-MAIL in any > manner whatsoever is strictly prohibited. Please delete it immediately > and notify the sender by E-MAIL. > > In such a case reading, reproducing, printing or further dissemination > of this E-MAIL is strictly prohibited and may be unlawful. > > SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment > hereto is free from computer viruses or other defects. > > The opinions expressed in this E-MAIL and any ATTACHEMENTS may be > those of the author and are not necessarily those of SOFTPRO SYSTEMS. > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Fw: pdf search
How can I search through PDF? - Original Message - From: Santosh To: Lucene Users List Sent: Friday, August 20, 2004 5:59 PM Subject: pdf search Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
pdf search
Hi, I am new bee to lucene. I have downloaded zip file. now how can i give my own list words to lucene? In the demo i saw that lucene is automatically creating index if we run the java program.but I want to give my own search words, how is it possible? regards Santosh kumar SoftPro Systems Hyderabad "The harder you train in peace, the lesser you bleed in war" ---SOFTPRO DISCLAIMER-- Information contained in this E-MAIL and any attachments are confidential being proprietary to SOFTPRO SYSTEMS is 'privileged' and 'confidential'. If you are not an intended or authorised recipient of this E-MAIL or have received it in error, You are notified that any use, copying or dissemination of the information contained in this E-MAIL in any manner whatsoever is strictly prohibited. Please delete it immediately and notify the sender by E-MAIL. In such a case reading, reproducing, printing or further dissemination of this E-MAIL is strictly prohibited and may be unlawful. SOFTPRO SYSYTEMS does not REPRESENT or WARRANT that an attachment hereto is free from computer viruses or other defects. The opinions expressed in this E-MAIL and any ATTACHEMENTS may be those of the author and are not necessarily those of SOFTPRO SYSTEMS.
RE: lucene and ejb applications
hi erik thanks for the warning and the code. Let me re-phrase the question, i have a index generated by lucene, i need to have the search capabilty to have a high availabilty. What solutions would be the most optimal Currentlly i have two senarions in mind a) setup a RMI based app. that on start-up initializes a IndexSearcher object and waits for invocation of a method like Vector executeQuery(Query ) b) create a web based app(jsp/servlet or struts) that initialises the IndexSearcher object, and stores in the servletContext on intialization, and all request invoke the Hits search(Query q) with senario a) i can have more control over updates, insert, and deletes where as with senario b) has higher availabilty I want to create and store the IndexSearcher object, during initailization to save on mutlitple open and reads. once updates are ready signal can be sent to block further searches while the updates are integrated into the existing index. >-Original Message- >From: Erik Hatcher [mailto:[EMAIL PROTECTED] >Sent: 20 August 2004 11:13 >To: Lucene Users List >Subject: Re: lucene and ejb applications > > >What would be the best way? Use Lucene outside of EJB. It's quite >silly to make such a decision "purely due to a policy decision" when >the technicalities of it show that it is an unwise decision. > >You're going to navigate Hits through a session bean? And as you said, >the EJB spec says not to use file I/O from EJB's. That is a good >recommendation if you are distributing your system across servers and >replication is occurring - if another call to a session bean occurs and >ends up on a different server, then the file handle is lost. > >I violate the spec in my JavaDevWithAnt project and have one mode where >I have a stateless session bean returning search results: >http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely do >not recommend it. It works when you are in a single-server >environment. > >In summary - EJB and Lucene are not a good mix - don't force it just to >be buzzword compliant. > > Erik > > >On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote: > >> hi all >> >>purely due to a policy decision, we would like to host our lucene >> search >> application , in a j2ee container, preferable by means of a ejb. >> Since access to java.io is restricted by the ejb specification, what >> would >> be the best way to create desgin the application ? >> i have taken a look at [EMAIL PROTECTED] but it my relies on mbeans and >> not a >> session bean >> does any one have pointers or samples that can be looked at >> >> >> >> >> >> >> - >> To unsubscribe, e-mail: [EMAIL PROTECTED] >> For additional commands, e-mail: [EMAIL PROTECTED] > > >- >To unsubscribe, e-mail: [EMAIL PROTECTED] >For additional commands, e-mail: [EMAIL PROTECTED] > > - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: lucene and ejb applications
What would be the best way? Use Lucene outside of EJB. It's quite silly to make such a decision "purely due to a policy decision" when the technicalities of it show that it is an unwise decision. You're going to navigate Hits through a session bean? And as you said, the EJB spec says not to use file I/O from EJB's. That is a good recommendation if you are distributing your system across servers and replication is occurring - if another call to a session bean occurs and ends up on a different server, then the file handle is lost. I violate the spec in my JavaDevWithAnt project and have one mode where I have a stateless session bean returning search results: http://www.ehatchersolutions.com/JavaDevWithAnt - but I definitely do not recommend it. It works when you are in a single-server environment. In summary - EJB and Lucene are not a good mix - don't force it just to be buzzword compliant. Erik On Aug 20, 2004, at 4:32 AM, Rupinder Singh Mazara wrote: hi all purely due to a policy decision, we would like to host our lucene search application , in a j2ee container, preferable by means of a ejb. Since access to java.io is restricted by the ejb specification, what would be the best way to create desgin the application ? i have taken a look at [EMAIL PROTECTED] but it my relies on mbeans and not a session bean does any one have pointers or samples that can be looked at - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
lucene and ejb applications
hi all purely due to a policy decision, we would like to host our lucene search application , in a j2ee container, preferable by means of a ejb. Since access to java.io is restricted by the ejb specification, what would be the best way to create desgin the application ? i have taken a look at [EMAIL PROTECTED] but it my relies on mbeans and not a session bean does any one have pointers or samples that can be looked at - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]