which lucene
hello luceners i have installed lucene on my linux-debian testing. so there is the jarfile lucene-1.4.3.jar under /usr/share/java. so far so god. there is a german stemmer and a german analyzer in it under org.apache.lucene.analysis.de who works pretty well. but the official release eg. from http://mirror.switch.ch/mirror/apache/dist/lucene/java/ is 2.4.1. is quite different. so there is no german analyzer in this package. but some other features are available like setAllowLeadingWildcard(true), which are not included in the official debian release 1.4.3. so my question. which one of the releases are recommended to use? 1.4.3 or 2.4.1? how do i get to a release 2.4.1 with a german stemmer/analyzer? my target ist, to search with lucene on a large number of textfiles with german, french and italian text. thank you for your attention and greets from switzerland (the land with the many äöü's..:-), timon - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
newbie question again
hello list sory, this is maeby a stupid questin, but i can't resolve. so maeby you can help me: i try to compile the indexer-example from the book lucene in action, 2nd edition (http://www.manning.com/hatcher3/hatcher_meapch1.pdf), but get the following error: -- javac -Xlint -cp ":.:./lucene/lucene-core-2.4.1.jar" Indexer.java Indexer.java:37: cannot find symbol symbol : constructor FSDirectory(java.io.File,) location: class org.apache.lucene.store.FSDirectory Directory dir = new FSDirectory(new File(indexDir), null); ^ 1 error -- it means the following codesgement: public Indexer(String indexDir) throws IOException { Directory dir = new FSDirectory(new File(indexDir)); writer = new IndexWriter(dir, new StandardAnalyzer(), true, IndexWriter.maxFieldLength.UNLIMITED); } im using debian testing with.. java -version java version "1.6.0_0" OpenJDK Runtime Environment (build 1.6.0_0-b11) OpenJDK Server VM (build 1.6.0_0-b11, mixed mode) sourcecode+makefile are attached: thanks for your help, timon import org.apache.lucene.index.IndexWriter; import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.store.Directory; import java.io.File; import java.io.IOException; import java.io.FileReader; public class Indexer { private IndexWriter writer; public static void main(String[] args) throws Exception { if (args.length != 2) { throw new Exception("Usage: java " + Indexer.class.getName() + " "); } String indexDir = args[0]; String dataDir = args[1]; long start = System.currentTimeMillis(); Indexer indexer = new Indexer(indexDir); int numIndexed = indexer.index(dataDir); indexer.close(); long end = System.currentTimeMillis(); System.out.println("Indexing " + numIndexed + " files took " + (end - start) + " milliseconds"); } public Indexer(String indexDir) throws IOException { Directory dir = new FSDirectory(new File(indexDir), null); writer = new IndexWriter(dir, new StandardAnalyzer(), true, IndexWriter.MaxFieldLength.UNLIMITED); } public void close() throws IOException { writer.close(); } public int index(String dataDir) throws Exception { File[] files = new File(dataDir).listFiles(); for (int i = 0; i < files.length; i++) { File f = files[i]; if (!f.isDirectory() && !f.isHidden() && f.exists() && f.canRead() && acceptFile(f)) { indexFile(f); } } return writer.numDocs(); } protected boolean acceptFile(File f) { return f.getName().endsWith(".txt"); } protected Document getDocument(File f) throws Exception { Document doc = new Document(); doc.add(new Field("contents", new FileReader(f))); doc.add(new Field("filename", f.getCanonicalPath(), Field.Store.YES, Field.Index.NOT_ANALYZED)); return doc; } private void indexFile(File f) throws Exception { System.out.println("Indexing " + f.getCanonicalPath()); Document doc = getDocument(f); if (doc != null) { writer.addDocument(doc); } } } LANG='de_CH'; CP="$(CLASSPATH):.:./lucene/lucene-core-2.4.1.jar" all: javac -Xlint -cp $(CP) Indexer.java - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
'problem with indexformat and luke
hello list i am using lucene 2.9. when i try to open the index with luke i got an error: unknown format version: -8 any hints? - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: 'problem with indexformat and luke
sounds bad... i use luke 0.9.2 (2009-03-20) who supports lucene until version 2.4. so why do i use lucene 2.9? are there some other monitoring tools? Am Freitag, 8. Mai 2009 schrieb Grant Ingersoll: > This usually means that your index was created using a newer version > of Lucene than is bundled with Luke. You will need to get the Luke > minimal jars (no Lucene) and use that along with the Lucene versions > you have. > > On May 8, 2009, at 12:42 PM, Timon Roth wrote: > > > hello list > > > > i am using lucene 2.9. when i try to open the index with luke i got > > an error: > > > > unknown format version: -8 > > > > any hints? > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > -- > Grant Ingersoll > http://www.lucidimagination.com/ > > Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) > using Solr/Lucene: > http://www.lucidimagination.com/search > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > -- Timon Roth Triemlistrasse 92 8047 Zürich -- 043 817 40 31 079 636 57 28 -- digitalforce.ch timon.r...@digitalforce.ch http://tel.search.ch/zuerich/triemlistrasse-92/timon-roth - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
german analyers xes me
hello list al little confusion with a phrasequery. im using lucene 2.9 and have indexed all the data with the germananalyzer. i have one field (full_text) for the searchable data and a few fields for sorting. the full_text ist not stored and analyzed. the fields for sorting are storen and not analyzed. doc.add(new Field("full_text", value,Field.Store.NO, Field.Index.ANALYZED)); doc.add(new Field("needs_sort", value,Field.Store.YES, Field.Index.NOT_ANALYZED)); so i do the following phrasesearch "öffentliche finanzen und abgaberecht"... the queryparser is feeded with the germananalyzer and translates the phrase to "offentlich finanx abgaberech". QueryParser parser = new QueryParser("full_text", new GermanAnalyzer()); but the result is not as expected. it gives me all hits who have the phrase in a sortfield, which i am not use for searching. other querys for searching works pretty well just like "gemeindeautonomie; art. 8, 9 und 26 bv" any hints? -- Timon Roth Triemlistrasse 92 8047 Zürich -- 043 817 40 31 079 636 57 28 -- digitalforce.ch timon.r...@digitalforce.ch http://tel.search.ch/zuerich/triemlistrasse-92/timon-roth - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
confusion with questionmark
dear list im searching through some lucene(2.9) index built with the GermanAnalyzer (from the package analyzers 2.9). when i search for the word deutschland (query parsed with german alnalyzer transforms to deutschla) i get a few hits. whei im searching for deu?schland i became no results, because the word leaves as it is (deu?schland). when i try deu?schal (same as deutschla), i get the same numbers of hits like when im searching for deutschland. so where did i go wrong?..:-) gruess, timon -- Timon Roth Triemlistrasse 92 8047 Zürich -- 043 817 40 31 079 636 57 28 -- digitalforce.ch timon.r...@digitalforce.ch http://tel.search.ch/zuerich/triemlistrasse-92/timon-roth
read between the lines of an index
dear list i want to add a entry to an index with a custom synomlist to an index. for example with the following text: [i worrie about nothing beacuse this worls is crazy] and i want to add the two custom synonyms [anything]=>[nothing] and [lazy]=>[crazy] so that a search for lazy, crazy nothing and anything gives me a hit to the entry in the index. the point is, that prasesearch must still work. for exapmple when im searching for: "this world is crazy" or "i worrie about nothing" must result in a hit, and i cannot just paste the sysnonyms after the existing words like this: [i worrie about nothing anything beacuse this worls is crazy lazy] how ca i do this? is there a possibility to insert more then one word at the same position? gruess, timon -- Timon Roth Triemlistrasse 92 8047 Zürich -- 043 817 40 31 079 636 57 28 -- digitalforce.ch timon.r...@digitalforce.ch http://tel.search.ch/zuerich/triemlistrasse-92/timon-roth
wheres the word
hello list im figgering about the following problem. in my index i cant find the word BE, but it exists in two documents. im usinglucene 2.4 with the standardanalyzer. other querys with words like de, et or de la works good. any ideas? gruess, timon
Re: wheres the word
hoi paul i now tried with the hint from mark miller...disabling all the stopwords from standardanalyzer... String stop_words[] = new String[0]; ...StandardAnalyzer(stop_words); this works perfect..;-) gruess, timon Am Donnerstag, 25. Juni 2009 schrieb Paul Libbrecht: > > Le 25-juin-09 à 01:28, Mark Miller a écrit : > >> im figgering about the following problem. in my index i cant find > >> the word BE, but it exists in two documents. im usinglucene 2.4 > >> with the standardanalyzer. > >> other querys with words like de, et or de la works good. any ideas? > > be is a stopword. Do yourself a favor and turn off stopwords. Best > > to remove them at query time if you really need to. > > Timon, you spotted it: the analyzer. You need to care to take the > right analyzer and, if that language token (?) is a different field, > you ned to use a different analyzer, e.g. whitespaceanalyzer... > > paul -- Timon Roth Triemlistrasse 92 8047 Zürich -- 043 817 40 31 079 636 57 28 -- digitalforce.ch timon.r...@digitalforce.ch http://tel.search.ch/zuerich/triemlistrasse-92/timon-roth