Re: optimize fails with Negative seek offset

2004-05-12 Thread Sascha Ottolski
Hi,

sorry for following up my own mail, but since no one responded so
far, I thought the stacktrace might be of interested. The following
exception always occurs when trying to optimize one of our indizes,
which always went ok for about a year now. I just tried with 1.4-rc3,
but with the same result:

java.io.IOException: Negative seek offset
at java.io.RandomAccessFile.seek(Native Method)
at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:405)
at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
at 
org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:222)
at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:63)
at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:238)
at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:483)
at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:362)
at LuceneRPCHandler.optimize(LuceneRPCHandler.java:398)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:324)
at org.apache.xmlrpc.Invoker.execute(Invoker.java:168)
at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:123)
at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:185)
at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:151)
at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139)
at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:773)
at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:656)
at java.lang.Thread.run(Thread.java:534)


Any hint would be greatly appreciated.


Thanks,

Sascha

-- 
Gallileus - the power of knowledge

Gallileus GmbHhttp://www.gallileus.info/

Pintschstraße 16  fon +49-(0)30-41 93 43 43
10249 Berlin  fax +49-(0)30-41 93 43 45
Germany



++
AKTUELLER HINWEIS (Mai 2004)

Literatur Alerts - Literatursuche (wie) im Schlaf!

Ab jetzt mehr dazu unter:
http://www.gallileus.info/gallileus/about/products/alerts/
++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new Lucene release: 1.4 RC3

2004-05-12 Thread Terry Steichen
I presume this still requires Java 1.4 to build, but will run with Java 1.3?

Regards,

Terry

- Original Message - 
From: Doug Cutting [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Tuesday, May 11, 2004 4:51 PM
Subject: new Lucene release: 1.4 RC3


 Version 1.4 RC3 of Lucene is available for download from:

 http://cvs.apache.org/dist/jakarta/lucene/v1.4-rc3/

 Changes are described at:


http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.85

 Doug

 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new Lucene release: 1.4 RC3

2004-05-12 Thread Otis Gospodnetic
I don't recall any JDK 1.4 methods/classes being used, and I just saw
Doug replacing one AssertException (1.4) with RuntimeException.

Are there some 1.4 dependencies I'm not aware of?

Otis

--- Terry Steichen [EMAIL PROTECTED] wrote:
 I presume this still requires Java 1.4 to build, but will run with
 Java 1.3?
 
 Regards,
 
 Terry
 
 - Original Message - 
 From: Doug Cutting [EMAIL PROTECTED]
 To: Lucene Users List [EMAIL PROTECTED]
 Sent: Tuesday, May 11, 2004 4:51 PM
 Subject: new Lucene release: 1.4 RC3
 
 
  Version 1.4 RC3 of Lucene is available for download from:
 
  http://cvs.apache.org/dist/jakarta/lucene/v1.4-rc3/
 
  Changes are described at:
 
 

http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.85
 
  Doug
 
 
 -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail:
 [EMAIL PROTECTED]
 
 
 
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: new Lucene release: 1.4 RC3

2004-05-12 Thread Terry Steichen
Last time I checked, JDK 1.4 was needed to compile the classes implementing
the new sorting features.  Part of the issue was the inclusion of the regex
classes, but the other dependency had to do (as I recall) with some kind of
inner class constructs (that JDK 1.3 won't compile).  I believe that the
contributor, Tim Jones, fixed some of then to work with JDK 1.3, but to the
best of my knowledge, not the inner class stuff.

Regards,

Terry

- Original Message - 
From: Otis Gospodnetic [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, May 12, 2004 8:04 AM
Subject: Re: new Lucene release: 1.4 RC3


 I don't recall any JDK 1.4 methods/classes being used, and I just saw
 Doug replacing one AssertException (1.4) with RuntimeException.

 Are there some 1.4 dependencies I'm not aware of?

 Otis

 --- Terry Steichen [EMAIL PROTECTED] wrote:
  I presume this still requires Java 1.4 to build, but will run with
  Java 1.3?
 
  Regards,
 
  Terry
 
  - Original Message - 
  From: Doug Cutting [EMAIL PROTECTED]
  To: Lucene Users List [EMAIL PROTECTED]
  Sent: Tuesday, May 11, 2004 4:51 PM
  Subject: new Lucene release: 1.4 RC3
 
 
   Version 1.4 RC3 of Lucene is available for download from:
  
   http://cvs.apache.org/dist/jakarta/lucene/v1.4-rc3/
  
   Changes are described at:
  
  
 

http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.85
  
   Doug
  
  
  -
   To unsubscribe, e-mail: [EMAIL PROTECTED]
   For additional commands, e-mail:
  [EMAIL PROTECTED]
  
  
 
 
  -
  To unsubscribe, e-mail: [EMAIL PROTECTED]
  For additional commands, e-mail: [EMAIL PROTECTED]
 


 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize fails with Negative seek offset

2004-05-12 Thread Anthony Vito
Looks like the same error I got when I tried to use Lucene version 1.3
to search on an index I had created with Lucene version 1.4. The
versions are not forward compatible. Did you by chance create the index
with version 1.4 and are now searching with version 1.3. It's easy to
get the dependencies out of sync for different apps, which is what
happened to me.

-vito

On Wed, 2004-05-12 at 04:59, Sascha Ottolski wrote:
 Hi,
 
 sorry for following up my own mail, but since no one responded so
 far, I thought the stacktrace might be of interested. The following
 exception always occurs when trying to optimize one of our indizes,
 which always went ok for about a year now. I just tried with 1.4-rc3,
 but with the same result:
 
 java.io.IOException: Negative seek offset
 at java.io.RandomAccessFile.seek(Native Method)
 at org.apache.lucene.store.FSInputStream.readInternal(FSDirectory.java:405)
 at org.apache.lucene.store.InputStream.readBytes(InputStream.java:61)
 at 
 org.apache.lucene.index.CompoundFileReader$CSInputStream.readInternal(CompoundFileReader.java:222)
 at org.apache.lucene.store.InputStream.refill(InputStream.java:158)
 at org.apache.lucene.store.InputStream.readByte(InputStream.java:43)
 at org.apache.lucene.store.InputStream.readVInt(InputStream.java:83)
 at org.apache.lucene.index.FieldsReader.doc(FieldsReader.java:63)
 at org.apache.lucene.index.SegmentReader.document(SegmentReader.java:238)
 at org.apache.lucene.index.SegmentMerger.mergeFields(SegmentMerger.java:185)
 at org.apache.lucene.index.SegmentMerger.merge(SegmentMerger.java:92)
 at org.apache.lucene.index.IndexWriter.mergeSegments(IndexWriter.java:483)
 at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:362)
 at LuceneRPCHandler.optimize(LuceneRPCHandler.java:398)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:324)
 at org.apache.xmlrpc.Invoker.execute(Invoker.java:168)
 at org.apache.xmlrpc.XmlRpcWorker.invokeHandler(XmlRpcWorker.java:123)
 at org.apache.xmlrpc.XmlRpcWorker.execute(XmlRpcWorker.java:185)
 at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:151)
 at org.apache.xmlrpc.XmlRpcServer.execute(XmlRpcServer.java:139)
 at org.apache.xmlrpc.WebServer$Connection.run(WebServer.java:773)
 at org.apache.xmlrpc.WebServer$Runner.run(WebServer.java:656)
 at java.lang.Thread.run(Thread.java:534)
 
 
 Any hint would be greatly appreciated.
 
 
 Thanks,
 
 Sascha
 
 -- 
 Gallileus - the power of knowledge
 
 Gallileus GmbHhttp://www.gallileus.info/
 
 Pintschstrae 16  fon +49-(0)30-41 93 43 43
 10249 Berlin  fax +49-(0)30-41 93 43 45
 Germany
 
 
 
 ++
 AKTUELLER HINWEIS (Mai 2004)
 
 Literatur Alerts - Literatursuche (wie) im Schlaf!
 
 Ab jetzt mehr dazu unter:
 http://www.gallileus.info/gallileus/about/products/alerts/
 ++
 
 -
 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]
 
 



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: optimize fails with Negative seek offset

2004-05-12 Thread Sascha Ottolski
Am Mittwoch, 12. Mai 2004 18:54 schrieb Anthony Vito:
 Looks like the same error I got when I tried to use Lucene version
 1.3 to search on an index I had created with Lucene version 1.4. The
 versions are not forward compatible. Did you by chance create the
 index with version 1.4 and are now searching with version 1.3. It's
 easy to get the dependencies out of sync for different apps, which is
 what happened to me.

 -vito

Hi vito,

thanks for the reply, but no, we only upgraded so far, but did not 
downgade. More than that, the failing index was just rebuilt completely 
with 1.4-rc2, only two weeks ago. The problem started a short time 
afterwards (but not immediately).


Greets,

Sascha

-- 
Gallileus - the power of knowledge

Gallileus GmbHhttp://www.gallileus.info/

Pintschstrae 16  fon +49-(0)30-41 93 43 43
10249 Berlin  fax +49-(0)30-41 93 43 45
Germany



++
AKTUELLER HINWEIS (Mai 2004)

Literatur Alerts - Literatursuche (wie) im Schlaf!

Ab jetzt mehr dazu unter:
http://www.gallileus.info/gallileus/about/products/alerts/
++

-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



RE: multivalue fields

2004-05-12 Thread Gerard Sychay
I don't know if it will help, but take a look at the following email and
enclosing thread from a few weeks ago.

http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgNo=7737

 Ryan Sonnek [EMAIL PROTECTED] 05/11/04 12:40PM 
using lucene 1.3-final, it appears to only search the first field with
that name.  here's the code i'm using to construct the index, and I'm
using Luke to check that the index is created correctly.  Everything
looks fine, but my search returns empty.  do i have to use a special
query to work with multivalue fields?  is there a testcase in the source
that performs this kind of work that I could look at?

//indexing
  Document doc = new Document();

Iterator values = myValues.iterator();
while (values.hasNext()) {
Object value = values.next();
doc.add(Field.Keyword(test, value.toString()));
}

//searching
   BooleanQuery query = new BooleanQuery();
   Query fieldQuery = QueryParser.parse(searchValue, test,
ANALYZER);
   query.add(fieldQuery, true, false);
 
Ryan


 -Original Message-
 From: Otis Gospodnetic [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, May 11, 2004 11:31 AM
 To: Lucene Users List
 Subject: Re: multivalue fields
 
 
 Just add multiple Fields with the exact same name.
 
 Otis
 
 --- Ryan Sonnek [EMAIL PROTECTED] wrote:
  How can I construct a document that has multiple values for 
 one field
  (ex: locale en_US, de_DE, etc).  I've been concatonating the
values
  into one string and storing them in one field, but I think this
  affects the search rankings (more text to search produces lower
  score).  is it possible to append the seperate values to the same
  field without concatonating them together?
  
  Ryan
  


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



Re: Mixing database and lucene searches

2004-05-12 Thread Glen Stampoultzis
I think I follow what you're saying.  Thanks Phil.

Regards,

Glen

Phil brunet [EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
   -- Snip --
   If you can't guaranty a fixed number of Lucene results (and it is
often
 the  case !), a good way is to duplicate the last PK and so to  round to
a
 fixed
   number.
  
 
 Hi... I'm not sure what you mean by that last bit.
 
 

 Hi ... i'm going to try to express myself correctly ... in english :-)

 We were talking about the need to cross Lucene results and DB results. And
 that it could be a good idea to execute a query like :

 SELECT *
 FROM my_table
 WHERE
1st criteria  // this criteria was not expressed in the
 Lucene query
 AND 2nd criteria // this criteria was not expressed in the
Lucene
 query

 AND  
 
 AND my_pk IN (pk_value_1, pk_value_2,  pk_value_n);

 where pk_values have been previously retrieved by the Lucene query.


 In the JDBC statement, using bind variables is a good way to avoid useless
 query parsing time.

 But if the number of pk_value retrieved by the Lucene query is different
 for each query, using bind variables will not avoid the query parsing
time.
 Because the SQL query signature will be differente, so the rdbms will
need
 to parse the query again.

 To bypass this problem, you can round the number of b ind variable.

 For exemple, you know that your Lucene queries will retrieve ... let's say
 ... a maximum of 1000 results.
 Sometimes only one result is retieved = you have one pk_value
 Sometime  5 results are retrieved   = you have five pk_value
 Sometime  etc

 I suggest that in each case, you duplicate the last pk_value in order to
 have always the same number of bind variables in the SQL statement.

 In my exemple, you will always have 1000 bind variables in the SQL
 statement, whatever you had one, five or n results.

 Especially for short SQL queries, avoiding parsing time is really precious
 (i work with Oracle DB- sic !)

 _
 MSN Search, le moteur de recherche qui pense comme vous !
 http://search.msn.fr/




-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



clean up html before indexing or add tags to ignore list

2004-05-12 Thread Sebastian Ho
Hi

This is a typical web crawler, indexing and search application
development. I have wrote my crawler and planning to add lucene in next.
One questions pop to my mind, in terms of performance, do i clean up the
html removing all tags before indexing, or i add all tags into the
ignore list during indexing/search stage. 

Which is better?

Thanks

Sebastian Ho


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]



lucene-1.4-rc2 and JVM version

2004-05-12 Thread Zhang, Lisheng
Hi,

We were starting to learn and use lucene about 3 weeks ago,
it is really a great product! Here we have some problems with
certain JVM versions (SUN jdk). We are using lucene-1.4-rc2 
on Solaris 2.8 platform:

(1) We have a program to index about 230 documents. If using
jdk1.4.1_02, our program often hanged at

IndexWriter.addDocument(doc);

At which document it hanges is essentially random.

My question is: is there any known issues with jdk1.4.1_02 and
lucene-1.4-rc2 (BUILD.txt said any jdk later than 1.2 is OK) ?

(2) We also found for some trivial search program, jdk1.3.0 would
crash, but jdk1.3.1_03 is OK (below I attached my search code).

If running on jdk1.3.0, I got the following message (at the line
calling IndexSearcher.search(...)):

#
# HotSpot Virtual Machine Error, Unexpected Signal 11
# Please report this error at
# http://java.sun.com/cgi-bin/bugreport.cgi
#
# Error ID: 4F533F534F4C415249530E435050079A 01
#
# Problematic Thread: prio=5 tid=0x29800 nid=0x1 runnable 
#

Is this a known problem with jdk1.3.0 ? The same program run
through with jdk1.3.1_03 fine.

I would really appreciate any help and guidance on these two
issues.

Best regards, Lisheng

##
import java.io.*;

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.*;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.Searcher;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.Hits;
import org.apache.lucene.search.Sort;
import org.apache.lucene.queryParser.QueryParser;

class UrSearch {
  private static void log(String msg)
  {
System.out.println(msg);
  }

  public static void main(String[] argv) 
  {
try {
  Searcher searcher = new IndexSearcher(./myindex);
  Searcher[] searches = new Searcher[1];
  searches[0] = searcher;

  Analyzer analyzer = new StandardAnalyzer();

  Query query0 = simpleQuery(analyzer);

  log(Q= + query0.toString());
  log(QueryClass= + query0.getClass().toString());

  Sort sort = new Sort(); 

  // Crash on this line if jdk1.3.0 !!!
  Hits hits = searcher.search(query0, sort);

  log(hits.length() +  total matching documents);

  for(int i=0; ihits.length(); i++) {
Document doc = hits.doc(i);

log(docid= + doc.get(docid));
log(score= + hits.score(i));
  }
  searcher.close();

} catch (Exception ex) {
  log(EXTYPE:  + ex.getClass().getName());
  log(EXMSG:  + ex.getMessage());

  try {
PrintWriter mout = new PrintWriter(new FileOutputStream(err.dat),
true);
ex.printStackTrace(mout);
  } catch(FileNotFoundException newex) {
log(TERRIBLE:  + newex.getMessage());
System.exit(0);
  }
}
  }

  static Query simpleQuery(Analyzer analyzer)
throws Exception
  {
Query q1 = QueryParser.parse(iepeditorial, all, analyzer);

return q1;
  }
}
##



-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]