date:20101101


[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926858#action_12926858
 ] 

Nico Krijnen commented on LUCENE-2729:
--

{code}
jteb:assetIndex jteb$ ls -la
total 41550832
drwxr-xr-x  2 jteb  jteb4862  1 nov 08:52 .
drwxr-xr-x  4 jteb  jteb 238 29 okt 14:10 ..
-rw-r--r--@ 1 jteb  jteb   21508  1 nov 08:52 .DS_Store
-rw-r--r--  1 jteb  jteb   969134416 18 okt 16:41 _2q.fdt
-rw-r--r--  1 jteb  jteb   36652 18 okt 16:41 _2q.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:41 _2q.fnm
-rw-r--r--  1 jteb  jteb 4685726 18 okt 16:41 _2q.frq
-rw-r--r--  1 jteb  jteb9166 18 okt 16:41 _2q.nrm
-rw-r--r--  1 jteb  jteb   393230403 18 okt 16:42 _2q.prx
-rw-r--r--  1 jteb  jteb7447 18 okt 16:42 _2q.tii
-rw-r--r--  1 jteb  jteb  746299 18 okt 16:42 _2q.tis
-rw-r--r--  1 jteb  jteb8394 18 okt 16:42 _2q.tvd
-rw-r--r--  1 jteb  jteb   599185081 18 okt 16:42 _2q.tvf
-rw-r--r--  1 jteb  jteb   73300 18 okt 16:42 _2q.tvx
-rw-r--r--  1 jteb  jteb  1595882722 18 okt 16:45 _3u.fdt
-rw-r--r--  1 jteb  jteb   63692 18 okt 16:45 _3u.fdx
-rw-r--r--  1 jteb  jteb 330 18 okt 16:45 _3u.fnm
-rw-r--r--  1 jteb  jteb 8001869 18 okt 16:45 _3u.frq
-rw-r--r--  1 jteb  jteb   15926 18 okt 16:45 _3u.nrm
-rw-r--r--  1 jteb  jteb   647374863 18 okt 16:45 _3u.prx
-rw-r--r--  1 jteb  jteb   11319 18 okt 16:45 _3u.tii
-rw-r--r--  1 jteb  jteb 1168399 18 okt 16:45 _3u.tis
-rw-r--r--  1 jteb  jteb   14209 18 okt 16:45 _3u.tvd
-rw-r--r--  1 jteb  jteb   986370136 18 okt 16:46 _3u.tvf
-rw-r--r--  1 jteb  jteb  127380 18 okt 16:46 _3u.tvx
-rw-r--r--  1 jteb  jteb  2691565961 18 okt 16:49 _4c.fdt
-rw-r--r--  1 jteb  jteb   39572 18 okt 16:49 _4c.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:49 _4c.fnm
-rw-r--r--  1 jteb  jteb18724620 18 okt 16:49 _4c.frq
-rw-r--r--  1 jteb  jteb9896 18 okt 16:49 _4c.nrm
-rw-r--r--  1 jteb  jteb   590255960 18 okt 16:50 _4c.prx
-rw-r--r--  1 jteb  jteb  141243 18 okt 16:50 _4c.tii
-rw-r--r--  1 jteb  jteb12185869 18 okt 16:50 _4c.tis
-rw-r--r--  1 jteb  jteb9894 18 okt 16:50 _4c.tvd
-rw-r--r--  1 jteb  jteb   932649779 18 okt 16:51 _4c.tvf
-rw-r--r--  1 jteb  jteb   79140 18 okt 16:51 _4c.tvx
-rw-r--r--  1 jteb  jteb  2398908136 18 okt 16:52 _4d.fdt
-rw-r--r--  1 jteb  jteb 548 18 okt 16:52 _4d.fdx
-rw-r--r--  1 jteb  jteb 354 18 okt 16:52 _4d.fnm
-rw-r--r--  1 jteb  jteb24581614 18 okt 16:52 _4d.frq
-rw-r--r--  1 jteb  jteb 140 18 okt 16:52 _4d.nrm
-rw-r--r--  1 jteb  jteb   158243133 18 okt 16:52 _4d.prx
-rw-r--r--  1 jteb  jteb  141948 18 okt 16:52 _4d.tii
-rw-r--r--  1 jteb  jteb12259425 18 okt 16:52 _4d.tis
-rw-r--r--  1 jteb  jteb 140 18 okt 16:52 _4d.tvd
-rw-r--r--  1 jteb  jteb   303769970 18 okt 16:53 _4d.tvf
-rw-r--r--  1 jteb  jteb1092 18 okt 16:53 _4d.tvx
-rw-r--r--  1 jteb  jteb  4118409126 29 okt 16:26 _6g.fdt
-rw-r--r--  1 jteb  jteb1484 29 okt 16:26 _6g.fdx
-rw-r--r--  1 jteb  jteb 384 29 okt 16:17 _6g.fnm
-rw-r--r--  1 jteb  jteb35294399 29 okt 16:27 _6g.frq
-rw-r--r--  1 jteb  jteb 374 29 okt 16:27 _6g.nrm
-rw-r--r--  1 jteb  jteb   230791431 29 okt 16:27 _6g.prx
-rw-r--r--  1 jteb  jteb  143860 29 okt 16:27 _6g.tii
-rw-r--r--  1 jteb  jteb12491845 29 okt 16:27 _6g.tis
-rw-r--r--  1 jteb  jteb 295 29 okt 16:28 _6g.tvd
-rw-r--r--  1 jteb  jteb   444939185 29 okt 16:28 _6g.tvf
-rw-r--r--  1 jteb  jteb2964 29 okt 16:28 _6g.tvx
-rw-r--r--  1 jteb  jteb  2758122671 29 okt 16:31 _6h.fdt
-rw-r--r--  1 jteb  jteb   96388 29 okt 16:31 _6h.fdx
-rw-r--r--  1 jteb  jteb 723 29 okt 16:29 _6h.fnm
-rw-r--r--  1 jteb  jteb51142700 29 okt 16:31 _6h.frq
-rw-r--r--  1 jteb  jteb   24100 29 okt 16:31 _6h.nrm
-rw-r--r--  1 jteb  jteb   189178767 29 okt 16:31 _6h.prx
-rw-r--r--  1 jteb  jteb  270472 29 okt 16:31 _6h.tii
-rw-r--r--  1 jteb  jteb21710405 29 okt 16:31 _6h.tis
-rw-r--r--  1 jteb  jteb   23873 29 okt 16:31 _6h.tvd
-rw-r--r--  1 jteb  jteb   394088075 29 okt 16:31 _6h.tvf
-rw-r--r--  1 jteb  jteb  192772 29 okt 16:31 _6h.tvx
-rw-r--r--  1 jteb  jteb   0 29 okt 20:22 _8b.fnm
-rw-r--r--  1 jteb  jteb   0 29 okt 20:26 _8b.tvd
-rw-r--r--  1 jteb  jteb   0 29 okt 20:26 _8b.tvf
-rw-r--r--  1 jteb  jteb   0 29 okt 20:22 _8c.fdt
-rw-r--r--  1 jteb  jteb   0 29 okt 20:22 _8c.fdx
-rw-r--r--  1 jteb  jteb   0 29 okt 20:26 _8c.frq
-rw-r--r--  1 jteb  jteb   0 29 okt 20:24 _8c.tii
-rw-r--r--  1 jteb  jteb   0 29 okt 20:24 _8c.tis
-rw-r--r--  1 jteb  jteb   0 29 okt 20:28 _8c.tvf
-rw-r--r--  1 jteb  jteb   0 29 okt 20:30 _8c.tvx
-rw-r--r--  1 jteb  jteb   0 29 okt 20:24 _8d.fdt
-rw-r--r--  1 jteb  jteb   0 29 okt 20:25 _8d.fdx

Solr-trunk - Build # 1299 - Still Failing

Build: http://hudson.zones.apache.org/hudson/job/Solr-trunk/1299/

All tests passed

Build Log (for compile errors):
[...truncated 16288 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-3.x - Build # 832 - Failure

Build: http://hudson.zones.apache.org/hudson/job/Lucene-Solr-tests-only-3.x/832/

1 tests failed.
REGRESSION:  
org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads

Error Message:
IndexFileDeleter doesn't know about file _3i.tvx

Stack Trace:
junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file 
_3i.tvx
at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:4336)
at 
org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:4383)
at 
org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:3159)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3232)
at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:3203)
at 
org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:200)
at 
org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:253)




Build Log (for compile errors):
[...truncated 8588 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export


[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926859#action_12926859
 ] 

Nico Krijnen commented on LUCENE-2729:
--

A second file listing from another test run, same result: read past EOF

{code}
jteb:assetIndex jteb$ ls -la
total 38739848
drwxr-xr-x  2 jteb  jteb4964 26 okt 11:51 .
drwxr-xr-x  3 jteb  jteb 204 22 okt 11:42 ..
-rw-r--r--  1 jteb  jteb   969134416 18 okt 16:41 _2q.fdt
-rw-r--r--  1 jteb  jteb   36652 18 okt 16:41 _2q.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:41 _2q.fnm
-rw-r--r--  1 jteb  jteb 4685726 18 okt 16:41 _2q.frq
-rw-r--r--  1 jteb  jteb9166 18 okt 16:41 _2q.nrm
-rw-r--r--  1 jteb  jteb   393230403 18 okt 16:42 _2q.prx
-rw-r--r--  1 jteb  jteb7447 18 okt 16:42 _2q.tii
-rw-r--r--  1 jteb  jteb  746299 18 okt 16:42 _2q.tis
-rw-r--r--  1 jteb  jteb8394 18 okt 16:42 _2q.tvd
-rw-r--r--  1 jteb  jteb   599185081 18 okt 16:42 _2q.tvf
-rw-r--r--  1 jteb  jteb   73300 18 okt 16:42 _2q.tvx
-rw-r--r--  1 jteb  jteb  2061261675 18 okt 16:44 _39.fdt
-rw-r--r--  1 jteb  jteb1012 18 okt 16:44 _39.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:44 _39.fnm
-rw-r--r--  1 jteb  jteb17754579 18 okt 16:44 _39.frq
-rw-r--r--  1 jteb  jteb 256 18 okt 16:44 _39.nrm
-rw-r--r--  1 jteb  jteb   121067407 18 okt 16:44 _39.prx
-rw-r--r--  1 jteb  jteb  137511 18 okt 16:44 _39.tii
-rw-r--r--  1 jteb  jteb11726653 18 okt 16:44 _39.tis
-rw-r--r--  1 jteb  jteb 185 18 okt 16:44 _39.tvd
-rw-r--r--  1 jteb  jteb   233037042 18 okt 16:44 _39.tvf
-rw-r--r--  1 jteb  jteb2020 18 okt 16:44 _39.tvx
-rw-r--r--  1 jteb  jteb  1595882722 18 okt 16:45 _3u.fdt
-rw-r--r--  1 jteb  jteb   63692 18 okt 16:45 _3u.fdx
-rw-r--r--  1 jteb  jteb 330 18 okt 16:45 _3u.fnm
-rw-r--r--  1 jteb  jteb 8001869 18 okt 16:45 _3u.frq
-rw-r--r--  1 jteb  jteb   15926 18 okt 16:45 _3u.nrm
-rw-r--r--  1 jteb  jteb   647374863 18 okt 16:45 _3u.prx
-rw-r--r--  1 jteb  jteb   11319 18 okt 16:45 _3u.tii
-rw-r--r--  1 jteb  jteb 1168399 18 okt 16:45 _3u.tis
-rw-r--r--  1 jteb  jteb   14209 18 okt 16:45 _3u.tvd
-rw-r--r--  1 jteb  jteb   986370136 18 okt 16:46 _3u.tvf
-rw-r--r--  1 jteb  jteb  127380 18 okt 16:46 _3u.tvx
-rw-r--r--  1 jteb  jteb  2057147455 18 okt 16:47 _3v.fdt
-rw-r--r--  1 jteb  jteb 476 18 okt 16:47 _3v.fdx
-rw-r--r--  1 jteb  jteb 384 18 okt 16:47 _3v.fnm
-rw-r--r--  1 jteb  jteb1520 18 okt 16:47 _3v.frq
-rw-r--r--  1 jteb  jteb 122 18 okt 16:47 _3v.nrm
-rw-r--r--  1 jteb  jteb   109724024 18 okt 16:47 _3v.prx
-rw-r--r--  1 jteb  jteb  132491 18 okt 16:47 _3v.tii
-rw-r--r--  1 jteb  jteb11457688 18 okt 16:47 _3v.tis
-rw-r--r--  1 jteb  jteb 114 18 okt 16:47 _3v.tvd
-rw-r--r--  1 jteb  jteb   211902147 18 okt 16:48 _3v.tvf
-rw-r--r--  1 jteb  jteb 948 18 okt 16:48 _3v.tvx
-rw-r--r--  1 jteb  jteb  2691565961 18 okt 16:49 _4c.fdt
-rw-r--r--  1 jteb  jteb   39572 18 okt 16:49 _4c.fdx
-rw-r--r--  1 jteb  jteb 276 18 okt 16:49 _4c.fnm
-rw-r--r--  1 jteb  jteb18724620 18 okt 16:49 _4c.frq
-rw-r--r--  1 jteb  jteb9896 18 okt 16:49 _4c.nrm
-rw-r--r--  1 jteb  jteb   590255960 18 okt 16:50 _4c.prx
-rw-r--r--  1 jteb  jteb  141243 18 okt 16:50 _4c.tii
-rw-r--r--  1 jteb  jteb12185869 18 okt 16:50 _4c.tis
-rw-r--r--  1 jteb  jteb9894 18 okt 16:50 _4c.tvd
-rw-r--r--  1 jteb  jteb   932649779 18 okt 16:51 _4c.tvf
-rw-r--r--  1 jteb  jteb   79140 18 okt 16:51 _4c.tvx
-rw-r--r--  1 jteb  jteb  2398908136 18 okt 16:52 _4d.fdt
-rw-r--r--  1 jteb  jteb 548 18 okt 16:52 _4d.fdx
-rw-r--r--  1 jteb  jteb 354 18 okt 16:52 _4d.fnm
-rw-r--r--  1 jteb  jteb24581614 18 okt 16:52 _4d.frq
-rw-r--r--  1 jteb  jteb 140 18 okt 16:52 _4d.nrm
-rw-r--r--  1 jteb  jteb   158243133 18 okt 16:52 _4d.prx
-rw-r--r--  1 jteb  jteb  141948 18 okt 16:52 _4d.tii
-rw-r--r--  1 jteb  jteb12259425 18 okt 16:52 _4d.tis
-rw-r--r--  1 jteb  jteb 140 18 okt 16:52 _4d.tvd
-rw-r--r--  1 jteb  jteb   303769970 18 okt 16:53 _4d.tvf
-rw-r--r--  1 jteb  jteb1092 18 okt 16:53 _4d.tvx
-rw-r--r--  1 jteb  jteb  1081212027 18 okt 16:53 _4p.fdt
-rw-r--r--  1 jteb  jteb 212 18 okt 16:53 _4p.fdx
-rw-r--r--  1 jteb  jteb 354 18 okt 16:53 _4p.fnm
-rw-r--r--  1 jteb  jteb 8294102 18 okt 16:53 _4p.frq
-rw-r--r--  1 jteb  jteb  56 18 okt 16:53 _4p.nrm
-rw-r--r--  1 jteb  jteb60513257 18 okt 16:53 _4p.prx
-rw-r--r--  1 jteb  jteb  134898 18 okt 16:53 _4p.tii
-rw-r--r--  1 jteb  jteb11376730 18 okt 16:53 _4p.tis
-rw-r--r--  1 jteb  jteb  56 18 okt 16:53 _4p.tvd
-rw-r--r--  1 jteb  jteb   116715012 18 okt 16:53 _4p.tvf
-rw-r--r--  1 jteb  jteb 420 18 okt 16:53 _4p.tvx
-rw-r--r--  1 jteb  jteb   787581180 18 okt 16:54

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926907#action_12926907
 ] 

Michael McCandless commented on LUCENE-2729:


That long string of length 0 files is very bizarre.

Was there no original root cause here?  Eg disk full?

Or is the read past EOF on closing an IndexReader w/ pending deletes really 
the first exception you see?

Does zoie somehow touch the index files?  Taking a backup is fundamentally a 
read-only op on the index, so that process shouldn't by itself truncate index 
files.

Something is somehow reaching in and zero-ing out these files.  I don't think 
Lucene itself would do this.  For example, the serious of _6i.XXX zero'd 
files... Lucene writes these
files roughly in sequence, so if something bad happened in writing the 
postings, then the .nrm file should not even exist.

So we need to figure out who is truncating these files...


 Index corruption after 'read past EOF' under heavy update load and snapshot 
 export
 --

 Key: LUCENE-2729
 URL: https://issues.apache.org/jira/browse/LUCENE-2729
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.1, 3.0.2
 Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
 Integrated with zoie (using a zoie snapshot from 2010-08-06: 
 zoie-2.0.0-snapshot-20100806.jar).
Reporter: Nico Krijnen

 We have a system running lucene and zoie. We use lucene as a content store 
 for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
 backups of the index. This works fine for small indexes and when there are 
 not a lot of changes to the index when the backup is made.
 On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
 is being changed a lot (lots of document additions and/or deletions), we 
 almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
 obtain timed out'.
 At that point we get lots of 0 kb files in the index, data gets lots, and the 
 index is unusable.
 When we stop our server, remove the 0kb files and restart our server, the 
 index is operational again, but data has been lost.
 I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
 Hopefully someone has some ideas where to look to fix this.
 Some more details...
 Stack trace of the read past EOF and following Lock obtain timed out:
 {code}
 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
 ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
 java.io.IOException: read past EOF
 at 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
 at 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
 at 
 org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
 at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
 at 
 org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:166)
 at 
 org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
 at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
 at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
 at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
 at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
 at 
 proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
 at 
 proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
 at 
 proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
 at 
 proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
 at 
 proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
 at 
 proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
 ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - Problem 
 copying segments: Lock obtain timed out: 
 org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
 org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
 org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:84)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:957)
 at

[jira] Updated: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export


 [ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Nico Krijnen updated LUCENE-2729:
-

Description: 
We have a system running lucene and zoie. We use lucene as a content store for 
a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
backups of the index. This works fine for small indexes and when there are not 
a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
is being changed a lot (lots of document additions and/or deletions), we almost 
always get a 'read past EOF' at some point, followed by lots of 'Lock obtain 
timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the 
index is unusable.

When we stop our server, remove the 0kb files and restart our server, the index 
is operational again, but data has been lost.

I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
Hopefully someone has some ideas where to look to fix this.


Some more details...

Stack trace of the read past EOF and following Lock obtain timed out:

{code}
78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] ERROR 
proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
at 
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:166)
at 
org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
at 
proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
at 
proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
at 
proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
at 
proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
at 
proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
at 
proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
Problem copying segments: Lock obtain timed out: 
org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
at org.apache.lucene.store.Lock.obtain(Lock.java:84)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:957)
at 
proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
at 
proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
at 
proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
at 
proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
at 
proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
at 
proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
{code}

We get exactly the same behavour on both OS X and on Windows. On both zoie is 
using a SimpleFSDirectory.
We also use a SingleInstanceLockFactory (since our process is the only one 
working with the index), but we get the same behaviour with a NativeFSLock.

The snapshot backup is being made by calling:

*proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)*

Same issue in zoie JIRA:

http://snaprojects.jira.com/browse/ZOIE-51

  was:
We have a system running lucene and zoie. We use lucene as a content store for 
a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
backups of the index. This works fine for small indexes and when there are not 
a lot of changes to the index when the backup is made.

On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
is being

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

[
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926908#action_12926908
]

Nico Krijnen commented on LUCENE-2729:
--

bq. Was there no original root cause here? Eg disk full?

This was one of the first things i thought, but the disk has more than enough
free space: 200GB. Also, for this test we write the backup to a different disk
- both for better performance and to prevent the disk with the index on it from
running out of free space.

bq. Or is the read past EOF on closing an IndexReader w/ pending deletes
really the first exception you see?

It is the first exception we see. We turned on quite a bit of additional
logging but we have not been able to find out anything weird happening before
this error. I do expect something weird must have happened to cause the 'read
past EOF'.

Do you have any clues as to what we could look for? - that might narrow the
search.
We are able to consistently reproduce this on our test environment. So if you
have clues to specific debug logging that should be turned on - we can do
another test run.

bq. Does zoie somehow touch the index files?

Index corruption after 'read past EOF' under heavy update load and snapshot
export
--

Key: LUCENE-2729
URL: https://issues.apache.org/jira/browse/LUCENE-2729
Project: Lucene - Java
Issue Type: Bug
Components: Index
Affects Versions: 3.0.1, 3.0.2
Environment: Happens on both OS X 10.6 and Windows 2008 Server.
Integrated with zoie (using a zoie snapshot from 2010-08-06:
zoie-2.0.0-snapshot-20100806.jar).
Reporter: Nico Krijnen

We have a system running lucene and zoie. We use lucene as a content store
for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled
backups of the index. This works fine for small indexes and when there are
not a lot of changes to the index when the backup is made.
On large indexes (about 5 GB to 19 GB), when a backup is made while the index
is being changed a lot (lots of document additions and/or deletions), we
almost always get a 'read past EOF' at some point, followed by lots of 'Lock
obtain timed out'.
At that point we get lots of 0 kb files in the index, data gets lots, and the
index is unusable.
When we stop our server, remove the 0kb files and restart our server, the
index is operational again, but data has been lost.
I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both.
Hopefully someone has some ideas where to look to fix this.
Some more details...
Stack trace of the read past EOF and following Lock obtain timed out:
{code}
78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085]
ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
java.io.IOException: read past EOF
at
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
at
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:166)
at
org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
at
proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
at
proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
at
proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
at
proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
at
proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
at
proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085]
ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader -
Problem copying

[jira] Issue Comment Edited: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

[
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926908#action_12926908
]

Nico Krijnen edited comment on LUCENE-2729 at 11/1/10 6:52 AM:
---

bq. Was there no original root cause here? Eg disk full?

bq. Or is the read past EOF on closing an IndexReader w/ pending deletes
really the first exception you see?

It is the first exception we see. We turned on quite a bit of additional
logging but we have not been able to find anything weird happening before this
error. I do expect something weird must have happened to cause the 'read past
EOF'.

bq. Does zoie somehow touch the index files?

We'll try to find out. For as far as I see the basic backup procedure is to
grab the last 'commit snapshot', prevent it from being deleted
(ZoieIndexDeletionPolicy), and write all the files mentioned in the commit
snapshot to a NIO WritableByteChannel
(proj.zoie.impl.indexing.internal.DiskIndexSnapshot#writeTo) - we call
proj.zoie.impl.indexing.ZoieSystem.exportSnapshot(WritableByteChannel)
ourselves.

was (Author: nkrijnen):
bq. Was there no original root cause here? Eg disk full?

bq. Or is the read past EOF on closing an IndexReader w/ pending deletes
really the first exception you see?

bq. Does zoie somehow touch the index files?

Index corruption after 'read past EOF' under heavy update load and snapshot
export
--

Re: How can I get started for investigating the source code of Lucene ?

2010-11-01 Thread mark harwood

Here's a rough overview I mapped out as a sequence diagram for the search side 
of things some time ago:  http://goo.gl/lE6a

- Original Message 
From: Jeff Zhang zjf...@gmail.com
To: dev@lucene.apache.org
Sent: Mon, 1 November, 2010 5:43:08
Subject: How can I get started for investigating the source code of Lucene ?

Hi all,

I'd like to study the source code of Lucene, but I found there's not
so much documents about the internal structure of lucene. And the
classes are so big that not so readable, could anyone give me
suggestion about How can I get started for investigating the source
code of Lucene ? Any document or blog post would be good .

Thanks

-- 
Best Regards

Jeff Zhang

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2202) Money FieldType

[
https://issues.apache.org/jira/browse/SOLR-2202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926974#action_12926974
]

Robert Muir commented on SOLR-2202:
---

Greg, one more nitpick:

I think the reloadCurrencyConfig could be improved:
# It seems to use resource loader to read in the xml file into a String
line-by-line, but then concats all these lines and converts back into a
bytearray, just to get an input stream.
# It uses a charset of UTF8 (should be UTF-8).

I think easier/safer would be to just get an InputStream directly from the
resource loader (ResourceLoader.openResource) without this encoding conversion.

Money FieldType
---

Key: SOLR-2202
URL: https://issues.apache.org/jira/browse/SOLR-2202
Project: Solr
Issue Type: New Feature
Components: Schema and Analysis
Affects Versions: 1.5
Reporter: Greg Fodor
Attachments: SOLR-2022-solr-3.patch, SOLR-2202-lucene-1.patch,
SOLR-2202-solr-1.patch, SOLR-2202-solr-2.patch, SOLR-2202-solr-4.patch,
SOLR-2202-solr-5.patch

Attached please find patches to add support for monetary values to
Solr/Lucene with query-time currency conversion. The following features are
supported:
- Point queries (ex: price:4.00USD)
- Range quries (ex: price:[$5.00 TO $10.00])
- Sorting.
- Currency parsing by either currency code or symbol.
- Symmetric Asymmetric exchange rates. (Asymmetric exchange rates are
useful if there are fees associated with exchanging the currency.)
At indexing time, money fields can be indexed in a native currency. For
example, if a product on an e-commerce site is listed in Euros, indexing the
price field as 10.00EUR will index it appropriately. By altering the
currency.xml file, the sorting and querying against Solr can take into
account fluctuations in currency exchange rates without having to re-index
the documents.
The new money field type is a polyfield which indexes two fields, one which
contains the amount of the value and another which contains the currency code
or symbol. The currency metadata (names, symbols, codes, and exchange rates)
are expected to be in an xml file which is pointed to by the field type
declaration in the schema.xml.
The current patch is factored such that Money utility functions and
configuration metadata lie in Lucene (see MoneyUtil and CurrencyConfig),
while the MoneyType and MoneyValueSource lie in Solr. This was meant to
mirror the work being done on the spacial field types.
This patch has not yet been deployed to production but will be getting used
to power the international search capabilities of the search engine at Etsy.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2725) Bengali Analyzer for Lucene has been Developed

2010-11-01 Thread Ahmed Chisty (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926981#action_12926981
]

Ahmed Chisty commented on LUCENE-2725:
--

I have extended the analyzer class and use some rules of Bengali language's
grammar. I have first tokenized Bengali Strings. Then i removed the stopwords
and finally i filtered them.

I have worked with this analyzer in some projects. It works fine. How can i
contribute it to Lucene API? Can anyone give me any solution/way?

Bengali Analyzer for Lucene has been Developed
--

Key: LUCENE-2725
URL: https://issues.apache.org/jira/browse/LUCENE-2725
Project: Lucene - Java
Issue Type: New Feature
Components: contrib/analyzers
Affects Versions: 3.0.1
Environment: Environment Independent
Reporter: Ahmed Chisty
Fix For: 3.1

Hi everyone,
I am a CSE student of SUST, SYlhet( http://www.sust.edu/).
I have noticed that there is no such Bengali Analyzer in Lucene for Bengali
Text search and highlight. I have used Standard Analyzer and others but they
do not give good result.
So, i have developed a Bengali Analyzer. I have tested it for 50 thousand
document. And it is being used in Ekushe Finance Search Engine.
(http://efinance.com.bd/).
Please give me some instruction so that i can contribute that analyzer in
Lucene.
Thanx.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2725) Bengali Analyzer for Lucene has been Developed


[ 
https://issues.apache.org/jira/browse/LUCENE-2725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12926985#action_12926985
 ] 

Robert Muir commented on LUCENE-2725:
-

Ahmed: please see http://wiki.apache.org/lucene-java/HowToContribute

 Bengali Analyzer for Lucene has been Developed
 --

 Key: LUCENE-2725
 URL: https://issues.apache.org/jira/browse/LUCENE-2725
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Affects Versions: 3.0.1
 Environment: Environment Independent
Reporter: Ahmed Chisty
 Fix For: 3.1


 Hi everyone,
 I am a CSE student of SUST, SYlhet( http://www.sust.edu/). 
 I have noticed that there is no such Bengali Analyzer in Lucene for Bengali 
 Text search and highlight. I have used Standard Analyzer and others but they 
 do not give good result.
 So, i have developed a Bengali Analyzer. I have tested it for 50 thousand 
 document. And it is being used in Ekushe Finance Search Engine. 
 (http://efinance.com.bd/). 
 Please give me some instruction so that i can contribute that analyzer in 
 Lucene.
 Thanx.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 858 - Failure

Build: 
http://hudson.zones.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/858/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:878)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:844)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:437)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78)
at 
org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)




Build Log (for compile errors):
[...truncated 8709 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Add MERGEINDEXES action to CoreAdmin wiki page?

2010-11-01 Thread Eric Pugh

Shouldn't the MERGEINDEXES action be listed on the 
http://wiki.apache.org/solr/CoreAdmin wiki page?   With maybe a link back to 
http://wiki.apache.org/solr/MergingSolrIndexes#Merging_Through_CoreAdmin ?

Be happy to make the edit...

Eric

-
Eric Pugh | Principal | OpenSource Connections, LLC | 434.466.1467 | 
http://www.opensourceconnections.com
Co-Author: Solr 1.4 Enterprise Search Server available from 
http://www.packtpub.com/solr-1-4-enterprise-search-server
Free/Busy: http://tinyurl.com/eric-cal









-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Jason Rutherglen (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927031#action_12927031
 ] 

Jason Rutherglen commented on LUCENE-2729:
--

Using Solr 1.4.2 on disk full .del files were being written with a file length 
of zero, however that is supposed to be fixed by 
https://issues.apache.org/jira/browse/LUCENE-2593  This doesn't appear to be 
similar because more than the .del files are of zero length.

 Index corruption after 'read past EOF' under heavy update load and snapshot 
 export
 --

 Key: LUCENE-2729
 URL: https://issues.apache.org/jira/browse/LUCENE-2729
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.1, 3.0.2
 Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
 Integrated with zoie (using a zoie snapshot from 2010-08-06: 
 zoie-2.0.0-snapshot-20100806.jar).
Reporter: Nico Krijnen

 We have a system running lucene and zoie. We use lucene as a content store 
 for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
 backups of the index. This works fine for small indexes and when there are 
 not a lot of changes to the index when the backup is made.
 On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
 is being changed a lot (lots of document additions and/or deletions), we 
 almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
 obtain timed out'.
 At that point we get lots of 0 kb files in the index, data gets lots, and the 
 index is unusable.
 When we stop our server, remove the 0kb files and restart our server, the 
 index is operational again, but data has been lost.
 I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
 Hopefully someone has some ideas where to look to fix this.
 Some more details...
 Stack trace of the read past EOF and following Lock obtain timed out:
 {code}
 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
 ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
 java.io.IOException: read past EOF
 at 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
 at 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
 at 
 org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
 at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
 at 
 org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:166)
 at 
 org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
 at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
 at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
 at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
 at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
 at 
 proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
 at 
 proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
 at 
 proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
 at 
 proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
 at 
 proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
 at 
 proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
 ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
 Problem copying segments: Lock obtain timed out: 
 org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
 org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
 org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:84)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:957)
 at 
 proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
 at 
 proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
 at 
 proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
 at 
 proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
 at

jQuery and tabs in example

All:

I recently had occasion to work with the Solr example code and VrW and
figured out how to put in a tabbed display by letting jQuery do all the
work, but that needed a more recent jQuery (I used 1.4.x). Since I'm fresh
off that experience and can maybe remember what I just finished doing, do
folks think it's worth a Jira or two (that I'd immediately take) for
1 Upgrading the example code to jQuery 1.4.3
2 Using the tabbing capabilities of 1.4 to display the simple, spatial and
group by links in a tabbed page to demonstrate?

Let me know
Erick

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export


[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927040#action_12927040
 ] 

Nico Krijnen commented on LUCENE-2729:
--

In the mean time, we also did a test with a checkout of the latest lucene_3_0 
branch (@2010-11-01), which should include the fix that Jason mentions.

Does not seem to make a difference though. We still get a 'read past EOF'.

On the last run we did get a slightly different stacktrace. This time the 'read 
past EOF' happens when the zoie RAM index is written to the zoie Disk index. 
Last time it occurred a little earlier in BaseSearchIndex#loadFromIndex, while 
committing deletes to the disk IndexReader. This could be just a coincidence 
though. My feeling is still that the 'read past EOF' is just a result/symptom 
of something else that happened just before it - still trying to figure out 
what that could be...

{code}
15:25:03,453 
[proj.zoie.impl.indexing.internal.realtimeindexdataloa...@3d9e7719] 
ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
Problem copying segments: read past EOF
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
at 
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:170)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:960)
at 
proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
at 
proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
at 
proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
at 
proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
at 
proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
at 
proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}

 Index corruption after 'read past EOF' under heavy update load and snapshot 
 export
 --

 Key: LUCENE-2729
 URL: https://issues.apache.org/jira/browse/LUCENE-2729
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.1, 3.0.2
 Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
 Integrated with zoie (using a zoie snapshot from 2010-08-06: 
 zoie-2.0.0-snapshot-20100806.jar).
Reporter: Nico Krijnen

 We have a system running lucene and zoie. We use lucene as a content store 
 for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
 backups of the index. This works fine for small indexes and when there are 
 not a lot of changes to the index when the backup is made.
 On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
 is being changed a lot (lots of document additions and/or deletions), we 
 almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
 obtain timed out'.
 At that point we get lots of 0 kb files in the index, data gets lots, and the 
 index is unusable.
 When we stop our server, remove the 0kb files and restart our server, the 
 index is operational again, but data has been lost.
 I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
 Hopefully someone has some ideas where to look to fix this.
 Some more details...
 Stack trace of the read past EOF and following Lock obtain timed out:
 {code}
 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
 ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
 java.io.IOException: read past EOF
 at 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
 at 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
 at 
 org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
 at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
 at 
 org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:166)
 at 
 org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
 at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
 at

[jira] Issue Comment Edited: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export


[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927040#action_12927040
 ] 

Nico Krijnen edited comment on LUCENE-2729 at 11/1/10 12:53 PM:


In the mean time, we also did a test with a checkout of the latest lucene_3_0 
branch (@2010-11-01), which should include the fix that Jason mentions.

Does not seem to make a difference though. We still get a 'read past EOF'.

On the last run we did get a slightly different stacktrace. This time the 'read 
past EOF' happens when the zoie RAM index is written to the zoie Disk index. 
Last time it occurred a little earlier in BaseSearchIndex#loadFromIndex, while 
committing deletes to the disk IndexReader. This could be just a coincidence 
though. My feeling is still that the 'read past EOF' is just a result/symptom 
of something else that happened just before it - still trying to figure out 
what that could be... any suggestions are welcome.

{code}
15:25:03,453 
[proj.zoie.impl.indexing.internal.realtimeindexdataloa...@3d9e7719] 
ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
Problem copying segments: read past EOF
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
at 
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:170)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:960)
at 
proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
at 
proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
at 
proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
at 
proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
at 
proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
at 
proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}

  was (Author: nkrijnen):
In the mean time, we also did a test with a checkout of the latest 
lucene_3_0 branch (@2010-11-01), which should include the fix that Jason 
mentions.

Does not seem to make a difference though. We still get a 'read past EOF'.

On the last run we did get a slightly different stacktrace. This time the 'read 
past EOF' happens when the zoie RAM index is written to the zoie Disk index. 
Last time it occurred a little earlier in BaseSearchIndex#loadFromIndex, while 
committing deletes to the disk IndexReader. This could be just a coincidence 
though. My feeling is still that the 'read past EOF' is just a result/symptom 
of something else that happened just before it - still trying to figure out 
what that could be...

{code}
15:25:03,453 
[proj.zoie.impl.indexing.internal.realtimeindexdataloa...@3d9e7719] 
ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
Problem copying segments: read past EOF
java.io.IOException: read past EOF
at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
at 
org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
at 
org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:170)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1127)
at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:960)
at 
proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
at 
proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
at 
proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
at 
proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
at 
proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:172)
at 
proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:377)
{code}
  
 Index corruption after 'read past EOF' under heavy update load and snapshot 
 export

[jira] Commented: (LUCENE-2729) Index corruption after 'read past EOF' under heavy update load and snapshot export

2010-11-01 Thread Michael McCandless (JIRA)


[ 
https://issues.apache.org/jira/browse/LUCENE-2729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927043#action_12927043
 ] 

Michael McCandless commented on LUCENE-2729:


Somehow we have to locate the event that causes the truncation of the files.

Can you enable IndexWriter's infoStream and then get the corruption to happen, 
and post the results?

 Index corruption after 'read past EOF' under heavy update load and snapshot 
 export
 --

 Key: LUCENE-2729
 URL: https://issues.apache.org/jira/browse/LUCENE-2729
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 3.0.1, 3.0.2
 Environment: Happens on both OS X 10.6 and Windows 2008 Server. 
 Integrated with zoie (using a zoie snapshot from 2010-08-06: 
 zoie-2.0.0-snapshot-20100806.jar).
Reporter: Nico Krijnen

 We have a system running lucene and zoie. We use lucene as a content store 
 for a CMS/DAM system. We use the hot-backup feature of zoie to make scheduled 
 backups of the index. This works fine for small indexes and when there are 
 not a lot of changes to the index when the backup is made.
 On large indexes (about 5 GB to 19 GB), when a backup is made while the index 
 is being changed a lot (lots of document additions and/or deletions), we 
 almost always get a 'read past EOF' at some point, followed by lots of 'Lock 
 obtain timed out'.
 At that point we get lots of 0 kb files in the index, data gets lots, and the 
 index is unusable.
 When we stop our server, remove the 0kb files and restart our server, the 
 index is operational again, but data has been lost.
 I'm not sure if this is a zoie or a lucene issue, so i'm posting it to both. 
 Hopefully someone has some ideas where to look to fix this.
 Some more details...
 Stack trace of the read past EOF and following Lock obtain timed out:
 {code}
 78307 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
 ERROR proj.zoie.impl.indexing.internal.BaseSearchIndex - read past EOF
 java.io.IOException: read past EOF
 at 
 org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:154)
 at 
 org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:39)
 at 
 org.apache.lucene.store.ChecksumIndexInput.readByte(ChecksumIndexInput.java:37)
 at org.apache.lucene.store.IndexInput.readInt(IndexInput.java:69)
 at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:245)
 at 
 org.apache.lucene.index.IndexFileDeleter.init(IndexFileDeleter.java:166)
 at 
 org.apache.lucene.index.DirectoryReader.doCommit(DirectoryReader.java:725)
 at org.apache.lucene.index.IndexReader.commit(IndexReader.java:987)
 at org.apache.lucene.index.IndexReader.commit(IndexReader.java:973)
 at org.apache.lucene.index.IndexReader.decRef(IndexReader.java:162)
 at org.apache.lucene.index.IndexReader.close(IndexReader.java:1003)
 at 
 proj.zoie.impl.indexing.internal.BaseSearchIndex.deleteDocs(BaseSearchIndex.java:203)
 at 
 proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:223)
 at 
 proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
 at 
 proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
 at 
 proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
 at 
 proj.zoie.impl.indexing.internal.BatchedIndexDataLoader$LoaderThread.run(BatchedIndexDataLoader.java:373)
 579336 [proj.zoie.impl.indexing.internal.realtimeindexdataloa...@31ca5085] 
 ERROR proj.zoie.impl.indexing.internal.LuceneIndexDataLoader - 
 Problem copying segments: Lock obtain timed out: 
 org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
 org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: 
 org.apache.lucene.store.singleinstancel...@5ad0b895: write.lock
 at org.apache.lucene.store.Lock.obtain(Lock.java:84)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:1060)
 at org.apache.lucene.index.IndexWriter.init(IndexWriter.java:957)
 at 
 proj.zoie.impl.indexing.internal.DiskSearchIndex.openIndexWriter(DiskSearchIndex.java:176)
 at 
 proj.zoie.impl.indexing.internal.BaseSearchIndex.loadFromIndex(BaseSearchIndex.java:228)
 at 
 proj.zoie.impl.indexing.internal.LuceneIndexDataLoader.loadFromIndex(LuceneIndexDataLoader.java:153)
 at 
 proj.zoie.impl.indexing.internal.DiskLuceneIndexDataLoader.loadFromIndex(DiskLuceneIndexDataLoader.java:134)
 at 
 proj.zoie.impl.indexing.internal.RealtimeIndexDataLoader.processBatch(RealtimeIndexDataLoader.java:171)
 at

Annoying message when building Solr

It's been several weeks since I built Solr, so I removed all the trunk code,
did a checkout and tried an ant build.  The build starts out by giving a
bunch of annoying warnings about not being able to find
c:\ant\lib\xbean.jar, xerxedImpl.jar, serializer.jar and others (Yes, some
of us are forever destined to work on windows boxes). I'm also getting some
test failures

I know there have been eMails flying back and forth about Maven etc. but
haven't paid much attention.

I can start tracking these down but wanted to know what's expected. The how
to contribute page might need to be updated, which I'll do if that's what
should be done.

And my mac starts out the build by not being able to find some hsqldb jars,
but at least the tests succeed there.

All of these are very possibly issues with half-baked machine setups, which
is the first thing I'll check this afternoon. Mostly I wanted to know if:
1 These are experienced by someone else
2 I should have paid more attention to the Maven emails and that's the
preferred way of doing things now.
3 whether the Wiki is just really out of date and I should update it as I
work through the issues.

Under any circumstances, the how to contribute page on the wiki doesn't
have any prerequisites, which may be way more important on windows boxes
than Macs...

Erick

jQuery and tabs in example (sorry for double posting)

Got the old list in the to field first time, sorry..

All:

I recently had occasion to work with the Solr example code and VrW and
figured out how to put in a tabbed display by letting jQuery do all the
work, but that needed a more recent jQuery (I used 1.4.x). Since I'm fresh
off that experience and can maybe remember what I just finished doing, do
folks think it's worth a Jira or two (that I'd immediately take) for
1 Upgrading the example code to jQuery 1.4.3
2 Using the tabbing capabilities of 1.4 to display the simple, spatial and
group by links in a tabbed page to demonstrate?

Let me know
Erick

[jira] Created: (SOLR-2210) Provide solr FilterFactory for Lucene ICUTokenizer

2010-11-01 Thread Tom Burton-West (JIRA)

Provide solr FilterFactory for Lucene ICUTokenizer
--

 Key: SOLR-2210
 URL: https://issues.apache.org/jira/browse/SOLR-2210
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Tom Burton-West
Priority: Minor


The Lucene ICUTokenizer provides many benefits for multilingual tokenizing.   
There should be a ICUFilterFactory so that it can be used from Solr.   There 
are probably some issues in terms of passing configuration parameters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2210) Provide solr FilterFactory for Lucene ICUTokenizer


[ 
https://issues.apache.org/jira/browse/SOLR-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927048#action_12927048
 ] 

Robert Muir commented on SOLR-2210:
---

Thanks for opening this, Tom.

I've got some barebones filters for some of this stuff on my computer.
Because the ICU jar file is large, i was trying to see if i could solve 
LUCENE-2510 first, but this would only fix the problem for 4.0 anyway.
I think we should just make an icu contrib for now, and put the factories 
(Tokenizer, Normalizer, Folding, Transliterator, Collation) and the jar file in 
there.


 Provide solr FilterFactory for Lucene ICUTokenizer
 --

 Key: SOLR-2210
 URL: https://issues.apache.org/jira/browse/SOLR-2210
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Tom Burton-West
Priority: Minor

 The Lucene ICUTokenizer provides many benefits for multilingual tokenizing.   
 There should be a ICUFilterFactory so that it can be used from Solr.   There 
 are probably some issues in terms of passing configuration parameters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Created: (SOLR-2211) Create Solr FilterFactory for Lucene StandardTokenizer with UAX#29 support

2010-11-01 Thread Tom Burton-West (JIRA)

Create Solr FilterFactory for Lucene StandardTokenizer with  UAX#29 support
---

 Key: SOLR-2211
 URL: https://issues.apache.org/jira/browse/SOLR-2211
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Tom Burton-West
Priority: Minor


The Lucene 3.x StandardTokenizer with UAX#29 support provides benefits for 
non-English tokenizing.  Presently it can be invoked by using the 
StandardTokenizerFactory and setting the Version to 3.1.  However, it would be 
useful to be able to use the improved unicode processing without necessarily 
including the ip address and email address processing of StandardAnalyzer.   A 
FilterFactory that allowed the use of the StandardTokenizer with UAX#29 support 
on its own would be useful.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2211) Create Solr FilterFactory for Lucene StandardTokenizer with UAX#29 support


[ 
https://issues.apache.org/jira/browse/SOLR-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927052#action_12927052
 ] 

Robert Muir commented on SOLR-2211:
---

Tom, for this one we just want to wrap 
org.apache.lucene.standard.UAX29Tokenizer, care to make a patch?


 Create Solr FilterFactory for Lucene StandardTokenizer with  UAX#29 support
 ---

 Key: SOLR-2211
 URL: https://issues.apache.org/jira/browse/SOLR-2211
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Tom Burton-West
Priority: Minor

 The Lucene 3.x StandardTokenizer with UAX#29 support provides benefits for 
 non-English tokenizing.  Presently it can be invoked by using the 
 StandardTokenizerFactory and setting the Version to 3.1.  However, it would 
 be useful to be able to use the improved unicode processing without 
 necessarily including the ip address and email address processing of 
 StandardAnalyzer.   A FilterFactory that allowed the use of the 
 StandardTokenizer with UAX#29 support on its own would be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2210) Provide solr FilterFactory for Lucene ICUTokenizer


[ 
https://issues.apache.org/jira/browse/SOLR-2210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927053#action_12927053
 ] 

Robert Muir commented on SOLR-2210:
---

actually another idea, would be to just make an 'extraAnalyzers' contrib. 
then we could also add factories for smart chinese, polish etc, without 
creating a ton of contribs.

i think this would be a good solution to expose all the lucene analyzers to 
Solr, 
since to me, LUCENE-2510 seems tricky.


 Provide solr FilterFactory for Lucene ICUTokenizer
 --

 Key: SOLR-2210
 URL: https://issues.apache.org/jira/browse/SOLR-2210
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Tom Burton-West
Priority: Minor

 The Lucene ICUTokenizer provides many benefits for multilingual tokenizing.   
 There should be a ICUFilterFactory so that it can be used from Solr.   There 
 are probably some issues in terms of passing configuration parameters.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (LUCENE-2680) Improve how IndexWriter flushes deletes against existing segments

2010-11-01 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/LUCENE-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jason Rutherglen updated LUCENE-2680:
-

Attachment: LUCENE-2680.patch

The general approach is to reuse BufferedDeletes though place them into a
segment info keyed map for those segments generated post lastSegmentIndex as
per what has been discussed here
https://issues.apache.org/jira/browse/LUCENE-2655?focusedCommentId=12922894page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12922894
and below.

* lastSegmentIndex is added to IW

* DW segmentDeletes is a map of segment info - buffered deletes. In the apply
deletes method buffered deletes are pulled for a given segment info if they
exist, otherwise they're taken from deletesFlushedLastSeg.

* I'm not entirely sure what pushDeletes should do now, probably the same thing
as currently, only the name should change slightly in that it's pushing deletes
only for the RAM buffer docs.

* There needs to be tests to ensure the docid-upto logic is working correctly

* I'm not sure what to do with DW hasDeletes (it's usage is commented out)

* Does there need to be separate deletes for the ram buffer vis-à-vis the (0 -
lastSegmentIndex) deletes?

* The memory accounting'll now get interesting as we'll need to track the RAM
usage of terms/queries across multiple maps.

* In commitMerge, DW verifySegmentDeletes removes the unused info - deletes

* testDeletes deletes a doc in segment 1, then merges segments 1 and 2. We
then test to insure the deletes were in fact applied only to segment 1 and 2.

* testInitLastSegmentIndex insures that on IW init, the lastSegmentIndex is in
fact set

Improve how IndexWriter flushes deletes against existing segments
-

Key: LUCENE-2680
URL: https://issues.apache.org/jira/browse/LUCENE-2680
Project: Lucene - Java
Issue Type: Improvement
Reporter: Michael McCandless
Fix For: 4.0

Attachments: LUCENE-2680.patch

IndexWriter buffers up all deletes (by Term and Query) and only
applies them if 1) commit or NRT getReader() is called, or 2) a merge
is about to kickoff.
We do this because, for a large index, it's very costly to open a
SegmentReader for every segment in the index. So we defer as long as
we can. We do it just before merge so that the merge can eliminate
the deleted docs.
But, most merges are small, yet in a big index we apply deletes to all
of the segments, which is really very wasteful.
Instead, we should only apply the buffered deletes to the segments
that are about to be merged, and keep the buffer around for the
remaining segments.
I think it's not so hard to do; we'd have to have generations of
pending deletions, because the newly merged segment doesn't need the
same buffered deletions applied again. So every time a merge kicks
off, we pinch off the current set of buffered deletions, open a new
set (the next generation), and record which segment was created as of
which generation.
This should be a very sizable gain for large indices that mix
deletes, though, less so in flex since opening the terms index is much
faster.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2211) Create Solr FilterFactory for Lucene StandardTokenizer with UAX#29 support

2010-11-01 Thread Tom Burton-West (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927067#action_12927067
 ] 

Tom Burton-West commented on SOLR-2211:
---

Sure, I'll give it a try.  I've got  large Monday morning backlog in my todo 
list today, so it will probably be towards the middle of the week.

 Create Solr FilterFactory for Lucene StandardTokenizer with  UAX#29 support
 ---

 Key: SOLR-2211
 URL: https://issues.apache.org/jira/browse/SOLR-2211
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Tom Burton-West
Priority: Minor

 The Lucene 3.x StandardTokenizer with UAX#29 support provides benefits for 
 non-English tokenizing.  Presently it can be invoked by using the 
 StandardTokenizerFactory and setting the Version to 3.1.  However, it would 
 be useful to be able to use the improved unicode processing without 
 necessarily including the ip address and email address processing of 
 StandardAnalyzer.   A FilterFactory that allowed the use of the 
 StandardTokenizer with UAX#29 support on its own would be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Commented: (SOLR-2211) Create Solr FilterFactory for Lucene StandardTokenizer with UAX#29 support


[ 
https://issues.apache.org/jira/browse/SOLR-2211?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12927069#action_12927069
 ] 

Robert Muir commented on SOLR-2211:
---

Sounds great, this one has no external dependencies, so it can just be with the 
rest of the factories.

I'll look at starting on the ant/build-system-stuff for SOLR-2210.


 Create Solr FilterFactory for Lucene StandardTokenizer with  UAX#29 support
 ---

 Key: SOLR-2211
 URL: https://issues.apache.org/jira/browse/SOLR-2211
 Project: Solr
  Issue Type: New Feature
Affects Versions: 3.1
Reporter: Tom Burton-West
Priority: Minor

 The Lucene 3.x StandardTokenizer with UAX#29 support provides benefits for 
 non-English tokenizing.  Presently it can be invoked by using the 
 StandardTokenizerFactory and setting the Version to 3.1.  However, it would 
 be useful to be able to use the improved unicode processing without 
 necessarily including the ip address and email address processing of 
 StandardAnalyzer.   A FilterFactory that allowed the use of the 
 StandardTokenizer with UAX#29 support on its own would be useful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Build problems (sorry for the second double-post)

2010-11-01 Thread Robert Muir

On Mon, Nov 1, 2010 at 1:13 PM, Erick Erickson erickerick...@gmail.com wrote:
 Sorry, sent the original to the old dev list. Shows you how long it's been
 since I originated a mail
 It's been several weeks since I built Solr, so I removed all the trunk code,
 did a checkout and tried an ant build.  The build starts out by giving a
 bunch of annoying warnings about not being able to find
 c:\ant\lib\xbean.jar, xerxedImpl.jar, serializer.jar and others (Yes, some
 of us are forever destined to work on windows boxes).

these might be warnings somehow related to ant's classpath. i think i
get these... what version of ant by the way?

I'm also getting some
 test failures

which ones? can you provide the information it gives you back,
specifically the 'reproduce-with' command line?

 I know there have been eMails flying back and forth about Maven etc. but
 haven't paid much attention.

wait, are you using ant, or maven?!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

RE: Build problems (sorry for the second double-post)

2010-11-01 Thread Uwe Schindler

  Sorry, sent the original to the old dev list. Shows you how long it's
  been since I originated a mail
  It's been several weeks since I built Solr, so I removed all the trunk
  code, did a checkout and tried an ant build.  The build starts out by
  giving a bunch of annoying warnings about not being able to find
  c:\ant\lib\xbean.jar, xerxedImpl.jar, serializer.jar and others (Yes,
  some of us are forever destined to work on windows boxes).
 
 these might be warnings somehow related to ant's classpath. i think i get
 these... what version of ant by the way?

These warning dont indicate that anything is broken, it happens on ant 1.7.0 
and ant 1.7.1, if you have multiple lib folder in ~/.ant/lib and/or you 
specified a folder with -lib to command line. This is an ant bug, not sure 
where it comes from, but it tries to build a path from a jar file names from 
one folder together with the name of other folder and adds it to classpath, 
producing incorrect path/file combinations. Javac complains simply about those 
incorrect entries. This breaks nothing.

On Lucene's Hudson builds (its FreeBSD) we have exactly the same problem since 
we have a ~/.ant/lib on the machine. But it works fine, so need to react on it 
:-)




-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene PMC update

2010-11-01 Thread sebb

Daniel Naber left the PMC in May 2010, but is still listed on the
website (and in committee-info.txt) as being a member of the PMC.

Also, Robert Muir was added to the PMC recently, but is not in the
LDAP PMC group; the PMC chair need to run the following command on
people please:

modify_committee.pl lucene -add=rmuir

Thanks.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Lucene PMC update

2010-11-01 Thread Grant Ingersoll

I'll take care of it.

On Nov 1, 2010, at 10:30 AM, sebb wrote:

 Daniel Naber left the PMC in May 2010, but is still listed on the
 website (and in committee-info.txt) as being a member of the PMC.
 
 Also, Robert Muir was added to the PMC recently, but is not in the
 LDAP PMC group; the PMC chair need to run the following command on
 people please:
 
 modify_committee.pl lucene -add=rmuir
 
 Thanks.
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 867 - Failure

Build: 
http://hudson.zones.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/867/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:878)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:844)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:437)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78)
at 
org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)




Build Log (for compile errors):
[...truncated 8709 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Lucene-Solr-tests-only-trunk - Build # 869 - Failure

Build: 
http://hudson.zones.apache.org/hudson/job/Lucene-Solr-tests-only-trunk/869/

1 tests failed.
REGRESSION:  org.apache.solr.TestDistributedSearch.testDistribSearch

Error Message:
Some threads threw uncaught exceptions!

Stack Trace:
junit.framework.AssertionFailedError: Some threads threw uncaught exceptions!
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:878)
at 
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:844)
at 
org.apache.lucene.util.LuceneTestCase.tearDown(LuceneTestCase.java:437)
at org.apache.solr.SolrTestCaseJ4.tearDown(SolrTestCaseJ4.java:78)
at 
org.apache.solr.BaseDistributedSearchTestCase.tearDown(BaseDistributedSearchTestCase.java:144)




Build Log (for compile errors):
[...truncated 8709 lines...]



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Build problems (sorry for the second double-post)

2010-11-01 Thread Robert Muir

On Mon, Nov 1, 2010 at 7:29 PM, Erick Erickson erickerick...@gmail.com wrote:
 Uwe:
 Thanks, I'll update the how to contribute page with your comments.
 Robert:
 I'm using ant. I could have been clearer about that. There is no mention of
 maven at all on the how to contribute page, and I'm playing the
 naive user role here because it's a natural role for me

well, this is helpful, here is some explanation for each unique issue
you have (I have windows too, so some of it i see)

     [junit] Testsuite:
 org.apache.solr.client.solrj.response.TestSpellCheckResponse
     [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.346 sec
     [junit]
     [junit] - Standard Error -
     [junit]  WARNING: best effort to remove
 C:\apache_trunk\trunk\solr\build\test-results\temp\6\solrtest-TestSpellCheckResponse-1288644520922\spellchecker\_4.cfs
 FAILED !
     [junit]  WARNING: best effort to remove
 C:\apache_trunk\trunk\solr\build\test-results\temp\6\solrtest-TestSpellCheckResponse-1288644520922\spellchecker
 FAILED !
     [junit]  WARNING: best effort to remove
 C:\apache_trunk\trunk\solr\build\test-results\temp\6\solrtest-TestSpellCheckResponse-1288644520922
 FAILED !
     [junit] -  ---

This is just a warning, no test failed, only the best effort to
remove the spellchecker index failed.

So, the problem here is
https://issues.apache.org/jira/browse/SOLR-1877 (unclosed spellcheck
reader).

The solr base test classes (AbstractSolrTestCase, SolrTestCaseJ4)
check that they can remove their temporary directories completely...
on windows because the reader isn't closed, they can't do this, so
they emit these warnings.

In lucene, we have a similar check in LuceneTestCase, except the test
will actually fail, and it uses MockDirectoryWrapper so that the tests
always act like they are on windows regardless of the OS.

It might be a good idea to make a MockDirectoryWrapperFactory, and use
it for all Solr tests for these reasons (we can disable this pickiness
for the two solr tests, but at least it would be consistent on windows
and linux). Its also handy if you want to emulate things like
disk-full in tests...


 *
 This looks more promising:
     [junit] Testsuite: org.apache.solr.cloud.CloudStateUpdateTest
     [junit] Testcase:
 testCoreRegistration(org.apache.solr.cloud.CloudStateUpdateTest): FAILED
     [junit]
     [junit] junit.framework.AssertionFailedError:
     [junit] at
 org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:170)

This is a real test failure, fails often in hudson too. This looks
like https://issues.apache.org/jira/browse/SOLR-2159

 **

     [junit] Testsuite: org.apache.solr.velocity.VelocityResponseWriterTest
     [junit] Testcase:
 testTemplateName(org.apache.solr.velocity.VelocityResponseWriterTest):
 Caused an ERROR
     [junit] org.apache.log4j.Logger.setAdditivity(Z)V
     [junit] java.lang.NoSuchMethodError:
 org.apache.log4j.Logger.setAdditivity(Z)V


Hmm, are you sure you got a clean checkout? NoSuchMethod error is
wierd to see here, I don't see it.

Other people have seen this, and somehow fixed it... we should get to
the bottom of this/document whatever the fix is at least !

 *
     [junit]
     [junit] Testsuite: org.apache.solr.handler.TestReplicationHandler
     [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 46.863 sec
     [junit]
     [junit] - Standard Error -
     [junit] 01/11/2010 06:48:51 ? org.apache.solr.handler.SnapPuller
 fetchLatestIndex
     [junit] SEVERE: Master at: http://localhost:51343/solr/replication is
 not available. Index fetch failed. Exception: Connection refused: connect


This is just a noisy/crazy test and it often logs scary/severe errors
for me. But as you see, it didnt fail.

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Build problems (sorry for the second double-post)

Robert:

Thanks for the time you put into this. I'll make a clean checkout in the
morning and see if that one error goes away. I'll see if I can get to the
bottom of some of these. After how everything just worked on my
Mac, it's disconcerting to see these failures-that-aren't-failures on my
Windows box... and training oneself to ignore warnings is just asking
for trouble.

Thanks again
Erick

On Mon, Nov 1, 2010 at 8:23 PM, Robert Muir rcm...@gmail.com wrote:

 On Mon, Nov 1, 2010 at 7:29 PM, Erick Erickson erickerick...@gmail.com
 wrote:
  Uwe:
  Thanks, I'll update the how to contribute page with your comments.
  Robert:
  I'm using ant. I could have been clearer about that. There is no mention
 of
  maven at all on the how to contribute page, and I'm playing the
  naive user role here because it's a natural role for me

 well, this is helpful, here is some explanation for each unique issue
 you have (I have windows too, so some of it i see)

  [junit] Testsuite:
  org.apache.solr.client.solrj.response.TestSpellCheckResponse
  [junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 1.346 sec
  [junit]
  [junit] - Standard Error -
  [junit]  WARNING: best effort to remove
 
 C:\apache_trunk\trunk\solr\build\test-results\temp\6\solrtest-TestSpellCheckResponse-1288644520922\spellchecker\_4.cfs
  FAILED !
  [junit]  WARNING: best effort to remove
 
 C:\apache_trunk\trunk\solr\build\test-results\temp\6\solrtest-TestSpellCheckResponse-1288644520922\spellchecker
  FAILED !
  [junit]  WARNING: best effort to remove
 
 C:\apache_trunk\trunk\solr\build\test-results\temp\6\solrtest-TestSpellCheckResponse-1288644520922
  FAILED !
  [junit] -  ---

 This is just a warning, no test failed, only the best effort to
 remove the spellchecker index failed.

 So, the problem here is
 https://issues.apache.org/jira/browse/SOLR-1877 (unclosed spellcheck
 reader).

 The solr base test classes (AbstractSolrTestCase, SolrTestCaseJ4)
 check that they can remove their temporary directories completely...
 on windows because the reader isn't closed, they can't do this, so
 they emit these warnings.

 In lucene, we have a similar check in LuceneTestCase, except the test
 will actually fail, and it uses MockDirectoryWrapper so that the tests
 always act like they are on windows regardless of the OS.

 It might be a good idea to make a MockDirectoryWrapperFactory, and use
 it for all Solr tests for these reasons (we can disable this pickiness
 for the two solr tests, but at least it would be consistent on windows
 and linux). Its also handy if you want to emulate things like
 disk-full in tests...


  *
  This looks more promising:
  [junit] Testsuite: org.apache.solr.cloud.CloudStateUpdateTest
  [junit] Testcase:
  testCoreRegistration(org.apache.solr.cloud.CloudStateUpdateTest): FAILED
  [junit]
  [junit] junit.framework.AssertionFailedError:
  [junit] at
 
 org.apache.solr.cloud.CloudStateUpdateTest.testCoreRegistration(CloudStateUpdateTest.java:170)

 This is a real test failure, fails often in hudson too. This looks
 like https://issues.apache.org/jira/browse/SOLR-2159

  **
 
  [junit] Testsuite:
 org.apache.solr.velocity.VelocityResponseWriterTest
  [junit] Testcase:
  testTemplateName(org.apache.solr.velocity.VelocityResponseWriterTest):
  Caused an ERROR
  [junit] org.apache.log4j.Logger.setAdditivity(Z)V
  [junit] java.lang.NoSuchMethodError:
  org.apache.log4j.Logger.setAdditivity(Z)V


 Hmm, are you sure you got a clean checkout? NoSuchMethod error is
 wierd to see here, I don't see it.

 Other people have seen this, and somehow fixed it... we should get to
 the bottom of this/document whatever the fix is at least !

  *
  [junit]
  [junit] Testsuite: org.apache.solr.handler.TestReplicationHandler
  [junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 46.863
 sec
  [junit]
  [junit] - Standard Error -
  [junit] 01/11/2010 06:48:51 ? org.apache.solr.handler.SnapPuller
  fetchLatestIndex
  [junit] SEVERE: Master at: http://localhost:51343/solr/replicationis
  not available. Index fetch failed. Exception: Connection refused: connect


 This is just a noisy/crazy test and it often logs scary/severe errors
 for me. But as you see, it didnt fail.

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org

Re: jQuery and tabs in example (sorry for double posting)

2010-11-01 Thread Lance Norskog

Absotively!

On Mon, Nov 1, 2010 at 10:12 AM, Erick Erickson erickerick...@gmail.com wrote:
 Got the old list in the to field first time, sorry..
 All:
 I recently had occasion to work with the Solr example code and VrW and
 figured out how to put in a tabbed display by letting jQuery do all the
 work, but that needed a more recent jQuery (I used 1.4.x). Since I'm fresh
 off that experience and can maybe remember what I just finished doing, do
 folks think it's worth a Jira or two (that I'd immediately take) for
 1 Upgrading the example code to jQuery 1.4.3
 2 Using the tabbing capabilities of 1.4 to display the simple, spatial and
 group by links in a tabbed page to demonstrate?
 Let me know
 Erick




-- 
Lance Norskog
goks...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: jQuery and tabs in example

2010-11-01 Thread Erik Hatcher


On Nov 1, 2010, at 12:33 , Erick Erickson wrote:

 All:
 
 I recently had occasion to work with the Solr example code and VrW and 
 figured out how to put in a tabbed display by letting jQuery do all the work, 
 but that needed a more recent jQuery (I used 1.4.x). Since I'm fresh off that 
 experience and can maybe remember what I just finished doing, do folks think 
 it's worth a Jira or two (that I'd immediately take) for
 1 Upgrading the example code to jQuery 1.4.3

+1

 2 Using the tabbing capabilities of 1.4 to display the simple, spatial and 
 group by links in a tabbed page to demonstrate?

I'm not too fond of the the tabbed way to demonstrate these features.  Rather, 
we could create separate layouts and/or browse.vm'ish templates for each 
different example.  Otherwise, we end up with a cluttered UI that demonstrates 
everything all at once and is too cluttered to be fun to show off.  Check out 
how it looks on trunk with the work Grant has done (good work, but getting a 
bit cluttered and needs to streamlining IMO)

Erik



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] Updated: (SOLR-2210) Provide solr FilterFactory for Lucene ICUTokenizer