[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857388#action_12857388
 ] 

Shai Erera commented on LUCENE-2396:


Robert I think this is great! Can we move more analyzers from core here? I 
think however that a backwards section in changes is important because it 
alerts users about those analyzers whose runtime behavior changed. Otherwise 
how would the poor uses know that? It doesn't mean you need to maintain back 
compat support but at least alert them when things change.

Even if we eventually decide to remove API bw completely, a section in CHANGES 
will still be required to help users upgrade easily.

 remove version from contrib/analyzers.
 --

 Key: LUCENE-2396
 URL: https://issues.apache.org/jira/browse/LUCENE-2396
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2396.patch


 Contrib/analyzers has no backwards-compatibility policy, so let's remove 
 Version so the API is consumable.
 if you think we shouldn't do this, then instead explicitly state and vote on 
 what the backwards compatibility policy for contrib/analyzers should be 
 instead, or move it all to core.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2396) remove version from contrib/analyzers.

2010-04-15 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12857396#action_12857396
 ] 

Shai Erera commented on LUCENE-2396:


Static? Weren't you against that!? 

But if we remove back compat from analyzers why do we need Version? Or is this 
API bw that we remove?

 remove version from contrib/analyzers.
 --

 Key: LUCENE-2396
 URL: https://issues.apache.org/jira/browse/LUCENE-2396
 Project: Lucene - Java
  Issue Type: Task
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2396.patch


 Contrib/analyzers has no backwards-compatibility policy, so let's remove 
 Version so the API is consumable.
 if you think we shouldn't do this, then instead explicitly state and vote on 
 what the backwards compatibility policy for contrib/analyzers should be 
 instead, or move it all to core.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2397) SnapshotDeletionPolicy.snapshot() throws NPE if no commits happened

2010-04-15 Thread Shai Erera (JIRA)
SnapshotDeletionPolicy.snapshot() throws NPE if no commits happened
---

 Key: LUCENE-2397
 URL: https://issues.apache.org/jira/browse/LUCENE-2397
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1


SDP throws NPE if no commits occurred and snapshot() was called. I will replace 
it w/ throwing IllegalStateException. I'll also move TestSDP from o.a.l to 
o.a.l,index. I'll post a patch soon

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2316) Define clear semantics for Directory.fileLength

2010-04-14 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2316.


Lucene Fields: [New, Patch Available]  (was: [New])
 Assignee: Shai Erera
   Resolution: Fixed

Committed revision 933879.

 Define clear semantics for Directory.fileLength
 ---

 Key: LUCENE-2316
 URL: https://issues.apache.org/jira/browse/LUCENE-2316
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2316.patch


 On this thread: 
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e
  it was mentioned that Directory's fileLength behavior is not consistent 
 between Directory implementations if the given file name does not exist. 
 FSDirectory returns a 0 length while RAMDirectory throws FNFE.
 The problem is that the semantics of fileLength() are not defined. As 
 proposed in the thread, we'll define the following semantics:
 * Returns the length of the file denoted by codename/code if the file 
 exists. The return value may be anything between 0 and Long.MAX_VALUE.
 * Throws FileNotFoundException if the file does not exist. Note that you can 
 call dir.fileExists(name) if you are not sure whether the file exists or not.
 For backwards we'll create a new method w/ clear semantics. Something like:
 {code}
 /**
  * @deprecated the method will become abstract when #fileLength(name) has 
 been removed.
  */
 public long getFileLength(String name) throws IOException {
   long len = fileLength(name);
   if (len == 0  !fileExists(name)) {
 throw new FileNotFoundException(name);
   }
   return len;
 }
 {code}
 The first line just calls the current impl. If it throws exception for a 
 non-existing file, we're ok. The second line verifies whether a 0 length is 
 for an existing file or not and throws an exception appropriately.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856845#action_12856845
 ] 

Shai Erera commented on LUCENE-2159:


This looks like a nice tool. But all it does is create multiple copies of the 
same segment(s) right? So what exactly do you want to test with it? What 
worries me is that we'll be multiplying the lexicon, posting lists, statistics 
etc., therefore I'm not sure how reliable the tests will be (whatever they 
are), except for measuring things related to large number of segments (like 
merge performance). Am I right?

I also think this class better fits in benchmark rather than misc, as it's 
really for perf. testing/measurements and not as a generic utility ... You can 
create a Task out if it, like ExpandIndexTask which one can include in his 
algorithm.

 Tool to expand the index for perf/stress testing.
 -

 Key: LUCENE-2159
 URL: https://issues.apache.org/jira/browse/LUCENE-2159
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 3.0
Reporter: John Wang
 Attachments: ExpandIndex.java


 Sometimes it is useful to take a small-ish index and expand it into a large 
 index with K segments for perf/stress testing. 
 This tool does that. See attached class.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856877#action_12856877
 ] 

Shai Erera commented on LUCENE-2159:


bq. I understand having a general performance suite to test regression is a 
good thing. But we found having a more focused test for segmentation and merge 
is important.

Are you saying that because of the benchmark proposal? I still think that an 
ExpandIndexTask will be useful for benchmark and fits better there, than in 
contrib/misc. We can have that task together w/ a predefined .alg for using it 
...

 Tool to expand the index for perf/stress testing.
 -

 Key: LUCENE-2159
 URL: https://issues.apache.org/jira/browse/LUCENE-2159
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 3.0
Reporter: John Wang
 Attachments: ExpandIndex.java


 Sometimes it is useful to take a small-ish index and expand it into a large 
 index with K segments for perf/stress testing. 
 This tool does that. See attached class.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856911#action_12856911
 ] 

Shai Erera commented on LUCENE-2159:


Which is fine - I think this would be a neat task to add to benchmark, w/ 
specific documentation on how to use it and for what purposes. If you can also 
write a sample .alg file which e.g. creates a small index and then Expand it, 
that'd be great.

I've looked at the different PerfTask implementations in benchmark, and I'm 
thinking if we perhaps should do the following:
* Create an AddIndexesTask which receives one or more Directories as input and 
calls writer.addIndexesNoOptimize
* If one wants, he can add an OptimizeTask call afterwards.
* Write an expandIndex.alg which initially creates an index of size N from one 
content source and then calls the AddIndexesTask several times. The .alg file 
is meant to be an example as well as people can change it to create bigger or 
smaller indexes, use other content sources and switch between RAM/FS 
directories.

How's that sound?

 Tool to expand the index for perf/stress testing.
 -

 Key: LUCENE-2159
 URL: https://issues.apache.org/jira/browse/LUCENE-2159
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 3.0
Reporter: John Wang
 Attachments: ExpandIndex.java


 Sometimes it is useful to take a small-ish index and expand it into a large 
 index with K segments for perf/stress testing. 
 This tool does that. See attached class.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2159) Tool to expand the index for perf/stress testing.

2010-04-14 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2159?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856917#action_12856917
 ] 

Shai Erera commented on LUCENE-2159:


bq. There is an excellent section on it in LIA2

Indeed !

Ok so to create a task, you just extend PerfTask. You can look under 
contrib/benchmark/src/java/o.a.l/benchmark/byTask/tasks for many examples. 
OptimizeTask seems relevant here (i.e. it calls an IW API and receives a 
parameter).

For writing .alg files, that's SUPER simple, just look under 
contrib/benchmark/conf for many existing examples. You can post a patch once 
you feel comfortable enough with it and I can help you with the struggles (if 
you'll run into any). Another great source (besides LIA2) on writing .alg files 
is the package.html under 
contrib/benchmark/src/java/org/apache/lucene/benchmark/byTask.

 Tool to expand the index for perf/stress testing.
 -

 Key: LUCENE-2159
 URL: https://issues.apache.org/jira/browse/LUCENE-2159
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/*
Affects Versions: 3.0
Reporter: John Wang
 Attachments: ExpandIndex.java


 Sometimes it is useful to take a small-ish index and expand it into a large 
 index with K segments for perf/stress testing. 
 This tool does that. See attached class.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-13 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2386.


Resolution: Fixed

Committed revision 933613. (take #2)

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2316) Define clear semantics for Directory.fileLength

2010-04-13 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2316:
---

Attachment: LUCENE-2316.patch

Patch clarifies the contract, fixes the directories to adhere to it and adds a 
CHANGES under backwards section. All tests pass.

 Define clear semantics for Directory.fileLength
 ---

 Key: LUCENE-2316
 URL: https://issues.apache.org/jira/browse/LUCENE-2316
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2316.patch


 On this thread: 
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e
  it was mentioned that Directory's fileLength behavior is not consistent 
 between Directory implementations if the given file name does not exist. 
 FSDirectory returns a 0 length while RAMDirectory throws FNFE.
 The problem is that the semantics of fileLength() are not defined. As 
 proposed in the thread, we'll define the following semantics:
 * Returns the length of the file denoted by codename/code if the file 
 exists. The return value may be anything between 0 and Long.MAX_VALUE.
 * Throws FileNotFoundException if the file does not exist. Note that you can 
 call dir.fileExists(name) if you are not sure whether the file exists or not.
 For backwards we'll create a new method w/ clear semantics. Something like:
 {code}
 /**
  * @deprecated the method will become abstract when #fileLength(name) has 
 been removed.
  */
 public long getFileLength(String name) throws IOException {
   long len = fileLength(name);
   if (len == 0  !fileExists(name)) {
 throw new FileNotFoundException(name);
   }
   return len;
 }
 {code}
 The first line just calls the current impl. If it throws exception for a 
 non-existing file, we're ok. The second line verifies whether a 0 length is 
 for an existing file or not and throws an exception appropriately.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855870#action_12855870
 ] 

Shai Erera commented on LUCENE-2386:


I'm not sure if we're arguing about the same thing here ... why when I open an 
IW on empty Directory I need an empty segment that's created, and from now on 
never changed, populated or even read? That just seems wrong to me ... when I 
fixed the tests to not rely on the buggy behavior, I noticed several which 
count the list of commits (especially the IDP ones) w/ a documentation like 1 
for opening + N for committing ...

It just looks weird that when you open IW a commit happens, a set of empty 
files are created, but from now on they are never modified, until IDP kicks in, 
after the second commit ... it's nothing like initing the Directory to be able 
to receive input ..

And I don't know what's the benefit of doing new IW() following by 
IR.open() ... that IR will always see 0 documents, until you call reopen (if 
commit happened in between). So what's the convenience here? that your code can 
call IR.open once, and from that point forward just 'reopen()'? That seems low 
advantage to me, really. Maybe what we should do is fix IR.open to return a 
null IR in case the directory hasn't been populated w/ anything yet. Then you 
can check easily if you should call open() (==null) or reopen (otherwise). Or 
create a blank stub of IR which emulates an empty Dir, and when reopen is 
called works well (if the Directory is not empty now) ...

BTW, FWIW, Solr's code did not break from this change at all ... it was the 
combination of FSDir and NoLF/SingleInstanceLF that broke some tests that used 
it ... I don't know how many apps out there are using that combination, but I'd 
bet it's small? I use that combination, however in my case an IR is opened only 
after a commit signal/event is raised (so I don't check isCurrent often or 
attempt to reopen()). What I'm trying to say is that this combination is 
dangerous, and the application needs to ensure that only one IW is open at any 
given time, and I'm sure such apps are more sophisticated then opening IW and 
then IR just for the convenience of it.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2316) Define clear semantics for Directory.fileLength

2010-04-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855873#action_12855873
 ] 

Shai Erera commented on LUCENE-2316:


Well ... dir.fileLength is also used by SegmentInfos.sizeInBytes to compute the 
size of all the files in the Directory. If we remove fileLength, then SI will 
need to call dir.openInput.length() and the close it? Seems like a lot of work 
to me, for just obtaining the length of the file. So I agree that if you have 
an IndexInput at hand, you should call its length() method rather than 
Dir.fileLength. But otherwise, if you just have a name at hand, a 
dir.fileLength is convenient?

I'm also ok w/ the bw break rather than going through the new/deprecate cycle.

 Define clear semantics for Directory.fileLength
 ---

 Key: LUCENE-2316
 URL: https://issues.apache.org/jira/browse/LUCENE-2316
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Priority: Minor
 Fix For: 3.1


 On this thread: 
 http://mail-archives.apache.org/mod_mbox/lucene-java-dev/201003.mbox/%3c126142c1003121525v24499625u1589bbef4c079...@mail.gmail.com%3e
  it was mentioned that Directory's fileLength behavior is not consistent 
 between Directory implementations if the given file name does not exist. 
 FSDirectory returns a 0 length while RAMDirectory throws FNFE.
 The problem is that the semantics of fileLength() are not defined. As 
 proposed in the thread, we'll define the following semantics:
 * Returns the length of the file denoted by codename/code if the file 
 exists. The return value may be anything between 0 and Long.MAX_VALUE.
 * Throws FileNotFoundException if the file does not exist. Note that you can 
 call dir.fileExists(name) if you are not sure whether the file exists or not.
 For backwards we'll create a new method w/ clear semantics. Something like:
 {code}
 /**
  * @deprecated the method will become abstract when #fileLength(name) has 
 been removed.
  */
 public long getFileLength(String name) throws IOException {
   long len = fileLength(name);
   if (len == 0  !fileExists(name)) {
 throw new FileNotFoundException(name);
   }
   return len;
 }
 {code}
 The first line just calls the current impl. If it throws exception for a 
 non-existing file, we're ok. The second line verifies whether a 0 length is 
 for an existing file or not and throws an exception appropriately.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855875#action_12855875
 ] 

Shai Erera commented on LUCENE-2392:


Mike - it'll also be great if we can store the length of the document in a 
custom way. I think what I'm saying is that if we can open up the norms 
computation to custom code - that will do what I want, right? Maybe we can have 
a class like DocLengthProvider which apps can plug in if they want to customize 
how that length is computed. Wherever we write the norms, we'll call that impl, 
which by default will do what Lucene does today?
I think though that it's not a field-level setting, but an IW one?

 Enable flexible scoring
 ---

 Key: LUCENE-2392
 URL: https://issues.apache.org/jira/browse/LUCENE-2392
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2392.patch


 This is a first step (nowhere near committable!), implementing the
 design iterated to in the recent Baby steps towards making Lucene's
 scoring more flexible java-dev thread.
 The idea is (if you turn it on for your Field; it's off by default) to
 store full stats in the index, into a new _X.sts file, per doc (X
 field) in the index.
 And then have FieldSimilarityProvider impls that compute doc's boost
 bytes (norms) from these stats.
 The patch is able to index the stats, merge them when segments are
 merged, and provides an iterator-only API.  It also has starting point
 for per-field Sims that use the stats iterator API to compute boost
 bytes.  But it's not at all tied into actual searching!  There's still
 tons left to do, eg, how does one configure via Field/FieldType which
 stats one wants indexed.
 All tests pass, and I added one new TestStats unit test.
 The stats I record now are:
   - field's boost
   - field's unique term count (a b c a a b -- 3)
   - field's total term count (a b c a a b -- 6)
   - total term count per-term (sum of total term count for all docs
 that have this term)
 Still need at least the total term count for each field.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2373) Change StandardTermsDictWriter to work with streaming and append-only filesystems

2010-04-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855877#action_12855877
 ] 

Shai Erera commented on LUCENE-2373:


I'd rather not count on file length as well ... so a put/getTermDictSize method 
on Codec will allow one to implement it however one wants, if running on HDFS 
for example?

 Change StandardTermsDictWriter to work with streaming and append-only 
 filesystems
 -

 Key: LUCENE-2373
 URL: https://issues.apache.org/jira/browse/LUCENE-2373
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Andrzej Bialecki 
 Fix For: 3.1


 Since early 2.x times Lucene used a skip/seek/write trick to patch the length 
 of the terms dict into a place near the start of the output data file. This 
 however made it impossible to use Lucene with append-only filesystems such as 
 HDFS.
 In the post-flex trunk the following code in StandardTermsDictWriter 
 initiates this:
 {code}
 // Count indexed fields up front
 CodecUtil.writeHeader(out, CODEC_NAME, VERSION_CURRENT); 
 out.writeLong(0); // leave space for end 
 index pointer
 {code}
 and completes this in close():
 {code}
   out.seek(CodecUtil.headerLength(CODEC_NAME));
   out.writeLong(dirStart);
 {code}
 I propose to change this layout so that this pointer is stored simply at the 
 end of the file. It's always 8 bytes long, and we known the final length of 
 the file from Directory, so it's a single additional seek(length - 8) to read 
 it, which is not much considering the benefits.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855892#action_12855892
 ] 

Shai Erera commented on LUCENE-2386:


bq. what is the proper way (after this fix) to open an IR over possibly-empty 
directory? 

You can simply call commit() immediately after you open IW. If that's what you 
need then it will work for you.

You're right that if I add docs, deletes and them commits, I'll get an empty 
segment. So is if you do new IW() and then iw.close() w/ no addDocument in 
between. The point here was that we should not create a commit unless the user 
has specifically asked for it. Calling close() means asking for a commit, per 
close semantics and contract. But if the app called new IW, add docs and 
crashed in the middle, the Directory will still remain empty ... which is sort 
of what, IMO, should happen.

I agree it's a matter of perspective. I think that when autoCommit was removed, 
so should have been this code. I don't know if it was left behind for a good 
reason, or simply because when someone tried to do it, he found out it's not 
that simple (like I have :)).

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855913#action_12855913
 ] 

Shai Erera commented on LUCENE-2392:


I'd like to withdraw my request from above. I misunderstood that the stats I 
need are stored per-field per-doc. So that will allow me to compute the 
docLength as I want.

 Enable flexible scoring
 ---

 Key: LUCENE-2392
 URL: https://issues.apache.org/jira/browse/LUCENE-2392
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2392.patch


 This is a first step (nowhere near committable!), implementing the
 design iterated to in the recent Baby steps towards making Lucene's
 scoring more flexible java-dev thread.
 The idea is (if you turn it on for your Field; it's off by default) to
 store full stats in the index, into a new _X.sts file, per doc (X
 field) in the index.
 And then have FieldSimilarityProvider impls that compute doc's boost
 bytes (norms) from these stats.
 The patch is able to index the stats, merge them when segments are
 merged, and provides an iterator-only API.  It also has starting point
 for per-field Sims that use the stats iterator API to compute boost
 bytes.  But it's not at all tied into actual searching!  There's still
 tons left to do, eg, how does one configure via Field/FieldType which
 stats one wants indexed.
 All tests pass, and I added one new TestStats unit test.
 The stats I record now are:
   - field's boost
   - field's unique term count (a b c a a b -- 3)
   - field's total term count (a b c a a b -- 6)
   - total term count per-term (sum of total term count for all docs
 that have this term)
 Still need at least the total term count for each field.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855924#action_12855924
 ] 

Shai Erera commented on LUCENE-2386:


I don't think that people need to write that emptiness-detection-then-commit 
code ... if they care, they can simply immediately call commit() after they 
open IW.

bq. Isn't opening IW with CREATE* mode called specifically asking for?

It depends on how you interpret the mode ... for example, you cannot pass 
OpenMode.APPEND for an empty Directory, because IW throws an exception. The 
modes are just meant to tell IW how to behave:
* APPEND - I know there is an index in the Directory, and I'd like to append to 
it.
* CREATE - I don't care if there is an index in the Directory -- create a new 
one, zeroing out all segments.
* CREATE_OR_APPEND - If there is an index, open it, otherwise create a new one.

So if you pass CREATE on an already populated index, IW doesn't do the implicit 
commit, until you call commit() yourself. But if you pass CREATE on an empty 
index, IW suddenly calls commit()? That's just an inconsistency that's meant to 
allow you to open an IR immediately after new IW() call, irregardless of what 
was there? And if you open that IR, then if the index was populated you see the 
previous set of documents, but if it wasn't you see nothing, even though you 
meant to say override what's there?

I've checked what FileOutputStream does, using the following code:
{code}
File file = new File(d:/temp/tmpfile);
FileOutputStream fos = new FileOutputStream(file);
fos.write(3);
fos.close();
  
fos = new FileOutputStream(file);
FileInputStream fis = new FileInputStream(file);
System.out.println(fis.read());
{code}

* Second line creates an empty file immediately, not waiting for close() or 
flush() -- which resembles the behavior that you're suggesting we should take 
w/ IW (which is the 'today's behavior')
* Forth line closes the file, flushing and writing the content.
* Fifth line *recreates* the file, empty, again, w/o calling close. So it zeros 
out the file content immediately, even before you wrote a single piece of byte 
to it.
* Sixth+Seventh line proves it by attempting to read from the file, and the 
output printed is -1.

I've wrapped the FOS w/ a BufferedOS and the behavior is still the same. So I'm 
trying to show is that we don't fully adhere to the CREATE mode, and rightfully 
if you ask me - we shouldn't zero out the segments until the application called 
commit(). But we choose to adhere differently to the CREATE* mode if the index 
is already populated. That's an inconsistent behavior, at least in my 
perspective. It's also harder to explain and document, e.g. you should call 
commit() if you used CREATE, in case you want to zero out everything 
immediately, and the Directory is not empty, but you don't need to call 
commit() if the directory was empty, Lucene will do it for you. -- so now how 
will the app know if it should call commit()? It will need to write a sort of 
emptiness-detection-then-commit?

I am willing to consider the following semantics:
* APPEND - assumes an index exists and open it.
* CREATE - zeros out everything that's in the directory *immediately*, and also 
prepares an empty directory.
* CREATE_OR_APPEND - either loads an existing index, or is able to work on the 
empty directory. No implicit commit is happening by IW if the index does not 
exist.

But I think CREATE is too dangerous, and so I prefer to stick w/ the proposed 
change to the patch so far -- if you open an index in CREATE*, you should call 
commit before you can read it. That will adhere to the semantics of what the 
application wanted, whether it meant to zero out an existing Directory, or 
create a new one from scratch.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 

[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12856063#action_12856063
 ] 

Shai Erera commented on LUCENE-2386:


So just call new IW(), then rollback and ensure dir.listAll() returns an 
empty list? Or also index stuff, making sure a flush occurs and then rollback? 
I'm not sure that the latter is related to that issue ...

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-12 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Patch includes the proposed test in TestIndexWriter. I think this is ready for 
commit, if there are no more objections.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2386.


Lucene Fields: [New, Patch Available]  (was: [New])
   Resolution: Fixed

Committed revision 932868.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-04-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855713#action_12855713
 ] 

Shai Erera commented on LUCENE-1709:


Committed revision 932878 with the following:
# benchmark tests force sequential run
# threadsPerProcessor defaults to 1 and can be overridden by 
-DthreadsPerProcessor=value
# A CHANGES entry

 Parallelize Tests
 -

 Key: LUCENE-1709
 URL: https://issues.apache.org/jira/browse/LUCENE-1709
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, runLuceneTests.py

   Original Estimate: 48h
  Remaining Estimate: 48h

 The Lucene tests can be parallelized to make for a faster testing system.  
 This task from ANT can be used: 
 http://ant.apache.org/manual/CoreTasks/parallel.html
 Previous discussion: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/69669
 Notes from Mike M.:
 {quote}
 I'd love to see a clean solution here (the tests are embarrassingly
 parallelizable, and we all have machines with good concurrency these
 days)... I have a rather hacked up solution now, that uses
 -Dtestpackage=XXX to split the tests up.
 Ideally I would be able to say use N threads and it'd do the right
 thing... like the -j flag to make.
 {quote}

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855727#action_12855727
 ] 

Shai Erera commented on LUCENE-2386:


Committed revision 932917 for the revert.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Fixes IndexFileDeleter, adds a proper test to TestIndexWriter. Haven't run all 
the tests yet though, but the added test passes now with the fix.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855767#action_12855767
 ] 

Shai Erera commented on LUCENE-2386:


About IndexReader.listCommits ... the javadocs state this There must be at 
least one commit in the Directory, else this method throws 
java.io.IOException.. So I'll change it to reflect the right exception type is 
thrown (IndexNotFoundException) and revert the change to DirReader.listCommits 
which returns an empty list.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-11 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Patch w/ proposed fixes. All tests pass, including Solr's :).

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch, 
 LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-10 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Patch updated to latest rev. + the proposed name change -- 
IndexNotFoundException. All tests pass. I plan to commit this later today.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855344#action_12855344
 ] 

Shai Erera commented on LUCENE-2386:


Ok I've added the following to DirReader:

{code}
try {
  latest.read(dir, codecs);
} catch (FileNotFoundException e) {
  if (e.getMessage().startsWith(no segments* file found in)) {
// Might be that the Directory is empty, in which case just return an
// empty collection.
return Collections.emptyList();
  } else {
throw e;
  }
}
{code}

And now that test passes.

I'll continue discovering tests that fail ... probably backwards will have its 
share too :).

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855369#action_12855369
 ] 

Shai Erera commented on LUCENE-2386:


I already did that ... just didn't post back. Created 
SegmentsFileNotFoundException.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-04-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855379#action_12855379
 ] 

Shai Erera commented on LUCENE-1879:


I have found such version ... and it fails too :). At least the one I received.

But never mind that ... as long as we both agree the implementation should 
change. I didn't mean to say anything bad about what you did .. I know the 
limitations you had to work with.

 Parallel incremental indexing
 -

 Key: LUCENE-1879
 URL: https://issues.apache.org/jira/browse/LUCENE-1879
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
 Fix For: 3.1

 Attachments: parallel_incremental_indexing.tar


 A new feature that allows building parallel indexes and keeping them in sync 
 on a docID level, independent of the choice of the MergePolicy/MergeScheduler.
 Find details on the wiki page for this feature:
 http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
 Discussion on java-dev:
 http://markmail.org/thread/ql3oxzkob7aqf3jd

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

Patch fixes all tests as well as changes to IndexWriter, IndexFileDeleter, 
DirectoryReader and SegmentInfos.

I'd like to commit this shortly, before all the files get changed by a 
malicious other commit :). (kidding of course)

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-09 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855457#action_12855457
 ] 

Shai Erera commented on LUCENE-2386:


Ok sounds good. Is there a preferred package for exceptions? Or is o.a.l.index 
ok?

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch, LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854885#action_12854885
 ] 

Shai Erera commented on LUCENE-2074:


Uwe, must this be coupled with that issue? This one waits for a long time (why? 
for JFlex 1.5 release?) and protecting against a huge buffer allocation can be 
a real quick and tiny fix. And this one also focuses on getting Unicode 5 to 
work, which is unrelated to the buffer size. But the buffer size is not a 
critical issue either that we need to move fast with it ... so it's your call. 
Just thought they are two unrelated problems.

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2074) Use a separate JFlex generated Unicode 4 by Java 5 compatible StandardTokenizer

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854887#action_12854887
 ] 

Shai Erera commented on LUCENE-2074:


bq. I plan to commit this soon! 

That's great news !

BTW - what are you going to do w/ the JFlex 1.5 binary? Are you going to check 
it in somewhere? because it hasn't been released last I checked. I'm asking for 
general knowledge, because I know the scripts are downloading it, or rely on it 
to exist somewhere.

In that case, then yes, let's fix it here.

 Use a separate JFlex generated Unicode 4 by Java 5 compatible 
 StandardTokenizer
 ---

 Key: LUCENE-2074
 URL: https://issues.apache.org/jira/browse/LUCENE-2074
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 3.0
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: jflex-1.4.1-vs-1.5-snapshot.diff, jflexwarning.patch, 
 LUCENE-2074-lucene30.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, LUCENE-2074.patch, 
 LUCENE-2074.patch


 The current trunk version of StandardTokenizerImpl was generated by Java 1.4 
 (according to the warning). In Java 3.0 we switch to Java 1.5, so we should 
 regenerate the file.
 After regeneration the Tokenizer behaves different for some characters. 
 Because of that we should only use the new TokenizerImpl when 
 Version.LUCENE_30 or LUCENE_31 is used as matchVersion.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854920#action_12854920
 ] 

Shai Erera commented on LUCENE-1482:


I still think that calling isDebugEnabled is better, because the message 
formatting stuff may do unnecessary things like casting, autoboxing etc. IMO, 
if logging is enabled, evaluating it twice is not a big deal ... it's a simple 
check.

I'm glad someone here thinks logging will be useful though :). I wish there 
will be quorum here to proceed w/ that.

Note that I also offered to not create any dependency on SLF4J, but rather 
extract infoStream to a static InfoStream class, which will avoid passing it 
around everywhere, and give the flexibility to output stuff from other classes 
which don't have an infoStream at hand.

 Replace infoSteram by a logging framework (SLF4J)
 -

 Key: LUCENE-1482
 URL: https://issues.apache.org/jira/browse/LUCENE-1482
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, 
 slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar


 Lucene makes use of infoStream to output messages in its indexing code only. 
 For debugging purposes, when the search application is run on the customer 
 side, getting messages from other code flows, like search, query parsing, 
 analysis etc can be extremely useful.
 There are two main problems with infoStream today:
 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
 other classes I need to either expose an API or propagate infoStream to all 
 classes (see for example DocumentsWriter, which receives its infoStream 
 instance from IndexWriter).
 2. I can either turn debugging on or off, for the entire code.
 Introducing a logging framework can allow each class to control its logging 
 independently, and more importantly, allows the application to turn on 
 logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
 I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
 as it names states, a facade over different logging frameworks. As such, you 
 can include the slf4j.jar in your application, and it recognizes at deploy 
 time what is the actual logging framework you'd like to use. SLF4J comes with 
 several adapters for Java logging, Log4j and others. If you know your 
 application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
 your classpath, and your logging statements will use Java logging underneath 
 the covers.
 This makes the logging code very simple. For a class A the logger will be 
 instantiated like this:
 public class A {
   private static final logger = LoggerFactory.getLogger(A.class);
 }
 And will later be used like this:
 public class A {
   private static final logger = LoggerFactory.getLogger(A.class);
   public void foo() {
 if (logger.isDebugEnabled()) {
   logger.debug(message);
 }
   }
 }
 That's all !
 Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
 (but I assume it's fast also over other logging frameworks).
 The important thing is, every class controls its own logger. Not all classes 
 have to output logging messages, and we can improve Lucene's logging 
 gradually, w/o changing the API, by adding more logging messages to 
 interesting classes.
 I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855020#action_12855020
 ] 

Shai Erera commented on LUCENE-1709:


Robert, I will commit the patch, seems good to do anyway. We can handle the ant 
jars separately later.

And ths hang behavior is exactly what I experience, including the 
FileInputStream thing. Only on my machine, when I took a thread dump, it showed 
that Ant waits on FIS.read() ...

Robert - to remind you that even with the patch which forces junit to use a 
separate temp folder per thread, it still hung ... 

 Parallelize Tests
 -

 Key: LUCENE-1709
 URL: https://issues.apache.org/jira/browse/LUCENE-1709
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, runLuceneTests.py

   Original Estimate: 48h
  Remaining Estimate: 48h

 The Lucene tests can be parallelized to make for a faster testing system.  
 This task from ANT can be used: 
 http://ant.apache.org/manual/CoreTasks/parallel.html
 Previous discussion: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/69669
 Notes from Mike M.:
 {quote}
 I'd love to see a clean solution here (the tests are embarrassingly
 parallelizable, and we all have machines with good concurrency these
 days)... I have a rather hacked up solution now, that uses
 -Dtestpackage=XXX to split the tests up.
 Ideally I would be able to say use N threads and it'd do the right
 thing... like the -j flag to make.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)
Move NoDeletionPolicy from benchmark to core


 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1


As the subject says, but I'll also make it a singleton + add some unit tests, 
as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)
IndexWriter commits unnecessarily on fresh Directory


 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1


I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
unnecessarily, and kind of brings back an autoCommit mode, in a strange way ... 
why do we need that commit? Do we really expect people to open an IndexReader 
on an empty Directory which they just passed to an IW w/ create=true? If they 
want, they can simply call commit() right away on the IW they created.

I ran into this when writing a test which committed N times, then compared the 
number of commits (via IndexReader.listCommits) and was surprised to see N+1 
commits.

Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
jumping on me .. so the change might not be that simple. But I think it's 
manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2385:
---

Attachment: LUCENE-2385.patch

Move NoDeletionPolicy to core, adds javadocs + TestNoDeletionPolicy. Also 
includes the relevant changes to benchmark (algorithms + CreateIndexTask).
I've fixed a typo I had in NoMergeScheduler - not related to this issue, but 
since it was just a typo, thought it's no harm to do it here.

Tests pass. Planning to commit shortly.

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855131#action_12855131
 ] 

Shai Erera commented on LUCENE-2386:


Took a look at IndexFileDeleter, and located to offending code segment which is 
responsible for the IndexCorruptException:
{code}
if (currentCommitPoint == null) {
  // We did not in fact see the segments_N file
  // corresponding to the segmentInfos that was passed
  // in.  Yet, it must exist, because our caller holds
  // the write lock.  This can happen when the directory
  // listing was stale (eg when index accessed via NFS
  // client with stale directory listing cache).  So we
  // try now to explicitly open this commit point:
  SegmentInfos sis = new SegmentInfos();
  try {
sis.read(directory, segmentInfos.getCurrentSegmentFileName(), codecs);
  } catch (IOException e) {
throw new CorruptIndexException(failed to locate current segments_N 
file);
  }
{code}

Looks like this code protects against a real problem, which was raised on the 
list a couple of times already - stale NFS cache. So I'm reluctant to remove 
that check ... thought I still think we should differentiate between a newly 
created index on a fresh Directory, to a stale NFS problem. Maybe we can pass a 
boolean isNew or something like that to the ctor, and if it's a new index and 
the last commit point is missing, IFD will not throw the exception, but 
silently ignore that? So the code would become something like this:
{code}
if (currentCommitPoint == null  !isNew) {
   
}
{code}

Does this make sense, or am I missing something?

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855140#action_12855140
 ] 

Shai Erera commented on LUCENE-2385:


I did that first, but then remembered that when I did that in the past, people 
were unable to apply my patches, w/o doing the svn move themselves. Anyway, for 
this file it's not really important I think - a very simple and tiny file, w/ 
no history to preserve? Is that ok for this file (b/c I have no idea how to do 
the svn move now ... after I've made all the changes already) :)

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855148#action_12855148
 ] 

Shai Erera commented on LUCENE-2386:


Looking at IFD again, I think a boolean ctor arg is not required. What I can do 
is check if any Lucene file has been seen (in the for-loop iteration on the 
Directory files), and if not, then deduce it's a new Directory, and skip that 
'if' check. I'll give it a shot.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2385:
---

Attachment: LUCENE-2385.patch

Is it better now?

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch, LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855155#action_12855155
 ] 

Shai Erera commented on LUCENE-2385:


Forgot to mention that the only move I made was of NoDeletionPolicy:

svn move 
contrib/benchmark/src/java/org/apache/lucene/benchmark/utils/NoDeletionPolicy.java
 src/java/org/apache/lucene/index/NoDeletionPolicy.java

I'll remember that in the future Uwe - thanks for the heads up !

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch, LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2385) Move NoDeletionPolicy from benchmark to core

2010-04-08 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2385.


Resolution: Fixed

Committed revision 932129.

 Move NoDeletionPolicy from benchmark to core
 

 Key: LUCENE-2385
 URL: https://issues.apache.org/jira/browse/LUCENE-2385
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark, Index
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2385.patch, LUCENE-2385.patch


 As the subject says, but I'll also make it a singleton + add some unit tests, 
 as well as some documentation. I'll post a patch hopefully today.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2386:
---

Attachment: LUCENE-2386.patch

First stab at this. Patch still missing CHANGES entry, and I haven't run all 
the tests, just TestIndexWriter. With those changes it passes. One thing that I 
think should be fixed is testImmediateDiskFull - if I don't add 
writer.commit(), the test fails, because dir.getRecomputeActualSizeInBytes 
returns 0 (no RAMFiles yet), and then the test succeeds at adding one document. 
So maybe just change the test to set maxSizeInBytes to '1', always?

TestNoDeletionPolicy is not covered by this patch (should be fixed as well, 
because now the number of commits is exactly N and not N+1). Will fix it 
tomorrow.

Anyway, it's really late now, so hopefully some fresh eyes will look at it 
while I'm away, and comment on the proposed changes. I hope I got all the 
changes to the tests right.

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855265#action_12855265
 ] 

Shai Erera commented on LUCENE-2386:


bq. Maybe change testImmediateDiskFull to set max allowed size to max(1, 
current-usage)?

Good idea ! Did it and it works.

Now ... one thing I haven't mentioned is the bw break. This is a behavioral bw 
break, which specifically I'm not so sure we should care about, because I 
wonder how many apps out there rely on being able to open a reader before they 
ever commited on a fresh new index. So what do you think - do this change 
anyway, OR ... utilize Version to our aid? I.e., if the Version that was passed 
to IWC is before LUCENE_31, we keep the initial commit, otherwise we don't do 
it? Pros is that I won't need to change many of the tests because they still 
use the LUCENE_30 version (but that is not a strong argument), so it's a weak 
Pro. Cons is that IW will keep having that doCommit handling in its ctor, only 
now w/ added comments on why this is being kept around etc.

What do you think?

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2386) IndexWriter commits unnecessarily on fresh Directory

2010-04-08 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2386?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12855277#action_12855277
 ] 

Shai Erera commented on LUCENE-2386:


Apparently, there are more tests that fail ... lost count but easy fixing. I 
tried writing the following test:

{code}
  public void testNoCommits() throws Exception {
// Tests that if we don't call commit(), the directory has 0 commits. This 
has
// changed since LUCENE-2386, where before IW would always commit on a fresh
// new index.
Directory dir = new RAMDirectory();
IndexWriter writer = new IndexWriter(dir, new 
IndexWriterConfig(TEST_VERSION_CURRENT, new 
WhitespaceAnalyzer(TEST_VERSION_CURRENT)));
assertEquals(expected 0 commits!, 0, IndexReader.listCommits(dir).size());
// No changes still should generate a commit, because it's a new index.
writer.close();
assertEquals(expected 1 commits!, 0, IndexReader.listCommits(dir).size());
  }
{code}

Simple test - validates that no commits are present following a freshly new 
index creation, w/o closing or committing. However, IndexReader.listCommits 
fails w/ the following exception:

{code}
java.io.FileNotFoundException: no segments* file found in 
org.apache.lucene.store.ramdirect...@2d262d26: files: []
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:652)
at 
org.apache.lucene.index.SegmentInfos$FindSegmentsFile.run(SegmentInfos.java:535)
at org.apache.lucene.index.SegmentInfos.read(SegmentInfos.java:323)
at 
org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1033)
at 
org.apache.lucene.index.DirectoryReader.listCommits(DirectoryReader.java:1023)
at 
org.apache.lucene.index.IndexReader.listCommits(IndexReader.java:1341)
at 
org.apache.lucene.index.TestIndexWriter.testNoCommits(TestIndexWriter.java:4966)
   
{code}

The failure occurs when SegmentInfos attempts to find segments.gen and fails. 
So I wonder if I should fix DirectoryReader to catch that exception and simply 
return an empty Collection .. or I should fix SegmentInfos at this point -- 
notice the files: [] at the end - I think that by adding a check to the 
following code (SegmentInfos, line 652) which validates that there were any 
files before throwing the exception, it'll still work properly and safely (i.e. 
to detect a problematic Directory). Will need probably to break away from the 
while loop and I guess fix some other things in upper layers ... therefore I'm 
not sure if I should not simply catch that exception in 
DirectoryReader.listCommits w/ proper documentation and be done w/ it. After 
all, it's not supposed to be called ... ever? or hardly ever?

{code}
  if (gen == -1) {
// Neither approach found a generation
throw new FileNotFoundException(no segments* file found in  + 
directory + : files:  + Arrays.toString(files));
  }
{code}

 IndexWriter commits unnecessarily on fresh Directory
 

 Key: LUCENE-2386
 URL: https://issues.apache.org/jira/browse/LUCENE-2386
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2386.patch


 I've noticed IndexWriter's ctor commits a first commit (empty one) if a fresh 
 Directory is passed, w/ OpenMode.CREATE or CREATE_OR_APPEND. This seems 
 unnecessarily, and kind of brings back an autoCommit mode, in a strange way 
 ... why do we need that commit? Do we really expect people to open an 
 IndexReader on an empty Directory which they just passed to an IW w/ 
 create=true? If they want, they can simply call commit() right away on the IW 
 they created.
 I ran into this when writing a test which committed N times, then compared 
 the number of commits (via IndexReader.listCommits) and was surprised to see 
 N+1 commits.
 Tried to change doCommit to false in IW ctor, but it got IndexFileDeleter 
 jumping on me .. so the change might not be that simple. But I think it's 
 manageable, so I'll try to attack it (and IFD specifically !) back :).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-1709) Parallelize Tests

2010-04-07 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-1709:
---

Attachment: LUCENE-1709-2.patch

Since I had the changes on my local env. I thought it's best to generate a 
patch out of them, so they don't get lost. The patch doesn't cover the ant 
.jars, only the changes to common-build.xml as well as benchmark/build.xml

 Parallelize Tests
 -

 Key: LUCENE-1709
 URL: https://issues.apache.org/jira/browse/LUCENE-1709
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-1709-2.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, runLuceneTests.py

   Original Estimate: 48h
  Remaining Estimate: 48h

 The Lucene tests can be parallelized to make for a faster testing system.  
 This task from ANT can be used: 
 http://ant.apache.org/manual/CoreTasks/parallel.html
 Previous discussion: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/69669
 Notes from Mike M.:
 {quote}
 I'd love to see a clean solution here (the tests are embarrassingly
 parallelizable, and we all have machines with good concurrency these
 days)... I have a rather hacked up solution now, that uses
 -Dtestpackage=XXX to split the tests up.
 Ideally I would be able to say use N threads and it'd do the right
 thing... like the -j flag to make.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark

2010-04-07 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2377.


Resolution: Fixed

Committed revision 931502.

 Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
 -

 Key: LUCENE-2377
 URL: https://issues.apache.org/jira/browse/LUCENE-2377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2377.patch


 Benchmark allows one to set the MP and MS to use, by defining the class name 
 and then use reflection to instantiate them. However NoMP and NoMS are 
 singletons and therefore reflection does not work for them. Easy fix in 
 CreateIndexTask. I'll post a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-04-07 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854588#action_12854588
 ] 

Shai Erera commented on LUCENE-2353:


Actually, we've reopened LUCENE-1709 to track that. This is not related to this 
issue's changes, but seems to be related to benchmark test in specifically. 
Please have a look there at a patch I've posted which forces benchmark tests to 
run in sequential mode. Additionally, you can 'ant test -Drunsequential=1' from 
the command line, benchmark's root folder, to achieve the same.
And it'd be great if you post the above on LUCENE-1709 as well -- because now I 
know I'm not the only one running into this :).

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1709) Parallelize Tests

2010-04-06 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12854348#action_12854348
 ] 

Shai Erera commented on LUCENE-1709:


One more thing - change benchmark tests to run sequentially (by adding the 
property).
Robert, are you going to tackle that soon?

 Parallelize Tests
 -

 Key: LUCENE-1709
 URL: https://issues.apache.org/jira/browse/LUCENE-1709
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 2.4.1
Reporter: Jason Rutherglen
Assignee: Robert Muir
 Fix For: 3.1

 Attachments: LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, 
 LUCENE-1709.patch, LUCENE-1709.patch, LUCENE-1709.patch, runLuceneTests.py

   Original Estimate: 48h
  Remaining Estimate: 48h

 The Lucene tests can be parallelized to make for a faster testing system.  
 This task from ANT can be used: 
 http://ant.apache.org/manual/CoreTasks/parallel.html
 Previous discussion: 
 http://www.gossamer-threads.com/lists/lucene/java-dev/69669
 Notes from Mike M.:
 {quote}
 I'd love to see a clean solution here (the tests are embarrassingly
 parallelizable, and we all have machines with good concurrency these
 days)... I have a rather hacked up solution now, that uses
 -Dtestpackage=XXX to split the tests up.
 Ideally I would be able to say use N threads and it'd do the right
 thing... like the -j flag to make.
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark

2010-04-06 Thread Shai Erera (JIRA)
Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
-

 Key: LUCENE-2377
 URL: https://issues.apache.org/jira/browse/LUCENE-2377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1


Benchmark allows one to set the MP and MS to use, by defining the class name 
and then use reflection to instantiate them. However NoMP and NoMS are 
singletons and therefore reflection does not work for them. Easy fix in 
CreateIndexTask. I'll post a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2377) Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark

2010-04-06 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2377:
---

Attachment: LUCENE-2377.patch

Patch includes both fix to CreateIndexTask as well as relevant tests to 
CreateIndexTaskTest. I plan to commit later today if there are no objections.

 Enable the use of NoMergePolicy and NoMergeScheduler by Benchmark
 -

 Key: LUCENE-2377
 URL: https://issues.apache.org/jira/browse/LUCENE-2377
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2377.patch


 Benchmark allows one to set the MP and MS to use, by defining the class name 
 and then use reflection to instantiate them. However NoMP and NoMS are 
 singletons and therefore reflection does not work for them. Easy fix in 
 CreateIndexTask. I'll post a patch soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-31 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851829#action_12851829
 ] 

Shai Erera commented on LUCENE-2310:


+1 for this simplification. Can we just name it Indexable, and omit Document 
from it? That way, it's both shorter and less chances for users to directly 
link it w/ Document.

One thing I didn't understand though, is what will happen to ir/is.doc() 
method? Will those be deprecated in favor of some other class which receives an 
IR as parameter and knows how to re-construct Indexable(Document)?

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-31 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera reassigned LUCENE-2353:
--

Assignee: Shai Erera

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-31 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851836#action_12851836
 ] 

Shai Erera commented on LUCENE-2353:


Unless there are objections, I plan to commit this shortly

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2310) Reduce Fieldable, AbstractField and Field complexity

2010-03-31 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2310?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12851842#action_12851842
 ] 

Shai Erera commented on LUCENE-2310:


Right Earwin - agreed.

I'd like to summarize a brief discussion we had on IRC around that:
The idea is not to provide another interface/class for search purposes, but 
rather expose the right API from IndexReader, even if it might be a bit 
low-level. API like getIndexedFields(docId) and getStorefFields(docId), both 
optionally take a FieldSelector, should allow the application to re-construct 
its Indexable however it wants. And IR/IS don't need to know anything about 
that.
To complete the picture for current users, we can have a static reconstruct() 
on Document which takes IR, docId and FieldSelector ...

BTW, I'm not even sure getIndedxedFields can be efficiently supported today. 
Just listing it here for completeness.

 Reduce Fieldable, AbstractField and Field complexity
 

 Key: LUCENE-2310
 URL: https://issues.apache.org/jira/browse/LUCENE-2310
 Project: Lucene - Java
  Issue Type: Sub-task
  Components: Index
Reporter: Chris Male
 Attachments: LUCENE-2310-Deprecate-AbstractField-CleanField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-AbstractField.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields-core.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch, 
 LUCENE-2310-Deprecate-DocumentGetFields.patch


 In order to move field type like functionality into its own class, we really 
 need to try to tackle the hierarchy of Fieldable, AbstractField and Field.  
 Currently AbstractField depends on Field, and does not provide much more 
 functionality that storing fields, most of which are being moved over to 
 FieldType.  Therefore it seems ideal to try to deprecate AbstractField (and 
 possible Fieldable), moving much of the functionality into Field and 
 FieldType.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-31 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera resolved LUCENE-2353.


Resolution: Fixed

Committed revision 929520.

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
Assignee: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-29 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2353:
---

Attachment: LUCENE-2353.patch

Updated to also match 'c:/temp' like paths, which are also accepted on Windows

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch, LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-28 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850644#action_12850644
 ] 

Shai Erera commented on LUCENE-2353:


I don't have an account yet, so I cannot commit this on my own. Any volunteers?

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-27 Thread Shai Erera (JIRA)
Config incorrectly handles Windows absolute pathnames
-

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
 Fix For: 3.1


I have no idea how no one ran into this so far, but I tried to execute an .alg 
file which used ReutersContentSource and referenced both docs.dir and work.dir 
as Windows absolute pathnames (e.g. d:\something). Surprisingly, the run 
reported an error of missing content under benchmark\work\something.

I've traced the problem back to Config, where get(String, String) includes the 
following code:
{code}
if (sval.indexOf(:)  0) {
  return sval;
}
// first time this prop is extracted by round
int k = sval.indexOf(:);
String colName = sval.substring(0, k);
sval = sval.substring(k + 1);
...
{code}

It detects : in the value and so it thinks it's a per-round property, thus 
stripping d: from the value ... fix is very simple:
{code}
if (sval.indexOf(:)  0) {
  return sval;
} else if (sval.indexOf(:\\) = 0) {
  // this previously messed up absolute path names on Windows. Assuming
  // there is no real value that starts with \\
  return sval;
}
// first time this prop is extracted by round
int k = sval.indexOf(:);
String colName = sval.substring(0, k);
sval = sval.substring(k + 1);
{code}

I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2353) Config incorrectly handles Windows absolute pathnames

2010-03-27 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2353:
---

Attachment: LUCENE-2353.patch

The fix is only relevant to get(String, String) and not to all other 
get(String, type) variants.

Benchmark test passed but after I svn up (to include the latest parallel test 
thing) the test just sits idle (after finishing), waiting for something. If I 
run the tests in eclipse they pass. So I'm guessing it's a problem w/ my env. 
or build.xml?

I also tried 'ant clean test' from within benchmark, but it didn't help. I then 
tried 'ant clean' from root, and 'ant test' from benchmark, but the test just 
keeps waiting on WriteLineDocTaskTest, on this line:
[junit]  config properties:
[junit] directory = RAMDirectory
[junit] doc.maker = 
org.apache.lucene.benchmark.byTask.tasks.WriteLineDocTaskTest$JustDateDocMaker
[junit] line.file.out = 
D:\dev\lucene\lucene-trunk\build\contrib\benchmark\test\W\one-line
[junit] ---

I think this can go in (if it passes on someone else's machine, while I figure 
out what's wrong in my env. separately.

 Config incorrectly handles Windows absolute pathnames
 -

 Key: LUCENE-2353
 URL: https://issues.apache.org/jira/browse/LUCENE-2353
 Project: Lucene - Java
  Issue Type: Bug
  Components: contrib/benchmark
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2353.patch


 I have no idea how no one ran into this so far, but I tried to execute an 
 .alg file which used ReutersContentSource and referenced both docs.dir and 
 work.dir as Windows absolute pathnames (e.g. d:\something). Surprisingly, the 
 run reported an error of missing content under benchmark\work\something.
 I've traced the problem back to Config, where get(String, String) includes 
 the following code:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 ...
 {code}
 It detects : in the value and so it thinks it's a per-round property, thus 
 stripping d: from the value ... fix is very simple:
 {code}
 if (sval.indexOf(:)  0) {
   return sval;
 } else if (sval.indexOf(:\\) = 0) {
   // this previously messed up absolute path names on Windows. Assuming
   // there is no real value that starts with \\
   return sval;
 }
 // first time this prop is extracted by round
 int k = sval.indexOf(:);
 String colName = sval.substring(0, k);
 sval = sval.substring(k + 1);
 {code}
 I'll post a patch w/ the above fix + test shortly.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader

2010-03-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850075#action_12850075
 ] 

Shai Erera commented on LUCENE-2345:


Earwin, w/o knowing too much about the details of your work, I wanted to 
comment on get rid of of init/reinit/moreinit methods, moving the code to 
constructors. I work now on Parallel Index and one of the things I do is 
extend IW. Currently, IW's ctor code performs the initialization, however I'm 
thinking to move that code to an init method. The reason is to allow easy 
extensions of IW, such as LUCENE-2330. There I'm going to add a default ctor to 
IW, accompanied by an init method the extending class can call if needed. So 
what I'm trying to say is that init methods are not always bad, and sometimes 
ctors limit you. Perhaps it would make sense though in what you're trying to do 
...

 Make it possible to subclass SegmentReader
 --

 Key: LUCENE-2345
 URL: https://issues.apache.org/jira/browse/LUCENE-2345
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Reporter: Tim Smith
 Fix For: 3.1

 Attachments: LUCENE-2345_3.0.patch


 I would like the ability to subclass SegmentReader for numerous reasons:
 * to capture initialization/close events
 * attach custom objects to an instance of a segment reader (caches, 
 statistics, so on and so forth)
 * override methods on segment reader as needed
 currently this isn't really possible
 I propose adding a SegmentReaderFactory that would allow creating custom 
 subclasses of SegmentReader
 default implementation would be something like:
 {code}
 public class SegmentReaderFactory {
   public SegmentReader get(boolean readOnly) {
 return readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
   }
   public SegmentReader reopen(SegmentReader reader, boolean readOnly) {
 return newSegmentReader(readOnly);
   }
 }
 {code}
 It would then be made possible to pass a SegmentReaderFactory to IndexWriter 
 (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, 
 etc)
 I could prepare a patch if others think this has merit
 Obviously, this API would be experimental/advanced/will change in future

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader

2010-03-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850083#action_12850083
 ] 

Shai Erera commented on LUCENE-2345:


Thanks Uwe, I know that ctor is the preferred way, and in the process of 
introducing IWC I delete IW.init which all ctors called and pulled all the code 
to IW ctor. I will make that init() on IW final. But sometimes putting code in 
init() is not bad (and it's used in Lucene elsewhere too (e.g. PQ and up until 
recently IW).

 Make it possible to subclass SegmentReader
 --

 Key: LUCENE-2345
 URL: https://issues.apache.org/jira/browse/LUCENE-2345
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Reporter: Tim Smith
 Fix For: 3.1

 Attachments: LUCENE-2345_3.0.patch


 I would like the ability to subclass SegmentReader for numerous reasons:
 * to capture initialization/close events
 * attach custom objects to an instance of a segment reader (caches, 
 statistics, so on and so forth)
 * override methods on segment reader as needed
 currently this isn't really possible
 I propose adding a SegmentReaderFactory that would allow creating custom 
 subclasses of SegmentReader
 default implementation would be something like:
 {code}
 public class SegmentReaderFactory {
   public SegmentReader get(boolean readOnly) {
 return readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
   }
   public SegmentReader reopen(SegmentReader reader, boolean readOnly) {
 return newSegmentReader(readOnly);
   }
 }
 {code}
 It would then be made possible to pass a SegmentReaderFactory to IndexWriter 
 (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, 
 etc)
 I could prepare a patch if others think this has merit
 Obviously, this API would be experimental/advanced/will change in future

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850086#action_12850086
 ] 

Shai Erera commented on LUCENE-2215:


Sure let's wait for the patch and some perf. results.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader

2010-03-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850094#action_12850094
 ] 

Shai Erera commented on LUCENE-2345:


Earwin, I wholeheartedly agree with what you wrote. If we could refactor IW and 
extract it to a set of interfaces, then I agree (and Michael B. has an issue 
open for that). I think though that IW's API is already that interface (give or 
take few methods). So perhaps this can be an easy refactoring - introduce an 
Indexer (a la Searcher) class (or interface) w/ all of IW public methods, and 
then let PW extend/impl that class/interface as well as IW. We can also 
consider making IW itself final this way (though bw police will prevent it :)).

Then when PW sets up the slices, it can create them as IW or any other IW-like 
implementation it needs them to impl. If it sounds good enough to become its 
own issue, I can open one and we can continue discussing it there (and leave 
that issue focused on extending SR). Then I'll hold off w/ LUCENE-2330, or 
simply rename it to reflect that Indexer API.

 Make it possible to subclass SegmentReader
 --

 Key: LUCENE-2345
 URL: https://issues.apache.org/jira/browse/LUCENE-2345
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Reporter: Tim Smith
 Fix For: 3.1

 Attachments: LUCENE-2345_3.0.patch


 I would like the ability to subclass SegmentReader for numerous reasons:
 * to capture initialization/close events
 * attach custom objects to an instance of a segment reader (caches, 
 statistics, so on and so forth)
 * override methods on segment reader as needed
 currently this isn't really possible
 I propose adding a SegmentReaderFactory that would allow creating custom 
 subclasses of SegmentReader
 default implementation would be something like:
 {code}
 public class SegmentReaderFactory {
   public SegmentReader get(boolean readOnly) {
 return readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
   }
   public SegmentReader reopen(SegmentReader reader, boolean readOnly) {
 return newSegmentReader(readOnly);
   }
 }
 {code}
 It would then be made possible to pass a SegmentReaderFactory to IndexWriter 
 (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, 
 etc)
 I could prepare a patch if others think this has merit
 Obviously, this API would be experimental/advanced/will change in future

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850313#action_12850313
 ] 

Shai Erera commented on LUCENE-1879:


The way I planned to support multi-threaded indexing is to do a two-phase 
addDocument. First, allocate a doc ID from DocumentsWriter (synchronized) and 
then add the Document to each Slice with that doc ID. DocumentsWriter was not 
suppose to know it is a parallel index ... something like the following.
{code}
int docId = obtainDocId();
for (IndexWriter slice : slices) {
  slice.addDocument(docId, Document);
}
{code}

That allows ParallelWriter to be really an orchestrator/manager of all slices, 
while each slice can be an IW on its own.

Now, when you say ParallelDocumentsWriter, I assume you mean that that 
DocWriter will be aware of the slices? That I think is an interesting idea, 
which is unrelated to LUCENE-2324. I.e., ParallelWriter will invoke its 
addDocument code which will get down to ParallelDocumentWriter, which will 
allocate the doc ID itself and call each slice's DocWriter.addDocument? And 
then LUCENE-2324 will just improve the performance of that process?

This might require a bigger change to IW then I had anticipated, but perhaps 
it's worth it.

What do you think?

 Parallel incremental indexing
 -

 Key: LUCENE-1879
 URL: https://issues.apache.org/jira/browse/LUCENE-1879
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
 Fix For: 3.1

 Attachments: parallel_incremental_indexing.tar


 A new feature that allows building parallel indexes and keeping them in sync 
 on a docID level, independent of the choice of the MergePolicy/MergeScheduler.
 Find details on the wiki page for this feature:
 http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
 Discussion on java-dev:
 http://markmail.org/thread/ql3oxzkob7aqf3jd

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1879) Parallel incremental indexing

2010-03-26 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1879?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850336#action_12850336
 ] 

Shai Erera commented on LUCENE-1879:


Hi Grant - I believe what you describe is related to solving the incremental 
field updates problem, where someone might want to change the value of a 
specific document's field. But PI is not about that. Rather, PI is about 
updating a whole slice at once, ie, changing a field's value across all docs, 
or adding a field to all docs (I believe such question was asked on the user 
list few days ago). I've listed above several scenarios where PI is useful for, 
but unfortunately it is unrelated to incremental field updates.

If I misunderstood you, then please clarify.

Re incremental field updates, I think your direction is interesting, and 
deserves discussion, but in a separate issue/thread?

 Parallel incremental indexing
 -

 Key: LUCENE-1879
 URL: https://issues.apache.org/jira/browse/LUCENE-1879
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
 Fix For: 3.1

 Attachments: parallel_incremental_indexing.tar


 A new feature that allows building parallel indexes and keeping them in sync 
 on a docID level, independent of the choice of the MergePolicy/MergeScheduler.
 Find details on the wiki page for this feature:
 http://wiki.apache.org/lucene-java/ParallelIncrementalIndexing 
 Discussion on java-dev:
 http://markmail.org/thread/ql3oxzkob7aqf3jd

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2345) Make it possible to subclass SegmentReader

2010-03-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849728#action_12849728
 ] 

Shai Erera commented on LUCENE-2345:


bq. The IndexWriter now has a getter and setter for setting this

If this is not expected to change during the lifetime of IW, I think it should 
be added to IWC when you upgrade the patch to 3.1.

 Make it possible to subclass SegmentReader
 --

 Key: LUCENE-2345
 URL: https://issues.apache.org/jira/browse/LUCENE-2345
 Project: Lucene - Java
  Issue Type: Wish
  Components: Index
Reporter: Tim Smith
 Fix For: 3.1

 Attachments: LUCENE-2345_3.0.patch


 I would like the ability to subclass SegmentReader for numerous reasons:
 * to capture initialization/close events
 * attach custom objects to an instance of a segment reader (caches, 
 statistics, so on and so forth)
 * override methods on segment reader as needed
 currently this isn't really possible
 I propose adding a SegmentReaderFactory that would allow creating custom 
 subclasses of SegmentReader
 default implementation would be something like:
 {code}
 public class SegmentReaderFactory {
   public SegmentReader get(boolean readOnly) {
 return readOnly ? new ReadOnlySegmentReader() : new SegmentReader();
   }
   public SegmentReader reopen(SegmentReader reader, boolean readOnly) {
 return newSegmentReader(readOnly);
   }
 }
 {code}
 It would then be made possible to pass a SegmentReaderFactory to IndexWriter 
 (for pooled readers) as well as to SegmentReader.get() (DirectoryReader.open, 
 etc)
 I could prepare a patch if others think this has merit
 Obviously, this API would be experimental/advanced/will change in future

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-25 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12850002#action_12850002
 ] 

Shai Erera commented on LUCENE-2215:


bq. since I think it's safe to say most applications implement paging

Let's be careful about the semantics here Grant. Most if not all applications 
implement paging indeed, but I believe only FEW actually store user contexts 
between searches. PagingCollector relies on the application to store the lowest 
ranking doc that was returned previously, which means storing context between 
user's searches.

I agree w/ Mike's statement about 99.9% of the searches would never run that 
code, which is why I've proposed a delegation/wrapper approach from the 
beginning. I also think that we should make some allowances here and there, for 
the non-common case, and introduce better software design than specialized 
code. A Collector filter approach for some rare (or even less common) cases 
seems very reasonable to me.

Also, I think that if we add to TSDC a create method which takes into account 
the previously scored lowest doc, it will confuse people. Now they will need to 
think where do I get this low score from? - but perhaps after I see the code, 
it wouldn't be such a bad thing  just have a feeling TSDC and TFC should be 
left on their own, and extreme paging stuff should either be its own 
specialized collector, or a wrapper.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849200#action_12849200
 ] 

Shai Erera commented on LUCENE-2215:


So what's the motivation of declaring PagingCollector a TopDocsCollector? Would 
you envision one to request for a TopDocsCollector but don't care if it's TSDC, 
TFC or PagingCollector? I would rather have it extend TDC directly, and then 
you won't need to throw UOE for the rest of the methods ...

What about renaming it to TopScorePagingCollector?

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849384#action_12849384
 ] 

Shai Erera commented on LUCENE-2343:


In the patch you write: topDocOrdered - Creates a TopDocCollector that 
requires in order docs - did you mean TopScoreDocCollector? Because 
TopDocCollector is abstract ...

I think the following:
{code}
+  Class? extends Collector clazz = (Class? extends Collector) 
Class.forName(clnName);
+  collector = clazz.newInstance();
{code}
can be written as 
Class.forName(clnName).asSubclass(Collector.class).newInstance();

Also, and it's a style issue, can you remove the '== true/false' from ifs?

I'd change *if (clnName.equals() == false)* to *if (clnName.length()  0)*.

Why does benchmark/build.xml now relies on the compiled classes/test (of core)?

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849393#action_12849393
 ] 

Shai Erera commented on LUCENE-2343:


ok I won't argue about == true/false. It's a style thing and I'm not too 
fanatic about it :).

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849403#action_12849403
 ] 

Shai Erera commented on LUCENE-2343:


I wasn't talking about the name of the parameter but about the comment in the 
javadoc. TopDocsCollector is a typo - should have been TopScoreDocCollector. If 
you also want to change the name of the parameter in the .alg file that's ok as 
well, though I'm fine w/ topDocOrdered/Unordered.

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849404#action_12849404
 ] 

Shai Erera commented on LUCENE-2339:


Do we want to suppress only IOExceptions? What about any RuntimeExceptions - 
upon hitting any of them the code will fly away? Not saying it's a bad thing, 
but pointing it out.

Other than that, the patch looks good. closeSafely is not exactly what I had in 
mind about closeNoException because it forces you to catch the IOE if you don't 
declare you throw it, or you need to move on, discarding it. But I guess this 
is a matter for another issue. 

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch, 
 LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849416#action_12849416
 ] 

Shai Erera commented on LUCENE-2343:


Looks good !

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: LUCENE-2343.patch, LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2343) Add support for benchmarking Collectors

2010-03-24 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12849435#action_12849435
 ] 

Shai Erera commented on LUCENE-2343:


I've just realized you haven't added a CHANGES entry (and I missed that in my 
previous review, sorry).

 Add support for benchmarking Collectors
 ---

 Key: LUCENE-2343
 URL: https://issues.apache.org/jira/browse/LUCENE-2343
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2343.patch, LUCENE-2343.patch


 As the title says.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2342) DisjunctionSumScorer explain

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2342?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848560#action_12848560
 ] 

Shai Erera commented on LUCENE-2342:


Took me a while to spot the typo :). Can you reproduce a problem w/ a nice test 
case? So that we won't run into this issue in the future again.

 DisjunctionSumScorer explain
 

 Key: LUCENE-2342
 URL: https://issues.apache.org/jira/browse/LUCENE-2342
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Gary Yngve
Priority: Minor
   Original Estimate: 0.17h
  Remaining Estimate: 0.17h

 The bottom of the explain method in DisjunctionSumScorer says
 if (nrMatchers = minimumNrMatchers) {
 This is incorrect.. it should say
 if (nrMatches = minimumNrMatchers) {
 nrMatchers is the instance variable used for advancing, whereas nrMatches is 
 explain's local variable.
 Minor, because I don't think DSS's explain is ever called by anything 
 (BooleanWeight has its own explain)?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848565#action_12848565
 ] 

Shai Erera commented on LUCENE-2339:


I personally haven't seen problem using NIO on Windows, but that's perhaps just 
because I haven't run into them yet :). I think your proposal makes sense - 
let's start w/ NIO bulk-copy and then we can disable if people complain or 
report errors.

Consistency is important, I agree. So let's keep Collection there. I just 
wanted to avoid converting arrays to a Collection, just so that they can be 
iterated on. Seems a waste to me, but not so much to argue about :).

Re (7), I hate such libraries too. But I hate more the ones that just hide 
problems away from me :). The ideal thing was if Lucene would use a logging 
mechanism (I once started it on LUCENE-1482) so that you could include the 
stacktrace print if logging is enabled. But currently the code just hides the 
problem away ... and I'd hate to debug such thing, not realizing an IO 
exception is thrown from close().

So unless LUCENE-1482 springs back to life again, what do you suggest we do? 
Suppressing the exceptions seems wrong to me.

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1482) Replace infoSteram by a logging framework (SLF4J)

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848571#action_12848571
 ] 

Shai Erera commented on LUCENE-1482:


Well ... since Mark hasn't closed it yet (thanks Mark :)), I thought to try 
once more. Perhaps w/ the merge of Lucene/Solr this will look more reasonable 
now? I personally feel that just setting InfoStream on IW is not enough. I 
don't think we need to control logging per level either. I think it's important 
to introduce this in at least one of the following modes:
# We add SLF4J and allow the application to control logging per package(s), but 
the logging level won't matter - as long as it's not OFF, we log.
# We add a static factory LuceneLogger or something, which turns logging 
on/off, in which case all components/packages either log or not.

I think (1) gives us greater flexibility (us as in the apps developers), but 
(2) is also acceptable. As long as we can introduce logging messages from more 
components w/o passing infoStream around ... On LUCENE-2339 for example, a 
closeSafely method was added which suppresses IOExceptions that may be caused 
by io.close(). You cannot print the stacktrace because that would be 
unacceptable w/ products that are not allowed to print anything unless logging 
has been enabled, but on the other hand suppressing the exception is not good 
either ... in this case, a LuceneLogger could have helped because you could 
print the stacktrace if logging was enabled.

 Replace infoSteram by a logging framework (SLF4J)
 -

 Key: LUCENE-1482
 URL: https://issues.apache.org/jira/browse/LUCENE-1482
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-1482-2.patch, LUCENE-1482.patch, 
 slf4j-api-1.5.6.jar, slf4j-nop-1.5.6.jar


 Lucene makes use of infoStream to output messages in its indexing code only. 
 For debugging purposes, when the search application is run on the customer 
 side, getting messages from other code flows, like search, query parsing, 
 analysis etc can be extremely useful.
 There are two main problems with infoStream today:
 1. It is owned by IndexWriter, so if I want to add logging capabilities to 
 other classes I need to either expose an API or propagate infoStream to all 
 classes (see for example DocumentsWriter, which receives its infoStream 
 instance from IndexWriter).
 2. I can either turn debugging on or off, for the entire code.
 Introducing a logging framework can allow each class to control its logging 
 independently, and more importantly, allows the application to turn on 
 logging for only specific areas in the code (i.e., org.apache.lucene.index.*).
 I've investigated SLF4J (stands for Simple Logging Facade for Java) which is, 
 as it names states, a facade over different logging frameworks. As such, you 
 can include the slf4j.jar in your application, and it recognizes at deploy 
 time what is the actual logging framework you'd like to use. SLF4J comes with 
 several adapters for Java logging, Log4j and others. If you know your 
 application uses Java logging, simply drop slf4j.jar and slf4j-jdk14.jar in 
 your classpath, and your logging statements will use Java logging underneath 
 the covers.
 This makes the logging code very simple. For a class A the logger will be 
 instantiated like this:
 public class A {
   private static final logger = LoggerFactory.getLogger(A.class);
 }
 And will later be used like this:
 public class A {
   private static final logger = LoggerFactory.getLogger(A.class);
   public void foo() {
 if (logger.isDebugEnabled()) {
   logger.debug(message);
 }
   }
 }
 That's all !
 Checking for isDebugEnabled is very quick, at least using the JDK14 adapter 
 (but I assume it's fast also over other logging frameworks).
 The important thing is, every class controls its own logger. Not all classes 
 have to output logging messages, and we can improve Lucene's logging 
 gradually, w/o changing the API, by adding more logging messages to 
 interesting classes.
 I will submit a patch shortly

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848606#action_12848606
 ] 

Shai Erera commented on LUCENE-2339:


Sorry ... I was confused w/ the for loop of Java 5 :). Let's keep it Collection 
then. Sorry for the hassle.

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848636#action_12848636
 ] 

Shai Erera commented on LUCENE-2339:


I don't want to block the issue. If LUCENE-1482 will advance somewhere, we'll 
log a message in closeSafely. Otherwise between suppressing to always printing 
I agree we should suppress. If someone does not want to suppress he should call 
close(). Which makes me think we should call this method closeNoException 
because closeSafely is not exactly what it does :).

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848729#action_12848729
 ] 

Shai Erera commented on LUCENE-2339:


Mike, that's what I wrote above if someone does not want to suppress, he 
should call close. I think that closeSafely (or as I prefer it - 
closeNoException) should be closed only when you know you've hit an exception 
and you want to close the stream suppressing any exceptions. Otherwise call 
close().

bq. can we add a boolean arg (suppressExceptions) to control that?

That would beat the purpose of the method no? I mean, currently it does not 
throw any exception, not even declaring one, and if we add that boolean it will 
need to declare throws IOException, which will force the caller to try-catch 
that exception and ... suppress it or document // cannot happen because I've 
passed false?

So how about we call it closeNoException, document that it does not throw any 
exception and intentionally suppresses them, and if you don't want them to be 
suppressed, you can call io.close() yourself?

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848753#action_12848753
 ] 

Shai Erera commented on LUCENE-2339:


bq. But there is still a need to close everything, but do throw the 1st 
exception you hit.

Ohh I see what you mean. My assumption is that when you call closeNoException 
you already know that you've hit an exception and just want to close the stream 
w/o getting more exceptions. If you don't know that, don't call 
closeNoException?

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848777#action_12848777
 ] 

Shai Erera commented on LUCENE-2339:


Ok that's indeed different :). I guess we can introduce it now, in this issue 
(it's tiny and simple). A closeAll which documents it throws the first 
exception it hits.

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848896#action_12848896
 ] 

Shai Erera commented on LUCENE-2215:


I've reviewed PagingCollector.java and the first thing I have to say about it 
is that I really like it ! :) Saves lots of unnecessary heapify code, if the 
application can allow itself to store the lowest last SD.

I have few comments/questions.

I don't understand what getLastScoreDoc is for? Is it just a utility method? Is 
it something the app can compute by itself? Anyway, it lacks javadocs, so 
perhaps if they existed I wouldn't need to ask ;).

In collect(), there's the following code:
{code}
} else if (score == previousPassLowest.score  doc = 
previousPassLowest.doc) {
// if the scores are the same and the doc is less than 
or equal to
// the
// previous pass lowest hit doc then skip because this 
collector
// favors
// lower number documents.
return;
{code}

I think there's a typo in the comment favors lower number documents .. while 
it seems to prefer higher doc IDs? The way I understand it, irregardless of 
whether docs are collected in/out of order, HitQueue ensures that when scores 
are equals, the lowest IDs are favored. Thus the first round always keeps the 
lowest IDs among the docs whose scores match. The next round will favor the 
docs whose IDs come next, and so forth ... am I right? (just clarifying my 
understanding).
If that's the case, I think it'll be good if it's spelled out in the comment, 
and also mention that it means that document has already been returned 
previously (like it's documented in the previous 'if').

The last 'else' really looks like TSDC's out-of-order version, which makes me 
think whether PagingCollector can be viewed as a filter on top of TSDC (and 
possibly even TopFieldCollector)? So if a hit should be collected, it just 
calls super.collect? I realize though that a Collector is a hotspot and we want 
to minimize 'if' let alone method call statements as much as possible. But it 
just feels so strong that it should be a filter ... :). And you wouldn't need 
to specifically handle in/out orderness ... and w/ the right design, it can 
also wrap a TFC or any other TDC implementation ...

BTW, I've noticed that you don't track maxScore - is it assumed that the 
application stores it from the first round? If so I'd document it, because the 
application needs to know it should use TSDC the first round, and 
PagingCollector the second round.

Also, PagingCollector offers a ctor which does not force the application to 
pass in a ScoreDoc. See my comment from above - it might be misleading, because 
if you use this collector right from the very first search, you lose the 
maxScore tracking. I also don't see why it should be allowed - if a dummy 
previousPassLowest ScoreDoc is used, collect() does a lot of unnecessary 'if's. 
I think this collector should be used only from the second round, and a single 
ctor which forces a ScoreDoc to be passed would make more sense. If the 
application wishes to shoot itself in the leg (performance-wise), it can pass a 
dummy SD itself.

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, PagingCollector.java, 
 TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2215) paging collector

2010-03-23 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848908#action_12848908
 ] 

Shai Erera commented on LUCENE-2215:


I must admit I don't like throwing UOE. I imagine the naive user calling one of 
these and hit w/ UOE out of nowhere really :). Perhaps it's a sign 
PagingCollector should not be a sub-class of TopDocsCollector? It does not 
benefit from it in any way because it overrides all the main methods, impls 
them or throws UOE for those it doesn't like. So perhaps it should just be a 
TopScorePagingCollector which copies some of the functionality of TSDC, but is 
not a TDC itself. It will have a topDocs() method, and only it (b/c I agree the 
rest don't make any sense).

Notice the different name I propose - to make it clear it's a collector that 
can be used for paging through a scored list of results.

I BTW liked that the if/else clauses were separated, b/c you could include 
meaningful documentation for each. Right now those are just very long lines.

About in-order, I think the only thing you will save is the last 'else'. Read 
my comment above about wrapping TSDC ... not sure about it, but it will make it 
more elegant.

I'll review the rest of the patch. Didn't yet understand what's PagingIterable 
for ...

 paging collector
 

 Key: LUCENE-2215
 URL: https://issues.apache.org/jira/browse/LUCENE-2215
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Search
Affects Versions: 2.4, 3.0
Reporter: Adam Heinz
Assignee: Grant Ingersoll
Priority: Minor
 Attachments: IterablePaging.java, LUCENE-2215.patch, 
 PagingCollector.java, TestingPagingCollector.java


 http://issues.apache.org/jira/browse/LUCENE-2127?focusedCommentId=12796898page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12796898
 Somebody assign this to Aaron McCurry and we'll see if we can get enough 
 votes on this issue to convince him to upload his patch.  :)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy

2010-03-22 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2331:
---

Attachment: LUCENE-2331.patch

Sorry - new eclipse and project settings :). Should be ok now.

 Add NoOpMergePolicy
 ---

 Key: LUCENE-2331
 URL: https://issues.apache.org/jira/browse/LUCENE-2331
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2331.patch, LUCENE-2331.patch


 I'd like to add a simple and useful MP implementation which does  nothing 
 ! :). I've came across many places where either the following is documented 
 or implemented: if you want to prevent merges, set mergeFactor to a high 
 enough value. I think a NoOpMergePolicy is just as good, and can REALLY 
 allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL).
 As such, NoOpMergePolicy will be introduced as a singleton, and can be used 
 for convenience purposes only. Also, for Parallel Index it's important, 
 because I'd like the slices to never do any merges, unless ParallelWriter 
 decides so. So they should be set w/ that MP.
 I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need 
 to change it afterwards.
 About the name - I like the name, but suggestions are welcome. I thought of a 
 NullMergePolicy, but I don't like 'Null' used for a NoOp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2331) Add NoOpMergePolicy

2010-03-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848113#action_12848113
 ] 

Shai Erera commented on LUCENE-2331:


bq. do you think we should allow instantiation of NoMergePolicy, allowing you 
to control if it uses CFS or not?

You ask because of the useCompound* methods? I wanted NMP to be a singleton 
really, and I don't think those two really matter? Meaning, if you are using 
it, I guess you don't really care if it uses a cmpnd file or not?

But if you think it's important, I can create 3 singletons: 
NO_COMPOUND_FILES_AND_STORE, COMPOUND_FILES, COMPOUND_FILES_AND_STORE (I really 
hate the long names though). We can settle w/ just two - (NO)COMPOUND_FILES ...

 Add NoOpMergePolicy
 ---

 Key: LUCENE-2331
 URL: https://issues.apache.org/jira/browse/LUCENE-2331
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2331.patch, LUCENE-2331.patch


 I'd like to add a simple and useful MP implementation which does  nothing 
 ! :). I've came across many places where either the following is documented 
 or implemented: if you want to prevent merges, set mergeFactor to a high 
 enough value. I think a NoOpMergePolicy is just as good, and can REALLY 
 allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL).
 As such, NoOpMergePolicy will be introduced as a singleton, and can be used 
 for convenience purposes only. Also, for Parallel Index it's important, 
 because I'd like the slices to never do any merges, unless ParallelWriter 
 decides so. So they should be set w/ that MP.
 I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need 
 to change it afterwards.
 About the name - I like the name, but suggestions are welcome. I thought of a 
 NullMergePolicy, but I don't like 'Null' used for a NoOp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy

2010-03-22 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2331:
---

Attachment: LUCENE-2331.patch

Patch includes NoMergePolicy.NO_COMPOUND_FILES and COMPOUND_FILES singletons.

 Add NoOpMergePolicy
 ---

 Key: LUCENE-2331
 URL: https://issues.apache.org/jira/browse/LUCENE-2331
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2331.patch, LUCENE-2331.patch, LUCENE-2331.patch


 I'd like to add a simple and useful MP implementation which does  nothing 
 ! :). I've came across many places where either the following is documented 
 or implemented: if you want to prevent merges, set mergeFactor to a high 
 enough value. I think a NoOpMergePolicy is just as good, and can REALLY 
 allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL).
 As such, NoOpMergePolicy will be introduced as a singleton, and can be used 
 for convenience purposes only. Also, for Parallel Index it's important, 
 because I'd like the slices to never do any merges, unless ParallelWriter 
 decides so. So they should be set w/ that MP.
 I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need 
 to change it afterwards.
 About the name - I like the name, but suggestions are welcome. I thought of a 
 NullMergePolicy, but I don't like 'Null' used for a NoOp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2331) Add NoOpMergePolicy

2010-03-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848192#action_12848192
 ] 

Shai Erera commented on LUCENE-2331:


I think it's correct. The idea is to say that even w/ NMP, if you use NMS you 
ensure that no MS code is ever run (e.g. if you use NMP only, then CMS code 
[default] will always run but won't do anything).

 Add NoOpMergePolicy
 ---

 Key: LUCENE-2331
 URL: https://issues.apache.org/jira/browse/LUCENE-2331
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2331.patch, LUCENE-2331.patch, LUCENE-2331.patch


 I'd like to add a simple and useful MP implementation which does  nothing 
 ! :). I've came across many places where either the following is documented 
 or implemented: if you want to prevent merges, set mergeFactor to a high 
 enough value. I think a NoOpMergePolicy is just as good, and can REALLY 
 allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL).
 As such, NoOpMergePolicy will be introduced as a singleton, and can be used 
 for convenience purposes only. Also, for Parallel Index it's important, 
 because I'd like the slices to never do any merges, unless ParallelWriter 
 decides so. So they should be set w/ that MP.
 I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need 
 to change it afterwards.
 About the name - I like the name, but suggestions are welcome. I thought of a 
 NullMergePolicy, but I don't like 'Null' used for a NoOp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848341#action_12848341
 ] 

Shai Erera commented on LUCENE-2328:


Earwin, can you add a deprecation message to sync(String)? When I upgraded from 
2.9 to 3.0 some methods were deprecated w/o any explanation as to what I should 
use instead. I think a message like @deprecated use #sync(Collection) instead. 
For easy migration you can change your code to call 
sync(Colllections.singleton(name)) ... or something along those lines.

Other than that, patch looks great! I really like the code cleanup from IW.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Assignee: Michael McCandless
Priority: Minor
 Fix For: 3.1

 Attachments: LUCENE-2328.patch, LUCENE-2328.patch, LUCENE-2328.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2339) Allow Directory.copy() to accept a collection of file names to be copied

2010-03-22 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12848376#action_12848376
 ] 

Shai Erera commented on LUCENE-2339:


Patch looks good! Few comments:

# is it safe to use NIO for all FSDirs? I thought that on Windows NIO has some 
bugs/limitations. In that case, would it be safer if just NIOFSDir used NIO?
# Can copyTo(Directory, CollectionString) be changed to copyTo(Directory, 
IterableString)? Unless we think that someone would want to use size() or 
something.
# I know it's a matter of style, but you import static Arrays.asList, and 
then use asList directly in copyTo(Dir). It confuses me because I expect asList 
to be a method declared on Dir, and so I prefer to see Arrays.asList. But it's 
just style, don't know how others feel about that.
# On copyTo(Dir), perhaps instead of converting the listAll() to List and then 
remove elements from it, you can just iterate on whatever listAll() returns and 
add the files that pass the filter to a list? You can even optimize and if all 
the files Dir returned pass the filter, you can just pass the array to 
copyTo(Dir, Iterable), assuming we change the method to accept Iterable. But 
that's a minor optimization.
# copy(src, dest, boolean) - can you add a message to @deprecated so users will 
know what to replace it with more easily?
# I see that copy(src, dest) also accepts a boolean of whether to close the src 
directory. But copyTo(dIr) doesn't. I personally think it's ok, as someone can 
call close on src himself, but am wondering if it wouldn't be more convenient. 
I.e. instead of change calls from Directory.copy(src, dest, true), I now need 
to do src.copyTo(dest) followed by a src.close().
# closeSafely - perhaps print the stacktrace, even if you don't throw it?

 Allow Directory.copy() to accept a collection of file names to be copied
 

 Key: LUCENE-2339
 URL: https://issues.apache.org/jira/browse/LUCENE-2339
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Earwin Burrfoot
Assignee: Michael McCandless
 Attachments: LUCENE-2339.patch, LUCENE-2339.patch


 Par example, I want to copy files pertaining to a certain commit, and not 
 everything there is in a Directory.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2337) DisjunctionSumScorer and ScorerDocQueue javadocs and one method name out of date after move from skipTo() to advance()

2010-03-21 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847916#action_12847916
 ] 

Shai Erera commented on LUCENE-2337:


Note that -1 is a valid return value in case doc() is called before nextDoc(). 
However it is not valid for nextDoc() and advance().

 DisjunctionSumScorer and ScorerDocQueue javadocs and one method name out of 
 date after move from skipTo() to advance()
 --

 Key: LUCENE-2337
 URL: https://issues.apache.org/jira/browse/LUCENE-2337
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Javadocs, Search
Reporter: Paul Elschot
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2337.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2333) Failures during contrib builds, when classes in core were changed without ant clean

2010-03-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2333?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847415#action_12847415
 ] 

Shai Erera commented on LUCENE-2333:


This up-to-date thingy looks really  cool and useful. So I guess you'd compare 
the .jar date and the build/classes/java date? This is sort of what javac does 
when it decides which classes to compile ... I guess.

 Failures during contrib builds, when classes in core were changed without ant 
 clean
 ---

 Key: LUCENE-2333
 URL: https://issues.apache.org/jira/browse/LUCENE-2333
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1

 Attachments: LUCENE-2333.patch, shai-compile-fix.patch, 
 shai-compile-fix2.patch


 From java-dev by Shai Erera:
 {quote}
 I've noticed that sometimes, after I run test-core and test-contrib, and then 
 change core code, test-contrib fail on NoSuchMethodError and stuff like that. 
 I've noticed that core.jar exists under build, and I assumed it's used by 
 test-contrib, and probably is not recreated after core code has changed.
 I verified it when looking in contrib-build.xml, which defines a property 
 lucene.jar.present which is set to true if the jar is ... well, present. 
 Which I believe is the reason for these failures. I've been thinking how to 
 resolve that, and I can think of two ways:
 (1) have test-core always delete that file, but that has two issues:
 (1.1) It's redundant if the code hasn't changed.
 (1.2) It forces you to either jar-core or test-core before you test-contrib, 
 if you want to make sure you run w/ the latest jar.
 or
 (2) have test-contrib always call jar-core, which will first delete the file 
 and then re-create it by compiling first. Compiling should not do anything if 
 the code hasn't changed. So the only waste would be to create the .jar, but I 
 think that's quite fast?
 Does anyone, with more Ant skills than me, know of a better way to detect 
 from test-contrib that core code has changed and only then rebuild the jar?
 {quote}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847448#action_12847448
 ] 

Shai Erera commented on LUCENE-2328:


Earwin, I agree that sub-classing FSDir is not that easy. So I guess you'll add 
another piece of jdoc to createOutput, to notify Dir when it's closed? This 
seems reasonable.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2328) IndexWriter.synced field accumulates data leading to a Memory Leak

2010-03-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847585#action_12847585
 ] 

Shai Erera commented on LUCENE-2328:


bq. Trying to sync a file that hasn't yet been closed will be undefined

Can we avoid 'undefined'? We have an issue open about SegmentInfos.fileLength() 
not clearly defined and it causes confusion. If it's undefined, then someone 
might attempt to call sync before he closes the file, and only then close ... 
can we throw an exception in that case?

We can have close(), sync() and closeAndSync(). Would the latter make sense?

I prefer if the API will be explicit,, and I think that throwing an exception 
(StillOpenException?) if sync() is called before close() is very explicit, and 
reasonable if accompanied by a proper jdoc.

 IndexWriter.synced  field accumulates data leading to a Memory Leak
 ---

 Key: LUCENE-2328
 URL: https://issues.apache.org/jira/browse/LUCENE-2328
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.1, 2.9.2, 3.0, 3.0.1
 Environment: all
Reporter: Gregor Kaczor
Priority: Minor
 Fix For: 3.1

   Original Estimate: 1h
  Remaining Estimate: 1h

 I am running into a strange OutOfMemoryError. My small test application does
 index and delete some few files. This is repeated for 60k times. Optimization
 is run from every 2k times a file is indexed. Index size is 50KB. I did 
 analyze
 the HeapDumpFile and realized that IndexWriter.synced field occupied more than
 half of the heap. That field is a private HashSet without a getter. Its task 
 is
 to hold files which have been synced already.
 There are two calls to addAll and one call to add on synced but no remove or
 clear throughout the lifecycle of the IndexWriter instance.
 According to the Eclipse Memory Analyzer synced contains 32618 entries which
 look like file names _e065_1.del or _e067.cfs
 The index directory contains 10 files only.
 I guess synced is holding obsolete data 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2331) Add NoOpMergePolicy

2010-03-19 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2331:
---

Attachment: LUCENE-2331.patch

Patch includes:
* NoMergePolicy + TestNoMergePolicy
* NoMergeScheduler + TestNoMergeScheduler
* MergeScheduler - methods changed to public
* CHANGES entry (New Features)

 Add NoOpMergePolicy
 ---

 Key: LUCENE-2331
 URL: https://issues.apache.org/jira/browse/LUCENE-2331
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index
Reporter: Shai Erera
 Fix For: 3.1

 Attachments: LUCENE-2331.patch


 I'd like to add a simple and useful MP implementation which does  nothing 
 ! :). I've came across many places where either the following is documented 
 or implemented: if you want to prevent merges, set mergeFactor to a high 
 enough value. I think a NoOpMergePolicy is just as good, and can REALLY 
 allow you disable merges (except for maybe set mergeFactor to Int.MAX_VAL).
 As such, NoOpMergePolicy will be introduced as a singleton, and can be used 
 for convenience purposes only. Also, for Parallel Index it's important, 
 because I'd like the slices to never do any merges, unless ParallelWriter 
 decides so. So they should be set w/ that MP.
 I have a patch ready. Waiting for LUCENE-2320 to go in, so that I don't need 
 to change it afterwards.
 About the name - I like the name, but suggestions are welcome. I thought of a 
 NullMergePolicy, but I don't like 'Null' used for a NoOp.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2336) off by one: DisjunctionSumScorer::advance

2010-03-19 Thread Shai Erera (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12847716#action_12847716
 ] 

Shai Erera commented on LUCENE-2336:


Hi Gary

This has been discussed before (I'm not sure if about DisjunctionSumScorer 
specifically), and therefore there is also a NOTE in advance() of DISI:
{code}
   * bNOTE:/b certain implementations may return a different value (each
   * time) if called several times in a row with the same target.
{code}
Note the *may return a different value...* part. I remember while working on 
LUCENE-1614 that this has been discussed and thus we ended up w/ documenting 
that *may return* part. See here: 
https://issues.apache.org/jira/browse/LUCENE-1614?focusedCommentId=12710860page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12710860
 and read some above and below to see relevant discussion.

I'll need to refresh my memory though why DisjunctionSumScorer works like that 
... perhaps an overlook on my side from 1614, but perhaps there was a reason.

Anyway, about the code example you gave above, why would you want to call 
advance w/ the same value many times? What's the use case? If you're only 
dealing w/ one DISI, then unless you really want to skip to a certain document, 
I don't see any reason for calling advance. The usage is typically if you have 
2 or more DISIs, and one's nextDoc or advance returned a value that is greater 
than the other's doc() ...

Also, it's risky to write the code you wrote, because some scorers, upon init 
are already on a certain doc (I think the Disj. ones, but maybe also the Conj. 
one), and so by calling advance(1), you will actually *skip* over the first 
document and miss a hit.

Can you clarify the usage then?

 off by one: DisjunctionSumScorer::advance
 -

 Key: LUCENE-2336
 URL: https://issues.apache.org/jira/browse/LUCENE-2336
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Gary Yngve
Priority: Minor
   Original Estimate: 4h
  Remaining Estimate: 4h

 The bug is:
 if (target = currentDoc) {
 should be
 if (target  currentDoc) {
 based on the comments for the method as well as the contract for 
 DocIdSetIterator: Advances to the first beyond the current
 It can be demonstrated by:
   assertEquals(advance(1) first match failed, 1, 
 scorer.advance(1));
   assertEquals(advance(1) second match failed, n, 
 scorer.advance(1));
 if docId: 1 is a hit and n is the next hit.  (Tests all pass if this code 
 change is made.)
 I'm not labeling it as major because the class is package-protected and 
 currently passes spec.
 Relevant excerpt:
  /**
* Advances to the first match beyond the current whose document number is
* greater than or equal to a given target. br
* When this method is used the {...@link #explain(int)} method should not 
 be
* used. br
* The implementation uses the skipTo() method on the subscorers.
* 
* @param target
*  The target document number.
* @return the document whose number is greater than or equal to the given
* target, or -1 if none exist.
*/
   public int advance(int target) throws IOException {
 if (scorerDocQueue.size()  minimumNrMatchers) {
   return currentDoc = NO_MORE_DOCS;
 }
 if (target = currentDoc) {
   return currentDoc;
 }
 do {
   if (scorerDocQueue.topDoc() = target) {
 boolean b = advanceAfterCurrent();
 return b ? currentDoc : (currentDoc = NO_MORE_DOCS);
   } else if (!scorerDocQueue.topSkipToAndAdjustElsePop(target)) {
 if (scorerDocQueue.size()  minimumNrMatchers) {
   return currentDoc = NO_MORE_DOCS;
 }
   }
 } while (true);
   }

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2320) Add MergePolicy to IndexWriterConfig

2010-03-18 Thread Shai Erera (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2320?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shai Erera updated LUCENE-2320:
---

Attachment: LUCENE-2320.patch

Fixed a copy-paste comment error in IndexWriter (introduced in LUCENE-2294).

 Add MergePolicy to IndexWriterConfig
 

 Key: LUCENE-2320
 URL: https://issues.apache.org/jira/browse/LUCENE-2320
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Shai Erera
Assignee: Michael McCandless
 Fix For: 3.1

 Attachments: LUCENE-2320.patch, LUCENE-2320.patch, LUCENE-2320.patch, 
 LUCENE-2320.patch, LUCENE-2320.patch


 Now that IndexWriterConfig is in place, I'd like to move MergePolicy to it as 
 well. The change is not straightforward and so I've kept it for a separate 
 issue. MergePolicy requires in its ctor an IndexWriter, however none can be 
 passed to it before an IndexWriter actually exists. And today IW may create 
 an MP just for it to be overridden by the application one line afterwards. I 
 don't want to make iw member of MP non-final, or settable by extending 
 classes, however it needs to remain protected so they can access it directly. 
 So the proposed changes are:
 * Add a SetOnce object (to o.a.l.util), or Immutable, which can only be set 
 once (hence its name). It'll have the signature SetOnceT w/ *synchronized 
 setT* and *T get()*. T will be declared volatile, so that get() won't be 
 synchronized.
 * MP will define a *protected final SetOnceIndexWriter writer* instead of 
 the current writer. *NOTE: this is a bw break*. any suggestions are welcomed.
 * MP will offer a public default ctor, together with a set(IndexWriter).
 * IndexWriter will set itself on MP using set(this). Note that if set will be 
 called more than once, it will throw an exception (AlreadySetException - or 
 does someone have a better suggestion, preferably an already existing Java 
 exception?).
 That's the core idea. I'd like to post a patch soon, so I'd appreciate your 
 review and proposals.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



  1   2   3   4   5   6   7   8   >