Re: Do you need help on Lucy?

2010-06-19 Thread Marvin Humphrey
On Sat, Jun 19, 2010 at 07:00:56AM +0200, VOITPTRPTR wrote:
 How one can contribute  (I'm a C developer) to Lucy?

We have a wiki page up that covers the mechanics of contributing:

http://wiki.apache.org/lucy/HowToContribute

Most people choose what they want to work on based on what they need or what
interests them.  Since you don't mention wanting to work on a particular
problem, there are a some general C tasks we could use help on and that don't
require a lot of prior knowledge about the code base; I'll describe one of
those.

A lot of Lucy code was originally written for C89.  We have since changed our
C dialect to the overlap of C99 and C++, allowing us to use a number of
idioms which result in cleaner, more readable code.  One of these is the
declaration of loop variables within a for construct:

Index: core/Lucy/Object/VArray.c
===
--- core/Lucy/Object/VArray.c(revision 956160)
+++ core/Lucy/Object/VArray.c(working copy)
@@ -55,8 +55,7 @@
 VA_dump(VArray *self)
 {
 VArray *dump = VA_new(self-size);
-uint32_t i, max;
-for (i = 0, max = self-size; i  max; i++) {
+for (uint32_t i = 0, max = self-size; i  max; i++) {
 Obj *elem = VA_Fetch(self, i);
 if (elem) { VA_Store(dump, i, Obj_Dump(elem)); }
 }

A good place to start would be that file, VArray.c.

Thanks for inquiring,

Marvin Humphrey



[Lucy Wiki] Update of HowToContribute by MarvinHumphr ey

2010-06-19 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Lucy Wiki for change 
notification.

The HowToContribute page has been changed by MarvinHumphrey.
The comment on this change is: Remove mention of an obsolete C89 requirement..
http://wiki.apache.org/lucy/HowToContribute?action=diffrev1=2rev2=3

--

  
  Modify the source code using your favorite text editor or IDE.  Please take 
the following points into account:
  
-  * All code will eventually need to be portable to multiple operating systems 
and compilers.  This is a complex requirement and it should not block your 
contribution, but the most helpful thing you an do up front is declare C 
variables at the top of each block, C89-style.
+  * All code will eventually need to be portable to multiple operating systems 
and compilers.  (This is a complex requirement and it should not block your 
contribution.)
   * All public APIs should be accompanied by informative documentation.
   * Code should be formatted according to the style guidelines at 
LucyStyleGuide.
   * Contributions should pass existing unit tests.


Re: Do you need help on Lucy?

2010-06-19 Thread VOITPTRPTR
Hi Marvin,

 We have a wiki page up that covers the mechanics of contributing:
 http://wiki.apache.org/lucy/HowToContribute

Excellent. I'll checkout the code and start reading on monday!

 Most people choose what they want to work on based on what they need or what
 interests them.  Since you don't mention wanting to work on a particular
 problem, there are a some general C tasks we could use help on and that don't
 require a lot of prior knowledge about the code base; I'll describe one of
 those.

To be honest, I've no knowledge on how Lucene core is implemented.
But I'm pretty confident when coding in C (portabilty, clarity, refactoring 
...).
More over I've strong background on algorithms and algorithms optimization.

 A lot of Lucy code was originally written for C89.  We have since changed our
 C dialect to the overlap of C99 and C++.

 allowing us to use a number of idioms which result in cleaner, more readable 
 code.  One of these is the
 declaration of loop variables within a for construct:
 
Index: core/Lucy/Object/VArray.c
===
--- core/Lucy/Object/VArray.c(revision 956160)
+++ core/Lucy/Object/VArray.c(working copy)
@@ -55,8 +55,7 @@
 VA_dump(VArray *self)
 {
 VArray *dump = VA_new(self-size);
-uint32_t i, max;
-for (i = 0, max = self-size; i  max; i++) {
+for (uint32_t i = 0, max = self-size; i  max; i++) {
 Obj *elem = VA_Fetch(self, i);
 if (elem) { VA_Store(dump, i, Obj_Dump(elem)); }
 }
 
 A good place to start would be that file, VArray.c.
 Thanks for inquiring,

OK, I see.

Is it possible to ask questions about design choices of Lucy (how indexes are 
built, algorithms
behind the scene...) in this mailing list as I'm missing this Information 
Retrieval skills?

Regards
--
voidptr...@gmail.com

[jira] Updated: (LUCY-114) compile failure on OS X 10.6

2010-06-19 Thread Peter Karman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCY-114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Karman updated LUCY-114:
--

Attachment: align_signature.patch

The inline patch came out garbled. Same patch attached.

 compile failure on OS X 10.6
 

 Key: LUCY-114
 URL: https://issues.apache.org/jira/browse/LUCY-114
 Project: Lucy
  Issue Type: Bug
  Components: Core - Store
 Environment: Mac OS X 10.6
Reporter: Peter Karman
 Attachments: align_signature.patch


 I get this error when trying to compile under OS X:
 ../core/Lucy/Store/OutStream.c:125: error: conflicting types for 
 'lucy_OutStream_align'
 autogen/Lucy/Store/OutStream.h:55: error: previous declaration of 
 'lucy_OutStream_align' was here
 patch below:
 Index: core/Lucy/Store/OutStream.bp
 ===
 --- core/Lucy/Store/OutStream.bp  (revision 925442)
 +++ core/Lucy/Store/OutStream.bp  (revision 925443)
 @@ -42,7 +42,7 @@
   *
   * @return the new file position.
   */
 -final i64_t
 +final int64_t
  Align(OutStream *self, int64_t modulus);
  
  /** Flush output buffer to target FileHandle.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (LUCY-115) nullable attribute not propagated

2010-06-19 Thread Peter Karman (JIRA)
nullable attribute not propagated
-

 Key: LUCY-115
 URL: https://issues.apache.org/jira/browse/LUCY-115
 Project: Lucy
  Issue Type: Bug
  Components: Clownfish
 Environment: OS X 10.6
Reporter: Peter Karman


In Clownfish files (.bp) such as KinoSearch/Search/Compiler.bp, certain methods 
are defined
as nullable but that nullable attribute is not being propagated to the 
_OVERRIDE generated
code.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (LUCY-116) Build.PL opts not supported

2010-06-19 Thread Peter Karman (JIRA)
Build.PL opts not supported
---

 Key: LUCY-116
 URL: https://issues.apache.org/jira/browse/LUCY-116
 Project: Lucy
  Issue Type: Bug
  Components: Perl bindings
 Environment: OS X 10.6
Reporter: Peter Karman


The Build.PL docs claim that --config cc= should work but it does not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (LUCY-116) Build.PL opts not supported

2010-06-19 Thread Peter Karman (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCY-116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Karman updated LUCY-116:
--

Attachment: get_cc.patch
pass_cc.patch

The attached patches implement the documented --config cc feature.

 Build.PL opts not supported
 ---

 Key: LUCY-116
 URL: https://issues.apache.org/jira/browse/LUCY-116
 Project: Lucy
  Issue Type: Bug
  Components: Perl bindings
 Environment: OS X 10.6
Reporter: Peter Karman
 Attachments: get_cc.patch, pass_cc.patch


 The Build.PL docs claim that --config cc= should work but it does not.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Uwe Schindler
Hi Koji,

- FieldCacheImpl.getStringIndex() no longer throws an exception when
  term count exceeds doc count.
 
 
 I think it is LUCENE-2142, but after it was fixed, getStringIndex() still
throws
 AIOOBE? Am I missing something?

I have seen you wrote a comment to 2142 on June 7, we have overseen this.
You should have reopened it and stop the release vote :(

Uwe


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

2010-06-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2142:
--

Attachment: LUCENE-2142-fix.patch

After a coffee i have seen the problem, too - stupoid :(

Here is the fix for 3.x (also 3.0 and 2.9) - in trunk the fix is not needed, as 
there are growable arrays. Maybe we should add a simple test to all branches!



 FieldCache.getStringIndex should not throw exception if term count exceeds 
 doc count
 

 Key: LUCENE-2142
 URL: https://issues.apache.org/jira/browse/LUCENE-2142
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9.3, 3.0.2, 3.1, 4.0

 Attachments: LUCENE-2142-fix.patch


 Spinoff of LUCENE-2133/LUCENE-831.
 Currently FieldCache cannot handle more than one value per field.
 We may someday want to fix that... but until that day:
 FieldCache.getStringIndex currently does a simplistic check to try to
 catch when you've accidentally allowed more than one term per field,
 by testing if the number of unique terms exceeds the number of
 documents.
 The problem is, this is not a perfect check, in that it allows false
 negatives (you could have more than one term per field for some docs
 and the check won't catch you).
 Further, the exception thrown is the unchecked RuntimeException.
 So this means... you could happily think all is good, until some day,
 well into production, once you've updated enough docs, suddenly the
 check will catch you and throw an unhandled exception, stopping all
 searches [that need to sort by this string field] in their tracks.
 It's not gracefully degrading.
 I think we should simply remove the test, ie, if you have more terms
 than docs then the terms simply overwrite one another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Koji Sekiguchi

(10/06/19 15:36), Uwe Schindler wrote:

Hi Koji,

   

   - FieldCacheImpl.getStringIndex() no longer throws an exception when
term count exceeds doc count.

   

I think it is LUCENE-2142, but after it was fixed, getStringIndex() still
 

throws
   

AIOOBE? Am I missing something?
 

I have seen you wrote a comment to 2142 on June 7, we have overseen this.
You should have reopened it and stop the release vote :(

Uwe

   

Yeah. I should do that, but when vote going, I simply forgot the issue.
Then I read your release announce, it reminded me of the issue.
I'm sorry about that...

Koji

--
http://www.rondhuit.com/en/


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

2010-06-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2142:
--

Attachment: (was: LUCENE-2142-fix.patch)

 FieldCache.getStringIndex should not throw exception if term count exceeds 
 doc count
 

 Key: LUCENE-2142
 URL: https://issues.apache.org/jira/browse/LUCENE-2142
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9.3, 3.0.2, 3.1, 4.0

 Attachments: LUCENE-2142-fix-3x.patch, LUCENE-2142-fix-trunk.patch


 Spinoff of LUCENE-2133/LUCENE-831.
 Currently FieldCache cannot handle more than one value per field.
 We may someday want to fix that... but until that day:
 FieldCache.getStringIndex currently does a simplistic check to try to
 catch when you've accidentally allowed more than one term per field,
 by testing if the number of unique terms exceeds the number of
 documents.
 The problem is, this is not a perfect check, in that it allows false
 negatives (you could have more than one term per field for some docs
 and the check won't catch you).
 Further, the exception thrown is the unchecked RuntimeException.
 So this means... you could happily think all is good, until some day,
 well into production, once you've updated enough docs, suddenly the
 check will catch you and throw an unhandled exception, stopping all
 searches [that need to sort by this string field] in their tracks.
 It's not gracefully degrading.
 I think we should simply remove the test, ie, if you have more terms
 than docs then the terms simply overwrite one another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2142) FieldCache.getStringIndex should not throw exception if term count exceeds doc count

2010-06-19 Thread Uwe Schindler (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uwe Schindler updated LUCENE-2142:
--

Attachment: LUCENE-2142-fix-3x.patch
LUCENE-2142-fix-trunk.patch

Here patch with test for 3.x and before. Trunk patch only contains test, which 
passes.

 FieldCache.getStringIndex should not throw exception if term count exceeds 
 doc count
 

 Key: LUCENE-2142
 URL: https://issues.apache.org/jira/browse/LUCENE-2142
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
Priority: Minor
 Fix For: 2.9.3, 3.0.2, 3.1, 4.0

 Attachments: LUCENE-2142-fix-3x.patch, LUCENE-2142-fix-trunk.patch


 Spinoff of LUCENE-2133/LUCENE-831.
 Currently FieldCache cannot handle more than one value per field.
 We may someday want to fix that... but until that day:
 FieldCache.getStringIndex currently does a simplistic check to try to
 catch when you've accidentally allowed more than one term per field,
 by testing if the number of unique terms exceeds the number of
 documents.
 The problem is, this is not a perfect check, in that it allows false
 negatives (you could have more than one term per field for some docs
 and the check won't catch you).
 Further, the exception thrown is the unchecked RuntimeException.
 So this means... you could happily think all is good, until some day,
 well into production, once you've updated enough docs, suddenly the
 check will catch you and throw an unhandled exception, stopping all
 searches [that need to sort by this string field] in their tracks.
 It's not gracefully degrading.
 I think we should simply remove the test, ie, if you have more terms
 than docs then the terms simply overwrite one another.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Uwe Schindler
Mike, Koji: The release is out, but should I maybe simple remove the
announcement line (simply strike it out) on the lucene.apache.org pages, so
nobody expects this to be fixed really?

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Saturday, June 19, 2010 10:19 AM
 To: dev@lucene.apache.org
 Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
 
 No problem, I fixed it now, see patches. For trunk, this was not an issue,
but
 for 3x, 3.0 and 2.9.
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
  Sent: Saturday, June 19, 2010 10:11 AM
  To: dev@lucene.apache.org
  Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
 
  (10/06/19 15:36), Uwe Schindler wrote:
   Hi Koji,
  
  
  - FieldCacheImpl.getStringIndex() no longer throws an exception
   when term count exceeds doc count.
  
  
   I think it is LUCENE-2142, but after it was fixed, getStringIndex()
   still
  
   throws
  
   AIOOBE? Am I missing something?
  
   I have seen you wrote a comment to 2142 on June 7, we have overseen
  this.
   You should have reopened it and stop the release vote :(
  
   Uwe
  
  
  Yeah. I should do that, but when vote going, I simply forgot the issue.
  Then I read your release announce, it reminded me of the issue.
  I'm sorry about that...
 
  Koji
 
  --
  http://www.rondhuit.com/en/
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Michael McCandless
OK, I think just removing the text claiming this is fixed, is good?

Mike

On Sat, Jun 19, 2010 at 5:29 AM, Uwe Schindler u...@thetaphi.de wrote:
 Mike, Koji: The release is out, but should I maybe simple remove the
 announcement line (simply strike it out) on the lucene.apache.org pages, so
 nobody expects this to be fixed really?

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Saturday, June 19, 2010 10:19 AM
 To: dev@lucene.apache.org
 Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

 No problem, I fixed it now, see patches. For trunk, this was not an issue,
 but
 for 3x, 3.0 and 2.9.

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
  Sent: Saturday, June 19, 2010 10:11 AM
  To: dev@lucene.apache.org
  Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
 
  (10/06/19 15:36), Uwe Schindler wrote:
   Hi Koji,
  
  
      - FieldCacheImpl.getStringIndex() no longer throws an exception
   when term count exceeds doc count.
  
  
   I think it is LUCENE-2142, but after it was fixed, getStringIndex()
   still
  
   throws
  
   AIOOBE? Am I missing something?
  
   I have seen you wrote a comment to 2142 on June 7, we have overseen
  this.
   You should have reopened it and stop the release vote :(
  
   Uwe
  
  
  Yeah. I should do that, but when vote going, I simply forgot the issue.
  Then I read your release announce, it reminded me of the issue.
  I'm sorry about that...
 
  Koji
 
  --
  http://www.rondhuit.com/en/
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3

2010-06-19 Thread Uwe Schindler
It will not disappear in changes.txt, but at least it should not be so
prominent.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Michael McCandless [mailto:luc...@mikemccandless.com]
 Sent: Saturday, June 19, 2010 11:38 AM
 To: dev@lucene.apache.org
 Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
 
 OK, I think just removing the text claiming this is fixed, is good?
 
 Mike
 
 On Sat, Jun 19, 2010 at 5:29 AM, Uwe Schindler u...@thetaphi.de wrote:
  Mike, Koji: The release is out, but should I maybe simple remove the
  announcement line (simply strike it out) on the lucene.apache.org
  pages, so nobody expects this to be fixed really?
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Saturday, June 19, 2010 10:19 AM
  To: dev@lucene.apache.org
  Subject: RE: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
 
  No problem, I fixed it now, see patches. For trunk, this was not an
  issue,
  but
  for 3x, 3.0 and 2.9.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Koji Sekiguchi [mailto:k...@r.email.ne.jp]
   Sent: Saturday, June 19, 2010 10:11 AM
   To: dev@lucene.apache.org
   Subject: Re: [ANNOUNCE] Release of Lucene Java 3.0.2 and 2.9.3
  
   (10/06/19 15:36), Uwe Schindler wrote:
Hi Koji,
   
   
   - FieldCacheImpl.getStringIndex() no longer throws an
exception when term count exceeds doc count.
   
   
I think it is LUCENE-2142, but after it was fixed,
getStringIndex() still
   
throws
   
AIOOBE? Am I missing something?
   
I have seen you wrote a comment to 2142 on June 7, we have
overseen
   this.
You should have reopened it and stop the release vote :(
   
Uwe
   
   
   Yeah. I should do that, but when vote going, I simply forgot the
issue.
   Then I read your release announce, it reminded me of the issue.
   I'm sorry about that...
  
   Koji
  
   --
   http://www.rondhuit.com/en/
  
  
   ---
   -- To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
   additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For
  additional commands, e-mail: dev-h...@lucene.apache.org
 
 
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org For additional
 commands, e-mail: dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-06-19 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated LUCENE-2380:
-

Attachment: LUCENE-2380_direct_arr_access.patch

This patch adds the ability to get at the raw arrays from the Direct* classes, 
and using those fixes the performance regressions in the fc faceting I was 
seeing.

To do this, it adds this to DocTermsIndex.  Anyone have a better solution?
{code}
/** @lucene.internal */
public abstract PackedInts.Reader getDocToOrd();
{code}


 Add FieldCache.getTermBytes, to load term data as byte[]
 

 Key: LUCENE-2380
 URL: https://issues.apache.org/jira/browse/LUCENE-2380
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, 
 LUCENE-2380.patch, LUCENE-2380_direct_arr_access.patch, 
 LUCENE-2380_enum.patch, LUCENE-2380_enum.patch


 With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
 string, but not necessarily), so we need to push this up the search stack.
 FieldCache now has getStrings and getStringIndex; we need corresponding 
 methods to load terms as native byte[], since in general they may not be 
 representable as String.  This should be quite a bit more RAM efficient too, 
 for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2380) Add FieldCache.getTermBytes, to load term data as byte[]

2010-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880483#action_12880483
 ] 

Yonik Seeley commented on LUCENE-2380:
--

It was really tricky performance testing this.

If I started solr and tested one type of faceting exclusively, the performance 
impact of going through the new FieldCache interfaces (PackedInts for ord 
lookup) was relatively minimal.

However, I had a simple script that tested the different variants (the 4 in the 
table above)... and using that resulted in the bigger slowdowns.

The script would do the following:
{code}
1) test 100 iterations of facet.method=fc on the 100,000 term field
2) test 10 iterations of facet.method=fcs on the 100,000 term field
3) test 100 iterations of facet.method=fc on the 100 term field
4) test 10 iterations of facet.method=fcs on the 100 term field
{code}

I would run the script a few times, making sure the numbers stabilized and were 
repeatable.

Testing #1 alone resulted in trunk slowing down ~ 4%
Testing #1 along with any single other test: same small slowdown of ~4%
Running the complete script: slowdown of 33-38% for #1 (as well as others)
When running the complete script, the first run of Test #1 was always the 
best... as if the JVM correctly specialized it, but then discarded it later, 
never to return.

So: you can't always depend on the JVM being able to inline stuff for you, and 
it seems very hard to determine when it can.
This obviously has implications for the lucene benchmarker too.


 Add FieldCache.getTermBytes, to load term data as byte[]
 

 Key: LUCENE-2380
 URL: https://issues.apache.org/jira/browse/LUCENE-2380
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2380.patch, LUCENE-2380.patch, LUCENE-2380.patch, 
 LUCENE-2380.patch, LUCENE-2380_direct_arr_access.patch, 
 LUCENE-2380_enum.patch, LUCENE-2380_enum.patch


 With flex, a term is now an opaque byte[] (typically, utf8 encoded unicode 
 string, but not necessarily), so we need to push this up the search stack.
 FieldCache now has getStrings and getStringIndex; we need corresponding 
 methods to load terms as native byte[], since in general they may not be 
 representable as String.  This should be quite a bit more RAM efficient too, 
 for US ascii content since each character would then use 1 byte not 2.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1911) File descriptor leak while indexing, may cause index corruption

2010-06-19 Thread Simon Rosenthal (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1911?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880490#action_12880490
 ] 

Simon Rosenthal commented on SOLR-1911:
---

No - seems to have cleared up with trunk also,.

I'm OK with closing it but am really curious to know what changed between mid 
May and today to clear up the problem.

 File descriptor leak while indexing, may cause index corruption
 ---

 Key: SOLR-1911
 URL: https://issues.apache.org/jira/browse/SOLR-1911
 Project: Solr
  Issue Type: Bug
  Components: update
Affects Versions: 1.5
 Environment: Ubuntu Linux, Java build 1.6.0_16-b01
 Solr Specification Version: 3.0.0.2010.05.12.16.17.46
   Solr Implementation Version: 4.0-dev exported - simon - 2010-05-12 
 16:17:46  -- bult from updated trunk
   Lucene Specification Version: 4.0-dev
   Lucene Implementation Version: 4.0-dev exported - 2010-05-12 16:18:26
   Current Time: Thu May 13 12:21:12 EDT 2010
   Server Start Time:Thu May 13 11:45:41 EDT 2010
Reporter: Simon Rosenthal
Priority: Critical
 Attachments: indexlsof.tar.gz, openafteropt.txt


 While adding documents to an already existing index using this build, the 
 number of open file descriptors increases dramatically until the open file 
 per-process limit is reached (1024) , at which point there are error messages 
 in the log to that effect. If the server is restarted the index may be corrupt
 commits are handled by autocommit every 60 seconds or 500 documents (usually 
 the time limit is reached first). 
 mergeFactor is 10.
 It looks as though each time a commit takes place, the number of open files  
 (obtained from  lsof -p `cat solr.pid` | egrep ' [0-9]+r ' ) increases by 
 40, There are several open file descriptors associated with each file in the 
 index.
 Rerunning the same index updates with an older Solr (built from trunk in Feb 
 2010) doesn't show this problem - the number of open files fluctuates up and 
 down as segments are created and merged, but stays basically constant.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1965) Solr 4.0 performance improvements

2010-06-19 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1965:
---

Attachment: SOLR-1965.patch

Here's the patch for facet.method=fc (single valued) that uses the latest patch 
in LUCENE-2378 to fix the performance regression.

 Solr 4.0 performance improvements
 -

 Key: SOLR-1965
 URL: https://issues.apache.org/jira/browse/SOLR-1965
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-1965.patch


 Catch-all performance improvement issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2504) sorting performance regression

2010-06-19 Thread Yonik Seeley (JIRA)
sorting performance regression
--

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880541#action_12880541
 ] 

Yonik Seeley commented on LUCENE-2504:
--

More numbers:  Ubuntu, Java 1.7.0-ea-b98 (64 bit):
f10_s sort only: 126 ms
sort against random field: 175 ms

 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2504) sorting performance regression

2010-06-19 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12880542#action_12880542
 ] 

Yonik Seeley commented on LUCENE-2504:
--

More numbers: Windows 7:
java version 1.6.0_17
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) 64-Bit Server VM (build 14.3-b01, mixed mode)

f10_s sort only: 115 ms
sort against random field: 162 ms 

 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Doppleganger threads after ingestion completed

2010-06-19 Thread Lance Norskog
Chewing up cpu or blocked. The stack trace says it's blocked.

The sockets are abandoned by the program, yes, but TCP/IP itself has a
complex sequence for shutting down sockets that takes a few minutes.
If these sockets stay around for hours, then there's a real problem.
(In fact, there is a bug in the TCP/IP specification, 40 years old,
that causes zombie sockets that never shut down.)

The HTTP solr server really needs a socket close() method.

On Thu, Jun 17, 2010 at 6:08 AM,  karl.wri...@nokia.com wrote:
 Folks,

 I ran 20,000,000 records into Solr via the extractingUpdateRequestHandler
 under jetty.  The previous problems with resources have apparently been
 resolved by using Http1.1 with keep-alive, rather than creating and
 destroying 20,000,000 sockets. ;-)  However, after the client terminates, I
 still find the Solr process chewing away CPU – indeed, there were 5 threads
 doing this.

 A thread dump yields the following partial trace for all 5 threads:

 btpool0-13 prio=10 tid=0x41391000 nid=0xe7c runnable
 [0x7f4a8c789000]
    java.lang.Thread.State: RUNNABLE
     at
 org.mortbay.jetty.HttpParser$Input.blockForContent(HttpParser.java:925)
     at org.mortbay.jetty.HttpParser$Input.read(HttpParser.java:897)
     at
 org.apache.commons.fileupload.MultipartStream$ItemInputStream.makeAvailable(MultipartStream.java:977)
     at
 org.apache.commons.fileupload.MultipartStream$ItemInputStream.close(MultipartStream.java:924)
     at
 org.apache.commons.fileupload.MultipartStream$ItemInputStream.close(MultipartStream.java:904)
     at org.apache.commons.fileupload.util.Streams.copy(Streams.java:119)
     at org.apache.commons.fileupload.util.Streams.copy(Streams.java:64)
     at
 org.apache.commons.fileupload.FileUploadBase.parseRequest(FileUploadBase.java:362)
     at
 org.apache.commons.fileupload.servlet.ServletFileUpload.parseRequest(ServletFileUpload.java:126)
     at
 org.apache.solr.servlet.MultipartRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:343)
     at
 org.apache.solr.servlet.StandardRequestParser.parseParamsAndFillStreams(SolrRequestParsers.java:396)
     at
 org.apache.solr.servlet.SolrRequestParsers.parse(SolrRequestParsers.java:114)
     at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:229)
     at
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
 …

 I could be wrong, but it looks to me like either jetty or fileupload may
 have a problem here.  I have not looked at the jetty source code, but
 infinitely spinning processes even after the socket has been abandoned do
 not seem reasonable to me.  Thoughts?

 Karl





-- 
Lance Norskog
goks...@gmail.com

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org