Re: update doc by query

2010-01-11 Thread Michael McCandless
On Sun, Jan 10, 2010 at 6:13 PM, Sanne Grinovero
s.grinov...@sourcesense.com wrote:
 Even if it's not strictly needed anymore, could it improve performance?

I think there should be no real performance gains/losses one way or another.

The current updateDocument call basically boils down to delete then add.

 Right now I need to use commit() right after this dual operation to
 make sure no reader is ever going to miss it

You don't need to use commit() right after -- you can use commit any
time later and both the del  add will be present.

 but if it was atomic I
 could have avoided the commit and just trust that at some time later
 it will be auto-committed: exact moment would be out of my control,
 but even so the view on index wouldn't have a chance to miss some
 documents.

Lucene no longer auto-commits -- your app completely controls when to
commit, so, I think the atomic-ness is unecessary?

Mike

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: update doc by query

2010-01-11 Thread Sanne Grinovero
Then I wouldn't need it and can still improve performance by using
periodic commits, nice!
thanks for explaining this,

Sanne

On Mon, Jan 11, 2010 at 10:57 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Sun, Jan 10, 2010 at 6:13 PM, Sanne Grinovero
 s.grinov...@sourcesense.com wrote:
 Even if it's not strictly needed anymore, could it improve performance?

 I think there should be no real performance gains/losses one way or another.

 The current updateDocument call basically boils down to delete then add.

 Right now I need to use commit() right after this dual operation to
 make sure no reader is ever going to miss it

 You don't need to use commit() right after -- you can use commit any
 time later and both the del  add will be present.

 but if it was atomic I
 could have avoided the commit and just trust that at some time later
 it will be auto-committed: exact moment would be out of my control,
 but even so the view on index wouldn't have a chance to miss some
 documents.

 Lucene no longer auto-commits -- your app completely controls when to
 commit, so, I think the atomic-ness is unecessary?

 Mike

 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org





-- 
Sanne Grinovero
http://in.relation.to/Bloggers/Sanne
Sourcesense - making sense of Open  Source: http://www.sourcesense.com

-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: update doc by query

2010-01-11 Thread Michael McCandless
Also, if the only reason why you're committing is so a reader can see
the changes (ie, you don't need so much safety), you should use
IndexWriter.getReader instead.

commit is really only needed for safety (ie known recovery points on
crash), or, for cases where the reader must be opened in a different
JVM than the writer.

Mike

On Mon, Jan 11, 2010 at 4:57 AM, Michael McCandless
luc...@mikemccandless.com wrote:
 On Sun, Jan 10, 2010 at 6:13 PM, Sanne Grinovero
 s.grinov...@sourcesense.com wrote:
 Even if it's not strictly needed anymore, could it improve performance?

 I think there should be no real performance gains/losses one way or another.

 The current updateDocument call basically boils down to delete then add.

 Right now I need to use commit() right after this dual operation to
 make sure no reader is ever going to miss it

 You don't need to use commit() right after -- you can use commit any
 time later and both the del  add will be present.

 but if it was atomic I
 could have avoided the commit and just trust that at some time later
 it will be auto-committed: exact moment would be out of my control,
 but even so the view on index wouldn't have a chance to miss some
 documents.

 Lucene no longer auto-commits -- your app completely controls when to
 commit, so, I think the atomic-ness is unecessary?

 Mike


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2181) benchmark for collation

2010-01-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798655#action_12798655
 ] 

Robert Muir commented on LUCENE-2181:
-

bq. I just ran the contrib/benchmark tests, and I got one test failure: 

OK, I think this is from adjusting the ReadTokensTask...

I looked at the test and i think it  should be improved :)

 benchmark for collation
 ---

 Key: LUCENE-2181
 URL: https://issues.apache.org/jira/browse/LUCENE-2181
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/benchmark
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 LUCENE-2181.patch, LUCENE-2181.patch, 
 top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2


 Steven Rowe attached a contrib/benchmark-based benchmark for collation (both 
 jdk and icu) under LUCENE-2084, along with some instructions to run it... 
 I think it would be a nice if we could turn this into a committable patch and 
 add it to benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2181) benchmark for collation

2010-01-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2181:


Attachment: LUCENE-2181.patch

corrected this testReadTokens(), it tests by adding up token freq across the 
index and comparing it to the number of tokens read, but there is a 
non-tokenized, but indexed field (DocMaker.ID_FIELD), the keywords from this 
should not add to the expected count.

added your addt'l parameter checking for NewCollationAnalyzerTask


 benchmark for collation
 ---

 Key: LUCENE-2181
 URL: https://issues.apache.org/jira/browse/LUCENE-2181
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/benchmark
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2


 Steven Rowe attached a contrib/benchmark-based benchmark for collation (both 
 jdk and icu) under LUCENE-2084, along with some instructions to run it... 
 I think it would be a nice if we could turn this into a committable patch and 
 add it to benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2181) benchmark for collation

2010-01-11 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798679#action_12798679
 ] 

Steven Rowe commented on LUCENE-2181:
-

+1, tests all pass, and ant collation produced expected output.

One minor detail, though - shouldn't the output files be renamed to identify 
their purpose, similarly to how you renamed bm2jira.pl?  Here's the relevant 
section in {{contrib/benchmark/build.txt}}:

{code:xml}
property name=collation.output.file 
value=${working.dir}/benchmark.output.txt/
property name=collation.jira.output.file  
value=${working.dir}/bm2jira.output.txt/
{code}


 benchmark for collation
 ---

 Key: LUCENE-2181
 URL: https://issues.apache.org/jira/browse/LUCENE-2181
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/benchmark
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2


 Steven Rowe attached a contrib/benchmark-based benchmark for collation (both 
 jdk and icu) under LUCENE-2084, along with some instructions to run it... 
 I think it would be a nice if we could turn this into a committable patch and 
 add it to benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2203) improved snowball testing

2010-01-11 Thread Robert Muir (JIRA)
improved snowball testing
-

 Key: LUCENE-2203
 URL: https://issues.apache.org/jira/browse/LUCENE-2203
 Project: Lucene - Java
  Issue Type: Test
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor


Snowball project has test vocabulary files for each language in their svn 
repository, along with expected output.

We should use these tests to ensure all languages are working correctly, and it 
might be helpful in the future for identifying back breaks/changes if we ever 
want to upgrade snowball, etc.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2203) improved snowball testing

2010-01-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2203:


Attachment: LUCENE-2203.patch

attached is a patch that does an svn checkout of rev 500 (which is what we are 
using) of these test files, and has a test class to check every language.

the test for all languages is a total of 2 seconds on my computer (fast)

it takes about 10 seconds to do the svn checkout: since this stuff is 
BSD-licensed we could comment out the hook to do the checkout and instead 
actually commit the files if we want, i think as long as we do the proper 
NOTICE stuff, etc.

Please note two languages are commented out: Finnish and Lovins, these 
currently fail. I only investigated this enough to determine that it wasn't my 
LUCENE-2194 performance improvement commit that broke these, they were broken 
with the previous revision too. we should probably get to the bottom of these.


 improved snowball testing
 -

 Key: LUCENE-2203
 URL: https://issues.apache.org/jira/browse/LUCENE-2203
 Project: Lucene - Java
  Issue Type: Test
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2203.patch


 Snowball project has test vocabulary files for each language in their svn 
 repository, along with expected output.
 We should use these tests to ensure all languages are working correctly, and 
 it might be helpful in the future for identifying back breaks/changes if we 
 ever want to upgrade snowball, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2203) improved snowball testing

2010-01-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798737#action_12798737
 ] 

Robert Muir commented on LUCENE-2203:
-

its worth mentioning for the two broken languages: Finnish and Lovins, that 
they use some snowball operations none of the others do.
So I think its not gonna be too bad to get to the bottom of this.

 improved snowball testing
 -

 Key: LUCENE-2203
 URL: https://issues.apache.org/jira/browse/LUCENE-2203
 Project: Lucene - Java
  Issue Type: Test
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2203.patch


 Snowball project has test vocabulary files for each language in their svn 
 repository, along with expected output.
 We should use these tests to ensure all languages are working correctly, and 
 it might be helpful in the future for identifying back breaks/changes if we 
 ever want to upgrade snowball, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2201) more performance improvements for snowball

2010-01-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2201?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798748#action_12798748
 ] 

Robert Muir commented on LUCENE-2201:
-

all tests from LUCENE-2203 pass with this patch... (it does not change any 
snowball behavior).

I will also update the patch to additionally make member variables in Among 
final, consistent with what has already happened in Snowball: 
http://svn.tartarus.org/snowball/trunk/snowball/java/org/tartarus/snowball/Among.java?view=diffr1=267r2=502diff_format=h


 more performance improvements for snowball
 --

 Key: LUCENE-2201
 URL: https://issues.apache.org/jira/browse/LUCENE-2201
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2201.patch


 i took a more serious look at snowball after LUCENE-2194.
 This gives greatly improved performance, but note it has some minor breaks to 
 snowball internals:
 * Among.s becomes a char[] instead of a string
 * SnowballProgram.current becomes a char[] instead of a StringBuilder
 * SnowballProgram.eq_s(int, String) becomes eq_s(int, CharSequence), so that 
 eq_v(StringBuilder) doesnt need to create an extra string.
 * same as the above with eq_s_b and eq_v_b
 * replace_s(int, int, String) becomes replace_s(int, int, CharSequence), so 
 that StringBuilder-based slice and insertion methods don't need to create an 
 extra string.
 all of these breaks imho are only theoretical, the problem is just that 
 pretty much everything is public or protected in the snowball internals.
 the performance improvement here depends heavily upon the snowball language 
 in use, but its way more significant than LUCENE-2194.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2181) benchmark for collation

2010-01-11 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2181:


Attachment: LUCENE-2181.patch

Steven thanks, in addition to your comments I also changed the config to 
download the top100k files to the temp directory, and expand to 
work/top100k-out, consistent with the other benchmark datasets.

I think this one is ready. If there is no objection I will commit in a day or 
two.

 benchmark for collation
 ---

 Key: LUCENE-2181
 URL: https://issues.apache.org/jira/browse/LUCENE-2181
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/benchmark
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2


 Steven Rowe attached a contrib/benchmark-based benchmark for collation (both 
 jdk and icu) under LUCENE-2084, along with some instructions to run it... 
 I think it would be a nice if we could turn this into a committable patch and 
 add it to benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2203) improved snowball testing

2010-01-11 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798795#action_12798795
 ] 

Simon Willnauer commented on LUCENE-2203:
-

Robert, those test seem to be very extensive - thats good!
But honestly think we should make those tests optional in some way. The files 
you are downloading are very large and might be an issues for some folks. The 
filesize is over 70MB which is a lot for a test. I need to thing about this a 
little and come up with some suggestions.

 improved snowball testing
 -

 Key: LUCENE-2203
 URL: https://issues.apache.org/jira/browse/LUCENE-2203
 Project: Lucene - Java
  Issue Type: Test
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2203.patch


 Snowball project has test vocabulary files for each language in their svn 
 repository, along with expected output.
 We should use these tests to ensure all languages are working correctly, and 
 it might be helpful in the future for identifying back breaks/changes if we 
 ever want to upgrade snowball, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2203) improved snowball testing

2010-01-11 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798800#action_12798800
 ] 

Robert Muir commented on LUCENE-2203:
-

Simon, these files are large (70MB) but so is the 65MB reuters corpus that the 
benchmark test downloads

 improved snowball testing
 -

 Key: LUCENE-2203
 URL: https://issues.apache.org/jira/browse/LUCENE-2203
 Project: Lucene - Java
  Issue Type: Test
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2203.patch


 Snowball project has test vocabulary files for each language in their svn 
 repository, along with expected output.
 We should use these tests to ensure all languages are working correctly, and 
 it might be helpful in the future for identifying back breaks/changes if we 
 ever want to upgrade snowball, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Compound File Default

2010-01-11 Thread Grant Ingersoll
Should we really still be defaulting to true for setUseCompoundFile?  Do people 
still run out of file handles?  If so, why not have them turn it on, instead of 
everyone else having to turn it off.   

-Grant
-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Compound File Default

2010-01-11 Thread Michael McCandless
+1

I think we should make it Version dependent...

Mike

On Mon, Jan 11, 2010 at 3:20 PM, Grant Ingersoll gsing...@apache.org wrote:
 Should we really still be defaulting to true for setUseCompoundFile?  Do 
 people still run out of file handles?  If so, why not have them turn it on, 
 instead of everyone else having to turn it off.

 -Grant
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Compound File Default

2010-01-11 Thread Otis Gospodnetic
+1.  I never liked having the compound format be the default, since increasing 
the max # of open file handles is a well documented thing, at least in the UNIX 
world.

 Otis
--
Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch



- Original Message 
 From: Grant Ingersoll gsing...@apache.org
 To: java-dev@lucene.apache.org
 Sent: Mon, January 11, 2010 3:20:17 PM
 Subject: Compound File Default
 
 Should we really still be defaulting to true for setUseCompoundFile?  Do 
 people 
 still run out of file handles?  If so, why not have them turn it on, instead 
 of 
 everyone else having to turn it off.  
 
 -Grant
 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Compound File Default

2010-01-11 Thread Marvin Humphrey
On Mon, Jan 11, 2010 at 03:20:17PM -0500, Grant Ingersoll wrote:
 Should we really still be defaulting to true for setUseCompoundFile?  Do
 people still run out of file handles?

Yep.  You're going to smack up against that limit pretty quick on Mac OS X:

mar...@smokey:~ $ ulimit -n
256

 If so, why not have them turn it on, instead of everyone else having to turn
 it off.   

Can you up the file descriptor limit from within a running JVM?

If not, you're setting yourself up with a non-portable default.

Marvin Humphrey


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



Re: Compound File Default

2010-01-11 Thread Jason Rutherglen
Maybe the default can be conditional on the platform like NIOFSDirectory.

On Mon, Jan 11, 2010 at 1:25 PM, Marvin Humphrey mar...@rectangular.com wrote:
 On Mon, Jan 11, 2010 at 03:20:17PM -0500, Grant Ingersoll wrote:
 Should we really still be defaulting to true for setUseCompoundFile?  Do
 people still run out of file handles?

 Yep.  You're going to smack up against that limit pretty quick on Mac OS X:

    mar...@smokey:~ $ ulimit -n
    256

 If so, why not have them turn it on, instead of everyone else having to turn
 it off.

 Can you up the file descriptor limit from within a running JVM?

 If not, you're setting yourself up with a non-portable default.

 Marvin Humphrey


 -
 To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: java-dev-h...@lucene.apache.org



-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2203) improved snowball testing

2010-01-11 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12798957#action_12798957
 ] 

Uwe Schindler commented on LUCENE-2203:
---

The revision no. in the svn co works exactly like I proposed in LUCENE-2193 
for the BW tests in lucene core.

 improved snowball testing
 -

 Key: LUCENE-2203
 URL: https://issues.apache.org/jira/browse/LUCENE-2203
 Project: Lucene - Java
  Issue Type: Test
  Components: contrib/analyzers
Reporter: Robert Muir
Priority: Minor
 Attachments: LUCENE-2203.patch


 Snowball project has test vocabulary files for each language in their svn 
 repository, along with expected output.
 We should use these tests to ensure all languages are working correctly, and 
 it might be helpful in the future for identifying back breaks/changes if we 
 ever want to upgrade snowball, etc.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2204) FastVectorHighlighter: some classes and members should be publicly accessible

2010-01-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-2204:
---

Summary: FastVectorHighlighter: some classes and members should be publicly 
accessible  (was: FastVectorHighlighter: )

 FastVectorHighlighter: some classes and members should be publicly accessible
 -

 Key: LUCENE-2204
 URL: https://issues.apache.org/jira/browse/LUCENE-2204
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.1


 I intended to design custom FragmentsBuilder can be written and pluggable, 
 though, when I tried to write it out of the FVH package, it came out that 
 some classes and members should be publicly accessible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2204) FastVectorHighlighter:

2010-01-11 Thread Koji Sekiguchi (JIRA)
FastVectorHighlighter: 
---

 Key: LUCENE-2204
 URL: https://issues.apache.org/jira/browse/LUCENE-2204
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 3.0, 2.9.1, 2.9
Reporter: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.1


I intended to design custom FragmentsBuilder can be written and pluggable, 
though, when I tried to write it out of the FVH package, it came out that some 
classes and members should be publicly accessible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2204) FastVectorHighlighter: some classes and members should be publicly accessible to implement FragmentsBuilder

2010-01-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-2204:
---

Summary: FastVectorHighlighter: some classes and members should be publicly 
accessible to implement FragmentsBuilder  (was: FastVectorHighlighter: some 
classes and members should be publicly accessible)

 FastVectorHighlighter: some classes and members should be publicly accessible 
 to implement FragmentsBuilder
 ---

 Key: LUCENE-2204
 URL: https://issues.apache.org/jira/browse/LUCENE-2204
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.1


 I intended to design custom FragmentsBuilder can be written and pluggable, 
 though, when I tried to write it out of the FVH package, it came out that 
 some classes and members should be publicly accessible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2204) FastVectorHighlighter: some classes and members should be publicly accessible to implement FragmentsBuilder

2010-01-11 Thread Koji Sekiguchi (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Koji Sekiguchi updated LUCENE-2204:
---

Attachment: LUCENE-2204.patch

A patch attached. It includes reset methods for Tokenizer that is used by test 
code. I'll commit later today these trivial changes.

 FastVectorHighlighter: some classes and members should be publicly accessible 
 to implement FragmentsBuilder
 ---

 Key: LUCENE-2204
 URL: https://issues.apache.org/jira/browse/LUCENE-2204
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/highlighter
Affects Versions: 2.9, 2.9.1, 3.0
Reporter: Koji Sekiguchi
Priority: Trivial
 Fix For: 3.1

 Attachments: LUCENE-2204.patch


 I intended to design custom FragmentsBuilder can be written and pluggable, 
 though, when I tried to write it out of the FVH package, it came out that 
 some classes and members should be publicly accessible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2181) benchmark for collation

2010-01-11 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12799056#action_12799056
 ] 

Steven Rowe commented on LUCENE-2181:
-

+1, once again, tests all pass, and ant collation produced expected output. 

 benchmark for collation
 ---

 Key: LUCENE-2181
 URL: https://issues.apache.org/jira/browse/LUCENE-2181
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/benchmark
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, LUCENE-2181.patch, 
 top.100k.words.de.en.fr.uk.wikipedia.2009-11.tar.bz2


 Steven Rowe attached a contrib/benchmark-based benchmark for collation (both 
 jdk and icu) under LUCENE-2084, along with some instructions to run it... 
 I think it would be a nice if we could turn this into a committable patch and 
 add it to benchmark.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: java-dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: java-dev-h...@lucene.apache.org