[jira] Updated: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard

2010-09-15 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe updated LUCENE-2167:


Attachment: LUCENE-2167.patch

Updated to trunk.  All tests pass.  Documentation improved at package and class 
level.  modules/analysis/CHANGES.txt entry included.

I think this is ready to commit.

 Implement StandardTokenizer with the UAX#29 Standard
 

 Key: LUCENE-2167
 URL: https://issues.apache.org/jira/browse/LUCENE-2167
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Shyamal Prasad
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-lucene-buildhelper-maven-plugin.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 standard.zip, StandardTokenizerImpl.jflex

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 It would be really nice for StandardTokenizer to adhere straight to the 
 standard as much as we can with jflex. Then its name would actually make 
 sense.
 Such a transition would involve renaming the old StandardTokenizer to 
 EuropeanTokenizer, as its javadoc claims:
 bq. This should be a good tokenizer for most European-language documents
 The new StandardTokenizer could then say
 bq. This should be a good tokenizer for most languages.
 All the english/euro-centric stuff like the acronym/company/apostrophe stuff 
 can stay with that EuropeanTokenizer, and it could be used by the european 
 analyzers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1395) Integrate Katta

2010-09-15 Thread jianfeng zheng (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909654#action_12909654
 ] 

jianfeng zheng commented on SOLR-1395:
--

You are so nice, Mathias

  I am using the MultiShard Distributed Search of Solr, and also let Katta 
chose a node for a shard. I found there is only one proxy object in KattaClient 
for each Katta node, lock it will solve the problem you post on 18/Aug. but it 
will lead to each node works as single thread. 

 

 Integrate Katta
 ---

 Key: SOLR-1395
 URL: https://issues.apache.org/jira/browse/SOLR-1395
 Project: Solr
  Issue Type: New Feature
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: Next

 Attachments: back-end.log, front-end.log, hadoop-core-0.19.0.jar, 
 katta-core-0.6-dev.jar, katta.node.properties, katta.zk.properties, 
 log4j-1.2.13.jar, solr-1395-1431-3.patch, solr-1395-1431-4.patch, 
 solr-1395-1431-katta0.6.patch, solr-1395-1431-katta0.6.patch, 
 solr-1395-1431.patch, SOLR-1395.patch, SOLR-1395.patch, SOLR-1395.patch, 
 test-katta-core-0.6-dev.jar, zkclient-0.1-dev.jar, zookeeper-3.2.1.jar

   Original Estimate: 336h
  Remaining Estimate: 336h

 We'll integrate Katta into Solr so that:
 * Distributed search uses Hadoop RPC
 * Shard/SolrCore distribution and management
 * Zookeeper based failover
 * Indexes may be built using Hadoop

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2122) How to escape the special character in the Apache solr example between strings

2010-09-15 Thread JAYABAALAN V (JIRA)
How to escape the special character in the Apache solr example  between 
strings
--

 Key: SOLR-2122
 URL: https://issues.apache.org/jira/browse/SOLR-2122
 Project: Solr
  Issue Type: Bug
  Components: web gui
Affects Versions: 1.4
 Environment: Linux Environment
Reporter: JAYABAALAN V
Priority: Critical


Having a special character  between the string like Arts  Culture.If the 
user the selecting this value in web gui we need to display the corresponding 
records from the solr.
http://localhost:8983/solr/select/?q=rsprimarysub:Arts\Culturefl=rsprimarysubdebugQuery=true

Error Message
HTTP ERROR: 400
org.apache.lucene.queryParser.ParseException: Cannot parse 
'rsprimarysub:Arts\': Lexical error at line 1, column 19.  Encountered: EOF 
after : 
RequestURI=/solr/select/

Powered by Jetty://

Do provide inputs for the same.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2622) Random Test Failure org.apache.lucene.TestExternalCodecs.testPerFieldCodec (from TestExternalCodecs)

2010-09-15 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909686#action_12909686
 ] 

Simon Willnauer commented on LUCENE-2622:
-

It seems that we figured out whats going on here. The problem seem to be the 
optimization done in LUCENE-2588 where we strip off the non-distinguishing 
suffix to save RAM in the loaded terms index. The problem with this 
optimization is that it is not safe for all comparators. The testcase runs with 
a reverse unicode comparator which triggers terms to appear in reverse order 
during indexing. 
Yet, this is not a problem until we have run into the situations where the the 
stripped suffix is required due to the nature of the comparator. In this case 
here we index number  from 0 - 173 and with the randomly set termIndexInterval 
set to 54 we run into a situation where the indexing code was wrong about the 
prefix. It sees the term 49 with prior term 5 and thinks it could strip of 
the 9 from the previous term and uses 4 as the indexed term. 

Once we seek on the terms dictionary the binary search in 
CoreFieldIndex#getIndexOffset we try to find the indexedTerm prior to term 44 
we compare to 4 which returns -1 while comparing to 49 would have yield 1. 
That lets us end up with the wrong offset and the assert blows up.

We somehow need to have access to the actually used comparator during building 
the indexed terms to fix that - I will reopen LUCENE-2588

 Random Test Failure org.apache.lucene.TestExternalCodecs.testPerFieldCodec 
 (from TestExternalCodecs)
 

 Key: LUCENE-2622
 URL: https://issues.apache.org/jira/browse/LUCENE-2622
 Project: Lucene - Java
  Issue Type: Bug
  Components: Tests
Reporter: Mark Miller
Priority: Minor

 Error Message
 state.ord=54 startOrd=0 ir.isIndexTerm=true state.docFreq=1
 Stacktrace
 junit.framework.AssertionFailedError: state.ord=54 startOrd=0 
 ir.isIndexTerm=true state.docFreq=1
   at 
 org.apache.lucene.index.codecs.standard.StandardTermsDictReader$FieldReader$SegmentTermsEnum.seek(StandardTermsDictReader.java:395)
   at 
 org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:1099)
   at 
 org.apache.lucene.index.DocumentsWriter.applyDeletes(DocumentsWriter.java:1028)
   at 
 org.apache.lucene.index.IndexWriter.applyDeletes(IndexWriter.java:4213)
   at 
 org.apache.lucene.index.IndexWriter.doFlushInternal(IndexWriter.java:3381)
   at org.apache.lucene.index.IndexWriter.doFlush(IndexWriter.java:3221)
   at org.apache.lucene.index.IndexWriter.flush(IndexWriter.java:3211)
   at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2345)
   at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2323)
   at org.apache.lucene.index.IndexWriter.optimize(IndexWriter.java:2293)
   at 
 org.apache.lucene.TestExternalCodecs.testPerFieldCodec(TestExternalCodecs.java:645)
   at 
 org.apache.lucene.util.LuceneTestCase.runBare(LuceneTestCase.java:381)
   at org.apache.lucene.util.LuceneTestCase.run(LuceneTestCase.java:373)
 Standard Output
 NOTE: random codec of testcase 'testPerFieldCodec' was: 
 MockFixedIntBlock(blockSize=1327)
 NOTE: random locale of testcase 'testPerFieldCodec' was: lt_LT
 NOTE: random timezone of testcase 'testPerFieldCodec' was: Africa/Lusaka
 NOTE: random seed of testcase 'testPerFieldCodec' was: 812019387131615618

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Reopened: (LUCENE-2588) terms index should not store useless suffixes

2010-09-15 Thread Simon Willnauer (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Willnauer reopened LUCENE-2588:
-


Reopening this because this optimization is not safe for all BytesRef 
comparators. Reverse unicode order already breaks it. See LUCENE-2622 for 
details. The non-distinguishable suffix must be determined by the actually used 
Comparator otherwise the assumption might be wrong for non-standard sort order.

 terms index should not store useless suffixes
 -

 Key: LUCENE-2588
 URL: https://issues.apache.org/jira/browse/LUCENE-2588
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2588.patch, LUCENE-2588.patch


 This idea came up when discussing w/ Robert how to improve our terms index...
 The terms dict index today simply grabs whatever term was at a 0 mod 128 
 index (by default).
 But this is wasteful because you often don't need the suffix of the term at 
 that point.
 EG if the 127th term is aa and the 128th (indexed) term is abcd123456789, 
 instead of storing that full term you only need to store ab.  The suffix is 
 useless, and uses up RAM since we load the terms index into RAM.
 The patch is very simple.  The optimization is particularly easy because 
 terms are now byte[] and we sort in binary order.
 I tested on first 10M 1KB Wikipedia docs, and this reduces the terms index 
 (tii) file from 3.9 MB - 3.3 MB = 16% smaller (using StandardAnalyzer, 
 indexing body field tokenized but title / date fields untokenized).  I expect 
 on noisier terms dicts, especially ones w/ bad terms accidentally indexed, 
 that the savings will be even more.
 In the future we could do crazier things.  EG there's no real reason why the 
 indexed terms must be regular (every N terms), so, we could instead pick 
 terms more carefully, say approximately every N, but favor terms that have 
 a smaller net prefix.  We can also index more sparsely in regions where the 
 net docFreq is lowish, since we can afford somewhat higher seek+scan time to 
 these terms since enuming their docs will be much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2588) terms index should not store useless suffixes

2010-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909699#action_12909699
 ] 

Robert Muir commented on LUCENE-2588:
-

Should we really change StandardCodec to support this [non-binary order]?

Really if you have anything but regular unicode order, other things in lucene 
will break too, such as queries.
The test just doesnt test these. Try changing the order of PreFlexCodec's 
comparator...

Can't we just fix the test not to use StandardCodec? I mean we aren't taking 
any feature away here.

 terms index should not store useless suffixes
 -

 Key: LUCENE-2588
 URL: https://issues.apache.org/jira/browse/LUCENE-2588
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2588.patch, LUCENE-2588.patch


 This idea came up when discussing w/ Robert how to improve our terms index...
 The terms dict index today simply grabs whatever term was at a 0 mod 128 
 index (by default).
 But this is wasteful because you often don't need the suffix of the term at 
 that point.
 EG if the 127th term is aa and the 128th (indexed) term is abcd123456789, 
 instead of storing that full term you only need to store ab.  The suffix is 
 useless, and uses up RAM since we load the terms index into RAM.
 The patch is very simple.  The optimization is particularly easy because 
 terms are now byte[] and we sort in binary order.
 I tested on first 10M 1KB Wikipedia docs, and this reduces the terms index 
 (tii) file from 3.9 MB - 3.3 MB = 16% smaller (using StandardAnalyzer, 
 indexing body field tokenized but title / date fields untokenized).  I expect 
 on noisier terms dicts, especially ones w/ bad terms accidentally indexed, 
 that the savings will be even more.
 In the future we could do crazier things.  EG there's no real reason why the 
 indexed terms must be regular (every N terms), so, we could instead pick 
 terms more carefully, say approximately every N, but favor terms that have 
 a smaller net prefix.  We can also index more sparsely in regions where the 
 net docFreq is lowish, since we can afford somewhat higher seek+scan time to 
 these terms since enuming their docs will be much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Current trunk example woes...

2010-09-15 Thread Erick Erickson
Well, maybe. I'm sure I got SOLR, but maybe not Lucene. What I'm sure
I *hadn't* done was clean the lucene tree before building
the solr example. Which, if I'd been thinking would have been logical...

Doing both fixes my self-generated problem, and all's well now, I was having
a
hard time imagining that I was the first one to run into such an egregious
error, but it had been a long day by last night...

Never mind  thanks

Erick

On Tue, Sep 14, 2010 at 9:46 PM, Yonik Seeley yo...@lucidimagination.comwrote:

 On Tue, Sep 14, 2010 at 8:16 PM, Erick Erickson erickerick...@gmail.com
 wrote:
  If I check out the current trunk, and from solr do an ant clean example
  all is well, even up to starting Solr. But trying to hit anything on the
  site gives a response in the browser starting with:
 
  org.apache.solr.common.SolrException: Plugin init failure for
 [schema.xml]
  fieldType:Error loading class 'solr.SpatialTileField'
 
  Commenting the relevant fieldType out of schema.xml fixes this.
 Should
  I open a Jira or does someone want to jump on it?

 Hmmm, I can't reproduce this.
 Something like http://localhost:8983/solr/select?q=solr seems to work
 fine.

 Did you do an svn up at the trunk level (i.e. get lucene too)?

 -Yonik
 http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8

 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org




Re: Current trunk example woes...

2010-09-15 Thread Mark Miller
This can happen fairly easily/often these days. We probably still want
to consider having a Solr clean call Lucene clean.

- Mark

On 9/15/10 7:54 AM, Erick Erickson wrote:
 Well, maybe. I'm sure I got SOLR, but maybe not Lucene. What I'm sure
 I *hadn't* done was clean the lucene tree before building
 the solr example. Which, if I'd been thinking would have been logical...
 
 Doing both fixes my self-generated problem, and all's well now, I was
 having a 
 hard time imagining that I was the first one to run into such an egregious 
 error, but it had been a long day by last night...
 
 Never mind  thanks
 
 Erick 
 
 On Tue, Sep 14, 2010 at 9:46 PM, Yonik Seeley
 yo...@lucidimagination.com mailto:yo...@lucidimagination.com wrote:
 
 On Tue, Sep 14, 2010 at 8:16 PM, Erick Erickson
 erickerick...@gmail.com mailto:erickerick...@gmail.com wrote:
  If I check out the current trunk, and from solr do an ant clean
 example
  all is well, even up to starting Solr. But trying to hit anything
 on the
  site gives a response in the browser starting with:
 
  org.apache.solr.common.SolrException: Plugin init failure for
 [schema.xml]
  fieldType:Error loading class 'solr.SpatialTileField'
 
  Commenting the relevant fieldType out of schema.xml fixes
 this. Should
  I open a Jira or does someone want to jump on it?
 
 Hmmm, I can't reproduce this.
 Something like http://localhost:8983/solr/select?q=solr seems to
 work fine.
 
 Did you do an svn up at the trunk level (i.e. get lucene too)?
 
 -Yonik
 http://lucenerevolution.org  Lucene/Solr Conference, Boston Oct 7-8
 
 -
 To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
 mailto:dev-unsubscr...@lucene.apache.org
 For additional commands, e-mail: dev-h...@lucene.apache.org
 mailto:dev-h...@lucene.apache.org
 
 


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: exceptions from solr/contrib/dataimporthandler and solr/contrib/extraction

2010-09-15 Thread Jan Høydahl / Cominvent
 What I want you to do is, I want you to find the guys who are putting
 all the bugs in the code, and I want you to FIRE THEM!
 
 He who is without bugs in their code, may be the first to fire.

Did noone fire you? Neither do the ASF. Go away and skip unit tests no more..

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2588) terms index should not store useless suffixes

2010-09-15 Thread Simon Willnauer (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909717#action_12909717
 ] 

Simon Willnauer commented on LUCENE-2588:
-

bq. Should we really change StandardCodec to support this [non-binary order]?
I'm not sure if we should do, but we should at least document the limitation. 
People who work with that level do also read doc strings - if they don't let 
them be doomed but if you run into the bug we had in LUCENE-2622 you will have 
a super hard time to figure out what is going on without knowing lucene very 
very well. 


bq. Can't we just fix the test not to use StandardCodec? I mean we aren't 
taking any feature away here. 

+1 I think we should fix this test ASAP with either using byte sort order or 
add some MockCodec (what robert has suggested). 


 terms index should not store useless suffixes
 -

 Key: LUCENE-2588
 URL: https://issues.apache.org/jira/browse/LUCENE-2588
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2588.patch, LUCENE-2588.patch


 This idea came up when discussing w/ Robert how to improve our terms index...
 The terms dict index today simply grabs whatever term was at a 0 mod 128 
 index (by default).
 But this is wasteful because you often don't need the suffix of the term at 
 that point.
 EG if the 127th term is aa and the 128th (indexed) term is abcd123456789, 
 instead of storing that full term you only need to store ab.  The suffix is 
 useless, and uses up RAM since we load the terms index into RAM.
 The patch is very simple.  The optimization is particularly easy because 
 terms are now byte[] and we sort in binary order.
 I tested on first 10M 1KB Wikipedia docs, and this reduces the terms index 
 (tii) file from 3.9 MB - 3.3 MB = 16% smaller (using StandardAnalyzer, 
 indexing body field tokenized but title / date fields untokenized).  I expect 
 on noisier terms dicts, especially ones w/ bad terms accidentally indexed, 
 that the savings will be even more.
 In the future we could do crazier things.  EG there's no real reason why the 
 indexed terms must be regular (every N terms), so, we could instead pick 
 terms more carefully, say approximately every N, but favor terms that have 
 a smaller net prefix.  We can also index more sparsely in regions where the 
 net docFreq is lowish, since we can afford somewhat higher seek+scan time to 
 these terms since enuming their docs will be much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2504) sorting performance regression

2010-09-15 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909407#action_12909407
 ] 

Yonik Seeley edited comment on LUCENE-2504 at 9/15/10 9:00 AM:
---

bq. The open question is whether this hotspot fickleness is particular to 
Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included).

I tried IBM's latest Java6 (SR8 FP1, 20100624)
It seems to have some of the same pitfalls as Oracle's JVM, just different.
The first run does not differ from the second run in the same JVM as it does 
with Oracle, but the first run itself has much more variation.  The worst case 
is worse, and just like the Oracle JVM, it gets stuck in it's worst case.

Each run (of the complete set of fields) in a separate JVM since two runs in 
the same JVM didn't really differ as they did in the oracle JVM.


branch_3x:
|unique terms in field|median sort time of 100 sorts in ms|another run|another 
run|another run|another run|another run|another run
|10|129|128|130|109|98|128|135
|1|128|123|127|127|98|128|135
|1000|129|130|130|128|98|130|136
|100|128|133|133|130|100|132|139
|10|150|153|153|154|122|153|159

trunk:
|unique terms in field|median sort time of 100 sorts in ms|another run|another 
run|another run|another run|another run|another run
|10|217|81|383|99|79|78|215
|1|254|73|346|101|106|108|267
|1000|253|74|347|99|107|108|258
|100|253|107|394|98|107|102|255
|10|251|107|388|99|106|98|257

The second way of testing is to completely mix fields (no serial correlation 
between what field is sorted on).  This is the test that is very predictable 
with the Oracle JVM, but I still see wide variability with the IBM JVM.  Here 
is the list of different runs for the IBM JVM (ms):

branch_3x
|128|129|123|120|128|100|95|74|130|91|120

trunk
|106|89|168|116|155|119|108|118|112|169|165

To my eye, it looks like we have more variability in trunk, due to increased 
use of abstractions?

edit: corrected the table description - all times in this message are for the 
IBM JVM.


  was (Author: ysee...@gmail.com):
bq. The open question is whether this hotspot fickleness is particular to 
Oracle's java impl, or, is somehow endemic to bytecode VMs (.NET included).

I tried IBM's latest Java6 (SR8 FP1, 20100624)
It seems to have some of the same pitfalls as Oracle's JVM, just different.
The first run does not differ from the second run in the same JVM as it does 
with Oracle, but the first run itself has much more variation.  The worst case 
is worse, and just like the Oracle JVM, it gets stuck in it's worst case.

Each run (of the complete set of fields) in a separate JVM since two runs in 
the same JVM didn't really differ as they did in the oracle JVM.


branch_3x:
|unique terms in field|median sort time of 100 sorts in ms|another run|another 
run|another run|another run|another run|another run
|10|129|128|130|109|98|128|135
|1|128|123|127|127|98|128|135
|1000|129|130|130|128|98|130|136
|100|128|133|133|130|100|132|139
|10|150|153|153|154|122|153|159

trunk:
|unique terms in field|median sort time of 100 sorts in ms|another run|another 
run|another run|another run|another run|another run
|10|217|81|383|99|79|78|215
|1|254|73|346|101|106|108|267
|1000|253|74|347|99|107|108|258
|100|253|107|394|98|107|102|255
|10|251|107|388|99|106|98|257

The second way of testing is to completely mix fields (no serial correlation 
between what field is sorted on).  This is the test that is very predictable 
with the Oracle JVM, but I still see wide variability with the IBM JVM.  Here 
is the list of different runs for the Oracle JVM (ms):

branch_3x
|128|129|123|120|128|100|95|74|130|91|120

trunk
|106|89|168|116|155|119|108|118|112|169|165

To my eye, it looks like we have more variability in trunk, due to increased 
use of abstractions?

  
 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, 
 LUCENE-2504.zip, LUCENE-2504_SortMissingLast.patch


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard

2010-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909759#action_12909759
 ] 

Robert Muir commented on LUCENE-2167:
-

bq. I think this is ready to commit.

I think so too, i applied the svn moves and the patch and all tests pass.

One last question, it might be reasonable to move ClassicTokenizer and friends 
to .classic package?
There is nothing standards-based about them at all and it makes the .standard 
directory a little confusing.

To do this i would have to make StandardTokenizerInterface public, but it could 
marked @lucene.internal.


 Implement StandardTokenizer with the UAX#29 Standard
 

 Key: LUCENE-2167
 URL: https://issues.apache.org/jira/browse/LUCENE-2167
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Shyamal Prasad
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-lucene-buildhelper-maven-plugin.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 standard.zip, StandardTokenizerImpl.jflex

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 It would be really nice for StandardTokenizer to adhere straight to the 
 standard as much as we can with jflex. Then its name would actually make 
 sense.
 Such a transition would involve renaming the old StandardTokenizer to 
 EuropeanTokenizer, as its javadoc claims:
 bq. This should be a good tokenizer for most European-language documents
 The new StandardTokenizer could then say
 bq. This should be a good tokenizer for most languages.
 All the english/euro-centric stuff like the acronym/company/apostrophe stuff 
 can stay with that EuropeanTokenizer, and it could be used by the european 
 analyzers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard

2010-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909760#action_12909760
 ] 

Robert Muir commented on LUCENE-2167:
-

bq. One last question, it might be reasonable to move ClassicTokenizer and 
friends to .classic package?

by the way, if we decide this is best, i would like to open a new issue for it.
we don't have to do everything in one step, and currently this patch cleanly 
applies with the svn move instructions.

so I would like to commit this patch in a few days as-is if they are no 
objections.

if we want to improve packaging lets open a followup-issue.


 Implement StandardTokenizer with the UAX#29 Standard
 

 Key: LUCENE-2167
 URL: https://issues.apache.org/jira/browse/LUCENE-2167
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Shyamal Prasad
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-lucene-buildhelper-maven-plugin.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 standard.zip, StandardTokenizerImpl.jflex

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 It would be really nice for StandardTokenizer to adhere straight to the 
 standard as much as we can with jflex. Then its name would actually make 
 sense.
 Such a transition would involve renaming the old StandardTokenizer to 
 EuropeanTokenizer, as its javadoc claims:
 bq. This should be a good tokenizer for most European-language documents
 The new StandardTokenizer could then say
 bq. This should be a good tokenizer for most languages.
 All the english/euro-centric stuff like the acronym/company/apostrophe stuff 
 can stay with that EuropeanTokenizer, and it could be used by the european 
 analyzers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard

2010-09-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909770#action_12909770
 ] 

Steven Rowe commented on LUCENE-2167:
-

bq. One last question, it might be reasonable to move ClassicTokenizer and 
friends to .classic package?

I agree with your arguments about moving to .classic package.  I think new 
users won't care about what StandardTokenizer/Analyzer used to be.

My only concern here is existing users' upgrade experience - users should be 
able to continue using the ClassicTokenizer if they want to keep current 
behavior.  Right now, they can do that by setting Version to 3.0 in the 
constructor to StandardTokenizer/Analyzer.  I think this should remain the case 
until Lucene version 5.0.


 Implement StandardTokenizer with the UAX#29 Standard
 

 Key: LUCENE-2167
 URL: https://issues.apache.org/jira/browse/LUCENE-2167
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Shyamal Prasad
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-lucene-buildhelper-maven-plugin.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 standard.zip, StandardTokenizerImpl.jflex

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 It would be really nice for StandardTokenizer to adhere straight to the 
 standard as much as we can with jflex. Then its name would actually make 
 sense.
 Such a transition would involve renaming the old StandardTokenizer to 
 EuropeanTokenizer, as its javadoc claims:
 bq. This should be a good tokenizer for most European-language documents
 The new StandardTokenizer could then say
 bq. This should be a good tokenizer for most languages.
 All the english/euro-centric stuff like the acronym/company/apostrophe stuff 
 can stay with that EuropeanTokenizer, and it could be used by the european 
 analyzers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-15 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909771#action_12909771
 ] 

Jason Rutherglen commented on LUCENE-2575:
--

Because of the way byte slices work, eg, they need to pre-know
the size of the slice before iterating on it, we can't simply
point to the middle of a slice and read without probably
iterating over the forwarding address.

It seems the skip list will need to point to the beginning of a
slice. This'll make the interval iteration in the RAM buffer
skip list writer a little more complicated than today in that
it'll need to store positions that are the start of byte slices.
In other words, the intervals will be slightly uneven at times.

 Concurrent byte and int block implementations
 -

 Key: LUCENE-2575
 URL: https://issues.apache.org/jira/browse/LUCENE-2575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
 LUCENE-2575.patch


 The current *BlockPool implementations aren't quite concurrent.
 We really need something that has a locking flush method, where
 flush is called at the end of adding a document. Once flushed,
 the newly written data would be available to all other reading
 threads (ie, postings etc). I'm not sure I understand the slices
 concept, it seems like it'd be easier to implement a seekable
 random access file like API. One'd seek to a given position,
 then read or write from there. The underlying management of byte
 arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard

2010-09-15 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909772#action_12909772
 ] 

Robert Muir commented on LUCENE-2167:
-

{quote}
My only concern here is existing users' upgrade experience - users should be 
able to continue using the ClassicTokenizer if they want to keep current 
behavior. Right now, they can do that by setting Version to 3.0 in the 
constructor to StandardTokenizer/Analyzer. I think this should remain the case 
until Lucene version 5.0.
{quote}

I agree completely, i think we can do this though with the Classic stuff in a 
separate package? (like we can have both)

 Implement StandardTokenizer with the UAX#29 Standard
 

 Key: LUCENE-2167
 URL: https://issues.apache.org/jira/browse/LUCENE-2167
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Shyamal Prasad
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-lucene-buildhelper-maven-plugin.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 standard.zip, StandardTokenizerImpl.jflex

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 It would be really nice for StandardTokenizer to adhere straight to the 
 standard as much as we can with jflex. Then its name would actually make 
 sense.
 Such a transition would involve renaming the old StandardTokenizer to 
 EuropeanTokenizer, as its javadoc claims:
 bq. This should be a good tokenizer for most European-language documents
 The new StandardTokenizer could then say
 bq. This should be a good tokenizer for most languages.
 All the english/euro-centric stuff like the acronym/company/apostrophe stuff 
 can stay with that EuropeanTokenizer, and it could be used by the european 
 analyzers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2167) Implement StandardTokenizer with the UAX#29 Standard

2010-09-15 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2167?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909776#action_12909776
 ] 

Steven Rowe commented on LUCENE-2167:
-

bq. I agree completely, i think we can do this though with the Classic stuff in 
a separate package? (like we can have both)

Right, I didn't mean to say that moving the Classic stuff out of .standard was 
antithetical to preserving Classic functionality in StandardTokenizer - I just 
wanted to make sure that we agreed that that move doesn't mean complete 
separation (yet).  Sounds like we agree.

 Implement StandardTokenizer with the UAX#29 Standard
 

 Key: LUCENE-2167
 URL: https://issues.apache.org/jira/browse/LUCENE-2167
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/analyzers
Affects Versions: 3.1
Reporter: Shyamal Prasad
Assignee: Robert Muir
Priority: Minor
 Attachments: LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-jflex-tld-macro-gen.patch, LUCENE-2167-jflex-tld-macro-gen.patch, 
 LUCENE-2167-lucene-buildhelper-maven-plugin.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.benchmark.patch, 
 LUCENE-2167.benchmark.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, LUCENE-2167.patch, 
 standard.zip, StandardTokenizerImpl.jflex

   Original Estimate: 0.5h
  Remaining Estimate: 0.5h

 It would be really nice for StandardTokenizer to adhere straight to the 
 standard as much as we can with jflex. Then its name would actually make 
 sense.
 Such a transition would involve renaming the old StandardTokenizer to 
 EuropeanTokenizer, as its javadoc claims:
 bq. This should be a good tokenizer for most European-language documents
 The new StandardTokenizer could then say
 bq. This should be a good tokenizer for most languages.
 All the english/euro-centric stuff like the acronym/company/apostrophe stuff 
 can stay with that EuropeanTokenizer, and it could be used by the european 
 analyzers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-15 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909839#action_12909839
 ] 

Jason Rutherglen commented on LUCENE-2575:
--

Is there a way to know the level of a slice given only the forwarding 
address/position?  It doesn't look like it.  Hmm... This could mean encoding 
the level or the size of the slice into the slice, which would elongate slices 
in general, I suppose though that the level index would only add one byte and 
that would be okay. 

 Concurrent byte and int block implementations
 -

 Key: LUCENE-2575
 URL: https://issues.apache.org/jira/browse/LUCENE-2575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
 LUCENE-2575.patch


 The current *BlockPool implementations aren't quite concurrent.
 We really need something that has a locking flush method, where
 flush is called at the end of adding a document. Once flushed,
 the newly written data would be available to all other reading
 threads (ie, postings etc). I'm not sure I understand the slices
 concept, it seems like it'd be easier to implement a seekable
 random access file like API. One'd seek to a given position,
 then read or write from there. The underlying management of byte
 arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2575) Concurrent byte and int block implementations

2010-09-15 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909852#action_12909852
 ] 

Jason Rutherglen commented on LUCENE-2575:
--

In the following line of ByteBlockPool.allocSlice we're recording the slice 
level, however it's at the end of the slice rather than the beginning, which is 
where we'll need to write the level in order to implement slice seek.  I'm not 
immediately sure what's reading the level at this end position of the byte[].

{code}
buffer[byteUpto-1] = (byte) (16|newLevel);
{code}

 Concurrent byte and int block implementations
 -

 Key: LUCENE-2575
 URL: https://issues.apache.org/jira/browse/LUCENE-2575
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: Realtime Branch
Reporter: Jason Rutherglen
 Fix For: Realtime Branch

 Attachments: LUCENE-2575.patch, LUCENE-2575.patch, LUCENE-2575.patch, 
 LUCENE-2575.patch


 The current *BlockPool implementations aren't quite concurrent.
 We really need something that has a locking flush method, where
 flush is called at the end of adding a document. Once flushed,
 the newly written data would be available to all other reading
 threads (ie, postings etc). I'm not sure I understand the slices
 concept, it seems like it'd be easier to implement a seekable
 random access file like API. One'd seek to a given position,
 then read or write from there. The underlying management of byte
 arrays could then be hidden?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-2098) Search Grouping: Facet support

2010-09-15 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-2098:
---

Attachment: SOLR-2098.patch

Attaching patch that makes faceting work with field collapsing.

 Search Grouping: Facet support
 --

 Key: SOLR-2098
 URL: https://issues.apache.org/jira/browse/SOLR-2098
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Attachments: SOLR-2098.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (SOLR-2098) Search Grouping: Facet support

2010-09-15 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-2098.


Fix Version/s: 4.0
   Resolution: Fixed

committed.

 Search Grouping: Facet support
 --

 Key: SOLR-2098
 URL: https://issues.apache.org/jira/browse/SOLR-2098
 Project: Solr
  Issue Type: Sub-task
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: SOLR-2098.patch




-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2646) Iimplement the Military Grid Reference System for tiling

2010-09-15 Thread Grant Ingersoll (JIRA)
Iimplement the Military Grid Reference System for tiling


 Key: LUCENE-2646
 URL: https://issues.apache.org/jira/browse/LUCENE-2646
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/spatial
Reporter: Grant Ingersoll


The current tile based system in Lucene is broken.  We should standardize on a 
common way of labeling grids and provide that as an option.  Based on previous 
conversations with Ryan McKinley and Chris Male, it seems the Military Grid 
Reference System (http://en.wikipedia.org/wiki/Military_grid_reference_system) 
is a good candidate for the replacement due to its standard use of metric tiles 
of increasing orders of magnitude (1, 10, 100, 1000, etc.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2646) Iimplement the Military Grid Reference System for tiling

2010-09-15 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909962#action_12909962
 ] 

Chris Male commented on LUCENE-2646:


+1

Do you have an implementation in mind/already started?

 Iimplement the Military Grid Reference System for tiling
 

 Key: LUCENE-2646
 URL: https://issues.apache.org/jira/browse/LUCENE-2646
 Project: Lucene - Java
  Issue Type: New Feature
  Components: contrib/spatial
Reporter: Grant Ingersoll

 The current tile based system in Lucene is broken.  We should standardize on 
 a common way of labeling grids and provide that as an option.  Based on 
 previous conversations with Ryan McKinley and Chris Male, it seems the 
 Military Grid Reference System 
 (http://en.wikipedia.org/wiki/Military_grid_reference_system) is a good 
 candidate for the replacement due to its standard use of metric tiles of 
 increasing orders of magnitude (1, 10, 100, 1000, etc.)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2562) Make Luke a Lucene/Solr Module

2010-09-15 Thread Mark Miller (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12909980#action_12909980
 ] 

Mark Miller commented on LUCENE-2562:
-

I haven't had any time to really work on this in a while, but I did bite the 
bullet and join the pivot mailing list and figured out my issues with making a 
fluid resizing layout - which is sweet and will hopefully motivate me to make 
some progress here soon.

 Make Luke a Lucene/Solr Module
 --

 Key: LUCENE-2562
 URL: https://issues.apache.org/jira/browse/LUCENE-2562
 Project: Lucene - Java
  Issue Type: Task
Reporter: Mark Miller
 Attachments: luke1.jpg, luke2.jpg, luke3.jpg


 see
 http://search.lucidimagination.com/search/document/ee0e048c6b56ee2/luke_in_need_of_maintainer
 http://search.lucidimagination.com/search/document/5e53136b7dcb609b/web_based_luke
 I think it would be great if there was a version of Luke that always worked 
 with trunk - and it would also be great if it was easier to match Luke jars 
 with Lucene versions.
 While I'd like to get GWT Luke into the mix as well, I think the easiest 
 starting point is to straight port Luke to another UI toolkit before 
 abstracting out DTO objects that both GWT Luke and Pivot Luke could share.
 I've started slowly converting Luke's use of thinlet to Apache Pivot. I 
 haven't/don't have a lot of time for this at the moment, but I've plugged 
 away here and there over the past work or two. There is still a *lot* to do.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Lucene-3.x #116

2010-09-15 Thread Robert Muir
this is unrelated to the clover problem.

the problem is @RunWith(LuceneTestCase.LocalizedTestCaseRunner.class)
as you can see, clover thinks we added 6210 core tests (see
https://hudson.apache.org/hudson/view/Lucene/job/Lucene-3.x/115/testReport/)

We do not do many iterations, we run every test the same time. its not a
parameter thing.

try ant test -Dtestcase=TestQueryParser to see what i mean, then comment out
that @RunWith

On Wed, Sep 15, 2010 at 9:35 PM, Uwe Schindler u...@thetaphi.de wrote:

 Maybe we should reduce the iterations in the clover case. Clover should
 only test coverage and that does not need to try all random variants. For
 colverage a single run of each test should be fine.



 How about removing the –Dtests. from the clover part of the build file?



 -

 Uwe Schindler

 H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de

 eMail: u...@thetaphi.de



 *From:* Robert Muir [mailto:rcm...@gmail.com]
 *Sent:* Wednesday, September 15, 2010 5:44 PM
 *To:* dev@lucene.apache.org
 *Subject:* Re: Build failed in Hudson: Lucene-3.x #116



 Hi, I think the switch of all tests to Junit4 may be causing a clover
 issue.

 For example, TestQueryParser now thinks it has over 5000 tests.



 The reason is that it runs each test under every locale and junit4 counts
 them this way. It does the same with MultiCodecRunner.

 I wonder if now that we vary these in the tests anyway, if we should
 consider commenting out the Localized/MultiCodec runners?



 We could keep them available (but not used) in case you want to quickly run
 a test under every single Locale/Codec



 On Wed, Sep 15, 2010 at 8:34 PM, Apache Hudson Server 
 hud...@hudson.apache.org wrote:

 See https://hudson.apache.org/hudson/job/Lucene-3.x/116/changes

 Changes:

 [mikemccand] don't close reader prematurely

 [rmuir] LUCENE-2630: fix intl test bugs that rely on cldr version

 --
 [...truncated 18329 lines...]
[junit] Testsuite: org.apache.lucene.search.TestTimeLimitingCollector
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 2.721 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.TestTopDocsCollector
[junit] Tests run: 8, Failures: 0, Errors: 0, Time elapsed: 0.016 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.TestTopScoreDocCollector
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.TestWildcard
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.037 sec
[junit]
[junit] Testsuite:
 org.apache.lucene.search.function.TestCustomScoreQuery
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 8.241 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.function.TestDocValues
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.005 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.function.TestFieldScoreQuery
[junit] Tests run: 12, Failures: 0, Errors: 0, Time elapsed: 0.267 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.function.TestOrdValues
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.108 sec
[junit]
[junit] Testsuite:
 org.apache.lucene.search.payloads.TestPayloadNearQuery
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.72 sec
[junit]
[junit] Testsuite:
 org.apache.lucene.search.payloads.TestPayloadTermQuery
[junit] Tests run: 6, Failures: 0, Errors: 0, Time elapsed: 0.973 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.spans.TestBasics
[junit] Tests run: 20, Failures: 0, Errors: 0, Time elapsed: 12.966 sec
[junit]
[junit] Testsuite:
 org.apache.lucene.search.spans.TestFieldMaskingSpanQuery
[junit] Tests run: 11, Failures: 0, Errors: 0, Time elapsed: 0.667 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.spans.TestNearSpansOrdered
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 0.073 sec
[junit]
[junit] Testsuite: org.apache.lucene.search.spans.TestPayloadSpans
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 1.4 sec
[junit]
[junit] - Standard Output ---
[junit]
[junit] Spans Dump --
[junit] payloads for span:2
[junit] doc:0 s:3 e:6 three:Noise:5
[junit] doc:0 s:3 e:6 one:Entity:3
[junit]
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:0 s:0 e:3 xx:Entity:0
[junit] doc:0 s:0 e:3 yy:Noise:2
[junit] doc:0 s:0 e:3 rr:Noise:1
[junit]
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:1 s:0 e:4 rr:Noise:3
[junit] doc:1 s:0 e:4 xx:Entity:0
[junit] doc:1 s:0 e:4 yy:Noise:1
[junit]
[junit] Spans Dump --
[junit] payloads for span:3
[junit] doc:0 s:0 e:3 rr:Noise:1
[junit] doc:0 s:0 e:3 yy:Noise:2
[junit] doc:0 s:0 e:3 xx:Entity:0
[junit]
[junit] Spans Dump --
 

[jira] Commented: (SOLR-792) Tree Faceting Component

2010-09-15 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-792?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12910008#action_12910008
 ] 

Lance Norskog commented on SOLR-792:


Can this be back-ported (easily) to Solr 1.4.1? Is it dependent on new features?

 Tree Faceting Component
 ---

 Key: SOLR-792
 URL: https://issues.apache.org/jira/browse/SOLR-792
 Project: Solr
  Issue Type: New Feature
Reporter: Erik Hatcher
Assignee: Ryan McKinley
Priority: Minor
 Attachments: SOLR-792-PivotFaceting.patch, 
 SOLR-792-PivotFaceting.patch, SOLR-792-PivotFaceting.patch, 
 SOLR-792-PivotFaceting.patch, SOLR-792.patch, SOLR-792.patch, SOLR-792.patch, 
 SOLR-792.patch, SOLR-792.patch, SOLR-792.patch


 A component to do multi-level faceting.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Build failed in Hudson: Lucene-trunk #1289

2010-09-15 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Lucene-trunk/1289/changes

Changes:

[mikemccand] don't close reader prematurely

[rmuir] LUCENE-2630: fix intl test bugs that rely on cldr version

[rmuir] LUCENE-2630: fix intl test bugs that rely on cldr version

--
[...truncated 14620 lines...]
common.init:

build-lucene:

init:

compile-test:
 [echo] Building swing...

compile-analyzers-common:

common.init:

build-lucene:

init:

clover.setup:
[clover-setup] Clover Version 2.6.3, built on November 20 2009 (build-778)
[clover-setup] Loaded from: 
/export/home/hudson/tools/clover/clover2latest/clover-2.6.3.jar
[clover-setup] Clover: Open Source License registered to Apache.
[clover-setup] Clover is enabled with initstring 
'https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/test/clover/db/lucene_coverage.db'

clover.info:

clover:

common.compile-core:

compile-core:

common.compile-test:

junit-mkdir:
[mkdir] Created dir: 
https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/swing/test

junit-sequential:
[junit] Testsuite: org.apache.lucene.swing.models.TestBasicList
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 3.846 sec
[junit] 
[junit] Testsuite: org.apache.lucene.swing.models.TestBasicTable
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.143 sec
[junit] 
[junit] Testsuite: org.apache.lucene.swing.models.TestSearchingList
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.303 sec
[junit] 
[junit] Testsuite: org.apache.lucene.swing.models.TestSearchingTable
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.136 sec
[junit] 
[junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingList
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 0.48 sec
[junit] 
[junit] Testsuite: org.apache.lucene.swing.models.TestUpdatingTable
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.616 sec
[junit] 

junit-parallel:

common.test:
 [echo] Building wordnet...

common.init:

build-lucene:

init:

test:
 [echo] Building wordnet...

common.init:

build-lucene:

init:

compile-test:
 [echo] Building wordnet...

compile-analyzers-common:

common.init:

build-lucene:

init:

clover.setup:
[clover-setup] Clover Version 2.6.3, built on November 20 2009 (build-778)
[clover-setup] Loaded from: 
/export/home/hudson/tools/clover/clover2latest/clover-2.6.3.jar
[clover-setup] Clover: Open Source License registered to Apache.
[clover-setup] Clover is enabled with initstring 
'https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/test/clover/db/lucene_coverage.db'

clover.info:

clover:

common.compile-core:

compile-core:

common.compile-test:

junit-mkdir:
[mkdir] Created dir: 
https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/test

junit-sequential:
[junit] Testsuite: org.apache.lucene.wordnet.TestSynonymTokenFilter
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 8.607 sec
[junit] 
[junit] Testsuite: org.apache.lucene.wordnet.TestWordnet
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.946 sec
[junit] 
[junit] - Standard Output ---
[junit] Opening Prolog file 
https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/classes/test/org/apache/lucene/wordnet/testSynonyms.txt
[junit] [1/2] Parsing 
https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/classes/test/org/apache/lucene/wordnet/testSynonyms.txt
[junit] 2 s(10001,1,'woods',n,1,0). 0 0 ndecent=0
[junit] 4 s(10001,3,'forest',n,1,0). 2 1 ndecent=0
[junit] 8 s(10003,2,'baron',n,1,1). 6 3 ndecent=0
[junit] [2/2] Building index to store synonyms,  map sizes are 8 and 4
[junit] row=1/8 doc= Documentstored,omitNormssyn:king 
stored,indexedword:baron
[junit] row=2/8 doc= Documentstored,omitNormssyn:wood 
stored,omitNormssyn:woods stored,indexedword:forest
[junit] row=4/8 doc= Documentstored,omitNormssyn:wolfish 
stored,indexedword:ravenous
[junit] Optimizing..
[junit] Opening Prolog file 
https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/classes/test/org/apache/lucene/wordnet/testSynonyms.txt
[junit] [1/2] Parsing 
https://hudson.apache.org/hudson/job/Lucene-trunk/ws/lucene/build/contrib/wordnet/classes/test/org/apache/lucene/wordnet/testSynonyms.txt
[junit] 2 s(10001,1,'woods',n,1,0). 0 0 ndecent=0
[junit] 4 s(10001,3,'forest',n,1,0). 2 1 ndecent=0
[junit] 8 s(10003,2,'baron',n,1,1). 6 3 ndecent=0
[junit] [2/2] Building index to store synonyms,  map sizes are 8 and 4
[junit] row=1/8 doc= Documentstored,omitNormssyn:king 
stored,indexedword:baron
[junit] row=2/8 doc= Documentstored,omitNormssyn:wood