Re: Build failed in Hudson: Solr-3.x #111

2010-09-23 Thread Robert Muir
On Thu, Sep 23, 2010 at 1:32 AM, Apache Hudson Server 
hud...@hudson.apache.org wrote:

[junit] Testsuite: org.apache.solr.handler.dataimport.TestEvaluatorBag
[junit] Testcase:
 testGetDateFormatEvaluator(org.apache.solr.handler.dataimport.TestEvaluatorBag):
  Caused an ERROR
[junit] expected:2010-09-21 [10]:31 but was:2010-09-21 [09]:31
[junit] at
 org.apache.solr.handler.dataimport.TestEvaluatorBag.testGetDateFormatEvaluator(TestEvaluatorBag.java:131)
[junit] at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:704)
[junit] at
 org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:677)
[junit]
[junit]
[junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.039 sec
[junit]
[junit] - Standard Output ---
[junit] NOTE: random locale of testcase 'testGetDateFormatEvaluator'
 was: fr_BE
[junit] NOTE: random timezone of testcase 'testGetDateFormatEvaluator'
 was: PLT
[junit] -  ---
[junit] TEST org.apache.solr.handler.dataimport.TestEvaluatorBag FAILED


Some kind of race condition? I seem to hit this error randomly sometimes,
but I can never reproduce it, even with same locale/timezone.

-- 
Robert Muir
rcm...@gmail.com


[jira] Resolved: (SOLR-2125) Spatial filter is not accurate

2010-09-23 Thread Grant Ingersoll (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-2125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Grant Ingersoll resolved SOLR-2125.
---

Fix Version/s: 3.1
   4.0
   Resolution: Fixed

Committed to trunk and 3.x

 Spatial filter is not accurate
 --

 Key: SOLR-2125
 URL: https://issues.apache.org/jira/browse/SOLR-2125
 Project: Solr
  Issue Type: Bug
  Components: Build
Affects Versions: 1.5
Reporter: Bill Bell
Assignee: Grant Ingersoll
 Fix For: 3.1, 4.0

 Attachments: Distance.diff, SOLR-2125.patch, solrspatial.xlsx


 The calculations of distance appears to be off.
 Note: The radius of the sphere to be used when calculating distances on a 
 sphere (i.e. haversine). Default is the Earth's mean radius in kilometers 
 (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) 
 which is set to 3,958.761458084784856. Most applications will not need to set 
 this.
 The radius of the earth in KM is  6371.009 km (≈3958.761 mi).
 Also filtering distance appears to be off - example data:
 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 
 miles = 220 kilometers
 http://../solr/select?fl=*,scorestart=0rows=10q={!sfilt%20fl=store_lat_lon}qt=standardpt=44.9369054,-91.3929348d=280sort=dist(2,store,vector(44.9369054,-91.3929348))
  asc 
 Nothing shows. d=285 shows results. This is off by a lot.
 Bill

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (SOLR-2131) Solr Boost returns unexpected results

2010-09-23 Thread Jayant Patil (JIRA)
Solr Boost returns unexpected results
-

 Key: SOLR-2131
 URL: https://issues.apache.org/jira/browse/SOLR-2131
 Project: Solr
  Issue Type: Wish
Reporter: Jayant Patil


Hi,

We are using Solr for our searches. We are facing issues while applying boost 
on particular fields.
E.g. 
We have a field Category, which contains values like Electronics, Computers, 
Home Appliances, Mobile Phones etc. 
We want to boost the category Electronics and Mobile Phones, we are using the 
following query
(category:Electronics^2 OR category:Mobile Phones^1 OR category:[* TO *]^0)

The results are unexpected as Category Mobile Phones gets more boost than 
Electronics even if we are specifying the boost factor 2 for electronics and 1 
for mobile phones respectively.
On debugging we found that DocFreq is manipulating the scores and hence 
affecting the overall boost. The no. of docs for mobile phones is much lower 
than that for electronics and solr is giving higher score to mobile phones for 
this reason.

Please suggest a solution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914044#action_12914044
 ] 

Yonik Seeley commented on LUCENE-2649:
--

Now we're talking!

Q: why aren't the CachePopulator methods just directly on EntryConfig - was it 
easier to share implementations that way or something?

Also:
- It doesn't seem like we need two methods fillValidBits , fillByteValues - 
shouldn't it just be one method that looks at the config and fills in the 
appropriate entries based on cacheValidBits() and cacheValues()?
- We should allow an implementation to create subclasses of ByteValues, etc...  
what about this method:
   public abstract CachedArray  fillEntry( CachedArray vals, IndexReader 
reader, String field, EntryConfig creator )
That way, an existing entry can be filled in (i.e. vals != null) or a new entry 
can be created.
Oh, wait, I see further down a ByteValues createValue() - if that's meant to 
be a method on CachePopulator, I guess it's all good - my main concern was 
being able to create subclasses of ByteValues and frields.

Anyway, all that's off the top of my head - I'm sure you've thought about it 
more at this point.

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Solr-3.x #111

2010-09-23 Thread Chris Hostetter

Wild guess: it's coming from a test that seems to deal with dates -- maybe 
it's code that uses a DateFormatter (parsing or formating) in a 
non-ThreadSafe way?

in which case it may be likely to show up in parallel tests but not when 
running tests individually

: Date: Thu, 23 Sep 2010 08:37:36 -0400
: From: Robert Muir rcm...@gmail.com
: Reply-To: dev@lucene.apache.org
: To: dev@lucene.apache.org
: Subject: Re: Build failed in Hudson: Solr-3.x #111
: 
: On Thu, Sep 23, 2010 at 1:32 AM, Apache Hudson Server 
: hud...@hudson.apache.org wrote:
: 
: [junit] Testsuite: org.apache.solr.handler.dataimport.TestEvaluatorBag
: [junit] Testcase:
:  
testGetDateFormatEvaluator(org.apache.solr.handler.dataimport.TestEvaluatorBag):
:   Caused an ERROR
: [junit] expected:2010-09-21 [10]:31 but was:2010-09-21 [09]:31
: [junit] at
:  
org.apache.solr.handler.dataimport.TestEvaluatorBag.testGetDateFormatEvaluator(TestEvaluatorBag.java:131)
: [junit] at
:  
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:704)
: [junit] at
:  
org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:677)
: [junit]
: [junit]
: [junit] Tests run: 5, Failures: 0, Errors: 1, Time elapsed: 0.039 sec
: [junit]
: [junit] - Standard Output ---
: [junit] NOTE: random locale of testcase 'testGetDateFormatEvaluator'
:  was: fr_BE
: [junit] NOTE: random timezone of testcase 'testGetDateFormatEvaluator'
:  was: PLT
: [junit] -  ---
: [junit] TEST org.apache.solr.handler.dataimport.TestEvaluatorBag FAILED
: 
: 
: Some kind of race condition? I seem to hit this error randomly sometimes,
: but I can never reproduce it, even with same locale/timezone.
: 
: -- 
: Robert Muir
: rcm...@gmail.com
: 

-Hoss

--
http://lucenerevolution.org/  ...  October 7-8, Boston
http://bit.ly/stump-hoss  ...  Stump The Chump!


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Re: Build failed in Hudson: Solr-3.x #111

2010-09-23 Thread Robert Muir
On Thu, Sep 23, 2010 at 11:21 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:


 Wild guess: it's coming from a test that seems to deal with dates -- maybe
 it's code that uses a DateFormatter (parsing or formating) in a
 non-ThreadSafe way?


possibly, good idea: i will apply some pressure to the test (-Dtests.iter)
and see what happens



 in which case it may be likely to show up in parallel tests but not when
 running tests individually


just for reference: our parallel tests are not parallel with threads but
completely separate JVMs.
so tests don't need to be thread-safe.

Example:
TestClassA, TestClassB, TestClassC, TestClassD

with 2 threads we just spawn 2 jvms (jvm1 and jvm2):
jvm1 executes TestClassA, then TestClassB
jvm2 exectues TestClassC, then TestClassD

the other thing parallel tests do is give jvm1 and jvm2 unique base temp
directories, so that if they are working with the filesystem they wont step
over each other.

so parallel tests are rather safe, but we do have to be aware of statics
(since jvm1 will run TestClassA, then TestClassB sequentially in the same
jvm)

-- 
Robert Muir
rcm...@gmail.com


Re: Build failed in Hudson: Solr-3.x #111

2010-09-23 Thread Robert Muir
On Thu, Sep 23, 2010 at 11:21 AM, Chris Hostetter
hossman_luc...@fucit.orgwrote:

 in which case it may be likely to show up in parallel tests but not when
 running tests individually


another note for reference, the solr contribs dont use parallel testing yet
at the moment (they will, if I can finish SOLR-2002 patch)

but this isn't a big deal since there aren't many of them, and most don't
have many tests.

-- 
Robert Muir
rcm...@gmail.com


[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914082#action_12914082
 ] 

Ryan McKinley commented on LUCENE-2649:
---

bq. Q: why aren't the CachePopulator methods just directly on EntryConfig - was 
it easier to share implementations that way or something?

Two reasons (but I can be talked out of it)
1. this approach separates what you are asking for (bits/values/etc) from how 
they are actually generated (the populator).  Something makes me 
uncomfortable about the caller asking for Values needing to also know how they 
are generated.  Seems easy to mess up.  With this approach the 'populator' is 
attached to the field cache, and defines how stuff is read, vs the 
'EntryConfig' that defines what the user is asking for (particulary since they 
may change what they are asking for in subsequent calls)

2. The 'populator' is attached to the FieldCache so it has consistent behavior 
across subsequet calls to getXxxxValues().  Note that with this approach, if 
you ask the field cache for just the 'values' then later want the 'bits' it 
uses the same populator and adds the results to the existing CachedArray value.


bq. It doesn't seem like we need two methods fillValidBits , fillByteValues

The 'fillValidBits' just fills up the valid bits w/o actually parsing (or 
caching) the values.  This is useful when:
1. you only want the ValidBits, but not the values (Mike seems to want this)
2. you first ask for just values, then later want the bits.  

Thinking some more, I think the populator should look like this:
{code:java}
public abstract class CachePopulator 
  {
public abstract ByteValues   createByteValues(   IndexReader reader, String 
field, EntryConfig config ) throws IOException;
public abstract ShortValues  createShortValues(  IndexReader reader, String 
field, EntryConfig config ) throws IOException;
public abstract IntValuescreateIntValues(IndexReader reader, String 
field, EntryConfig config ) throws IOException;
public abstract FloatValues  createFloatValues(  IndexReader reader, String 
field, EntryConfig config ) throws IOException;
public abstract DoubleValues createDoubleValues( IndexReader reader, String 
field, EntryConfig config ) throws IOException;

public abstract void fillByteValues(   ByteValues   vals, IndexReader 
reader, String field, EntryConfig config ) throws IOException;
public abstract void fillShortValues(  ShortValues  vals, IndexReader 
reader, String field, EntryConfig config ) throws IOException;
public abstract void fillIntValues(IntValuesvals, IndexReader 
reader, String field, EntryConfig config ) throws IOException;
public abstract void fillFloatValues(  FloatValues  vals, IndexReader 
reader, String field, EntryConfig config ) throws IOException;
public abstract void fillDoubleValues( DoubleValues vals, IndexReader 
reader, String field, EntryConfig config ) throws IOException;

// This will only fill in the ValidBits w/o parsing any actual values
public abstract void fillValidBits( CachedArray  vals, IndexReader reader, 
String field, EntryConfig config ) throws IOException;
  }
{code}

The default 'create' implementation could look something like this:

{code:java}

@Override
public ShortValues createShortValues( IndexReader reader, String field, 
EntryConfig config ) throws IOException 
{
  if( config == null ) {
config = new SimpleEntryConfig();
  }
  ShortValues vals = new ShortValues();
  if( config.cacheValues() ) {
this.fillShortValues(vals, reader, field, config);
  }
  else if( config.cacheValidBits() ) {
this.fillValidBits(vals, reader, field, config);
  }
  else {
throw new RuntimeException( the config must cache values and/or bits 
);
  }
  return vals;
}
{code}

And the Cache 'createValue' would looks somethign like this:
{code:java}

  static final class ByteCache extends Cache {
ByteCache(FieldCache wrapper) {
  super(wrapper);
}

@Override
protected final ByteValues createValue(IndexReader reader, Entry entry, 
CachePopulator populator) throws IOException {
  String field = entry.field;
  EntryConfig config = (EntryConfig)entry.custom;
  if (config == null) {
return wrapper.getByteValues(reader, field, new SimpleEntryConfig() );
  }
  return populator.createByteValues(reader, field, config);
}
  }
{code}

thoughts?  This would open up lots more of the field cache... so if we go this 
route, lets make sure it addresses the other issues people have with 
FieldCache.  IIUC, the other big request is to load the values from an external 
source -- that should be possible with this approach.

 FieldCache should include a BitSet for matching docs
 

 

[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914089#action_12914089
 ] 

Yonik Seeley commented on LUCENE-2649:
--

Oh... I mis-matched the parens when I was looking at your proposal (hence the 
confusion).

I think getCachePopulator() should be under EntryConfig - that way people can 
provide their own (and extend ByteValues to include more info)
Otherwise, we'll forever be locked into a lowest common denominator of only 
adding info that everyone can agree on.


 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914095#action_12914095
 ] 

Ryan McKinley commented on LUCENE-2649:
---

{quote}
I think getCachePopulator() should be under EntryConfig - that way people can 
provide their own (and extend ByteValues to include more info)
{quote}

So you think it is better for *each* call to define how the cache works rather 
then having that as an attribute of the FieldCache (that could be extended).  
The on thing that concerns me is that that forces all users of the FieldCache 
to be in sync.

In this proposal, you could set the CachePopulator on the FieldCache. 

{quote}
Otherwise, we'll forever be locked into a lowest common denominator of only 
adding info that everyone can agree on.
{quote}

This is why I just added the 'createXxxxValues' functions on CachePopulator -- 
a subclass could add other values.



It looks like the basic difference between what we are thinking is that the 
Populator is attached to the FieldCache rather then each call to the 
FieldCache.  From my point of view, this would make it easier for system with a 
schema (like solr) have consistent results across all calls, rather then making 
each request to the FieldCache need to know about the schema - parsers - 
populator

but I can always be convinced ;)

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Assigned: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2010-09-23 Thread Steven Rowe (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steven Rowe reassigned LUCENE-2657:
---

Assignee: Steven Rowe

 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 Full Maven POMs will include the information necessary to run a multi-module 
 Maven build, in addition to serving the same purpose as the current POM 
 templates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914100#action_12914100
 ] 

Yonik Seeley commented on LUCENE-2649:
--

bq. In this proposal, you could set the CachePopulator on the FieldCache. 

Hmmm, OK, as long as it's possible.

bq. From my point of view, this would make it easier for system with a schema 
(like solr) have consistent results across all calls, rather then making each 
request to the FieldCache need to know about the schema - parsers - populator

I think this may make it a lot harder from Solr's point of view.
- it's essentially a static... so it had better not ever be configurable from 
the schema or solrconfig, or it will break multi-core.
- if we ever *did* want to treat fields differently (load some values from a 
DB, etc), we'd want to look that up in the schema - but we don't have a 
reference to the scema in the populator, and we wouldn't want to store one 
there (again, we have multiple schemas).  So... we could essentially create 
custom EntryConfig object and then our custom CachePopulator could delegate to 
the entry config  (and we've essentially re-invented a way to be able to 
specify the populator on a per-field basis).

Are EntryConfig objects stored as keys anywhere?   We need to be very careful 
about memory leaks.

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914109#action_12914109
 ] 

Yonik Seeley commented on LUCENE-2649:
--

It's all doable though I guess - even if EntryConfig objects are used as cache 
keys, we could store a weak reference to the solr core.
So I say, proceed with what you think will make it easy for Lucene users - and 
don't focus on what will be easy for Solr.

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2657) Replace Maven POM templates with full POMs, and change documentation accordingly

2010-09-23 Thread Steven Rowe (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914112#action_12914112
 ] 

Steven Rowe commented on LUCENE-2657:
-

{quote}
bq. Full Maven POMs will include the information necessary to run a 
multi-module Maven build

That sort of sounds like a parallel build process (i.e. you would be able to 
build lucene/solr itself with maven). Is it?
We've avoided that type of thing in the past.
{quote}

Yes, it constitutes a parallel build process, but the Ant build process would 
remain the official build.  For example, the artifacts produced by the Maven 
build will not be exactly the same as those produced by the Ant build process, 
and so cannot be used as release artifacts.

It has been noted elsewhere that a Maven build has never been produced for 
Lucene.  I hope in this issue to provide that, so that the discussion about 
whether or not to include it in the source tree has a concrete reference point.


 Replace Maven POM templates with full POMs, and change documentation 
 accordingly
 

 Key: LUCENE-2657
 URL: https://issues.apache.org/jira/browse/LUCENE-2657
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Reporter: Steven Rowe
Assignee: Steven Rowe
 Fix For: 3.1, 4.0


 The current Maven POM templates only contain dependency information, the bare 
 bones necessary for uploading artifacts to the Maven repository.
 Full Maven POMs will include the information necessary to run a multi-module 
 Maven build, in addition to serving the same purpose as the current POM 
 templates.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914115#action_12914115
 ] 

Ryan McKinley commented on LUCENE-2649:
---

preface, I don't really know how FieldCache is used, so my assumptions could be 
way off...

In solr, is there one FieldCache for all all cores, or does each core get its 
own FieldCache?  

I figured each core would create a single CachePopulator (with a reference to 
the schema) and attach it to the FieldCache.  If that is not possible, then ya, 
it will be better to put that in the request.

bq. Are EntryConfig objects stored as keys anywhere? We need to be very careful 
about memory leaks.

Yes, the EntryConfig is part of the 'Entry' and gets stored as a key.  


 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914123#action_12914123
 ] 

Yonik Seeley commented on LUCENE-2649:
--

bq. In solr, is there one FieldCache for all all cores, or does each core get 
its own FieldCache? 

There is a single FieldCache for all cores (same as in Lucene).

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914136#action_12914136
 ] 

Yonik Seeley commented on LUCENE-2649:
--

Passing it in would also allow a way to get rid of the StopFillCacheException 
hack for NumericField in the future.

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)

2010-09-23 Thread Robert Muir (JIRA)
wrong exception from NativeFSLockFactory (LIA2 test case)
-

 Key: LUCENE-2663
 URL: https://issues.apache.org/jira/browse/LUCENE-2663
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Robert Muir
 Fix For: 3.1, 4.0


As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found one 
of the test cases fail

the test is pretty simple, and passes on 3.0. The exception you get instead 
(LockReleaseFailedException) is 
pretty confusing and I think we should fix it.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2663:


Attachment: LUCENE-2663_test.patch

 wrong exception from NativeFSLockFactory (LIA2 test case)
 -

 Key: LUCENE-2663
 URL: https://issues.apache.org/jira/browse/LUCENE-2663
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2663_test.patch


 As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found 
 one of the test cases fail
 the test is pretty simple, and passes on 3.0. The exception you get instead 
 (LockReleaseFailedException) is 
 pretty confusing and I think we should fix it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)

2010-09-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914139#action_12914139
 ] 

Robert Muir commented on LUCENE-2663:
-

for trunk, just change the test to use MockAnalyzer instead of simple... i made 
the patch from 3x.

 wrong exception from NativeFSLockFactory (LIA2 test case)
 -

 Key: LUCENE-2663
 URL: https://issues.apache.org/jira/browse/LUCENE-2663
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2663_test.patch


 As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found 
 one of the test cases fail
 the test is pretty simple, and passes on 3.0. The exception you get instead 
 (LockReleaseFailedException) is 
 pretty confusing and I think we should fix it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)

2010-09-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914141#action_12914141
 ] 

Uwe Schindler commented on LUCENE-2663:
---

May there be a randomization issue. If both index writers use a different 
LockFacory, it can easy fail! So at least for one test run, the LockFactory 
should be identical, else the locking can produce wrong failures/messages, as 
SimpleFSLock and NativeFSLock can interact very limited, but there are only 
some security checks, so locking will fail if you mix the implementations.

 wrong exception from NativeFSLockFactory (LIA2 test case)
 -

 Key: LUCENE-2663
 URL: https://issues.apache.org/jira/browse/LUCENE-2663
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2663_test.patch


 As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found 
 one of the test cases fail
 the test is pretty simple, and passes on 3.0. The exception you get instead 
 (LockReleaseFailedException) is 
 pretty confusing and I think we should fix it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)

2010-09-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914144#action_12914144
 ] 

Robert Muir commented on LUCENE-2663:
-

The test isnt random. it uses FSDirectory.open

 wrong exception from NativeFSLockFactory (LIA2 test case)
 -

 Key: LUCENE-2663
 URL: https://issues.apache.org/jira/browse/LUCENE-2663
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2663_test.patch


 As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found 
 one of the test cases fail
 the test is pretty simple, and passes on 3.0. The exception you get instead 
 (LockReleaseFailedException) is 
 pretty confusing and I think we should fix it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)

2010-09-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914146#action_12914146
 ] 

Uwe Schindler commented on LUCENE-2663:
---

Sorry, you are right. The exception throws is clearly wrong :-) Maybe its 
related to Shai's changes in NativeFSLockFactory (which is the default)?

 wrong exception from NativeFSLockFactory (LIA2 test case)
 -

 Key: LUCENE-2663
 URL: https://issues.apache.org/jira/browse/LUCENE-2663
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2663_test.patch


 As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found 
 one of the test cases fail
 the test is pretty simple, and passes on 3.0. The exception you get instead 
 (LockReleaseFailedException) is 
 pretty confusing and I think we should fix it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2663) wrong exception from NativeFSLockFactory (LIA2 test case)

2010-09-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914160#action_12914160
 ] 

Michael McCandless commented on LUCENE-2663:


I think it is the recent changes to NativeFSLockFactory...

IW tries to first clear the lock, if create=true, and that attempt is causing 
the exception.  Really this is a holdover from SimpleFSLockFactory, which can 
leave orphan'd locks... so maybe somehow we shouldn't do this for other lock 
factories?

 wrong exception from NativeFSLockFactory (LIA2 test case)
 -

 Key: LUCENE-2663
 URL: https://issues.apache.org/jira/browse/LUCENE-2663
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Reporter: Robert Muir
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2663_test.patch


 As part of integrating Lucene In Action 2 test cases (LUCENE-2661), I found 
 one of the test cases fail
 the test is pretty simple, and passes on 3.0. The exception you get instead 
 (LockReleaseFailedException) is 
 pretty confusing and I think we should fix it.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914163#action_12914163
 ] 

Ryan McKinley commented on LUCENE-2649:
---

Ok, as I look more, I think it may be worth some even bigger changes!  

Is there any advantage to having a different map for each Type?  The double 
(and triple) cache can get a bit crazy and lead to so much duplication

What about moving to a FieldCache that is centered around the very basic API:

{code:java}
public T T get(IndexReader reader, String field, EntryCreatorT creator)
{code}

Entry creator would be something like
{code:java}
public abstract static class EntryCreatorT implements Serializable 
  {
public abstract T create( IndexReader reader, String field );
public abstract void validate( T entry, IndexReader reader, String field );

/**
 * NOTE: the hashCode is used as part of the cache Key, so make sure it 
 * only changes if you want different entries for the same field
 */
@Override
public int hashCode()
{
  return EntryCreator.class.hashCode();
}
  }
{code}

We could add all the utility functions that cast stuff to ByteValues etc.  We 
would also make sure that the Map does not use the EntryCreator as a key, but 
uses it to generate a key.

A sample EntryCreator would look like this:
{code:java}

class BytesEntryCreator extends FieldCache.EntryCreatorByteValues {

  @Override
  public ByteValues create(IndexReader reader, String field) 
  {
// all the normal walking stuff using whatever parameters we have specified
  }

  @Override
  public void validate(ByteValues entry, IndexReader reader, String field) 
  {
// all the normal walking stuff using whatever parameters we have specified
  }  
}
{code}

Thoughts on this approach?  


Crazy how a seemingly simple issue just explodes :(

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914169#action_12914169
 ] 

Yonik Seeley commented on LUCENE-2649:
--

Hmmm, that would also seem to transform the FieldCache into a more generic 
index reader cache - not a bad idea!

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1568) Implement Spatial Filter

2010-09-23 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1568:
---

Attachment: SOLR-1568.patch

Here's a patch that changes sfilt to use getParam() which first gets from local 
params, but then also checks global request params if missing.  I also added 
sfield since getting fl from global params as the spatial field is 
obviously wrong.

Since a single solr request may have multiple spatial related things (distance 
function, sort by distance, filter) it allows easier sharing.

example:  sfield=storept=10,20d=200fq={!sfilt}sort=sdist() asc

(I made up sdist, it doesn't exist yet... but you get the idea).

 Implement Spatial Filter
 

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: CartesianTierQParserPlugin.java, 
 SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch


 Given an index with spatial information (either as a geohash, 
 SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
 able to pass in a filter query that takes in the field name, lat, lon and 
 distance and produces an appropriate Filter (i.e. one that is aware of the 
 underlying field type for use by Solr. 
 The interface _could_ look like:
 {code}
 fq={!sfilt dist=20}location:49.32,-79.0
 {code}
 or it could be:
 {code}
 fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt p=49.32,-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914178#action_12914178
 ] 

Uwe Schindler commented on LUCENE-2649:
---

bq. Anyone know what the deal is with IndexReader:

It is to share the cache for clones of IndexReaders or when SegmentReaders are 
reopended with different deleted docs. In this case the underlying Reader is 
the same, so it should use its cache (e.g. when deleted docs are added, you 
dont need to invalidate the cache).

For more info, ask Mike McCandless!

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2664) Add SimpleText codec

2010-09-23 Thread Michael McCandless (JIRA)
Add SimpleText codec


 Key: LUCENE-2664
 URL: https://issues.apache.org/jira/browse/LUCENE-2664
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0


Inspired by Sahin Buyrukbilen's question here:

  
http://www.lucidimagination.com/search/document/b68846e383824653/how_to_export_lucene_index_to_a_simple_text_file#b68846e383824653

I made a simple read/write codec that stores all postings data into a
single text file (_X.pst), looking like this:

{noformat}
field contents
  term file
doc 0
  pos 5
  term is
doc 0
  pos 1
  term second
doc 0
  pos 3
  term test
doc 0
  pos 4
  term the
doc 0
  pos 2
  term this
doc 0
  pos 0
END
{noformat}

The codec is fully funtional -- all Lucene  Solr tests pass with
-Dtests.codec=SimpleText -- but, its performance is obviously poor.

However, it should be useful for debugging, transparency,
understanding just what Lucene stores in its index, etc.  And it's a
quick way to gain some understanding on how a codec works...


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914185#action_12914185
 ] 

Uwe Schindler commented on LUCENE-2649:
---

Yonik: I was expecting this answer...

The reason is that my current contact (it's also your's) has exactly that 
problem also with norms (but also FC), that they want to lazily load values for 
sorting/norms (see the very old issue LUCENE-505). At least we should have a 
TopFieldDocCollector that can alternatively to native arrays also use a 
ValueSource-like aproach with getter methods - so you could sort against a CSF. 
Even if it is 20% slower, in some cases thats the only way to get a suitable 
search experience. Not always speed is the most important thing, sometimes also 
space requirements or warmup times. I would have no problem with providing both 
and chosing the implementation that is most speed-effective. So if no native 
arrays are provided by the FieldCache use getter methods.

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2664) Add SimpleText codec

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914186#action_12914186
 ] 

Yonik Seeley commented on LUCENE-2664:
--

heh - cool!

 Add SimpleText codec
 

 Key: LUCENE-2664
 URL: https://issues.apache.org/jira/browse/LUCENE-2664
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2664.patch


 Inspired by Sahin Buyrukbilen's question here:
   
 http://www.lucidimagination.com/search/document/b68846e383824653/how_to_export_lucene_index_to_a_simple_text_file#b68846e383824653
 I made a simple read/write codec that stores all postings data into a
 single text file (_X.pst), looking like this:
 {noformat}
 field contents
   term file
 doc 0
   pos 5
   term is
 doc 0
   pos 1
   term second
 doc 0
   pos 3
   term test
 doc 0
   pos 4
   term the
 doc 0
   pos 2
   term this
 doc 0
   pos 0
 END
 {noformat}
 The codec is fully funtional -- all Lucene  Solr tests pass with
 -Dtests.codec=SimpleText -- but, its performance is obviously poor.
 However, it should be useful for debugging, transparency,
 understanding just what Lucene stores in its index, etc.  And it's a
 quick way to gain some understanding on how a codec works...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2664) Add SimpleText codec

2010-09-23 Thread Michael McCandless (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael McCandless updated LUCENE-2664:
---

Attachment: LUCENE-2664.patch

 Add SimpleText codec
 

 Key: LUCENE-2664
 URL: https://issues.apache.org/jira/browse/LUCENE-2664
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2664.patch


 Inspired by Sahin Buyrukbilen's question here:
   
 http://www.lucidimagination.com/search/document/b68846e383824653/how_to_export_lucene_index_to_a_simple_text_file#b68846e383824653
 I made a simple read/write codec that stores all postings data into a
 single text file (_X.pst), looking like this:
 {noformat}
 field contents
   term file
 doc 0
   pos 5
   term is
 doc 0
   pos 1
   term second
 doc 0
   pos 3
   term test
 doc 0
   pos 4
   term the
 doc 0
   pos 2
   term this
 doc 0
   pos 0
 END
 {noformat}
 The codec is fully funtional -- all Lucene  Solr tests pass with
 -Dtests.codec=SimpleText -- but, its performance is obviously poor.
 However, it should be useful for debugging, transparency,
 understanding just what Lucene stores in its index, etc.  And it's a
 quick way to gain some understanding on how a codec works...

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Michael McCandless (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914188#action_12914188
 ] 

Michael McCandless commented on LUCENE-2649:


bq. I see no relief for this issue on the horizon.

We need to specialize the code... either manually or automatically...

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914191#action_12914191
 ] 

Yonik Seeley commented on LUCENE-2649:
--

bq. If we like the more general cache, that probably needs its own issue

Correct - with a more general fieldCache, one could implement alternatives that 
MMAP files, etc.  But those alternative implementations certainly should not be 
included in this issue.

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914194#action_12914194
 ] 

Uwe Schindler commented on LUCENE-2649:
---

I just wanted to mention that, so the design of the new FC is more flexible in 
that case. I am just pissed of because of these arrays and no flexibility :-(

 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Issue Comment Edited: (LUCENE-2649) FieldCache should include a BitSet for matching docs

2010-09-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914194#action_12914194
 ] 

Uwe Schindler edited comment on LUCENE-2649 at 9/23/10 3:49 PM:


I just wanted to mention that, so the design of the new FC is more flexible in 
that case. I am just pissed of because of these arrays and no flexibility :-(

The FC impl should be in line with the CSF aproach from LUCENE-2186.

  was (Author: thetaphi):
I just wanted to mention that, so the design of the new FC is more flexible 
in that case. I am just pissed of because of these arrays and no flexibility :-(
  
 FieldCache should include a BitSet for matching docs
 

 Key: LUCENE-2649
 URL: https://issues.apache.org/jira/browse/LUCENE-2649
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Fix For: 4.0

 Attachments: LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, 
 LUCENE-2649-FieldCacheWithBitSet.patch, LUCENE-2649-FieldCacheWithBitSet.patch


 The FieldCache returns an array representing the values for each doc.  
 However there is no way to know if the doc actually has a value.
 This should be changed to return an object representing the values *and* a 
 BitSet for all valid docs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Created: (LUCENE-2665) Rework FieldCache to be more flexible/general

2010-09-23 Thread Ryan McKinley (JIRA)
Rework FieldCache to be more flexible/general
-

 Key: LUCENE-2665
 URL: https://issues.apache.org/jira/browse/LUCENE-2665
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley


The existing FieldCache implementation is very rigid and does not allow much 
flexibility.  In trying to implement simple features, it points to much larger 
structural problems.

This patch aims to take a fresh approach to how we work with the FieldCache.



-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2665) Rework FieldCache to be more flexible/general

2010-09-23 Thread Ryan McKinley (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ryan McKinley updated LUCENE-2665:
--

Attachment: LUCENE-2665-FieldCacheOverhaul.patch

This is a quick sketch of a more general FieldCache -- it only implements the 
ByteValues case, and is implemented in a different package.

The core API looks like this:
{code:java}
  public T T get(IndexReader reader, String field, EntryCreatorT creator) 
throws IOException
{code}

and the EntryCreator looks like this:
{code:java}
public abstract class EntryCreatorT implements Serializable
{
  public abstract T create( IndexReader reader, String field ) throws 
IOException;
  public abstract T validate( T entry, IndexReader reader, String field ) 
throws IOException;

  /**
   * Indicate if a cached cached value should be checked before usage.  
   * This is useful if an application wants to support subsequent calls
   * to the same cached object that may alter the cached object.  If
   * an application wants to avoid this (synchronized) check, it should 
   * return 'false'
   * 
   * @return 'true' if the Cache should call 'validate' before returning a 
cached object
   */
  public boolean shouldValidate() {
return true;
  }

  /**
   * @return A key to identify valid cache entries for subsequent requests
   */
  public Integer getCacheKey( IndexReader reader, String field )
  {
return new Integer(
EntryCreator.class.hashCode() ^ field.hashCode()
);
  }
}
{code}

For a real cleanup, I think it makes sense to move the Parser stuff to 
somewhere that deals with numerics -- I don't get why that is tied to the 
FieldCache

Just as sketch to get us thinking




 Rework FieldCache to be more flexible/general
 -

 Key: LUCENE-2665
 URL: https://issues.apache.org/jira/browse/LUCENE-2665
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Attachments: LUCENE-2665-FieldCacheOverhaul.patch


 The existing FieldCache implementation is very rigid and does not allow much 
 flexibility.  In trying to implement simple features, it points to much 
 larger structural problems.
 This patch aims to take a fresh approach to how we work with the FieldCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2665) Rework FieldCache to be more flexible/general

2010-09-23 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914270#action_12914270
 ] 

Ryan McKinley commented on LUCENE-2665:
---

While we are thinking about it... perhaps the best option is to add a Cache to 
the IndexReader itself.

This would be nice since it would would drop the WeakHashMapIndexReader that 
is used all over.  I just like hard references better!

The one (ok maybe there are more) hitch I can think of is that some Cache 
instances don't care about deleted docs (FieldCache) and others do.  Perhaps 
the EntryCreator knows and the Cache coud behave differently... or am i getting 
ahead of myself!




 Rework FieldCache to be more flexible/general
 -

 Key: LUCENE-2665
 URL: https://issues.apache.org/jira/browse/LUCENE-2665
 Project: Lucene - Java
  Issue Type: Improvement
Reporter: Ryan McKinley
 Attachments: LUCENE-2665-FieldCacheOverhaul.patch


 The existing FieldCache implementation is very rigid and does not allow much 
 flexibility.  In trying to implement simple features, it points to much 
 larger structural problems.
 This patch aims to take a fresh approach to how we work with the FieldCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2444) move contrib/analyzers to modules/analysis

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2444?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2444.
-

  Assignee: Robert Muir
Resolution: Fixed

 move contrib/analyzers to modules/analysis
 --

 Key: LUCENE-2444
 URL: https://issues.apache.org/jira/browse/LUCENE-2444
 Project: Lucene - Java
  Issue Type: Task
  Components: Build
Affects Versions: 4.0
Reporter: Robert Muir
Assignee: Robert Muir
 Attachments: LUCENE-2444.patch, LUCENE-2444.patch, 
 LUCENE-2444_boilerplate.patch


 This is a patch to move contrib/analyzers under modules/analyzers
 We can then continue consolidating (LUCENE-2413)... in truth this will sorta 
 be 
 an ongoing thing anyway, as we try to distance indexing from analysis, etc

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2651) Add support for MMapDirectory's unmap in Apache Harmony

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2651:


Component/s: Store

 Add support for MMapDirectory's unmap in Apache Harmony
 ---

 Key: LUCENE-2651
 URL: https://issues.apache.org/jira/browse/LUCENE-2651
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Uwe Schindler
Assignee: Uwe Schindler
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2651.patch, LUCENE-2651.patch, LUCENE-2651.patch, 
 LUCENE-2651.patch


 The setUseUnmap does not work on Apache Harmony, this patch adds support for 
 it. It also fixes a small problem, that unmapping a clone may cause a sigsegv.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2471) Supporting bulk copies in Directory

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2471:


Component/s: Store

 Supporting bulk copies in Directory
 ---

 Key: LUCENE-2471
 URL: https://issues.apache.org/jira/browse/LUCENE-2471
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Store
Reporter: Earwin Burrfoot
 Fix For: 3.1, 4.0


 A method can be added to IndexOutput that accepts IndexInput, and writes 
 bytes using it as a source.
 This should be used for bulk-merge cases (offhand - norms, docstores?). Some 
 Directories can then override default impl and skip intermediate buffers 
 (NIO, MMap, RAM?).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2528) CFSFileDirectory: Allow a Compound Index file to be deployed as a complete index without segment files

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2528:


Component/s: Store

 CFSFileDirectory: Allow a Compound Index file to be deployed as a complete 
 index without segment files
 --

 Key: LUCENE-2528
 URL: https://issues.apache.org/jira/browse/LUCENE-2528
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Store
Reporter: Lance Norskog
Priority: Minor
 Attachments: LUCENE-2528.patch


 This patch presents a compound index file as a Lucene Directory class. This 
 allows you to deploy one file to a query server instead of deploying a 
 directory with the compound file and two segment files.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2586) move intblock/sep codecs into test

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2586:


Component/s: Index

 move intblock/sep codecs into test
 --

 Key: LUCENE-2586
 URL: https://issues.apache.org/jira/browse/LUCENE-2586
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Affects Versions: 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2586.patch


 The intblock and sep codecs in core exist to make it easy for people to try 
 different low-level algos for encoding ints.
 Sep breaks docs, freqs, pos, skip data, payloads into 5 separate files (vs 2 
 files that standard codec uses).
 Intblock further enables the docs, freqs, pos files to encode fixed-sized 
 blocks of ints at a time.
 So an app can easily subclass these codecs, using their own int encoder.
 But these codecs are now concrete, and they use dummy low-level block int 
 encoder (eg encoding 128 ints as separate vints).
 I'd like to change these to be abstract, and move these dummy codecs into 
 test.
 The tests would still test these dummy codecs, by rotating them in randomly 
 for all tests.
 I'd also like to rename IntBlock - FixedIntBlock, because I'm trying to get 
 a VariableIntBlock working well (for int encoders like Simple9, Simple16, 
 whose block size varies depending on the particular values).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2571) Indexing performance tests with realtime branch

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2571:


Component/s: Index

 Indexing performance tests with realtime branch
 ---

 Key: LUCENE-2571
 URL: https://issues.apache.org/jira/browse/LUCENE-2571
 Project: Lucene - Java
  Issue Type: Task
  Components: Index
Reporter: Michael Busch
Priority: Minor
 Fix For: Realtime Branch


 We should run indexing performance tests with the DWPT changes and compare to 
 trunk.
 We need to test both single-threaded and multi-threaded performance.
 NOTE:  flush by RAM isn't implemented just yet, so either we wait with the 
 tests or flush by doc count.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2529) always apply position increment gap between values

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2529:


Component/s: Index

 always apply position increment gap between values
 --

 Key: LUCENE-2529
 URL: https://issues.apache.org/jira/browse/LUCENE-2529
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
 Environment: (I don't know which version to say this affects since 
 it's some quasi trunk release and the new versioning scheme confuses me.)
Reporter: David Smiley
 Fix For: 3.1, 4.0

 Attachments: 
 LUCENE-2529_always_apply_position_increment_gap_between_values.patch

   Original Estimate: 1h
  Remaining Estimate: 1h

 I'm doing some fancy stuff with span queries that is very sensitive to term 
 positions.  I discovered that the position increment gap on indexing is only 
 applied between values when there are existing terms indexed for the 
 document.  I suspect this logic wasn't deliberate, it's just how its always 
 been for no particular reason.  I think it should always apply the gap 
 between fields.  Reference DocInverterPerField.java line 82:
 if (fieldState.length  0)
   fieldState.position += 
 docState.analyzer.getPositionIncrementGap(fieldInfo.name);
 This is checking fieldState.length.  I think the condition should simply be:  
 if (i  0).
 I don't think this change will affect anyone at all but it will certainly 
 help me.  Presently, I can either change this line in Lucene, or I can put in 
 a hack so that the first value for the document is some dummy value which is 
 wasteful.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2573:


Component/s: Index

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2607) IndexWriter.isLocked() fails on a read-only directory

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2607:


Component/s: Index

 IndexWriter.isLocked() fails on a read-only directory
 -

 Key: LUCENE-2607
 URL: https://issues.apache.org/jira/browse/LUCENE-2607
 Project: Lucene - Java
  Issue Type: Bug
  Components: Index
Affects Versions: 2.9.2
Reporter: Trejkaz

 This appears to be a regression of some sort because the issue was only 
 discovered by us some time after upgrading to the 2.9 series, and was not 
 present when we were using 2.3 (big gap between those two, though.)
 We had some code like:
 {code}
 if (IndexWriter.isLocked(directory))
 {
 IndexWriter.unlock(directory);
 }
 {code}
 And now we get an exception when this code runs on a read-only location:
 {noformat}
 java.lang.RuntimeException: Failed to acquire random test lock; please verify 
 filesystem for lock directory 'X:\Data\Index' supports locking at
 
 org.apache.lucene.store.NativeFSLockFactory.acquireTestLock(NativeFSLockFactory.java:99)
  at
 
 org.apache.lucene.store.NativeFSLockFactory.makeLock(NativeFSLockFactory.java:137)
  at
 org.apache.lucene.store.Directory.makeLock(Directory.java:131) at
 org.apache.lucene.index.IndexWriter.isLocked(IndexWriter.java:5672) at
 {noformat}
 I think it makes more logical sense to return *false* - if locking is not 
 possible then it cannot be locked, therefore isLocked should always return 
 false.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2527) FieldCache.getTermsIndex should cache fasterButMoreRAM=true|false to the same cache key

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2527:


Component/s: Search

 FieldCache.getTermsIndex should cache fasterButMoreRAM=true|false to the same 
 cache key
 ---

 Key: LUCENE-2527
 URL: https://issues.apache.org/jira/browse/LUCENE-2527
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 4.0
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0


 When we cutover FieldCache to use shared byte[] blocks, we added the boolean 
 fasterButMoreRAM option, so you could tradeoff time/space.
 It defaults to true.
 The thinking is that an expert user, who wants to use false, could 
 pre-populate FieldCache by loading the field with false, and then later when 
 sorting on that field it'd use that same entry.
 But there's a bug -- when sorting, it then loads a 2nd entry with true.  
 This is because the Entry.custom in FieldCache participates in 
 equals/hashCode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2605) queryparser parses on whitespace

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2605?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2605:


Component/s: QueryParser

 queryparser parses on whitespace
 

 Key: LUCENE-2605
 URL: https://issues.apache.org/jira/browse/LUCENE-2605
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Reporter: Robert Muir
 Fix For: 3.1, 4.0


 The queryparser parses input on whitespace, and sends each whitespace 
 separated term to its own independent token stream.
 This breaks the following at query-time, because they can't see across 
 whitespace boundaries:
 * n-gram analysis
 * shingles 
 * synonyms (especially multi-word for whitespace-separated languages)
 * languages where a 'word' can contain whitespace (e.g. vietnamese)
 Its also rather unexpected, as users think their 
 charfilters/tokenizers/tokenfilters will do the same thing at index and 
 querytime, but
 in many cases they can't. Instead, preferably the queryparser would parse 
 around only real 'operators'.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2597) Query scorers should not use MultiFields

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2597:


Component/s: Search

 Query scorers should not use MultiFields
 

 Key: LUCENE-2597
 URL: https://issues.apache.org/jira/browse/LUCENE-2597
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Michael McCandless
Assignee: Michael McCandless
 Fix For: 4.0

 Attachments: LUCENE-2597.patch


 Lucene does all searching/filtering per-segment, today, but there are a 
 number of tests that directly invoke Scorer.scorer or Filter.getDocIdSet on a 
 composite reader.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2504) sorting performance regression

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2504:


Component/s: Search

 sorting performance regression
 --

 Key: LUCENE-2504
 URL: https://issues.apache.org/jira/browse/LUCENE-2504
 Project: Lucene - Java
  Issue Type: Bug
  Components: Search
Affects Versions: 4.0
Reporter: Yonik Seeley
 Fix For: 4.0

 Attachments: LUCENE-2504.patch, LUCENE-2504.patch, LUCENE-2504.patch, 
 LUCENE-2504.zip, LUCENE-2504_SortMissingLast.patch


 sorting can be much slower on trunk than branch_3x

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2665) Rework FieldCache to be more flexible/general

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2665:


Component/s: Search

 Rework FieldCache to be more flexible/general
 -

 Key: LUCENE-2665
 URL: https://issues.apache.org/jira/browse/LUCENE-2665
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Search
Reporter: Ryan McKinley
 Attachments: LUCENE-2665-FieldCacheOverhaul.patch


 The existing FieldCache implementation is very rigid and does not allow much 
 flexibility.  In trying to implement simple features, it points to much 
 larger structural problems.
 This patch aims to take a fresh approach to how we work with the FieldCache.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2566) + - operators allow any amount of whitespace

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2566?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2566:


Component/s: QueryParser

 + - operators allow any amount of whitespace
 

 Key: LUCENE-2566
 URL: https://issues.apache.org/jira/browse/LUCENE-2566
 Project: Lucene - Java
  Issue Type: Bug
  Components: QueryParser
Reporter: Yonik Seeley
Priority: Minor

 As an example, (foo - bar) is treated like (foo -bar).
 It seems like for +- to be treated as unary operators, they should be 
 immediately followed by the operand.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



bulk change 'don't email'

2010-09-23 Thread Robert Muir
didnt there used to be an option in jira not to email for bulk changes?

it seems to be gone... i was trying to clean up jira some.

sorry for the noise

-- 
Robert Muir
rcm...@gmail.com


[jira] Resolved: (LUCENE-712) Build with GCJ fail

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-712.


Resolution: Not A Problem

I compiled lucene with gcj, it builds fine.

However, many tests fail. gcj's classpath appears to be a dead project, and 
personally i won't go anywhere near their source code.
I don't recommend using lucene with gcj.

 Build with GCJ fail
 ---

 Key: LUCENE-712
 URL: https://issues.apache.org/jira/browse/LUCENE-712
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Affects Versions: 2.0.0
Reporter: Nicolas Lalevée
Priority: Minor
 Attachments: patch


 just need some little fix in the jar name and some issue with some anonymous 
 contructor

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-472) Some fixes to let gcj build lucene using ant gcj target

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-472.


Resolution: Not A Problem

I compiled lucene with gcj, it builds fine.

However, many tests fail. gcj's classpath appears to be a dead project, and 
personally i won't go anywhere near their source code.
I don't recommend using lucene with gcj.

 Some fixes to let gcj build lucene using ant gcj target
 ---

 Key: LUCENE-472
 URL: https://issues.apache.org/jira/browse/LUCENE-472
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Affects Versions: CVS Nightly - Specify date in submission
Reporter: Michele Bini
Priority: Minor
 Attachments: gcj-build.diff


 I'm attaching a patch that fixes two problems with the gcj build.
 First, some imports in lucene.search.FieldCacheImpl.java that gcj requires 
 but jdk doesn't were missing.
 Second, the Makefile uses the wrong name for the lucene-core .jar file.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-471) gcj ant target doesn't work on windows

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-471.


Resolution: Fixed

I compiled lucene with gcj, it builds fine.

However, many tests fail. gcj's classpath appears to be a dead project, and 
personally i won't go anywhere near their source code.
I don't recommend using lucene with gcj.

 gcj ant target doesn't work on windows
 --

 Key: LUCENE-471
 URL: https://issues.apache.org/jira/browse/LUCENE-471
 Project: Lucene - Java
  Issue Type: Bug
  Components: Build
Affects Versions: CVS Nightly - Specify date in submission
 Environment: Windows with MinGW (http://www.mingw.org/) and unixutils 
 (http://unxutils.sourceforge.net/)
Reporter: Michele Bini
Priority: Minor
 Attachments: win-makefile.diff, win-mmap.diff


 In order to fix it I made two changes, both really simple.
 First I added to org/apache/lucene/store/GCJIndexInput.cc some code to use 
 windows memory-mapped I/O instead than unix mmap().
 Then I had to rearrange the link order in the Makefile in order to avoid 
 unresolved symbol errors. Also to build repeatedly I had to instruct make to 
 ignore the return code for the mkdir command as on windows it fails if the 
 directory already exists.
 I'm attaching two patches corresponding to the changes; please note that with 
 the patches applied, the gcj target still works on linux. Both patches apply 
 cleanly to the current svn head.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

2010-09-23 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914296#action_12914296
 ] 

Jason Rutherglen commented on LUCENE-2573:
--

I was hoping something clever would come to me about how to unit test this, 
nothing has.  We can do the slowdown of writes to the file(s) via a 
Thread.sleep, however this will only emulate a real file system in RAM, what 
then?  I thought about testing the percentage however is it going to be exact?  
We could test a percentage range of each of the segments flushed?  I guess I 
just need to run the all of the unit tests, however some of those will fail 
because deletes aren't working properly yet.  

 Tiered flushing of DWPTs by RAM with low/high water marks
 -

 Key: LUCENE-2573
 URL: https://issues.apache.org/jira/browse/LUCENE-2573
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Index
Reporter: Michael Busch
Assignee: Michael Busch
Priority: Minor
 Fix For: Realtime Branch

 Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch


 Now that we have DocumentsWriterPerThreads we need to track total consumed 
 RAM across all DWPTs.
 A flushing strategy idea that was discussed in LUCENE-2324 was to use a 
 tiered approach:  
 - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
 - Flush all DWPTs at a high water mark (e.g. at 110%)
 - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are 
 used, flush at 90%, 95%, 100%, 105% and 110%.
 Should we allow the user to configure the low and high water mark values 
 explicitly using total values (e.g. low water mark at 120MB, high water mark 
 at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() 
 config method and use something like 90% and 110% for the water marks?

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Resolved: (LUCENE-2070) document LengthFilter wrt Unicode 4.0

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2070?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir resolved LUCENE-2070.
-

 Assignee: Robert Muir
Fix Version/s: 3.1
   Resolution: Fixed

Committed revision 1000675, 1000678 (3x)

 document LengthFilter wrt Unicode 4.0
 -

 Key: LUCENE-2070
 URL: https://issues.apache.org/jira/browse/LUCENE-2070
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/analyzers
Reporter: Robert Muir
Assignee: Robert Muir
Priority: Trivial
 Fix For: 3.1, 4.0

 Attachments: LUCENE-2070.patch


 LengthFilter calculates its min/max length from TermAttribute.termLength()
 This is not characters, but instead UTF-16 code units.
 In my opinion this should not be changed, merely documented.
 If we changed it, it would have an adverse performance impact because we 
 would have to actually calculate Character.codePointCount() on the text.
 If you feel strongly otherwise, fixing it to count codepoints would be a 
 trivial patch, but I'd rather not hurt performance.
 I admit I don't fully understand all the use cases for this filter.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-1540) Improvements to contrib.benchmark for TREC collections

2010-09-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-1540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914301#action_12914301
 ] 

Robert Muir commented on LUCENE-1540:
-

Tim, if you have modified benchmark to work with various formats of older TREC 
collections, that would be really nice.

 Improvements to contrib.benchmark for TREC collections
 --

 Key: LUCENE-1540
 URL: https://issues.apache.org/jira/browse/LUCENE-1540
 Project: Lucene - Java
  Issue Type: Improvement
  Components: contrib/benchmark
Affects Versions: 2.4
Reporter: Tim Armstrong
Priority: Minor

 The benchmarking utilities for  TREC test collections (http://trec.nist.gov) 
 are quite limited and do not support some of the variations in format of 
 older TREC collections.  
 I have been doing some benchmarking work with Lucene and have had to modify 
 the package to support:
 * Older TREC document formats, which the current parser fails on due to 
 missing document headers.
 * Variations in query format - newlines after title tag causing the query 
 parser to get confused.
 * Ability to detect and read in uncompressed text collections
 * Storage of document numbers by default without storing full text.
 I can submit a patch if there is interest, although I will probably want to 
 write unit tests for the new functionality first.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (LUCENE-2246) While indexing Turkish web pages, Parse Aborted: Lexical error.... occurs

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-2246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-2246:


Component/s: Examples
 (was: Index)

 While indexing Turkish web pages, Parse Aborted: Lexical error occurs
 ---

 Key: LUCENE-2246
 URL: https://issues.apache.org/jira/browse/LUCENE-2246
 Project: Lucene - Java
  Issue Type: Bug
  Components: Examples
Affects Versions: 3.0
Reporter: Selim Nadi

 When I try to index Turkish page if there is a Turkish specific character in 
 the HTML specific tag HTML parser gives Parse Aborted: Lexical error.on ... 
 line error.
 For this case IMG SRC=../images/head.jpg WIDTH=570 HEIGHT=47 BORDER=0 
 ALT=ş exception address ş character (which has 351 ascii value) as an 
 error. OR ı character in title tag.
 a title=(ııı)
 Turkish character in the content do not create any problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (LUCENE-851) Pruning

2010-09-23 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/LUCENE-851?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914304#action_12914304
 ] 

Robert Muir commented on LUCENE-851:


Marvin: is this functionality addressed with LUCENE-2482 ?

 Pruning
 ---

 Key: LUCENE-851
 URL: https://issues.apache.org/jira/browse/LUCENE-851
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index, Search
Reporter: Marvin Humphrey
Priority: Minor

 Greets,
 A thread on java-dev a couple of months ago drew my attention to a technique 
 used by Nutch for cutting down the number of hits that have to be processed:  
 if you have an algorithm for ordering documents by importance, and you sort 
 them so that the lowest document numbers have the highest rank, then most of 
 your high-scoring hits are going to occur early on in the hit-collection 
 process.  Say you're looking for the top 100 matches -- the odds are pretty 
 good that after you've found 1000 hits, you've gotten most of the good stuff. 
  It may not be necessary to score the other e.g. 5,000,000 hits.
 To pull this off in Nutch, they run the index through a post process whereby 
 documents are re-ordered by page score using the IndexSorter class.  
 Unfortunately, post-processing does not live happily with incremental 
 indexing.  
 However, if we ensure that document numbers are ordered according to our 
 criteria within each segment, that's almost as good.
 Say we're looking for 100 hits, as before; what we do is collect a maximum of 
 1000 hits per segment.  If we are dealing with an index made up of 25 
 segments, that's 25,000 hits max we'll have to process fully -- the rest we 
 can skip over.  That's not as quick as only processing 1000 hits then 
 stopping in a fully optimized index, but it's a lot better than churning 
 through all 5,000,000 hits.
 A lot of those hits from the smallest segments will be garbage; we'll get 
 most of our good hits from a few large segments most of the time.  But that's 
 fine -- the cost to process any one segment is small.
 Writing a low-level scoring loop which implements pruning per segment is 
 straightforward.  KinoSearch's version (in C) is below.
 To control the amount of pruning, we need a high-level 
 Searcher.setPruneFactor API, which sets a multiplier; the number of 
 hits-per-segment which must be processed is determined by multiplying the 
 number of hits you need by pruneFactor.  Here's code from KS for deriving 
 hits-per-seg:
 # process prune_factor if supplied
 my $seg_starts;
 my $hits_per_seg = 2**31 - 1;
 if ( defined $self-{prune_factor} and defined $args{num_wanted} ) {
 my $prune_count = $self-{prune_factor} * $args{num_wanted};
 if ( $prune_count  $hits_per_seg ) {# don't exceed I32_MAX
 $hits_per_seg = $prune_count;
 $seg_starts   = $reader-get_seg_starts;
 }
 }
 What I have not yet written is the index-time mechanism for sorting 
 documents.  
 In Nutch, they use the norms from a known indexed, non-tokenized field 
 (site).  However, in Lucene and KS, we can't count on any existing fields.  
 Document boost isn't stored directly, either.  The obvious answer is to start 
 storing it, which would suffice for Nutch-like uses.  However, it may make 
 sense to to avoid coupling document ordering to boost in order to influence 
 pruning without affecting scores.
 The sort ordering information needs a permanent home in the index, since it 
 will be needed whenever segment merging occurs.  The fixed-width per-document 
 storage in Lucene's .fdx file seems like a good place.  If we use one float 
 per document, we can simply put it before or after the 64-bit file pointer 
 and seek into the file after multiplying the doc num by 12 rather than 8.  
 During indexing, we'd keep the ordering info in an array; after all documents 
 for a segment have been added, we create an array of sorted document numbers. 
  When flushing the postings, their document numbers get remapped using the 
 sorted array.  Then we rewrite the .fdx file (and also the .tvx file), moving 
 the file pointers (and ordering info) to remapped locations.  The fact that 
 the .fdt file is now out of order is isn't a problem -- optimizing 
 sequential access to that file isn't important.
 This issue is closely tied to LUCENE-843, Improve how IndexWriter uses RAM 
 to buffer added documents, and LUCENE-847, Factor merge policy out of 
 IndexWriter.  Michael McCandless, Steven Parks, Ning Li, anybody else... 
 comments?  Suggestions?
 Marvin Humphrey
 Rectangular Research
 http://www.rectangular.com/
 
 void
 Scorer_collect(Scorer *self, HitCollector *hc, u32_t start, u32_t end,
u32_t hits_per_seg, VArray *seg_starts)
 {
 

[jira] Closed: (LUCENE-851) Pruning

2010-09-23 Thread Marvin Humphrey (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-851?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Marvin Humphrey closed LUCENE-851.
--

Resolution: Duplicate

Yes, LUCENE-2482 introduces the index sorter from Nutch that I referred to in
this issue. The termination mechanism is slightly different
(TimeLimitedCollector vs. X hits per segment), but it's the sorter that really
matters.

I'm closing as Duplicate. Thanks for digging this up!

 Pruning
 ---

 Key: LUCENE-851
 URL: https://issues.apache.org/jira/browse/LUCENE-851
 Project: Lucene - Java
  Issue Type: New Feature
  Components: Index, Search
Reporter: Marvin Humphrey
Priority: Minor

 Greets,
 A thread on java-dev a couple of months ago drew my attention to a technique 
 used by Nutch for cutting down the number of hits that have to be processed:  
 if you have an algorithm for ordering documents by importance, and you sort 
 them so that the lowest document numbers have the highest rank, then most of 
 your high-scoring hits are going to occur early on in the hit-collection 
 process.  Say you're looking for the top 100 matches -- the odds are pretty 
 good that after you've found 1000 hits, you've gotten most of the good stuff. 
  It may not be necessary to score the other e.g. 5,000,000 hits.
 To pull this off in Nutch, they run the index through a post process whereby 
 documents are re-ordered by page score using the IndexSorter class.  
 Unfortunately, post-processing does not live happily with incremental 
 indexing.  
 However, if we ensure that document numbers are ordered according to our 
 criteria within each segment, that's almost as good.
 Say we're looking for 100 hits, as before; what we do is collect a maximum of 
 1000 hits per segment.  If we are dealing with an index made up of 25 
 segments, that's 25,000 hits max we'll have to process fully -- the rest we 
 can skip over.  That's not as quick as only processing 1000 hits then 
 stopping in a fully optimized index, but it's a lot better than churning 
 through all 5,000,000 hits.
 A lot of those hits from the smallest segments will be garbage; we'll get 
 most of our good hits from a few large segments most of the time.  But that's 
 fine -- the cost to process any one segment is small.
 Writing a low-level scoring loop which implements pruning per segment is 
 straightforward.  KinoSearch's version (in C) is below.
 To control the amount of pruning, we need a high-level 
 Searcher.setPruneFactor API, which sets a multiplier; the number of 
 hits-per-segment which must be processed is determined by multiplying the 
 number of hits you need by pruneFactor.  Here's code from KS for deriving 
 hits-per-seg:
 # process prune_factor if supplied
 my $seg_starts;
 my $hits_per_seg = 2**31 - 1;
 if ( defined $self-{prune_factor} and defined $args{num_wanted} ) {
 my $prune_count = $self-{prune_factor} * $args{num_wanted};
 if ( $prune_count  $hits_per_seg ) {# don't exceed I32_MAX
 $hits_per_seg = $prune_count;
 $seg_starts   = $reader-get_seg_starts;
 }
 }
 What I have not yet written is the index-time mechanism for sorting 
 documents.  
 In Nutch, they use the norms from a known indexed, non-tokenized field 
 (site).  However, in Lucene and KS, we can't count on any existing fields.  
 Document boost isn't stored directly, either.  The obvious answer is to start 
 storing it, which would suffice for Nutch-like uses.  However, it may make 
 sense to to avoid coupling document ordering to boost in order to influence 
 pruning without affecting scores.
 The sort ordering information needs a permanent home in the index, since it 
 will be needed whenever segment merging occurs.  The fixed-width per-document 
 storage in Lucene's .fdx file seems like a good place.  If we use one float 
 per document, we can simply put it before or after the 64-bit file pointer 
 and seek into the file after multiplying the doc num by 12 rather than 8.  
 During indexing, we'd keep the ordering info in an array; after all documents 
 for a segment have been added, we create an array of sorted document numbers. 
  When flushing the postings, their document numbers get remapped using the 
 sorted array.  Then we rewrite the .fdx file (and also the .tvx file), moving 
 the file pointers (and ordering info) to remapped locations.  The fact that 
 the .fdt file is now out of order is isn't a problem -- optimizing 
 sequential access to that file isn't important.
 This issue is closely tied to LUCENE-843, Improve how IndexWriter uses RAM 
 to buffer added documents, and LUCENE-847, Factor merge policy out of 
 IndexWriter.  Michael McCandless, Steven Parks, Ning Li, anybody else... 
 comments?  Suggestions?
 Marvin Humphrey
 Rectangular Research
 http://www.rectangular.com/
 

[jira] Updated: (LUCENE-1840) QueryUtils should check that equals properly handles null

2010-09-23 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/LUCENE-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated LUCENE-1840:


Attachment: LUCENE-1840.patch

 QueryUtils should check that equals properly handles null
 -

 Key: LUCENE-1840
 URL: https://issues.apache.org/jira/browse/LUCENE-1840
 Project: Lucene - Java
  Issue Type: Improvement
  Components: Build
Reporter: Mark Miller
Priority: Trivial
 Attachments: LUCENE-1840.patch


 Its part of the equals contract, but many classes currently violate

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1568) Implement Spatial Filter

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914319#action_12914319
 ] 

Yonik Seeley commented on SOLR-1568:


Hmmm, I'm not understanding the purpose of the checks for if we cross the 
equator, and
addEquatorialBoundary().  It looks like it creates two ranges... [min TO 0] and 
[0 to max], where
[min TO max] should always work.  Is there something special about the equator 
I'm missing?

 Implement Spatial Filter
 

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: CartesianTierQParserPlugin.java, 
 SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch


 Given an index with spatial information (either as a geohash, 
 SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
 able to pass in a filter query that takes in the field name, lat, lon and 
 distance and produces an appropriate Filter (i.e. one that is aware of the 
 underlying field type for use by Solr. 
 The interface _could_ look like:
 {code}
 fq={!sfilt dist=20}location:49.32,-79.0
 {code}
 or it could be:
 {code}
 fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt p=49.32,-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1568) Implement Spatial Filter

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914321#action_12914321
 ] 

Yonik Seeley commented on SOLR-1568:


I'm further confused by the following:
{code}
   else if (ll[LONG]  0.0  ur[LONG]  0.0) {//prime meridian (0 degrees
{code}

I don't see what's special about crossing the 0 deg longitude line here.
It seems like the only special-case for longitude is if ll  ur, in which case 
you went over the +-180 line and need two ranges to cover that [ur TO -180] OR 
[ll TO 180] ?


 Implement Spatial Filter
 

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: CartesianTierQParserPlugin.java, 
 SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch


 Given an index with spatial information (either as a geohash, 
 SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
 able to pass in a filter query that takes in the field name, lat, lon and 
 distance and produces an appropriate Filter (i.e. one that is aware of the 
 underlying field type for use by Solr. 
 The interface _could_ look like:
 {code}
 fq={!sfilt dist=20}location:49.32,-79.0
 {code}
 or it could be:
 {code}
 fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt p=49.32,-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1568) Implement Spatial Filter

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914324#action_12914324
 ] 

Yonik Seeley commented on SOLR-1568:


FYI, I'm in the middle of refactoring this code, so please give feedback as 
comments, not patches for now...

 Implement Spatial Filter
 

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: CartesianTierQParserPlugin.java, 
 SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch


 Given an index with spatial information (either as a geohash, 
 SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
 able to pass in a filter query that takes in the field name, lat, lon and 
 distance and produces an appropriate Filter (i.e. one that is aware of the 
 underlying field type for use by Solr. 
 The interface _could_ look like:
 {code}
 fq={!sfilt dist=20}location:49.32,-79.0
 {code}
 or it could be:
 {code}
 fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt p=49.32,-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1568) Implement Spatial Filter

2010-09-23 Thread Chris Male (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914326#action_12914326
 ] 

Chris Male commented on SOLR-1568:
--

Yonik, I recommend you glance over 
[http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates] (which I think 
Bill mentioned earlier).  In addition to proposing quite a nice bounding box 
algorithm, he also discusses how to handle the poles and the 180 degree line.  
I recommend we follow the behaviour that he suggests.

 Implement Spatial Filter
 

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: CartesianTierQParserPlugin.java, 
 SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch


 Given an index with spatial information (either as a geohash, 
 SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
 able to pass in a filter query that takes in the field name, lat, lon and 
 distance and produces an appropriate Filter (i.e. one that is aware of the 
 underlying field type for use by Solr. 
 The interface _could_ look like:
 {code}
 fq={!sfilt dist=20}location:49.32,-79.0
 {code}
 or it could be:
 {code}
 fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt p=49.32,-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



Fwd: distributed search on duplicate shards

2010-09-23 Thread mike anderson
Just wanted to poke this since it got buried under a dozen or so Jira
updates. I also sent it to the deprecated list, though I think it should
have forwarded.

-mike

-- Forwarded message --
From: mike anderson saidthero...@gmail.com
Date: Thu, Sep 23, 2010 at 7:06 PM
Subject: distributed search on duplicate shards
To: solr-...@lucene.apache.org


Hi all,

My company is currently running a distributed Solr cluster with about 15
shards. We occasionally find that one shard will be relatively slow and thus
hold up the entire response. To remedy this we thought it might be useful to
have a system such that:

1. We can duplicate each shard, and thus have sets of shards, each with
the same index
2. We can pass in these sets of shards along with the query (for instance,
if ! is the delimiter, shards=solr1a!solr1b,solr2a!solr2b)
3. The request goes out to /all/ shards (unlike load balancing in Solr
Cloud)
4. The first shard from a set (solr1a, solr1b) to successfully return is
honored, and the other requests (solr1b, if solr1a responds first, for
instance) are removed/ignored
5. The response is completed and returned as soon as one shard from each set
responds


I've written a patch to accomplish this, but have a few questions

1. What are the known disadvantages to such a strategy? (we've thought of a
few, like sets being out of sync, but they don't bother us too much)
2. What would this type of a feature be called? This way I can open a Jira
ticket for it
3. Is there a preferred way to do this? My current patch (wich I can post
soon) works in the HTTPClient portion of SearchHandler. I keep a hash map of
the shard sets and cancel the FutureShardResponse's in the corresponding
set when each response comes back.

Thanks in advance,
Mike

P.S I'd like to write a test for this feature but it wasn't clear from the
distributed test how to do so. Could somebody point me in the right
direction (an existing test, perhaps) for how to accomplish this?


Re: bulk change 'don't email'

2010-09-23 Thread Ryan McKinley
On Thu, Sep 23, 2010 at 8:17 PM, Robert Muir rcm...@gmail.com wrote:
 didnt there used to be an option in jira not to email for bulk changes?
 it seems to be gone... i was trying to clean up jira some.
 sorry for the noise


nonsense!  I think we all appreciate your recent efforts!

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



RE: bulk change 'don't email'

2010-09-23 Thread Uwe Schindler
As far as I know, if there is one issue in the bulk that has status “closed”, 
your possibilities are limited. Maybe that happened here. The last time I 
released Lucene (June), it was all possible without sending emails!

 

Uwe

 

-

Uwe Schindler

H.-H.-Meier-Allee 63, D-28213 Bremen

 http://www.thetaphi.de/ http://www.thetaphi.de

eMail: u...@thetaphi.de

 

From: Robert Muir [mailto:rcm...@gmail.com] 
Sent: Thursday, September 23, 2010 5:17 PM
To: dev@lucene.apache.org
Subject: bulk change 'don't email'

 

didnt there used to be an option in jira not to email for bulk changes?

 

it seems to be gone... i was trying to clean up jira some.

 

sorry for the noise

-- 
Robert Muir
rcm...@gmail.com



Hudson build is back to normal : Solr-3.x #112

2010-09-23 Thread Apache Hudson Server
See https://hudson.apache.org/hudson/job/Solr-3.x/112/changes



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Updated: (SOLR-1568) Implement Spatial Filter

2010-09-23 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1568:
---

Attachment: SOLR-1568.patch

OK, so I didn't touch the current math, but I heavily re-factored (and fixed I 
hope) the logic.

It breaks down like this:  we always have exactly one latitude constraint, and 
between 0 and 2 longitude constraints.  (0 when doing a polar cap, 1 normal, 2 
when we cross the +-180 long line)

The part of the code that calculates the ranges and builds the query is now 
much shorter (ignore the other work in progress, SpatialDistanceQuery, for now).

 Implement Spatial Filter
 

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: CartesianTierQParserPlugin.java, 
 SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch


 Given an index with spatial information (either as a geohash, 
 SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
 able to pass in a filter query that takes in the field name, lat, lon and 
 distance and produces an appropriate Filter (i.e. one that is aware of the 
 underlying field type for use by Solr. 
 The interface _could_ look like:
 {code}
 fq={!sfilt dist=20}location:49.32,-79.0
 {code}
 or it could be:
 {code}
 fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt p=49.32,-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] Commented: (SOLR-1568) Implement Spatial Filter

2010-09-23 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12914340#action_12914340
 ] 

Yonik Seeley commented on SOLR-1568:


bq. I recommend we follow the behaviour that he suggests.

Looks like a really nice page.  I don't have time to figure out and redo any 
math right now, but I hope in the future that we can have random tests to 
verify this stuff.

 Implement Spatial Filter
 

 Key: SOLR-1568
 URL: https://issues.apache.org/jira/browse/SOLR-1568
 Project: Solr
  Issue Type: New Feature
Reporter: Grant Ingersoll
Assignee: Grant Ingersoll
Priority: Minor
 Fix For: 3.1, 4.0

 Attachments: CartesianTierQParserPlugin.java, 
 SOLR-1568.Mattmann.031010.patch.txt, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, 
 SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch, SOLR-1568.patch


 Given an index with spatial information (either as a geohash, 
 SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be 
 able to pass in a filter query that takes in the field name, lat, lon and 
 distance and produces an appropriate Filter (i.e. one that is aware of the 
 underlying field type for use by Solr. 
 The interface _could_ look like:
 {code}
 fq={!sfilt dist=20}location:49.32,-79.0
 {code}
 or it could be:
 {code}
 fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt p=49.32,-79.0 f=location dist=20}
 {code}
 or:
 {code}
 fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org