[jira] Commented: (SOLR-64) strict hierarchical facets

2009-10-16 Thread JIRA

[ 
https://issues.apache.org/jira/browse/SOLR-64?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766443#action_12766443
 ] 

Anıl Çetin commented on SOLR-64:


Sorry I couldn't explain it I think;

for: 
a_level1/a_level2/a_level3/a_level4
and
b_level1/b_level2/b_level3/b_level4

result:
a_level1
-- a_level1/a_level2
-- a_level1/a_level2/a_level3
--- a_level1/a_level2/a_level3/level4

b_level1/a_level1/a_level2/a_level3/level4
-- b_level1 / b_level2 / a_level1/a_level2/a_level3/level4
-- b_level1/b_level2/b_level3/ a_level1/a_level2/a_level3/level4
--- 
b_level1/b_level2/b_level3/b_level4/a_level1/a_level2/a_level3/level4


Actually it mixes the fields. Also, in some indexes (I don't know what 
causes/why) it gives error

HTTP Status 500 - null java.lang.StackOverflowError at 
java.nio.DirectByteBuffer.get(DirectByteBuffer.java:242) at 
java.nio.HeapByteBuffer.put(HeapByteBuffer.java:209) at 
sun.nio.ch.IOUtil.read(IOUtil.java:227) at 
sun.nio.ch.FileChannelImpl.read(FileChannelImpl.java:663) at 
org.apache.lucene.store.NIOFSDirectory$NIOFSIndexInput.readInternal(NIOFSDirectory.java:161)
 at 
org.apache.lucene.store.BufferedIndexInput.refill(BufferedIndexInput.java:157) 
at 
org.apache.lucene.store.BufferedIndexInput.readByte(BufferedIndexInput.java:38) 
at org.apache.lucene.store.IndexInput.readVInt(IndexInput.java:80) at 
org.apache.lucene.index.SegmentTermDocs.readNoTf(SegmentTermDocs.java:166) at 
org.apache.lucene.index.SegmentTermDocs.read(SegmentTermDocs.java:139) at 
org.apache.solr.search.SolrIndexSearcher.getDocSetNC(SolrIndexSearcher.java:685)
 at 
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:595)
 at 
org.apache.solr.search.SolrIndexSearcher.getPositiveDocSet(SolrIndexSearcher.java:585)
 at 
org.apache.solr.search.SolrIndexSearcher.numDocs(SolrIndexSearcher.java:1514) 
at 
org.apache.solr.request.SimpleFacets.getHierarchicalFacetCounts(SimpleFacets.java:619)
 at 
org.apache.solr.request.SimpleFacets.getHierarchicalFacetCounts(SimpleFacets.java:641)
 at 
...
..
org.apache.solr.request.SimpleFacets.getHierarchicalFacetCounts(SimpleFacets.java:641)
 


 strict hierarchical facets
 --

 Key: SOLR-64
 URL: https://issues.apache.org/jira/browse/SOLR-64
 Project: Solr
  Issue Type: New Feature
  Components: search
Reporter: Yonik Seeley
 Fix For: 1.5

 Attachments: SOLR-64.patch, SOLR-64.patch


 Strict Facet Hierarchies... each tag has at most one parent (a tree).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1431) CommComponent abstracted

2009-10-16 Thread Noble Paul (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1431?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Noble Paul updated SOLR-1431:
-

Attachment: SOLR-1431.patch

 CommComponent abstracted
 

 Key: SOLR-1431
 URL: https://issues.apache.org/jira/browse/SOLR-1431
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Assignee: Noble Paul
Priority: Trivial
 Fix For: 1.5

 Attachments: SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch, 
 SOLR-1431.patch, SOLR-1431.patch, SOLR-1431.patch

   Original Estimate: 24h
  Remaining Estimate: 24h

 We'll abstract CommComponent in this issue.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Build failed in Hudson: Solr-trunk #957

2009-10-16 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/957/changes

Changes:

[ryan] SOLR-1512 -- point luke launcher to 0.9.9

--
[...truncated 2216 lines...]
[junit] Running org.apache.solr.client.solrj.SolrExceptionTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 2.523 sec
[junit] Running org.apache.solr.client.solrj.SolrQueryTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.381 sec
[junit] Running org.apache.solr.client.solrj.TestBatchUpdate
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 22.664 sec
[junit] Running org.apache.solr.client.solrj.TestLBHttpSolrServer
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 21.242 sec
[junit] Running org.apache.solr.client.solrj.beans.TestDocumentObjectBinder
[junit] Tests run: 4, Failures: 0, Errors: 0, Time elapsed: 1.004 sec
[junit] Running org.apache.solr.client.solrj.embedded.JettyWebappTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 10.548 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeBinaryJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 9.76 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.LargeVolumeEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.988 sec
[junit] Running org.apache.solr.client.solrj.embedded.LargeVolumeJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 16.476 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MergeIndexesEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 4.612 sec
[junit] Running org.apache.solr.client.solrj.embedded.MultiCoreEmbeddedTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 3.86 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.MultiCoreExampleJettyTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 12.701 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 34.867 sec
[junit] Running org.apache.solr.client.solrj.embedded.SolrExampleJettyTest
[junit] Tests run: 10, Failures: 0, Errors: 0, Time elapsed: 49.322 sec
[junit] Running 
org.apache.solr.client.solrj.embedded.SolrExampleStreamingTest
[junit] Tests run: 9, Failures: 0, Errors: 0, Time elapsed: 59.263 sec
[junit] Running org.apache.solr.client.solrj.embedded.TestSolrProperties
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 8.779 sec
[junit] Running org.apache.solr.client.solrj.request.TestUpdateRequestCodec
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.097 sec
[junit] Running 
org.apache.solr.client.solrj.response.AnlysisResponseBaseTest
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 0.558 sec
[junit] Running 
org.apache.solr.client.solrj.response.DocumentAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 1.089 sec
[junit] Running 
org.apache.solr.client.solrj.response.FieldAnalysisResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.467 sec
[junit] Running org.apache.solr.client.solrj.response.QueryResponseTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.6 sec
[junit] Running org.apache.solr.client.solrj.response.TestSpellCheckResponse
[junit] Tests run: 2, Failures: 0, Errors: 0, Time elapsed: 18.451 sec
[junit] Running org.apache.solr.client.solrj.util.ClientUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.419 sec
[junit] Running org.apache.solr.common.SolrDocumentTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.428 sec
[junit] Running org.apache.solr.common.params.ModifiableSolrParamsTest
[junit] Tests run: 5, Failures: 0, Errors: 0, Time elapsed: 0.387 sec
[junit] Running org.apache.solr.common.params.SolrParamTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.471 sec
[junit] Running org.apache.solr.common.util.ContentStreamTest
[junit] Tests run: 3, Failures: 0, Errors: 0, Time elapsed: 0.538 sec
[junit] Running org.apache.solr.common.util.DOMUtilTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.506 sec
[junit] Running org.apache.solr.common.util.FileUtilsTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.532 sec
[junit] Running org.apache.solr.common.util.IteratorChainTest
[junit] Tests run: 7, Failures: 0, Errors: 0, Time elapsed: 0.409 sec
[junit] Running org.apache.solr.common.util.NamedListTest
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.419 sec
[junit] Running org.apache.solr.common.util.TestFastInputStream
[junit] Tests run: 1, Failures: 0, Errors: 0, Time elapsed: 0.377 sec
[junit] Running 

DIH wiki page reverted

2009-10-16 Thread Noble Paul നോബിള്‍ नोब्ळ्
I have reverted the DIH wiki page to revision 212. see this
https://issues.apache.org/jira/browse/INFRA-2270

the wiki has not sent any mail yet

So all the changes which were made after 212 is lost. Please go
through the page and check if your changes are lost.



-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


[jira] Created: (SOLR-1514) Facet search results contain 0:0 entries although '0' values were not indexed.

2009-10-16 Thread Renata Perkowska (JIRA)
Facet search results contain 0:0 entries although '0' values were not indexed.
--

 Key: SOLR-1514
 URL: https://issues.apache.org/jira/browse/SOLR-1514
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
 Environment: Solr is on: Linux  2.6.18-92.1.13.el5xen
Reporter: Renata Perkowska


Hi,
in my Jmeter  ATs  I can see that under some circumstances facet search results 
contain '0' both as keys
and values for the integer field called 'year' although I never index zeros. 

When I do a normal search, I don't see any indexed fields with zeros. 

When I run my facet test (using JMeter) in isolation, everything works fine. It 
happens only when it's being run after other tests
(and other indexing/deleting). On the other hand it shouldn't be the case that 
other indexing are influencing this test, as at the end of each test I'm 
deleting
indexed documents so before running the facet test an index is empty.

My facet test looks as follows:
 1. Index group of documents
 2. Perform search on facets
 3. Remove documents from the index.

The results that I'm getting for an integer field 'year':

 1990:4
 1995:4
 0:0
 1991:0
 1992:0
 1993:0
 1994:0
 1996:0
 1997:0
 1998:0

I'm indexing only values 1990-1999, so there certainly shouldn't be any '0'  as 
keys in the result set.

The indexed is being optimized not after each document deletion from and index, 
but only when an index is loaded/unloaded, so the optimization won't solve the 
problem in this case. 
If the facet.mincount is provided, then  I'm not getting 0:0, but other entries 
with '0' values are gone as well:

1990:4
1995:4


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1514) Facet search results contain 0:0 entries although '0' values were not indexed.

2009-10-16 Thread Renata Perkowska (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renata Perkowska updated SOLR-1514:
---

Description: 
Hi,
in my Jmeter  ATs  I can see that under some circumstances facet search results 
contain '0' both as keys
and values for the integer field called 'year' although I never index zeros. 

When I do a normal search, I don't see any indexed fields with zeros. 

When I run my facet test (using JMeter) in isolation, everything works fine. It 
happens only when it's being run after other tests
(and other indexing/deleting). On the other hand it shouldn't be the case that 
other indexing are influencing this test, as at the end of each test I'm 
deleting
indexed documents so before running the facet test an index is empty.

My facet test looks as follows:
 1. Index group of documents
 2. Perform search on facets
 3. Remove documents from the index.

The results that I'm getting for an integer field 'year':

 1990:4
 1995:4
 0:0
 1991:0
 1992:0
 1993:0
 1994:0
 1996:0
 1997:0
 1998:0

I'm indexing only values 1990-1999, so there certainly shouldn't be any '0'  as 
keys in the result set.

The indexed is being optimized not after each document deletion from and index, 
but only when an index is loaded/unloaded, so the optimization won't solve the 
problem in this case. 
If the facet.mincount0 is provided, then  I'm not getting 0:0, but other 
entries with '0' values are gone as well:

1990:4
1995:4


  was:
Hi,
in my Jmeter  ATs  I can see that under some circumstances facet search results 
contain '0' both as keys
and values for the integer field called 'year' although I never index zeros. 

When I do a normal search, I don't see any indexed fields with zeros. 

When I run my facet test (using JMeter) in isolation, everything works fine. It 
happens only when it's being run after other tests
(and other indexing/deleting). On the other hand it shouldn't be the case that 
other indexing are influencing this test, as at the end of each test I'm 
deleting
indexed documents so before running the facet test an index is empty.

My facet test looks as follows:
 1. Index group of documents
 2. Perform search on facets
 3. Remove documents from the index.

The results that I'm getting for an integer field 'year':

 1990:4
 1995:4
 0:0
 1991:0
 1992:0
 1993:0
 1994:0
 1996:0
 1997:0
 1998:0

I'm indexing only values 1990-1999, so there certainly shouldn't be any '0'  as 
keys in the result set.

The indexed is being optimized not after each document deletion from and index, 
but only when an index is loaded/unloaded, so the optimization won't solve the 
problem in this case. 
If the facet.mincount is provided, then  I'm not getting 0:0, but other entries 
with '0' values are gone as well:

1990:4
1995:4



 Facet search results contain 0:0 entries although '0' values were not indexed.
 --

 Key: SOLR-1514
 URL: https://issues.apache.org/jira/browse/SOLR-1514
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
 Environment: Solr is on: Linux  2.6.18-92.1.13.el5xen
Reporter: Renata Perkowska

 Hi,
 in my Jmeter  ATs  I can see that under some circumstances facet search 
 results contain '0' both as keys
 and values for the integer field called 'year' although I never index zeros. 
 When I do a normal search, I don't see any indexed fields with zeros. 
 When I run my facet test (using JMeter) in isolation, everything works fine. 
 It happens only when it's being run after other tests
 (and other indexing/deleting). On the other hand it shouldn't be the case 
 that other indexing are influencing this test, as at the end of each test I'm 
 deleting
 indexed documents so before running the facet test an index is empty.
 My facet test looks as follows:
  1. Index group of documents
  2. Perform search on facets
  3. Remove documents from the index.
 The results that I'm getting for an integer field 'year':
  1990:4
  1995:4
  0:0
  1991:0
  1992:0
  1993:0
  1994:0
  1996:0
  1997:0
  1998:0
 I'm indexing only values 1990-1999, so there certainly shouldn't be any '0'  
 as keys in the result set.
 The indexed is being optimized not after each document deletion from and 
 index, but only when an index is loaded/unloaded, so the optimization won't 
 solve the problem in this case. 
 If the facet.mincount0 is provided, then  I'm not getting 0:0, but other 
 entries with '0' values are gone as well:
 1990:4
 1995:4

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1514) Facet search results contain 0:0 entries although '0' values were not indexed.

2009-10-16 Thread Renata Perkowska (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Renata Perkowska updated SOLR-1514:
---

Description: 
Hi,
in my Jmeter  ATs  I can see that under some circumstances facet search results 
contain '0' both as keys
and values for the integer field called 'year' although I never index zeros. 

When I do a normal search, I don't see any indexed fields with zeros. 

When I run my facet test (using JMeter) in isolation, everything works fine. It 
happens only when it's being run after other tests
(and other indexing/deleting). On the other hand it shouldn't be the case that 
other indexing are influencing this test, as at the end of each test I'm 
deleting
indexed documents so before running the facet test an index is empty.

My facet test looks as follows:
 1. Index group of documents
 2. Perform search on facets
 3. Remove documents from the index.

The results that I'm getting for an integer field 'year':

 1990:4
 1995:4
 0:0
 1991:0
 1992:0
 1993:0
 1994:0
 1996:0
 1997:0
 1998:0

I'm indexing only values 1990-1999, so there certainly shouldn't be any '0'  as 
keys in the result set.

The indexed is being optimized not after each document deletion from and index, 
but only when an index is loaded/unloaded, so the optimization won't solve the 
problem in this case. 
If the facet.mincount0 is provided, then  I'm not getting 0:0, but other 
entries with '0' values are gone as well:

1990:4
1995:4

I'm also indexing text fields, but I don't see a similar situation in this 
case. This bug only happens for integer fields.


  was:
Hi,
in my Jmeter  ATs  I can see that under some circumstances facet search results 
contain '0' both as keys
and values for the integer field called 'year' although I never index zeros. 

When I do a normal search, I don't see any indexed fields with zeros. 

When I run my facet test (using JMeter) in isolation, everything works fine. It 
happens only when it's being run after other tests
(and other indexing/deleting). On the other hand it shouldn't be the case that 
other indexing are influencing this test, as at the end of each test I'm 
deleting
indexed documents so before running the facet test an index is empty.

My facet test looks as follows:
 1. Index group of documents
 2. Perform search on facets
 3. Remove documents from the index.

The results that I'm getting for an integer field 'year':

 1990:4
 1995:4
 0:0
 1991:0
 1992:0
 1993:0
 1994:0
 1996:0
 1997:0
 1998:0

I'm indexing only values 1990-1999, so there certainly shouldn't be any '0'  as 
keys in the result set.

The indexed is being optimized not after each document deletion from and index, 
but only when an index is loaded/unloaded, so the optimization won't solve the 
problem in this case. 
If the facet.mincount0 is provided, then  I'm not getting 0:0, but other 
entries with '0' values are gone as well:

1990:4
1995:4



 Facet search results contain 0:0 entries although '0' values were not indexed.
 --

 Key: SOLR-1514
 URL: https://issues.apache.org/jira/browse/SOLR-1514
 Project: Solr
  Issue Type: Bug
  Components: search
Affects Versions: 1.3
 Environment: Solr is on: Linux  2.6.18-92.1.13.el5xen
Reporter: Renata Perkowska

 Hi,
 in my Jmeter  ATs  I can see that under some circumstances facet search 
 results contain '0' both as keys
 and values for the integer field called 'year' although I never index zeros. 
 When I do a normal search, I don't see any indexed fields with zeros. 
 When I run my facet test (using JMeter) in isolation, everything works fine. 
 It happens only when it's being run after other tests
 (and other indexing/deleting). On the other hand it shouldn't be the case 
 that other indexing are influencing this test, as at the end of each test I'm 
 deleting
 indexed documents so before running the facet test an index is empty.
 My facet test looks as follows:
  1. Index group of documents
  2. Perform search on facets
  3. Remove documents from the index.
 The results that I'm getting for an integer field 'year':
  1990:4
  1995:4
  0:0
  1991:0
  1992:0
  1993:0
  1994:0
  1996:0
  1997:0
  1998:0
 I'm indexing only values 1990-1999, so there certainly shouldn't be any '0'  
 as keys in the result set.
 The indexed is being optimized not after each document deletion from and 
 index, but only when an index is loaded/unloaded, so the optimization won't 
 solve the problem in this case. 
 If the facet.mincount0 is provided, then  I'm not getting 0:0, but other 
 entries with '0' values are gone as well:
 1990:4
 1995:4
 I'm also indexing text fields, but I don't see a similar situation in this 
 case. This bug only happens for integer fields.

-- 
This message is automatically generated by JIRA.
-
You can reply to this 

[jira] Commented: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-16 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766641#action_12766641
 ] 

Jason Rutherglen commented on SOLR-1513:


Noble, before implementing, I was wondering if there's performance testing code 
for ConcurrentLRUCache in case Google Col somehow slows things down?

 Use Google Collections in ConcurrentLRUCache
 

 Key: SOLR-1513
 URL: https://issues.apache.org/jira/browse/SOLR-1513
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5


 ConcurrentHashMap is used in ConcurrentLRUCache.  The Google Colletions 
 concurrent map implementation allows for soft values that are great for 
 caches that potentially exceed the allocated heap.  Though I suppose Solr 
 caches usually don't use too much RAM?
 http://code.google.com/p/google-collections/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1513) Use Google Collections in ConcurrentLRUCache

2009-10-16 Thread Jason Rutherglen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Rutherglen updated SOLR-1513:
---

Attachment: google-collect-snapshot.jar
SOLR-1513.patch

Here's a basic implementation, it needs testing for performance
and what happens if a value is removed before a key (in which
case the map could return null?). There are a number of
configurable params so we'll add those as options for solrconfig.



 Use Google Collections in ConcurrentLRUCache
 

 Key: SOLR-1513
 URL: https://issues.apache.org/jira/browse/SOLR-1513
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
Reporter: Jason Rutherglen
Priority: Minor
 Fix For: 1.5

 Attachments: google-collect-snapshot.jar, SOLR-1513.patch


 ConcurrentHashMap is used in ConcurrentLRUCache.  The Google Colletions 
 concurrent map implementation allows for soft values that are great for 
 caches that potentially exceed the allocated heap.  Though I suppose Solr 
 caches usually don't use too much RAM?
 http://code.google.com/p/google-collections/

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1511) Problems in feeding XPathEntityProcessor with FieldReaderDataSource

2009-10-16 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766687#action_12766687
 ] 

Shalin Shekhar Mangar commented on SOLR-1511:
-

bq. DataImporter#loadDataConfig should be made private and all of the unit 
tests changed to use DataImporter#loadAndInit.

Done. Thanks!

Committed revision 826074.

 Problems in feeding XPathEntityProcessor with FieldReaderDataSource
 ---

 Key: SOLR-1511
 URL: https://issues.apache.org/jira/browse/SOLR-1511
 Project: Solr
  Issue Type: Bug
  Components: contrib - DataImportHandler
Reporter: Shalin Shekhar Mangar
Priority: Minor
 Fix For: 1.4

 Attachments: SOLR-1511.patch


 Reported by Lance on solr-user
 http://www.lucidimagination.com/search/document/e6a13b612b969143

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-201) bad exceptions when some options aren't in solrconfig.xml

2009-10-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-201.


   Resolution: Fixed
Fix Version/s: 1.4

The HashDocSet maxSize is now 3000 by default instead of -1.

I've verified that Solr starts up fine without errors with a blank 
solrconfig.xml (just config/config). I'm not sure when this was fixed so 
I'm marking it as 1.4

 bad exceptions when some options aren't in solrconfig.xml
 -

 Key: SOLR-201
 URL: https://issues.apache.org/jira/browse/SOLR-201
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Fix For: 1.4


 someone at work migrating from a pre-apache version of solr to the latest 
 release ran into a nasty NegativeArraySizeException
 today today because of these lines in DocSetHitCollector...
   static int HASHDOCSET_MAXSIZE= 
 SolrConfig.config.getInt(//HashDocSet/@maxSize,-1);
   final int[] scratch = new int[HASHDOCSET_MAXSIZE];
 (apparently in the version she was migrating from, not having that option set 
 worked fine, i'm guessing it just never used HashDocSets)
 we should fix that so it either has a sensible default, or at the very least 
 throws a usefull error message .. it's probably worth auditing all of the 
 config options as well.  
 (i'm imagining a unit test using a solrconfig.xml file that is completely 
 blank, and verifies that Solr starts up fine without exceptions, logs a bunch 
 of warnings, and then sits there unable to do anything (since no request 
 handlers are registered)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-416) need to audit all methods that might be using default Locale

2009-10-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-416:
---

Fix Version/s: 1.5

 need to audit all methods that might be using default Locale
 

 Key: SOLR-416
 URL: https://issues.apache.org/jira/browse/SOLR-416
 Project: Solr
  Issue Type: Bug
Reporter: Hoss Man
 Fix For: 1.5


 As discussed on the mailing list, there are places in Solr where java methods 
 that rely on the default locale are used to copare input with constants ... 
 the specific use case that prompted this bug being string comparison after 
 calling toUpperCase() ... this won't do what it should in some Locales...
 http://www.nabble.com/Invalid-value-%27explicit%27-for-echoParams-parameter-tf4837914.html
 we should audit the code as much as possible and try to replace these use 
 cases in a way that will work for everyone

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1073) StrField should allow locale sensitive sorting

2009-10-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-1073:


Fix Version/s: 1.5

 StrField should allow locale sensitive sorting
 --

 Key: SOLR-1073
 URL: https://issues.apache.org/jira/browse/SOLR-1073
 Project: Solr
  Issue Type: Improvement
 Environment: All
Reporter: Sachin
 Fix For: 1.5

 Attachments: LocaleStrField.java


 Currently, StrField does not take a parameter which it can pass to ctor of 
 SortField making the StrField's sorting rely on the locale of the JVM.  
 Ideally, StrField should allow setting the locale in the schema.xml and use 
 it to create a new instance of the SortField in getSortField() method, 
 something like:
 snip:
   public SortField getSortField(SchemaField field,boolean reverse)
   {
 ...
   Locale locale = new Locale(lang,country);
   return new SortField(field.getName(), locale, reverse);
  }
 More details about this issue here:
 http://www.nabble.com/CJKAnalyzer-and-Chinese-Text-sort-td22374195.html

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-53) Allow symbolic links and rsync over ssh in snap scripts

2009-10-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-53?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar resolved SOLR-53.
---

Resolution: Won't Fix

Reading the comments it seems we are not able to support this across platforms. 
The Java based replication should be able to work with a symbolic link to the 
index directory. Closing this issue for now, we can open this again if needed.

 Allow symbolic links and rsync over ssh in snap scripts
 ---

 Key: SOLR-53
 URL: https://issues.apache.org/jira/browse/SOLR-53
 Project: Solr
  Issue Type: Improvement
  Components: update
Reporter: Lee Marlow
 Attachments: symbolic_links_and_rsync_over_ssh.diff


 Our index directories are symbolic links to a shared location because we use 
 capistrano to deploy our Ruby on Rails application.  This caused problems 
 with snappuller and snapinstaller, so I added the -L option to the find 
 command to make it work.
 I also modified snappuller to rsync over ssh, which means there is no need to 
 run the rsync daemon - one less service to run.
 I will attach the patch.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-656) better error message when data/index is completely empty

2009-10-16 Thread Shalin Shekhar Mangar (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shalin Shekhar Mangar updated SOLR-656:
---

Fix Version/s: 1.5

 better error message when data/index is completely empty
 --

 Key: SOLR-656
 URL: https://issues.apache.org/jira/browse/SOLR-656
 Project: Solr
  Issue Type: Wish
Reporter: Hoss Man
 Fix For: 1.5


 Solr's normal behavior is to create an index dire in the dataDir if one 
 does not already exist, but if index does exist it is used as is, warts and 
 all ... if the index is corrupt in some way, and Solr can't create an 
 IndexWriter or IndexReader that error is propagated up to the user.
 I don't think this should change: Solr shouldn't attempt to do anything 
 special if there is a low level problem with the index, but something that 
 i've seen happen more then a few times is that people unwittingly rm 
 index/* when they should run -r index and as a result Solr+Lucene gives 
 them an error instead of just giving them an empty index
 when checking if an existing index dir exists, it would probably be worth 
 while to add a little one line sanity test that it contains some files, and 
 log a warning.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1394) HTML stripper is splitting tokens

2009-10-16 Thread Anders Melchiorsen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766735#action_12766735
 ] 

Anders Melchiorsen commented on SOLR-1394:
--

Thanks, that sounds great.

There is an existing off-by-one error in the numWhitespace calculation with 
hexadecimal numeric entities.

I noticed that while reworking the patch, but did not bother to report it in 
here because I was annoyed from being ignored. Now you got me in a better mood, 
so I can fix that error if you like?


 HTML stripper is splitting tokens
 -

 Key: SOLR-1394
 URL: https://issues.apache.org/jira/browse/SOLR-1394
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
Reporter: Anders Melchiorsen
 Attachments: SOLR-1394.patch, SOLR-1394.patch


 The Solr HTML stripper is replacing any removed HTML with whitespace. This is 
 to keep offsets correct for highlighting.
 However, as was already pointed out in SOLR-42, this means that any token 
 containing an HTML entity will be split into several tokens. That makes the 
 HTML stripper completely unreliable for international text (and any text is 
 potentially interantional).
 The current code is actually deficient for BOTH highlighting and indexing, 
 where the previous incarnation (that did not insert spaces) only had problems 
 with highlighting.
 The only workaround is to not use entities at all, which is impossible in 
 some situations and inconvenient in most situations. If the client is 
 required to transform entities before handing it to Solr, it might as well be 
 required to also strip tags, and then the HTML stripper would not be needed 
 at all.
 Today, we have a better solution that can be used: offset correction. We can 
 then avoid inserting extra whitespace, but still get correct offsets. The 
 attached patch implements just that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1394) HTML stripper is splitting tokens

2009-10-16 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley resolved SOLR-1394.


   Resolution: Fixed
Fix Version/s: 1.4

Committed.  Thanks Anders!

 HTML stripper is splitting tokens
 -

 Key: SOLR-1394
 URL: https://issues.apache.org/jira/browse/SOLR-1394
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
Reporter: Anders Melchiorsen
 Fix For: 1.4

 Attachments: SOLR-1394.patch, SOLR-1394.patch


 The Solr HTML stripper is replacing any removed HTML with whitespace. This is 
 to keep offsets correct for highlighting.
 However, as was already pointed out in SOLR-42, this means that any token 
 containing an HTML entity will be split into several tokens. That makes the 
 HTML stripper completely unreliable for international text (and any text is 
 potentially interantional).
 The current code is actually deficient for BOTH highlighting and indexing, 
 where the previous incarnation (that did not insert spaces) only had problems 
 with highlighting.
 The only workaround is to not use entities at all, which is impossible in 
 some situations and inconvenient in most situations. If the client is 
 required to transform entities before handing it to Solr, it might as well be 
 required to also strip tags, and then the HTML stripper would not be needed 
 at all.
 Today, we have a better solution that can be used: offset correction. We can 
 then avoid inserting extra whitespace, but still get correct offsets. The 
 attached patch implements just that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1394) HTML stripper is splitting tokens

2009-10-16 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766739#action_12766739
 ] 

Yonik Seeley commented on SOLR-1394:


bq. Now you got me in a better mood, so I can fix that error if you like?

Didn't see your message before the commit - yes, a patch would be great! Could 
possibly make it into 1.4 still if you're quick :-)

 HTML stripper is splitting tokens
 -

 Key: SOLR-1394
 URL: https://issues.apache.org/jira/browse/SOLR-1394
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
Reporter: Anders Melchiorsen
 Fix For: 1.4

 Attachments: SOLR-1394.patch, SOLR-1394.patch


 The Solr HTML stripper is replacing any removed HTML with whitespace. This is 
 to keep offsets correct for highlighting.
 However, as was already pointed out in SOLR-42, this means that any token 
 containing an HTML entity will be split into several tokens. That makes the 
 HTML stripper completely unreliable for international text (and any text is 
 potentially interantional).
 The current code is actually deficient for BOTH highlighting and indexing, 
 where the previous incarnation (that did not insert spaces) only had problems 
 with highlighting.
 The only workaround is to not use entities at all, which is impossible in 
 some situations and inconvenient in most situations. If the client is 
 required to transform entities before handing it to Solr, it might as well be 
 required to also strip tags, and then the HTML stripper would not be needed 
 at all.
 Today, we have a better solution that can be used: offset correction. We can 
 then avoid inserting extra whitespace, but still get correct offsets. The 
 attached patch implements just that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1394) HTML stripper is splitting tokens

2009-10-16 Thread Anders Melchiorsen (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anders Melchiorsen updated SOLR-1394:
-

Attachment: hex-entity.patch

I did not even test that this patch compiles, but it should show what I had in 
mind.

I will not have time to work on this during the weekend.


 HTML stripper is splitting tokens
 -

 Key: SOLR-1394
 URL: https://issues.apache.org/jira/browse/SOLR-1394
 Project: Solr
  Issue Type: Bug
  Components: Analysis
Affects Versions: 1.4
Reporter: Anders Melchiorsen
 Fix For: 1.4

 Attachments: hex-entity.patch, SOLR-1394.patch, SOLR-1394.patch


 The Solr HTML stripper is replacing any removed HTML with whitespace. This is 
 to keep offsets correct for highlighting.
 However, as was already pointed out in SOLR-42, this means that any token 
 containing an HTML entity will be split into several tokens. That makes the 
 HTML stripper completely unreliable for international text (and any text is 
 potentially interantional).
 The current code is actually deficient for BOTH highlighting and indexing, 
 where the previous incarnation (that did not insert spaces) only had problems 
 with highlighting.
 The only workaround is to not use entities at all, which is impossible in 
 some situations and inconvenient in most situations. If the client is 
 required to transform entities before handing it to Solr, it might as well be 
 required to also strip tags, and then the HTML stripper would not be needed 
 at all.
 Today, we have a better solution that can be used: offset correction. We can 
 then avoid inserting extra whitespace, but still get correct offsets. The 
 attached patch implements just that.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1510) EmbeddedSolrServer should support multiple cores

2009-10-16 Thread Lance Norskog (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12766823#action_12766823
 ] 

Lance Norskog commented on SOLR-1510:
-

What about having solr:// and solr://core URI formats? This would 
correspond to an embedded server.

There could be one master SolrJ connection factory that takes the solr:// or 
http://.; URIs.

 EmbeddedSolrServer should support multiple cores
 

 Key: SOLR-1510
 URL: https://issues.apache.org/jira/browse/SOLR-1510
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
Priority: Minor
 Fix For: 1.5


 currently , EmbeddedSolrServer can be started only with single core. This 
 restriction should be removed

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.