[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

2010-01-07 Thread Noble Paul (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797572#action_12797572
 ] 

Noble Paul commented on SOLR-1602:
--

bq.but there have also been some threads out there in the past pointing out 
that using FQNs can speed up core initialization 

This is resolved SOLR-921

 Refactor SOLR package structure to include o.a.solr.response and move 
 QueryResponseWriters in there
 ---

 Key: SOLR-1602
 URL: https://issues.apache.org/jira/browse/SOLR-1602
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.2, 1.3, 1.4
 Environment: independent of environment (code structure)
Reporter: Chris A. Mattmann
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1602.Mattmann.112509.patch.txt, 
 SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config


 Currently all o.a.solr.request.QueryResponseWriter implementations are 
 curiously located in the o.a.solr.request package. Not only is this package 
 getting big (30+ classes), a lot of them are misplaced. There should be a 
 first-class o.a.solr.response package, and the response related classes 
 should be given a home there. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

2010-01-07 Thread Paul taylor (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797601#action_12797601
 ] 

Paul taylor commented on SOLR-1653:
---

Hi, Im using in non Solr in an analyser, and think there maybe a performance 
issue because you cannot pass a compiled Pattern. In the reusableTokenStream() 
method you cannot reset a charfilter like you can a tokenizer so it as to 
recompile the pattern everytime 

i.e. 
 public TokenStream reusableTokenStream(String fieldName, Reader reader) throws 
IOException {
SavedStreams streams = (SavedStreams)getPreviousTokenStream();
if (streams == null) {
streams = new SavedStreams();
setPreviousTokenStream(streams);
streams.tokenStream = new 
StandardTokenizer(Version.LUCENE_CURRENT,new PatternReplaceCharFilter((no\\.) 
([0-9]+),$1$2,reader));
streams.filteredTokenStream = new 
StandardFilter(streams.filteredTokenStream);
streams.filteredTokenStream = new 
AccentFilter(streams.filteredTokenStream);
streams.filteredTokenStream = new 
LowercaseFilter(streams.filteredTokenStream);
}
else {
streams.tokenStream.reset(new PatternReplaceCharFilter((no\\.) 
([0-9]+),$1$2,reader));
}
return streams.filteredTokenStream;
}

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1708) Allowing import / update of a specific document using the data import handler

2010-01-07 Thread Simon Lachinger (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Lachinger updated SOLR-1708:
--

Attachment: 02-single-update.patch

 Allowing import / update of a specific document using the data import handler
 -

 Key: SOLR-1708
 URL: https://issues.apache.org/jira/browse/SOLR-1708
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Simon Lachinger
 Attachments: 02-single-update.patch


 There is the need that changes or new documents need to be added immediately 
 to the Solr Index. This could easily done via the update-handler - however, 
 when using the DataImportHandler it shouldn't be necessary to specify the 
 data extraction for the the DataImportHandler and also do it by feeding it to 
 into the update-handler. It should be centralized.
 Having to run delta query, identifying the changes, for changes where the 
 ID's of the updated documents are already known to the application is a 
 rather costly (in terms of database load) way to solve this.
 The attached patch allows to specify one or more query parameters for the 
 delta-import command, named 'root-pk', which allow to specify the document(s) 
 to be updated or added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1708) Allowing import / update of a specific document using the data import handler

2010-01-07 Thread Simon Lachinger (JIRA)
Allowing import / update of a specific document using the data import handler
-

 Key: SOLR-1708
 URL: https://issues.apache.org/jira/browse/SOLR-1708
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Simon Lachinger
 Attachments: 02-single-update.patch

There is the need that changes or new documents need to be added immediately to 
the Solr Index. This could easily done via the update-handler - however, when 
using the DataImportHandler it shouldn't be necessary to specify the data 
extraction for the the DataImportHandler and also do it by feeding it to into 
the update-handler. It should be centralized.

Having to run delta query, identifying the changes, for changes where the ID's 
of the updated documents are already known to the application is a rather 
costly (in terms of database load) way to solve this.

The attached patch allows to specify one or more query parameters for the 
delta-import command, named 'root-pk', which allow to specify the document(s) 
to be updated or added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

2010-01-07 Thread Attila Babo
While inserting a large pile of documents using
StreamingUpdateSolrServer I've found a race condition as all Runner
instances stopped while the blocking queue was full. The attached
patch solves the problem, to minify it all indentation has been
removed.

Index: 
src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
===
--- src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java  
(revision
888167)
+++ src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java  
(working
copy)
@@ -82,6 +82,7 @@
   log.info( starting runner: {} , this );
   PostMethod method = null;
   try {
+do {
 RequestEntity request = new RequestEntity() {
   // we don't know the length
   public long getContentLength() { return -1; }
@@ -142,6 +143,7 @@
   msg.append( request: +method.getURI() );
   handleError( new Exception( msg.toString() ) );
 }
+}  while( ! queue.isEmpty());
   }
   catch (Throwable e) {
 handleError( e );
@@ -149,6 +151,7 @@
   finally {
 try {
   // make sure to release the connection
+  if(method != null)
   method.releaseConnection();
 }
 catch( Exception ex ){}
@@ -195,11 +198,11 @@

   queue.put( req );

+synchronized( runners ) {
   if( runners.isEmpty()
 || (queue.remainingCapacity()  queue.size()
   runners.size()  threadCount) )
   {
-synchronized( runners ) {
   Runner r = new Runner();
   scheduler.execute( r );
   runners.add( r );

===

This patch has been tested with millions of document inserted to Solr,
before that I was unable to inject all of our documents as the
following scenario happened. We have a BlockingQueue called runners to
handle requests, at one point the queue was emptied by the Runner
threads, they all stopped processing new items but sent the collected
items to Solr. Solr was busy so that toke a long time, during that the
client filled the queue again. As all worker threads were instantiated
there were no way to create new Runners to handle the queue so it was
growing to upper limit. When the next item was about to put into the
queue it was blocked and the race condition just happened.

Patch 1, 2:
Inside the Runner.run method I've added a do while loop to prevent the
Runner to quit while there are new requests, this handles the problem
of new requests added while Runner is sending the previous batch.

Patch 3
Validity check of method variable is not strictly necessary, just a
code clean up.

Patch 4
The last part of the patch is to move synchronized outside of
conditional to avoid a situation where runners change while evaluating
it.

Your comments and critique are welcome!

Attila


[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

2010-01-07 Thread Grant Ingersoll (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797666#action_12797666
 ] 

Grant Ingersoll commented on SOLR-1680:
---

Why not broaden this and allow people to pass in their own collectors?  

Also, can you explain a bit more the use case specifically for Field Collapse?  

Alternatively, given something like LUCENE-2127, we may want Solr to be able to 
make query time decisions about what Collector to use.

 Provide an API to specify custom Collectors
 ---

 Key: SOLR-1680
 URL: https://issues.apache.org/jira/browse/SOLR-1680
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 1.3
Reporter: Martijn van Groningen
 Fix For: 1.5

 Attachments: field-collapse-core.patch, SOLR-1680.patch


 The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
 core code. 
 We want to make it possible for components to specify custom Collectors in 
 SolrIndexSearcher methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

2010-01-07 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797677#action_12797677
 ] 

Ryan McKinley commented on SOLR-1602:
-

| .Besides which: even if it's just an example it would be pretty shitty to 
break that example in the very next release.

Agreed -- we will make sure old FQNs work (until the next major release), but 
moving forward, we should remove FQN from schema.xml so this is less of an 
issue in the future.



 Refactor SOLR package structure to include o.a.solr.response and move 
 QueryResponseWriters in there
 ---

 Key: SOLR-1602
 URL: https://issues.apache.org/jira/browse/SOLR-1602
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.2, 1.3, 1.4
 Environment: independent of environment (code structure)
Reporter: Chris A. Mattmann
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1602.Mattmann.112509.patch.txt, 
 SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config


 Currently all o.a.solr.request.QueryResponseWriter implementations are 
 curiously located in the o.a.solr.request package. Not only is this package 
 getting big (30+ classes), a lot of them are misplaced. There should be a 
 first-class o.a.solr.response package, and the response related classes 
 should be given a home there. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

2010-01-07 Thread Ryan McKinley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797678#action_12797678
 ] 

Ryan McKinley commented on SOLR-1602:
-

Nobel, this issue is assigned to you?  Do you want to take care of it?  If not 
I can...

Patches won't work well since it will be a few steps in svn to make sure the 
history is maintained:
1. svn move the files to a new location, update references etc
2. commit
3. add stub files in the location where the old files were
4. commit

 Refactor SOLR package structure to include o.a.solr.response and move 
 QueryResponseWriters in there
 ---

 Key: SOLR-1602
 URL: https://issues.apache.org/jira/browse/SOLR-1602
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.2, 1.3, 1.4
 Environment: independent of environment (code structure)
Reporter: Chris A. Mattmann
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1602.Mattmann.112509.patch.txt, 
 SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config


 Currently all o.a.solr.request.QueryResponseWriter implementations are 
 curiously located in the o.a.solr.request package. Not only is this package 
 getting big (30+ classes), a lot of them are misplaced. There should be a 
 first-class o.a.solr.response package, and the response related classes 
 should be given a home there. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**

2010-01-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797687#action_12797687
 ] 

Yonik Seeley commented on SOLR-1707:


True immutability?  What's that mean over Collections.unmodifiableMap()?
And how do we know these are faster or more memory efficient?

 Use google collections immutable collections instead of 
 Collections.unmodifiable**
 --

 Key: SOLR-1707
 URL: https://issues.apache.org/jira/browse/SOLR-1707
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1707.patch


 google collections offer true immutability and more memory efficiency

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**

2010-01-07 Thread Yonik Seeley (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797713#action_12797713
 ] 

Yonik Seeley commented on SOLR-1707:


OK, I whipped up a quick test with String keys, many small maps (anywhere from 
1 to 20 keys per map).  Java6 -server 64 bit, Win7_x64

Size:
 Collections.unmodifiableMap:  7.4% bigger than HashMap
  google immutable map: 22.4% bigger than HashMap

Speed:
  Collections.unmodifiableMap: 4.2% slower than HashMap
  google immutable map:  26.0% slower than HashMap

For best space and speed, looks like we should stick with straight HashMap.

 Use google collections immutable collections instead of 
 Collections.unmodifiable**
 --

 Key: SOLR-1707
 URL: https://issues.apache.org/jira/browse/SOLR-1707
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1707.patch


 google collections offer true immutability and more memory efficiency

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2010-01-07 Thread Patrick Jungermann (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797716#action_12797716
 ] 

Patrick Jungermann commented on SOLR-236:
-

Hi all,

we using the Solr's trunk with the latest patch of {{2009-12-24 09:54 AM}}. 
Within the index, there are ~3.5 million documents with string-based 
identifiers of a length up to 50 chars.

The result document of our prefix query, which was at position 1 without 
collapsing, was with collapsing not even within the top 10 results. We using 
the option {{collapse.maxdocs=150}} and after changing this option to the value 
15000, the results seem to be as expected. Because of that, we concluded, that 
there has to be a problem with the sorting of the uncollapsed docset.


Also, we noticed a huge memory leak problem, when using collapsing. We 
configured the component with {{searchComponent name=query 
class=org.apache.solr.handler.component.CollapseComponent/}}.
Without setting the option {{collapse.field}}, it works normally, there are far 
no memory problems. If requests with enabled collapsing are received by the 
Solr server, the whole memory (oldgen could not be freed; eden space is heavily 
in use; ...) gets full after some few requests. By using a profiler, we noticed 
that the filterCache was extraordinary large. We supposed that there could be a 
caching problem (collapeCache was not enabled).


Additionally it might be very useful, if the parameter {{collapse=true|false}} 
would work again and could be used to enabled/disable the collapsing 
functionality. Currently, the existence of a field choosen for collapsing 
enables this feature and there is no possibility to configure the fields for 
collapsing within the request handlers. With that, we could configure it and 
only enable/disable it within the requests like it will be conveniently used by 
other components (highlighting, faceting, ...).


Patrick

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

2010-01-07 Thread Shalin Shekhar Mangar (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797771#action_12797771
 ] 

Shalin Shekhar Mangar commented on SOLR-1680:
-

bq. Why not broaden this and allow people to pass in their own collectors? 

Yes, that is the general idea, though it would be API driven than 
configuration. Any component should be able to pass a Collector to the various 
SolrIndexSearcher methods.

bq. Also, can you explain a bit more the use case specifically for Field 
Collapse? 

Field Collapsing needs to use a custom collector. Right now the collector is 
hard coded inside SolrIndexSearcher.

bq. Alternatively, given something like LUCENE-2127, we may want Solr to be 
able to make query time decisions about what Collector to use.

I guess that decision should be made by QueryComponent? If so, then the ability 
to pass a custom Collector to SolrIndexSearcher methods should be enough.

 Provide an API to specify custom Collectors
 ---

 Key: SOLR-1680
 URL: https://issues.apache.org/jira/browse/SOLR-1680
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 1.3
Reporter: Martijn van Groningen
 Fix For: 1.5

 Attachments: field-collapse-core.patch, SOLR-1680.patch


 The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
 core code. 
 We want to make it possible for components to specify custom Collectors in 
 SolrIndexSearcher methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

2010-01-07 Thread patrick o'leary (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797774#action_12797774
 ] 

patrick o'leary commented on SOLR-1680:
---

We've just done something like this recently and found the simplest was was to 
modify 
ResponseBuilder with setCustomCollector / getCustomCollector,
update the QueryCommand to include the custom collector.

It gets sticky in the SolrIndexSearcher with caching, and IIRC about 4 places 
to call the collector, the solution works, but is not in anyway elegant.

It would be good to see if we could refactor SolrIndexSearcher first to make it 
more streamlined.  

 Provide an API to specify custom Collectors
 ---

 Key: SOLR-1680
 URL: https://issues.apache.org/jira/browse/SOLR-1680
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 1.3
Reporter: Martijn van Groningen
 Fix For: 1.5

 Attachments: field-collapse-core.patch, SOLR-1680.patch


 The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
 core code. 
 We want to make it possible for components to specify custom Collectors in 
 SolrIndexSearcher methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-236) Field collapsing

2010-01-07 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797794#action_12797794
 ] 

Martijn van Groningen commented on SOLR-236:


bq. The result document of our prefix query, which was at position 1 without 
collapsing, was with collapsing not even within the top 10 results. We using 
the option collapse.maxdocs=150 and after changing this option to the value 
15000, the results seem to be as expected. Because of that, we concluded, that 
there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is 
doing that based on the uncollapsed docset which is not sorted in any way. The 
result of that is that documents that would normally appear in the first page 
don't appear at all in the search result. Eventually the collapse component 
uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We 
configured the component with searchComponent name=query 
class=org.apache.solr.handler.component.CollapseComponent/.
Without setting the option collapse.field, it works normally, there are far no 
memory problems. If requests with enabled collapsing are received by the Solr 
server, the whole memory (oldgen could not be freed; eden space is heavily in 
use; ...) gets full after some few requests. By using a profiler, we noticed 
that the filterCache was extraordinary large. We supposed that there could be a 
caching problem (collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse 
cache. This is something that has to be addressed and certainly will in the new 
field-collapse implementation. In the patch you're using too much is being 
cached (some data can even be neglected in the cache). Also in some cases 
strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false 
would work again and could be used to enabled/disable the collapsing 
functionality. Currently, the existence of a field choosen for collapsing 
enables this feature and there is no possibility to configure the fields for 
collapsing within the request handlers. With that, we could configure it and 
only enable/disable it within the requests like it will be conveniently used by 
other components (highlighting, faceting, ...).

That actually makes sense for using the collapse.enable parameter again in the 
patch. 

Martijn

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on 

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-01-07 Thread Martijn van Groningen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797794#action_12797794
 ] 

Martijn van Groningen edited comment on SOLR-236 at 1/7/10 9:28 PM:


bq. The result document of our prefix query, which was at position 1 without 
collapsing, was with collapsing not even within the top 10 results. We using 
the option collapse.maxdocs=150 and after changing this option to the value 
15000, the results seem to be as expected. Because of that, we concluded, that 
there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is 
doing that based on the uncollapsed docset which is not sorted in any way. The 
result of that is that documents that would normally appear in the first page 
don't appear at all in the search result. Eventually the collapse component 
uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We 
configured the component with searchComponent name=query 
class=org.apache.solr.handler.component.CollapseComponent/. Without setting 
the option collapse.field, it works normally, there are far no memory problems. 
If requests with enabled collapsing are received by the Solr server, the whole 
memory (oldgen could not be freed; eden space is heavily in use; ...) gets full 
after some few requests. By using a profiler, we noticed that the filterCache 
was extraordinary large. We supposed that there could be a caching problem 
(collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse 
cache. This is something that has to be addressed and certainly will in the new 
field-collapse implementation. In the patch you're using too much is being 
cached (some data can even be neglected in the cache). Also in some cases 
strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false 
would work again and could be used to enabled/disable the collapsing 
functionality. Currently, the existence of a field choosen for collapsing 
enables this feature and there is no possibility to configure the fields for 
collapsing within the request handlers. With that, we could configure it and 
only enable/disable it within the requests like it will be conveniently used by 
other components (highlighting, faceting, ...).

That actually makes sense for using the collapse.enable parameter again in the 
patch. 

Martijn

  was (Author: martijn):
bq. The result document of our prefix query, which was at position 1 
without collapsing, was with collapsing not even within the top 10 results. We 
using the option collapse.maxdocs=150 and after changing this option to the 
value 15000, the results seem to be as expected. Because of that, we concluded, 
that there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is 
doing that based on the uncollapsed docset which is not sorted in any way. The 
result of that is that documents that would normally appear in the first page 
don't appear at all in the search result. Eventually the collapse component 
uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We 
configured the component with searchComponent name=query 
class=org.apache.solr.handler.component.CollapseComponent/.
Without setting the option collapse.field, it works normally, there are far no 
memory problems. If requests with enabled collapsing are received by the Solr 
server, the whole memory (oldgen could not be freed; eden space is heavily in 
use; ...) gets full after some few requests. By using a profiler, we noticed 
that the filterCache was extraordinary large. We supposed that there could be a 
caching problem (collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse 
cache. This is something that has to be addressed and certainly will in the new 
field-collapse implementation. In the patch you're using too much is being 
cached (some data can even be neglected in the cache). Also in some cases 
strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false 
would work again and could be used to enabled/disable the collapsing 
functionality. Currently, the existence of a field choosen for collapsing 
enables this feature and there is no possibility to configure the fields for 
collapsing within the request handlers. With that, we could configure it and 
only enable/disable it within the requests like it will be 

Re: Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

2010-01-07 Thread Ryan McKinley

can  you submit a patch to JIRA?


On Jan 7, 2010, at 10:23 AM, Attila Babo wrote:


While inserting a large pile of documents using
StreamingUpdateSolrServer I've found a race condition as all Runner
instances stopped while the blocking queue was full. The attached
patch solves the problem, to minify it all indentation has been
removed.

Index: src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java

===
--- src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java	(revision

888167)
+++ src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java	(working

copy)
@@ -82,6 +82,7 @@
  log.info( starting runner: {} , this );
  PostMethod method = null;
  try {
+do {
RequestEntity request = new RequestEntity() {
  // we don't know the length
  public long getContentLength() { return -1; }
@@ -142,6 +143,7 @@
  msg.append( request: +method.getURI() );
  handleError( new Exception( msg.toString() ) );
}
+}  while( ! queue.isEmpty());
  }
  catch (Throwable e) {
handleError( e );
@@ -149,6 +151,7 @@
  finally {
try {
  // make sure to release the connection
+  if(method != null)
  method.releaseConnection();
}
catch( Exception ex ){}
@@ -195,11 +198,11 @@

  queue.put( req );

+synchronized( runners ) {
  if( runners.isEmpty()
|| (queue.remainingCapacity()  queue.size()
  runners.size()  threadCount) )
  {
-synchronized( runners ) {
  Runner r = new Runner();
  scheduler.execute( r );
  runners.add( r );

===

This patch has been tested with millions of document inserted to Solr,
before that I was unable to inject all of our documents as the
following scenario happened. We have a BlockingQueue called runners to
handle requests, at one point the queue was emptied by the Runner
threads, they all stopped processing new items but sent the collected
items to Solr. Solr was busy so that toke a long time, during that the
client filled the queue again. As all worker threads were instantiated
there were no way to create new Runners to handle the queue so it was
growing to upper limit. When the next item was about to put into the
queue it was blocked and the race condition just happened.

Patch 1, 2:
Inside the Runner.run method I've added a do while loop to prevent the
Runner to quit while there are new requests, this handles the problem
of new requests added while Runner is sending the previous batch.

Patch 3
Validity check of method variable is not strictly necessary, just a
code clean up.

Patch 4
The last part of the patch is to move synchronized outside of
conditional to avoid a situation where runners change while evaluating
it.

Your comments and critique are welcome!

Attila




[jira] Updated: (SOLR-1698) load balanced distributed search

2010-01-07 Thread Yonik Seeley (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1698:
---

Attachment: SOLR-1698.patch

Attaching new patch, still limited to LBHttpSolrServer at this point.
- includes tests
- adds a new expert-level API:
   public Rsp request(Req req) throws SolrServerException, IOException
   I chose objects (Rsp and Req) since I imagine we will need to continue to 
add new parameters and controls to both the request and the response (esp the 
request... things like timeout, max number of servers to query, etc).  The Rsp 
also contains info about which server returned the response and will allow us 
to stick with the same server for all phases of a distributed request.
- adds the concept of standard servers (those provided by the constructor or 
addServer)... a server on the zombie list that isn't a standard server won't be 
added to the alive list if it wakes up, and will not be pinged forever.


 load balanced distributed search
 

 Key: SOLR-1698
 URL: https://issues.apache.org/jira/browse/SOLR-1698
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: SOLR-1698.patch, SOLR-1698.patch


 Provide syntax and implementation of load-balancing across shard replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count

2010-01-07 Thread Peter Sturge (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge resolved SOLR-1672.


Resolution: Fixed

Marking as resolved.


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1706) wrong tokens output from WordDelimiterFilter when english possessives are in the text

2010-01-07 Thread Robert Muir (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797829#action_12797829
 ] 

Robert Muir commented on SOLR-1706:
---

its not just the concatenation, but also the subword generation.

In the case below, Autocoder should not be emitted, as only numeric subword 
generation is turned on.

{code}
  public void test128() throws Exception {
assertWdf(word 1234 Super-Duper-XL500-42-Autocoder x'sbd123 a4b3c-, 
0,1,0,0,0,0,0,0,0, null,
  new String[] { word, 1234, 42, Autocoder, a4b3c },
  new int[] { 0, 5, 28, 31, 50 },
  new int[] { 4, 9, 30, 40, 55 },
  new int[] { 1, 1, 1, 1, 2 });
  }
{code}

 wrong tokens output from WordDelimiterFilter when english possessives are in 
 the text
 -

 Key: SOLR-1706
 URL: https://issues.apache.org/jira/browse/SOLR-1706
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir

 the WordDelimiterFilter english possessive stemming 's  removal (on by 
 default) unfortunately causes strange behavior:
 below you can see that when I have requested to only output numeric 
 concatenations (not words), these english possessive stems are still 
 sometimes output, ignoring the options i have provided, and even then, in a 
 very inconsistent way.
 {code}
   assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
 new String[] { 42, AutoCoder },
 new int[] { 18, 21 },
 new int[] { 20, 30 },
 new int[] { 1, 1 });
   assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null,
 new String[] { 42, AutoCoder, 56 },
 new int[] { 18, 21, 33 },
 new int[] { 20, 30, 35 },
 new int[] { 1, 1, 1 });
   assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
 new String[] {  },
 new int[] {  },
 new int[] {  },
 new int[] {  });
   assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null,
 new String[] { 42 },
 new int[] { 18 },
 new int[] { 20 },
 new int[] { 1 });
 {code}
 where assertWdf is 
 {code}
   void assertWdf(String text, int generateWordParts, int generateNumberParts,
   int catenateWords, int catenateNumbers, int catenateAll,
   int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
   int stemEnglishPossessive, CharArraySet protWords, String expected[],
   int startOffsets[], int endOffsets[], String types[], int posIncs[])
   throws IOException {
 TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
 WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
 generateNumberParts, catenateWords, catenateNumbers, catenateAll,
 splitOnCaseChange, preserveOriginal, splitOnNumerics,
 stemEnglishPossessive, protWords);
 assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
 posIncs);
   }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (SOLR-1706) wrong tokens output from WordDelimiterFilter depending upon options

2010-01-07 Thread Robert Muir (JIRA)

 [ 
https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1706:
--

Description: 
below you can see that when I have requested to only output numeric 
concatenations (not words), some words are still sometimes output, ignoring the 
options i have provided, and even then, in a very inconsistent way.

{code}
  assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42, AutoCoder },
new int[] { 18, 21 },
new int[] { 20, 30 },
new int[] { 1, 1 });

  assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42, AutoCoder, 56 },
new int[] { 18, 21, 33 },
new int[] { 20, 30, 35 },
new int[] { 1, 1, 1 });

  assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
new String[] {  },
new int[] {  },
new int[] {  },
new int[] {  });

  assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42 },
new int[] { 18 },
new int[] { 20 },
new int[] { 1 });
{code}

where assertWdf is 
{code}
  void assertWdf(String text, int generateWordParts, int generateNumberParts,
  int catenateWords, int catenateNumbers, int catenateAll,
  int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
  int stemEnglishPossessive, CharArraySet protWords, String expected[],
  int startOffsets[], int endOffsets[], String types[], int posIncs[])
  throws IOException {
TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
generateNumberParts, catenateWords, catenateNumbers, catenateAll,
splitOnCaseChange, preserveOriginal, splitOnNumerics,
stemEnglishPossessive, protWords);
assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
posIncs);
  }
{code}


  was:
the WordDelimiterFilter english possessive stemming 's  removal (on by 
default) unfortunately causes strange behavior:

below you can see that when I have requested to only output numeric 
concatenations (not words), these english possessive stems are still sometimes 
output, ignoring the options i have provided, and even then, in a very 
inconsistent way.

{code}
  assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42, AutoCoder },
new int[] { 18, 21 },
new int[] { 20, 30 },
new int[] { 1, 1 });

  assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42, AutoCoder, 56 },
new int[] { 18, 21, 33 },
new int[] { 20, 30, 35 },
new int[] { 1, 1, 1 });

  assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
new String[] {  },
new int[] {  },
new int[] {  },
new int[] {  });

  assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42 },
new int[] { 18 },
new int[] { 20 },
new int[] { 1 });
{code}

where assertWdf is 
{code}
  void assertWdf(String text, int generateWordParts, int generateNumberParts,
  int catenateWords, int catenateNumbers, int catenateAll,
  int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
  int stemEnglishPossessive, CharArraySet protWords, String expected[],
  int startOffsets[], int endOffsets[], String types[], int posIncs[])
  throws IOException {
TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
generateNumberParts, catenateWords, catenateNumbers, catenateAll,
splitOnCaseChange, preserveOriginal, splitOnNumerics,
stemEnglishPossessive, protWords);
assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
posIncs);
  }
{code}


Summary: wrong tokens output from WordDelimiterFilter depending upon 
options  (was: wrong tokens output from WordDelimiterFilter when english 
possessives are in the text)

 wrong tokens output from WordDelimiterFilter depending upon options
 ---

 Key: SOLR-1706
 URL: https://issues.apache.org/jira/browse/SOLR-1706
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir

 below you can see that when I have requested to only output numeric 
 concatenations (not words), some words are still sometimes output, ignoring 
 the options i have provided, and even then, in a very inconsistent way.
 {code}
   assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
 new String[] { 42, AutoCoder },
 new int[] { 18, 21 },
 new int[] { 20, 30 },
 new int[] { 1, 1 });
   assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 

[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-07 Thread Koji Sekiguchi (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797841#action_12797841
 ] 

Koji Sekiguchi commented on SOLR-1696:
--

Noble, thank you for opening this and attaching the patch! Are you planning to 
commit this shortly? because I'm ready to commit SOLR-1268 that is using old 
style config. If you commit it, I'll rewrite SOLR-1268. Or I can assign 
SOLR-1696 to me.

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (SOLR-1709) Distributed Date Faceting

2010-01-07 Thread Peter Sturge (JIRA)
Distributed Date Faceting
-

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor


This patch is for adding support for date facets when using distributed 
searches.

Date faceting across multiple machines exposes some time-based issues that 
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch (i.e. 
merged date facets are at a time-of-day, not necessarily at a universal 
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the basis 
for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to the 
first by 1 'gap', these 'earlier' or 'later' facets will not be merged in.
There are several reasons for this:
  * Performance: It's faster to check facet_date lists against a single map's 
data, rather than against each other, particularly if there are many shards
  * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3 or 
more hours of data)
This could be dealt with if timezone and skew information was added, and 
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and 
'now' parameters to the 'facet_dates' map. This would tell requesters what time 
and TZ the remote server thinks it is, and so multiple shards' time data can be 
normalized.

The patch affects 2 files in the Solr core:
  org.apache.solr.handler.component.FacetComponent.java
  org.apache.solr.handler.component.ResponseBuilder.java

The main changes are in FacetComponent - ResponseBuilder is just to hold the 
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but 
really, if facet.date parameters are specified, it is assumed they are desired.
Comments  suggestions welcome.

As a favour to ask, if anyone could take my 2 source files and create a PATCH 
file from it, it would be greatly appreciated, as I'm having a bit of trouble 
with svn (don't shoot me, but my environment is a Redmond-based os company).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-07 Thread Jason Rutherglen (JIRA)

[ 
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797898#action_12797898
 ] 

Jason Rutherglen commented on SOLR-1709:


Tim,

Thanks for the patch...

bq. as I'm having a bit of trouble with svn (don't shoot me, but my environment 
is a Redmond-based os company).

TortoiseSVN works well on Windows, even for creating patches.  Have you tried 
it?  



 Distributed Date Faceting
 -

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor

 This patch is for adding support for date facets when using distributed 
 searches.
 Date faceting across multiple machines exposes some time-based issues that 
 anyone interested in this behaviour should be aware of:
 Any time and/or time-zone differences are not accounted for in the patch 
 (i.e. merged date facets are at a time-of-day, not necessarily at a universal 
 'instant-in-time', unless all shards are time-synced to the exact same time).
 The implementation uses the first encountered shard's facet_dates as the 
 basis for subsequent shards' data to be merged in.
 This means that if subsequent shards' facet_dates are skewed in relation to 
 the first by 1 'gap', these 'earlier' or 'later' facets will not be merged 
 in.
 There are several reasons for this:
   * Performance: It's faster to check facet_date lists against a single map's 
 data, rather than against each other, particularly if there are many shards
   * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
 time range larger than that which was requested
 (e.g. a request for one hour's worth of facets could bring back 2, 3 
 or more hours of data)
 This could be dealt with if timezone and skew information was added, and 
 the dates were normalized.
 One possibility for adding such support is to [optionally] add 'timezone' and 
 'now' parameters to the 'facet_dates' map. This would tell requesters what 
 time and TZ the remote server thinks it is, and so multiple shards' time data 
 can be normalized.
 The patch affects 2 files in the Solr core:
   org.apache.solr.handler.component.FacetComponent.java
   org.apache.solr.handler.component.ResponseBuilder.java
 The main changes are in FacetComponent - ResponseBuilder is just to hold the 
 completed SimpleOrderedMap until the finishStage.
 One possible enhancement is to perhaps make this an optional parameter, but 
 really, if facet.date parameters are specified, it is assumed they are 
 desired.
 Comments  suggestions welcome.
 As a favour to ask, if anyone could take my 2 source files and create a PATCH 
 file from it, it would be greatly appreciated, as I'm having a bit of trouble 
 with svn (don't shoot me, but my environment is a Redmond-based os company).

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Hudson build is back to normal: Solr-trunk #1024

2010-01-07 Thread Apache Hudson Server
See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1024/changes