date:20100107

[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

2010-01-07 Thread Noble Paul (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797572#action_12797572
 ] 

Noble Paul commented on SOLR-1602:
--

bq.but there have also been some threads out there in the past pointing out 
that using FQNs can speed up core initialization 

This is resolved SOLR-921

 Refactor SOLR package structure to include o.a.solr.response and move 
 QueryResponseWriters in there
 ---

 Key: SOLR-1602
 URL: https://issues.apache.org/jira/browse/SOLR-1602
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.2, 1.3, 1.4
 Environment: independent of environment (code structure)
Reporter: Chris A. Mattmann
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1602.Mattmann.112509.patch.txt, 
 SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config


 Currently all o.a.solr.request.QueryResponseWriter implementations are 
 curiously located in the o.a.solr.request package. Not only is this package 
 getting big (30+ classes), a lot of them are misplaced. There should be a 
 first-class o.a.solr.response package, and the response related classes 
 should be given a home there. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

2010-01-07 Thread Paul taylor (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797601#action_12797601
 ] 

Paul taylor commented on SOLR-1653:
---

Hi, Im using in non Solr in an analyser, and think there maybe a performance 
issue because you cannot pass a compiled Pattern. In the reusableTokenStream() 
method you cannot reset a charfilter like you can a tokenizer so it as to 
recompile the pattern everytime 

i.e. 
 public TokenStream reusableTokenStream(String fieldName, Reader reader) throws 
IOException {
SavedStreams streams = (SavedStreams)getPreviousTokenStream();
if (streams == null) {
streams = new SavedStreams();
setPreviousTokenStream(streams);
streams.tokenStream = new 
StandardTokenizer(Version.LUCENE_CURRENT,new PatternReplaceCharFilter((no\\.) 
([0-9]+),$1$2,reader));
streams.filteredTokenStream = new 
StandardFilter(streams.filteredTokenStream);
streams.filteredTokenStream = new 
AccentFilter(streams.filteredTokenStream);
streams.filteredTokenStream = new 
LowercaseFilter(streams.filteredTokenStream);
}
else {
streams.tokenStream.reset(new PatternReplaceCharFilter((no\\.) 
([0-9]+),$1$2,reader));
}
return streams.filteredTokenStream;
}

 add PatternReplaceCharFilter
 

 Key: SOLR-1653
 URL: https://issues.apache.org/jira/browse/SOLR-1653
 Project: Solr
  Issue Type: New Feature
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Koji Sekiguchi
Assignee: Koji Sekiguchi
Priority: Minor
 Fix For: 1.5

 Attachments: SOLR-1653.patch, SOLR-1653.patch


 Add a new CharFilter that uses a regular expression for the target of replace 
 string in char stream.
 Usage:
 {code:title=schema.xml}
 fieldType name=textCharNorm class=solr.TextField 
 positionIncrementGap=100 
   analyzer
 charFilter class=solr.PatternReplaceCharFilterFactory
 groupedPattern=([nN][oO]\.)\s*(\d+)
 replaceGroups=1,2 blockDelimiters=:;/
 charFilter class=solr.MappingCharFilterFactory 
 mapping=mapping-ISOLatin1Accent.txt/
 tokenizer class=solr.WhitespaceTokenizerFactory/
   /analyzer
 /fieldType
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1708) Allowing import / update of a specific document using the data import handler

2010-01-07 Thread Simon Lachinger (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Simon Lachinger updated SOLR-1708:
--

Attachment: 02-single-update.patch

 Allowing import / update of a specific document using the data import handler
 -

 Key: SOLR-1708
 URL: https://issues.apache.org/jira/browse/SOLR-1708
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Simon Lachinger
 Attachments: 02-single-update.patch


 There is the need that changes or new documents need to be added immediately 
 to the Solr Index. This could easily done via the update-handler - however, 
 when using the DataImportHandler it shouldn't be necessary to specify the 
 data extraction for the the DataImportHandler and also do it by feeding it to 
 into the update-handler. It should be centralized.
 Having to run delta query, identifying the changes, for changes where the 
 ID's of the updated documents are already known to the application is a 
 rather costly (in terms of database load) way to solve this.
 The attached patch allows to specify one or more query parameters for the 
 delta-import command, named 'root-pk', which allow to specify the document(s) 
 to be updated or added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1708) Allowing import / update of a specific document using the data import handler

2010-01-07 Thread Simon Lachinger (JIRA)

Allowing import / update of a specific document using the data import handler
-

 Key: SOLR-1708
 URL: https://issues.apache.org/jira/browse/SOLR-1708
 Project: Solr
  Issue Type: New Feature
  Components: contrib - DataImportHandler
Affects Versions: 1.4
Reporter: Simon Lachinger
 Attachments: 02-single-update.patch

There is the need that changes or new documents need to be added immediately to 
the Solr Index. This could easily done via the update-handler - however, when 
using the DataImportHandler it shouldn't be necessary to specify the data 
extraction for the the DataImportHandler and also do it by feeding it to into 
the update-handler. It should be centralized.

Having to run delta query, identifying the changes, for changes where the ID's 
of the updated documents are already known to the application is a rather 
costly (in terms of database load) way to solve this.

The attached patch allows to specify one or more query parameters for the 
delta-import command, named 'root-pk', which allow to specify the document(s) 
to be updated or added.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

2010-01-07 Thread Attila Babo

While inserting a large pile of documents using
StreamingUpdateSolrServer I've found a race condition as all Runner
instances stopped while the blocking queue was full. The attached
patch solves the problem, to minify it all indentation has been
removed.

Index: 
src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java
===
--- src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java  
(revision
888167)
+++ src/solrj/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.java  
(working
copy)
@@ -82,6 +82,7 @@
   log.info( starting runner: {} , this );
   PostMethod method = null;
   try {
+do {
 RequestEntity request = new RequestEntity() {
   // we don't know the length
   public long getContentLength() { return -1; }
@@ -142,6 +143,7 @@
   msg.append( request: +method.getURI() );
   handleError( new Exception( msg.toString() ) );
 }
+}  while( ! queue.isEmpty());
   }
   catch (Throwable e) {
 handleError( e );
@@ -149,6 +151,7 @@
   finally {
 try {
   // make sure to release the connection
+  if(method != null)
   method.releaseConnection();
 }
 catch( Exception ex ){}
@@ -195,11 +198,11 @@

   queue.put( req );

+synchronized( runners ) {
   if( runners.isEmpty()
 || (queue.remainingCapacity()  queue.size()
   runners.size()  threadCount) )
   {
-synchronized( runners ) {
   Runner r = new Runner();
   scheduler.execute( r );
   runners.add( r );

===

This patch has been tested with millions of document inserted to Solr,
before that I was unable to inject all of our documents as the
following scenario happened. We have a BlockingQueue called runners to
handle requests, at one point the queue was emptied by the Runner
threads, they all stopped processing new items but sent the collected
items to Solr. Solr was busy so that toke a long time, during that the
client filled the queue again. As all worker threads were instantiated
there were no way to create new Runners to handle the queue so it was
growing to upper limit. When the next item was about to put into the
queue it was blocked and the race condition just happened.

Patch 1, 2:
Inside the Runner.run method I've added a do while loop to prevent the
Runner to quit while there are new requests, this handles the problem
of new requests added while Runner is sending the previous batch.

Patch 3
Validity check of method variable is not strictly necessary, just a
code clean up.

Patch 4
The last part of the patch is to move synchronized outside of
conditional to avoid a situation where runners change while evaluating
it.

Your comments and critique are welcome!

Attila

[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

2010-01-07 Thread Grant Ingersoll (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797666#action_12797666
 ] 

Grant Ingersoll commented on SOLR-1680:
---

Why not broaden this and allow people to pass in their own collectors?  

Also, can you explain a bit more the use case specifically for Field Collapse?  

Alternatively, given something like LUCENE-2127, we may want Solr to be able to 
make query time decisions about what Collector to use.

 Provide an API to specify custom Collectors
 ---

 Key: SOLR-1680
 URL: https://issues.apache.org/jira/browse/SOLR-1680
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 1.3
Reporter: Martijn van Groningen
 Fix For: 1.5

 Attachments: field-collapse-core.patch, SOLR-1680.patch


 The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
 core code. 
 We want to make it possible for components to specify custom Collectors in 
 SolrIndexSearcher methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

2010-01-07 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797677#action_12797677
 ] 

Ryan McKinley commented on SOLR-1602:
-

| .Besides which: even if it's just an example it would be pretty shitty to 
break that example in the very next release.

Agreed -- we will make sure old FQNs work (until the next major release), but 
moving forward, we should remove FQN from schema.xml so this is less of an 
issue in the future.



 Refactor SOLR package structure to include o.a.solr.response and move 
 QueryResponseWriters in there
 ---

 Key: SOLR-1602
 URL: https://issues.apache.org/jira/browse/SOLR-1602
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.2, 1.3, 1.4
 Environment: independent of environment (code structure)
Reporter: Chris A. Mattmann
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1602.Mattmann.112509.patch.txt, 
 SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config


 Currently all o.a.solr.request.QueryResponseWriter implementations are 
 curiously located in the o.a.solr.request package. Not only is this package 
 getting big (30+ classes), a lot of them are misplaced. There should be a 
 first-class o.a.solr.response package, and the response related classes 
 should be given a home there. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

2010-01-07 Thread Ryan McKinley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797678#action_12797678
 ] 

Ryan McKinley commented on SOLR-1602:
-

Nobel, this issue is assigned to you?  Do you want to take care of it?  If not 
I can...

Patches won't work well since it will be a few steps in svn to make sure the 
history is maintained:
1. svn move the files to a new location, update references etc
2. commit
3. add stub files in the location where the old files were
4. commit

 Refactor SOLR package structure to include o.a.solr.response and move 
 QueryResponseWriters in there
 ---

 Key: SOLR-1602
 URL: https://issues.apache.org/jira/browse/SOLR-1602
 Project: Solr
  Issue Type: Improvement
  Components: Response Writers
Affects Versions: 1.2, 1.3, 1.4
 Environment: independent of environment (code structure)
Reporter: Chris A. Mattmann
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1602.Mattmann.112509.patch.txt, 
 SOLR-1602.Mattmann.112509_02.patch.txt, upgrade_solr_config


 Currently all o.a.solr.request.QueryResponseWriter implementations are 
 curiously located in the o.a.solr.request package. Not only is this package 
 getting big (30+ classes), a lot of them are misplaced. There should be a 
 first-class o.a.solr.response package, and the response related classes 
 should be given a home there. Patch forthcoming.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**

2010-01-07 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797687#action_12797687
 ] 

Yonik Seeley commented on SOLR-1707:


True immutability?  What's that mean over Collections.unmodifiableMap()?
And how do we know these are faster or more memory efficient?

 Use google collections immutable collections instead of 
 Collections.unmodifiable**
 --

 Key: SOLR-1707
 URL: https://issues.apache.org/jira/browse/SOLR-1707
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1707.patch


 google collections offer true immutability and more memory efficiency

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**

2010-01-07 Thread Yonik Seeley (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797713#action_12797713
 ] 

Yonik Seeley commented on SOLR-1707:


OK, I whipped up a quick test with String keys, many small maps (anywhere from 
1 to 20 keys per map).  Java6 -server 64 bit, Win7_x64

Size:
 Collections.unmodifiableMap:  7.4% bigger than HashMap
  google immutable map: 22.4% bigger than HashMap

Speed:
  Collections.unmodifiableMap: 4.2% slower than HashMap
  google immutable map:  26.0% slower than HashMap

For best space and speed, looks like we should stick with straight HashMap.

 Use google collections immutable collections instead of 
 Collections.unmodifiable**
 --

 Key: SOLR-1707
 URL: https://issues.apache.org/jira/browse/SOLR-1707
 Project: Solr
  Issue Type: Improvement
Reporter: Noble Paul
Assignee: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1707.patch


 google collections offer true immutability and more memory efficiency

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

2010-01-07 Thread Patrick Jungermann (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797716#action_12797716
 ] 

Patrick Jungermann commented on SOLR-236:
-

Hi all,

we using the Solr's trunk with the latest patch of {{2009-12-24 09:54 AM}}. 
Within the index, there are ~3.5 million documents with string-based 
identifiers of a length up to 50 chars.

The result document of our prefix query, which was at position 1 without 
collapsing, was with collapsing not even within the top 10 results. We using 
the option {{collapse.maxdocs=150}} and after changing this option to the value 
15000, the results seem to be as expected. Because of that, we concluded, that 
there has to be a problem with the sorting of the uncollapsed docset.


Also, we noticed a huge memory leak problem, when using collapsing. We 
configured the component with {{searchComponent name=query 
class=org.apache.solr.handler.component.CollapseComponent/}}.
Without setting the option {{collapse.field}}, it works normally, there are far 
no memory problems. If requests with enabled collapsing are received by the 
Solr server, the whole memory (oldgen could not be freed; eden space is heavily 
in use; ...) gets full after some few requests. By using a profiler, we noticed 
that the filterCache was extraordinary large. We supposed that there could be a 
caching problem (collapeCache was not enabled).


Additionally it might be very useful, if the parameter {{collapse=true|false}} 
would work again and could be used to enabled/disable the collapsing 
functionality. Currently, the existence of a field choosen for collapsing 
enables this feature and there is no possibility to configure the fields for 
collapsing within the request handlers. With that, we could configure it and 
only enable/disable it within the requests like it will be conveniently used by 
other components (highlighting, faceting, ...).


Patrick

 Field collapsing
 

 Key: SOLR-236
 URL: https://issues.apache.org/jira/browse/SOLR-236
 Project: Solr
  Issue Type: New Feature
  Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
 Fix For: 1.5

 Attachments: collapsing-patch-to-1.3.0-dieter.patch, 
 collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch, 
 collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch, 
 field-collapse-4-with-solrj.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-5.patch, field-collapse-5.patch, 
 field-collapse-solr-236-2.patch, field-collapse-solr-236.patch, 
 field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch, 
 field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff, 
 field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff, 
 quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch, 
 SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, 
 SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch, 
 SOLR-236_collapsing.patch


 This patch include a new feature called Field collapsing.
 Used in order to collapse a group of results with similar value for a given 
 field to a single entry in the result set. Site collapsing is a special case 
 of this, where all results for a given web site is collapsed into one or two 
 entries in the result set, typically with an associated more documents from 
 this site link. See also Duplicate detection.
 http://www.fastsearch.com/glossary.aspx?m=48amid=299
 The implementation add 3 new query parameters (SolrParams):
 collapse.field to choose the field used to group results
 collapse.type normal (default value) or adjacent
 collapse.max to select how many continuous results are allowed before 
 collapsing
 TODO (in progress):
 - More documentation (on source code)
 - Test cases
 Two patches:
 - field_collapsing.patch for current development version
 - field_collapsing_1.1.0.patch for Solr-1.1.0
 P.S.: Feedback and misspelling correction are welcome ;-)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

2010-01-07 Thread Shalin Shekhar Mangar (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797771#action_12797771
]

Shalin Shekhar Mangar commented on SOLR-1680:
-

bq. Why not broaden this and allow people to pass in their own collectors?

Yes, that is the general idea, though it would be API driven than
configuration. Any component should be able to pass a Collector to the various
SolrIndexSearcher methods.

bq. Also, can you explain a bit more the use case specifically for Field
Collapse?

Field Collapsing needs to use a custom collector. Right now the collector is
hard coded inside SolrIndexSearcher.

bq. Alternatively, given something like LUCENE-2127, we may want Solr to be
able to make query time decisions about what Collector to use.

I guess that decision should be made by QueryComponent? If so, then the ability
to pass a custom Collector to SolrIndexSearcher methods should be enough.

Provide an API to specify custom Collectors
---

Key: SOLR-1680
URL: https://issues.apache.org/jira/browse/SOLR-1680
Project: Solr
Issue Type: Sub-task
Components: search
Affects Versions: 1.3
Reporter: Martijn van Groningen
Fix For: 1.5

Attachments: field-collapse-core.patch, SOLR-1680.patch

The issue is dedicated to incorporate fieldcollapse's changes to the Solr's
core code.
We want to make it possible for components to specify custom Collectors in
SolrIndexSearcher methods.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

2010-01-07 Thread patrick o'leary (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797774#action_12797774
 ] 

patrick o'leary commented on SOLR-1680:
---

We've just done something like this recently and found the simplest was was to 
modify 
ResponseBuilder with setCustomCollector / getCustomCollector,
update the QueryCommand to include the custom collector.

It gets sticky in the SolrIndexSearcher with caching, and IIRC about 4 places 
to call the collector, the solution works, but is not in anyway elegant.

It would be good to see if we could refactor SolrIndexSearcher first to make it 
more streamlined.  

 Provide an API to specify custom Collectors
 ---

 Key: SOLR-1680
 URL: https://issues.apache.org/jira/browse/SOLR-1680
 Project: Solr
  Issue Type: Sub-task
  Components: search
Affects Versions: 1.3
Reporter: Martijn van Groningen
 Fix For: 1.5

 Attachments: field-collapse-core.patch, SOLR-1680.patch


 The issue is dedicated to incorporate fieldcollapse's changes to the Solr's 
 core code. 
 We want to make it possible for components to specify custom Collectors in 
 SolrIndexSearcher methods.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-236) Field collapsing

2010-01-07 Thread Martijn van Groningen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797794#action_12797794
]

Martijn van Groningen commented on SOLR-236:

bq. The result document of our prefix query, which was at position 1 without
collapsing, was with collapsing not even within the top 10 results. We using
the option collapse.maxdocs=150 and after changing this option to the value
15000, the results seem to be as expected. Because of that, we concluded, that
there has to be a problem with the sorting of the uncollapsed docset.

The collapse.maxdocs aborts collapsing after the threshold is met, but it is
doing that based on the uncollapsed docset which is not sorted in any way. The
result of that is that documents that would normally appear in the first page
don't appear at all in the search result. Eventually the collapse component
uses the collapsed docset as the result set and not the uncollapsed docset.

bq. Also, we noticed a huge memory leak problem, when using collapsing. We
configured the component with searchComponent name=query
class=org.apache.solr.handler.component.CollapseComponent/.
Without setting the option collapse.field, it works normally, there are far no
memory problems. If requests with enabled collapsing are received by the Solr
server, the whole memory (oldgen could not be freed; eden space is heavily in
use; ...) gets full after some few requests. By using a profiler, we noticed
that the filterCache was extraordinary large. We supposed that there could be a
caching problem (collapeCache was not enabled).

I agree it gets huge. This applies for both the filterCache and field collapse
cache. This is something that has to be addressed and certainly will in the new
field-collapse implementation. In the patch you're using too much is being
cached (some data can even be neglected in the cache). Also in some cases
strings are being cached that actually could be replaced with hashcodes.

bq. Additionally it might be very useful, if the parameter collapse=true|false
would work again and could be used to enabled/disable the collapsing
functionality. Currently, the existence of a field choosen for collapsing
enables this feature and there is no possibility to configure the fields for
collapsing within the request handlers. With that, we could configure it and
only enable/disable it within the requests like it will be conveniently used by
other components (highlighting, faceting, ...).

That actually makes sense for using the collapse.enable parameter again in the
patch.

Martijn

Field collapsing

Key: SOLR-236
URL: https://issues.apache.org/jira/browse/SOLR-236
Project: Solr
Issue Type: New Feature
Components: search
Affects Versions: 1.3
Reporter: Emmanuel Keller
Assignee: Shalin Shekhar Mangar
Fix For: 1.5

Attachments: collapsing-patch-to-1.3.0-dieter.patch,
collapsing-patch-to-1.3.0-ivan.patch, collapsing-patch-to-1.3.0-ivan_2.patch,
collapsing-patch-to-1.3.0-ivan_3.patch, field-collapse-3.patch,
field-collapse-4-with-solrj.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch, field-collapse-5.patch,
field-collapse-5.patch, field-collapse-5.patch,
field-collapse-solr-236-2.patch, field-collapse-solr-236.patch,
field-collapsing-extended-592129.patch, field_collapsing_1.1.0.patch,
field_collapsing_1.3.patch, field_collapsing_dsteigerwald.diff,
field_collapsing_dsteigerwald.diff, field_collapsing_dsteigerwald.diff,
quasidistributed.additional.patch, SOLR-236-FieldCollapsing.patch,
SOLR-236-FieldCollapsing.patch, SOLR-236-FieldCollapsing.patch,
SOLR-236.patch, SOLR-236.patch, SOLR-236.patch, SOLR-236.patch,
SOLR-236.patch, solr-236.patch, SOLR-236_collapsing.patch,
SOLR-236_collapsing.patch

This patch include a new feature called Field collapsing.
Used in order to collapse a group of results with similar value for a given
field to a single entry in the result set. Site collapsing is a special case
of this, where all results for a given web site is collapsed into one or two
entries in the result set, typically with an associated more documents from
this site link. See also Duplicate detection.
http://www.fastsearch.com/glossary.aspx?m=48amid=299
The implementation add 3 new query parameters (SolrParams):
collapse.field to choose the field used to group results
collapse.type normal (default value) or adjacent
collapse.max to select how many continuous results are allowed before
collapsing
TODO (in progress):
- More documentation (on

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

2010-01-07 Thread Martijn van Groningen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-236?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797794#action_12797794
]

Martijn van Groningen edited comment on SOLR-236 at 1/7/10 9:28 PM:

bq. Also, we noticed a huge memory leak problem, when using collapsing. We
configured the component with searchComponent name=query
class=org.apache.solr.handler.component.CollapseComponent/. Without setting
the option collapse.field, it works normally, there are far no memory problems.
If requests with enabled collapsing are received by the Solr server, the whole
memory (oldgen could not be freed; eden space is heavily in use; ...) gets full
after some few requests. By using a profiler, we noticed that the filterCache
was extraordinary large. We supposed that there could be a caching problem
(collapeCache was not enabled).

That actually makes sense for using the collapse.enable parameter again in the
patch.

Martijn

was (Author: martijn):
bq. The result document of our prefix query, which was at position 1
without collapsing, was with collapsing not even within the top 10 results. We
using the option collapse.maxdocs=150 and after changing this option to the
value 15000, the results seem to be as expected. Because of that, we concluded,
that there has to be a problem with the sorting of the uncollapsed docset.

Re: Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

2010-01-07 Thread Ryan McKinley


can  you submit a patch to JIRA?


On Jan 7, 2010, at 10:23 AM, Attila Babo wrote:


While inserting a large pile of documents using
StreamingUpdateSolrServer I've found a race condition as all Runner
instances stopped while the blocking queue was full. The attached
patch solves the problem, to minify it all indentation has been
removed.

Index: src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java

===
--- src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java	(revision

888167)
+++ src/solrj/org/apache/solr/client/solrj/impl/ 
StreamingUpdateSolrServer.java	(working

copy)
@@ -82,6 +82,7 @@
  log.info( starting runner: {} , this );
  PostMethod method = null;
  try {
+do {
RequestEntity request = new RequestEntity() {
  // we don't know the length
  public long getContentLength() { return -1; }
@@ -142,6 +143,7 @@
  msg.append( request: +method.getURI() );
  handleError( new Exception( msg.toString() ) );
}
+}  while( ! queue.isEmpty());
  }
  catch (Throwable e) {
handleError( e );
@@ -149,6 +151,7 @@
  finally {
try {
  // make sure to release the connection
+  if(method != null)
  method.releaseConnection();
}
catch( Exception ex ){}
@@ -195,11 +198,11 @@

  queue.put( req );

+synchronized( runners ) {
  if( runners.isEmpty()
|| (queue.remainingCapacity()  queue.size()
  runners.size()  threadCount) )
  {
-synchronized( runners ) {
  Runner r = new Runner();
  scheduler.execute( r );
  runners.add( r );

===

This patch has been tested with millions of document inserted to Solr,
before that I was unable to inject all of our documents as the
following scenario happened. We have a BlockingQueue called runners to
handle requests, at one point the queue was emptied by the Runner
threads, they all stopped processing new items but sent the collected
items to Solr. Solr was busy so that toke a long time, during that the
client filled the queue again. As all worker threads were instantiated
there were no way to create new Runners to handle the queue so it was
growing to upper limit. When the next item was about to put into the
queue it was blocked and the race condition just happened.

Patch 1, 2:
Inside the Runner.run method I've added a do while loop to prevent the
Runner to quit while there are new requests, this handles the problem
of new requests added while Runner is sending the previous batch.

Patch 3
Validity check of method variable is not strictly necessary, just a
code clean up.

Patch 4
The last part of the patch is to move synchronized outside of
conditional to avoid a situation where runners change while evaluating
it.

Your comments and critique are welcome!

Attila

[jira] Updated: (SOLR-1698) load balanced distributed search

2010-01-07 Thread Yonik Seeley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1698?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yonik Seeley updated SOLR-1698:
---

Attachment: SOLR-1698.patch

Attaching new patch, still limited to LBHttpSolrServer at this point.
- includes tests
- adds a new expert-level API:
   public Rsp request(Req req) throws SolrServerException, IOException
   I chose objects (Rsp and Req) since I imagine we will need to continue to 
add new parameters and controls to both the request and the response (esp the 
request... things like timeout, max number of servers to query, etc).  The Rsp 
also contains info about which server returned the response and will allow us 
to stick with the same server for all phases of a distributed request.
- adds the concept of standard servers (those provided by the constructor or 
addServer)... a server on the zombie list that isn't a standard server won't be 
added to the alive list if it wakes up, and will not be pinged forever.


 load balanced distributed search
 

 Key: SOLR-1698
 URL: https://issues.apache.org/jira/browse/SOLR-1698
 Project: Solr
  Issue Type: Improvement
Reporter: Yonik Seeley
 Attachments: SOLR-1698.patch, SOLR-1698.patch


 Provide syntax and implementation of load-balancing across shard replicas.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count

2010-01-07 Thread Peter Sturge (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Sturge resolved SOLR-1672.


Resolution: Fixed

Marking as resolved.


 RFE: facet reverse sort count
 -

 Key: SOLR-1672
 URL: https://issues.apache.org/jira/browse/SOLR-1672
 Project: Solr
  Issue Type: Improvement
  Components: search
Affects Versions: 1.4
 Environment: Java, Solrj, http
Reporter: Peter Sturge
Priority: Minor
 Attachments: SOLR-1672.patch

   Original Estimate: 0h
  Remaining Estimate: 0h

 As suggested by Chris Hosstetter, I have added an optional Comparator to the 
 BoundedTreeSetLong in the UnInvertedField class.
 This optional comparator is used when a new (and also optional) field facet 
 parameter called 'facet.sortorder' is set to the string 'dsc' 
 (e.g. f.facetname.facet.sortorder=dsc for per field, or 
 facet.sortorder=dsc for all facets).
 Note that this parameter has no effect if facet.method=enum.
 Any value other than 'dsc' (including no value) reverts the BoundedTreeSet to 
 its default behaviour.
  
 This change affects 2 source files:
  UnInvertedField.java
 [line 438] The getCounts() method signature is modified to add the 
 'facetSortOrder' parameter value to the end of the argument list.
  
 DIFF UnInvertedField.java:
 - public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix) throws IOException {
 + public NamedList getCounts(SolrIndexSearcher searcher, DocSet baseDocs, int 
 offset, int limit, Integer mincount, boolean missing, String sort, String 
 prefix, String facetSortOrder) throws IOException {
 [line 556] The getCounts() method is modified to create an overridden 
 BoundedTreeSetLong(int, Comparator) if the 'facetSortOrder' parameter 
 equals 'dsc'.
 DIFF UnInvertedField.java:
 - final BoundedTreeSetLong queue = new BoundedTreeSetLong(maxsize);
 + final BoundedTreeSetLong queue = (sort.equals(count) || 
 sort.equals(true)) ? (facetSortOrder.equals(dsc) ? new 
 BoundedTreeSetLong(maxsize, new Comparator()
 { @Override
 public int compare(Object o1, Object o2)
 {
   if (o1 == null || o2 == null)
 return 0;
   int result = ((Long) o1).compareTo((Long) o2);
   return (result != 0 ? result  0 ? -1 : 1 : 0); //lowest number first sort
 }}) : new BoundedTreeSetLong(maxsize)) : null;
  SimpleFacets.java
 [line 221] A getFieldParam(field, facet.sortorder, asc); is added to 
 retrieve the new parameter, if present. 'asc' used as a default value.
 DIFF SimpleFacets.java:
 + String facetSortOrder = params.getFieldParam(field, facet.sortorder, 
 asc);
  
 [line 253] The call to uif.getCounts() in the getTermCounts() method is 
 modified to pass the 'facetSortOrder' value string.
 DIFF SimpleFacets.java:
 - counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix);
 + counts = uif.getCounts(searcher, base, offset, limit, 
 mincount,missing,sort,prefix, facetSortOrder);
 Implementation Notes:
 I have noted in testing that I was not able to retrieve any '0' counts as I 
 had expected.
 I believe this could be because there appear to be some optimizations in 
 SimpleFacets/count caching such that zero counts are not iterated (at least 
 not by default)
 as a performance enhancement.
 I could be wrong about this, and zero counts may appear under some other as 
 yet untested circumstances. Perhaps an expert familiar with this part of the 
 code can clarify.
 In fact, this is not such a bad thing (at least for my requirements), as a 
 whole bunch of zero counts is not necessarily useful (for my requirements, 
 starting at '1' is just right).
  
 There may, however, be instances where someone *will* want zero counts - e.g. 
 searching for zero product stock counts (e.g. 'what have we run out of'). I 
 was envisioning the facet.mincount field
 being the preferred place to set where the 'lowest value' begins (e.g. 0 or 1 
 or possibly higher), but because of the caching/optimization, the behaviour 
 is somewhat different than expected.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1706) wrong tokens output from WordDelimiterFilter when english possessives are in the text

2010-01-07 Thread Robert Muir (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797829#action_12797829
 ] 

Robert Muir commented on SOLR-1706:
---

its not just the concatenation, but also the subword generation.

In the case below, Autocoder should not be emitted, as only numeric subword 
generation is turned on.

{code}
  public void test128() throws Exception {
assertWdf(word 1234 Super-Duper-XL500-42-Autocoder x'sbd123 a4b3c-, 
0,1,0,0,0,0,0,0,0, null,
  new String[] { word, 1234, 42, Autocoder, a4b3c },
  new int[] { 0, 5, 28, 31, 50 },
  new int[] { 4, 9, 30, 40, 55 },
  new int[] { 1, 1, 1, 1, 2 });
  }
{code}

 wrong tokens output from WordDelimiterFilter when english possessives are in 
 the text
 -

 Key: SOLR-1706
 URL: https://issues.apache.org/jira/browse/SOLR-1706
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir

 the WordDelimiterFilter english possessive stemming 's  removal (on by 
 default) unfortunately causes strange behavior:
 below you can see that when I have requested to only output numeric 
 concatenations (not words), these english possessive stems are still 
 sometimes output, ignoring the options i have provided, and even then, in a 
 very inconsistent way.
 {code}
   assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
 new String[] { 42, AutoCoder },
 new int[] { 18, 21 },
 new int[] { 20, 30 },
 new int[] { 1, 1 });
   assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null,
 new String[] { 42, AutoCoder, 56 },
 new int[] { 18, 21, 33 },
 new int[] { 20, 30, 35 },
 new int[] { 1, 1, 1 });
   assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
 new String[] {  },
 new int[] {  },
 new int[] {  },
 new int[] {  });
   assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null,
 new String[] { 42 },
 new int[] { 18 },
 new int[] { 20 },
 new int[] { 1 });
 {code}
 where assertWdf is 
 {code}
   void assertWdf(String text, int generateWordParts, int generateNumberParts,
   int catenateWords, int catenateNumbers, int catenateAll,
   int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
   int stemEnglishPossessive, CharArraySet protWords, String expected[],
   int startOffsets[], int endOffsets[], String types[], int posIncs[])
   throws IOException {
 TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
 WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
 generateNumberParts, catenateWords, catenateNumbers, catenateAll,
 splitOnCaseChange, preserveOriginal, splitOnNumerics,
 stemEnglishPossessive, protWords);
 assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
 posIncs);
   }
 {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (SOLR-1706) wrong tokens output from WordDelimiterFilter depending upon options

2010-01-07 Thread Robert Muir (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Robert Muir updated SOLR-1706:
--

Description: 
below you can see that when I have requested to only output numeric 
concatenations (not words), some words are still sometimes output, ignoring the 
options i have provided, and even then, in a very inconsistent way.

{code}
  assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42, AutoCoder },
new int[] { 18, 21 },
new int[] { 20, 30 },
new int[] { 1, 1 });

  assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42, AutoCoder, 56 },
new int[] { 18, 21, 33 },
new int[] { 20, 30, 35 },
new int[] { 1, 1, 1 });

  assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
new String[] {  },
new int[] {  },
new int[] {  },
new int[] {  });

  assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42 },
new int[] { 18 },
new int[] { 20 },
new int[] { 1 });
{code}

where assertWdf is 
{code}
  void assertWdf(String text, int generateWordParts, int generateNumberParts,
  int catenateWords, int catenateNumbers, int catenateAll,
  int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
  int stemEnglishPossessive, CharArraySet protWords, String expected[],
  int startOffsets[], int endOffsets[], String types[], int posIncs[])
  throws IOException {
TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
generateNumberParts, catenateWords, catenateNumbers, catenateAll,
splitOnCaseChange, preserveOriginal, splitOnNumerics,
stemEnglishPossessive, protWords);
assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
posIncs);
  }
{code}


  was:
the WordDelimiterFilter english possessive stemming 's  removal (on by 
default) unfortunately causes strange behavior:

below you can see that when I have requested to only output numeric 
concatenations (not words), these english possessive stems are still sometimes 
output, ignoring the options i have provided, and even then, in a very 
inconsistent way.

{code}
  assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42, AutoCoder },
new int[] { 18, 21 },
new int[] { 20, 30 },
new int[] { 1, 1 });

  assertWdf(Super-Duper-XL500-42-AutoCoder's-56, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42, AutoCoder, 56 },
new int[] { 18, 21, 33 },
new int[] { 20, 30, 35 },
new int[] { 1, 1, 1 });

  assertWdf(Super-Duper-XL500-AB-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
new String[] {  },
new int[] {  },
new int[] {  },
new int[] {  });

  assertWdf(Super-Duper-XL500-42-AutoCoder's-BC, 0,0,0,1,0,0,0,0,1, null,
new String[] { 42 },
new int[] { 18 },
new int[] { 20 },
new int[] { 1 });
{code}

where assertWdf is 
{code}
  void assertWdf(String text, int generateWordParts, int generateNumberParts,
  int catenateWords, int catenateNumbers, int catenateAll,
  int splitOnCaseChange, int preserveOriginal, int splitOnNumerics,
  int stemEnglishPossessive, CharArraySet protWords, String expected[],
  int startOffsets[], int endOffsets[], String types[], int posIncs[])
  throws IOException {
TokenStream ts = new WhitespaceTokenizer(new StringReader(text));
WordDelimiterFilter wdf = new WordDelimiterFilter(ts, generateWordParts,
generateNumberParts, catenateWords, catenateNumbers, catenateAll,
splitOnCaseChange, preserveOriginal, splitOnNumerics,
stemEnglishPossessive, protWords);
assertTokenStreamContents(wdf, expected, startOffsets, endOffsets, types,
posIncs);
  }
{code}


Summary: wrong tokens output from WordDelimiterFilter depending upon 
options  (was: wrong tokens output from WordDelimiterFilter when english 
possessives are in the text)

 wrong tokens output from WordDelimiterFilter depending upon options
 ---

 Key: SOLR-1706
 URL: https://issues.apache.org/jira/browse/SOLR-1706
 Project: Solr
  Issue Type: Bug
  Components: Schema and Analysis
Affects Versions: 1.4
Reporter: Robert Muir

 below you can see that when I have requested to only output numeric 
 concatenations (not words), some words are still sometimes output, ignoring 
 the options i have provided, and even then, in a very inconsistent way.
 {code}
   assertWdf(Super-Duper-XL500-42-AutoCoder's, 0,0,0,1,0,0,0,0,1, null,
 new String[] { 42, AutoCoder },
 new int[] { 18, 21 },
 new int[] { 20, 30 },
 new int[] { 1, 1 });
   assertWdf(Super-Duper-XL500-42-AutoCoder's-56,

[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

2010-01-07 Thread Koji Sekiguchi (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797841#action_12797841
 ] 

Koji Sekiguchi commented on SOLR-1696:
--

Noble, thank you for opening this and attaching the patch! Are you planning to 
commit this shortly? because I'm ready to commit SOLR-1268 that is using old 
style config. If you commit it, I'll rewrite SOLR-1268. Or I can assign 
SOLR-1696 to me.

 Deprecate old highlighting syntax and move configuration to 
 HighlightComponent
 

 Key: SOLR-1696
 URL: https://issues.apache.org/jira/browse/SOLR-1696
 Project: Solr
  Issue Type: Improvement
  Components: highlighter
Reporter: Noble Paul
 Fix For: 1.5

 Attachments: SOLR-1696.patch


 There is no reason why we should have a custom syntax for highlighter 
 configuration.
 It can be treated like any other SearchComponent and all the configuration 
 can go in there.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (SOLR-1709) Distributed Date Faceting

2010-01-07 Thread Peter Sturge (JIRA)

Distributed Date Faceting
-

 Key: SOLR-1709
 URL: https://issues.apache.org/jira/browse/SOLR-1709
 Project: Solr
  Issue Type: Improvement
  Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor


This patch is for adding support for date facets when using distributed 
searches.

Date faceting across multiple machines exposes some time-based issues that 
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch (i.e. 
merged date facets are at a time-of-day, not necessarily at a universal 
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the basis 
for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to the 
first by 1 'gap', these 'earlier' or 'later' facets will not be merged in.
There are several reasons for this:
  * Performance: It's faster to check facet_date lists against a single map's 
data, rather than against each other, particularly if there are many shards
  * If 'earlier' and/or 'later' facet_dates are added in, this will make the 
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3 or 
more hours of data)
This could be dealt with if timezone and skew information was added, and 
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and 
'now' parameters to the 'facet_dates' map. This would tell requesters what time 
and TZ the remote server thinks it is, and so multiple shards' time data can be 
normalized.

The patch affects 2 files in the Solr core:
  org.apache.solr.handler.component.FacetComponent.java
  org.apache.solr.handler.component.ResponseBuilder.java

The main changes are in FacetComponent - ResponseBuilder is just to hold the 
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but 
really, if facet.date parameters are specified, it is assumed they are desired.
Comments  suggestions welcome.

As a favour to ask, if anyone could take my 2 source files and create a PATCH 
file from it, it would be greatly appreciated, as I'm having a bit of trouble 
with svn (don't shoot me, but my environment is a Redmond-based os company).


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (SOLR-1709) Distributed Date Faceting

2010-01-07 Thread Jason Rutherglen (JIRA)

[
https://issues.apache.org/jira/browse/SOLR-1709?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12797898#action_12797898
]

Jason Rutherglen commented on SOLR-1709:

Tim,

Thanks for the patch...

bq. as I'm having a bit of trouble with svn (don't shoot me, but my environment
is a Redmond-based os company).

TortoiseSVN works well on Windows, even for creating patches. Have you tried
it?

Distributed Date Faceting
-

Key: SOLR-1709
URL: https://issues.apache.org/jira/browse/SOLR-1709
Project: Solr
Issue Type: Improvement
Components: SearchComponents - other
Affects Versions: 1.4
Reporter: Peter Sturge
Priority: Minor

This patch is for adding support for date facets when using distributed
searches.
Date faceting across multiple machines exposes some time-based issues that
anyone interested in this behaviour should be aware of:
Any time and/or time-zone differences are not accounted for in the patch
(i.e. merged date facets are at a time-of-day, not necessarily at a universal
'instant-in-time', unless all shards are time-synced to the exact same time).
The implementation uses the first encountered shard's facet_dates as the
basis for subsequent shards' data to be merged in.
This means that if subsequent shards' facet_dates are skewed in relation to
the first by 1 'gap', these 'earlier' or 'later' facets will not be merged
in.
There are several reasons for this:
* Performance: It's faster to check facet_date lists against a single map's
data, rather than against each other, particularly if there are many shards
* If 'earlier' and/or 'later' facet_dates are added in, this will make the
time range larger than that which was requested
(e.g. a request for one hour's worth of facets could bring back 2, 3
or more hours of data)
This could be dealt with if timezone and skew information was added, and
the dates were normalized.
One possibility for adding such support is to [optionally] add 'timezone' and
'now' parameters to the 'facet_dates' map. This would tell requesters what
time and TZ the remote server thinks it is, and so multiple shards' time data
can be normalized.
The patch affects 2 files in the Solr core:
org.apache.solr.handler.component.FacetComponent.java
org.apache.solr.handler.component.ResponseBuilder.java
The main changes are in FacetComponent - ResponseBuilder is just to hold the
completed SimpleOrderedMap until the finishStage.
One possible enhancement is to perhaps make this an optional parameter, but
really, if facet.date parameters are specified, it is assumed they are
desired.
Comments suggestions welcome.
As a favour to ask, if anyone could take my 2 source files and create a PATCH
file from it, it would be greatly appreciated, as I'm having a bit of trouble
with svn (don't shoot me, but my environment is a Redmond-based os company).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Hudson build is back to normal: Solr-trunk #1024

2010-01-07 Thread Apache Hudson Server

See http://hudson.zones.apache.org/hudson/job/Solr-trunk/1024/changes

[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

[jira] Commented: (SOLR-1653) add PatternReplaceCharFilter

[jira] Updated: (SOLR-1708) Allowing import / update of a specific document using the data import handler

[jira] Created: (SOLR-1708) Allowing import / update of a specific document using the data import handler

Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

[jira] Commented: (SOLR-1602) Refactor SOLR package structure to include o.a.solr.response and move QueryResponseWriters in there

[jira] Commented: (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**

[jira] Commented: (SOLR-1707) Use google collections immutable collections instead of Collections.unmodifiable**

[jira] Commented: (SOLR-236) Field collapsing

[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

[jira] Commented: (SOLR-1680) Provide an API to specify custom Collectors

[jira] Commented: (SOLR-236) Field collapsing

[jira] Issue Comment Edited: (SOLR-236) Field collapsing

Re: Race condition in org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer

[jira] Updated: (SOLR-1698) load balanced distributed search

[jira] Resolved: (SOLR-1672) RFE: facet reverse sort count

[jira] Commented: (SOLR-1706) wrong tokens output from WordDelimiterFilter when english possessives are in the text

[jira] Updated: (SOLR-1706) wrong tokens output from WordDelimiterFilter depending upon options

[jira] Commented: (SOLR-1696) Deprecate old highlighting syntax and move configuration to HighlightComponent

[jira] Created: (SOLR-1709) Distributed Date Faceting

[jira] Commented: (SOLR-1709) Distributed Date Faceting

Hudson build is back to normal: Solr-trunk #1024

24 matches

Site Navigation

Mail list logo

Footer information