Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change 
notification.

The "DocumentProcessing" page has been changed by JanHoydahl.
The comment on this change is: Clarification.
http://wiki.apache.org/solr/DocumentProcessing?action=diff&rev1=12&rev2=13

--------------------------------------------------

  
  = Anti-patterns =
   * Do not over-architecture like Eclipse SMILA and others have done going 
crazy with ESB etc
+  * Do not try to be a connector framework as well. Let ManifoldCF do that 
job. Focuson on the pipeline!
+  * Do not keep the source private (although Apache licensed) as DieselPoint 
did with OpenPipeline - create a community!
  
  = Proposed architecture =
  
[[https://docs.google.com/drawings/edit?id=1rVsy-p7sexSw3wrald2_fHtkLk6opYp5qzllvOHOB8c&hl=en|Architecture
 diagram]]
@@ -66, +68 @@

  Glue code to hook the pipeline into Solr could be an UpdateRequestProcessor 
which can either work in "local" mode, executing the pipeline locally 
in-thread, or in "distributed" mode which would dispatch the batch to an 
available node in a document processing cluster.
  
  I envision that the whole pipeline could (in addition to running standalone) 
be wrapped in a Solr RequestHandler i.e. a Document-processing-only node would 
be an instance of Solr with a new BinaryDocumentRequestHandler, without a local 
index. When processing is finished, the documents are routed to the final 
destination for indexing (perhpas using 
[[https://issues.apache.org/jira/browse/SOLR-2358|SOLR-2358]]).
+ 
+ The architecture diagram above shows the local and the fully distributed 
cases. Another option would be to round-robin feeding to the set of pipeline 
nodes directly (not needing a BinaryDocumentRequestHandler), and letting them 
do the distributed indexing as the last UdateProcessor.
  
  = Risks =
   * Automated distributed indexing 
[[https://issues.apache.org/jira/browse/SOLR-2358|SOLR-2358]] needs to work 
with this

Reply via email to