[
https://issues.apache.org/jira/browse/SOLR-1045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12678200#action_12678200
]
Ning Li commented on SOLR-1045:
-------------------------------
The purpose of this simple initial version is to give people an idea of the
functionality. It uses Hadoop contrib/index, which uses Hadoop mapred package.
Future versions will be very different from this version. The main difference
is that in this version, after a Solr input document is converted to a Lucene
document, a Lucene index writer is used to build the index. In future versions,
a Solr writer/core will be used.
Here are some pre-requisites for this issue:
- Hadoop 0.20. Hadoop 0.20 is to be released. There are two features in 0.20
that are important for this issue.
First is the new mapreduce package. The flexibility of the new mapreduce
api makes it possible to use a Solr writer/core in mapper tasks.
Second is the upgrade to Jetty 6 (6.1.14). The current release 0.19 uses
Jetty 5.
- There are a couple of changes required in Solr.
First is to make SolrCore support an indexing-only mode (i.e. no search).
Only then is it feasible to use it for indexing in a map task.
Second is to upgrate from Jetty 6.1.3 to Jetty 6.1.14. Hadoop 0.20 uses a
feature that is not available in 6.1.3.
What do you think about making "SolrCore support an indexing-only mode"?
> Build Solr index using Hadoop MapReduce
> ---------------------------------------
>
> Key: SOLR-1045
> URL: https://issues.apache.org/jira/browse/SOLR-1045
> Project: Solr
> Issue Type: New Feature
> Reporter: Ning Li
> Attachments: SOLR-1045.0.patch
>
>
> The goal is a contrib module that builds Solr index using Hadoop MapReduce.
> It is different from the Solr support in Nutch. The Solr support in Nutch
> sends a document to a Solr server in a reduce task. Here, the goal is to
> build/update Solr index within map/reduce tasks. Also, it achieves better
> parallelism when the number of map tasks is greater than the number of reduce
> tasks, which is usually the case.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.