[ https://issues.apache.org/jira/browse/SOLR-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784212#action_12784212 ]
Andrzej Bialecki commented on SOLR-1614: ----------------------------------------- If query performance is not a concern, then why not execute it directly on HDFS (using e.g. Nutch FsDirectory to read indexes from HDFS)? > Search in Hadoop > ---------------- > > Key: SOLR-1614 > URL: https://issues.apache.org/jira/browse/SOLR-1614 > Project: Solr > Issue Type: New Feature > Components: search > Affects Versions: 1.4 > Reporter: Jason Rutherglen > Priority: Minor > Fix For: 1.5 > > > What's the use case? Sometimes queries are expensive (such as > regex) or one has indexes located in HDFS, that then need to be > searched on. By leveraging Hadoop, these non-time sensitive > queries may be executed without dynamically deploying the > indexes to new Solr servers. > We'll download the index out of HDFS (assuming they're zipped), > perform the queries in a batch on the index shard, then merge > the results either using a Solr query results priority queue, or > simply using Hadoop's built in merge sorting. > The query file will be encoded in JSON format, (ID, query, > numresults,fields). The shards file will simply contain newline > delimited paths (HDFS or otherwise). The output can be a Solr > encoded results file per query. > I'm hoping to add an actual Hadoop unit test. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.