[ 
https://issues.apache.org/jira/browse/SOLR-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12784212#action_12784212
 ] 

Andrzej Bialecki  commented on SOLR-1614:
-----------------------------------------

If query performance is not a concern, then why not execute it directly on HDFS 
(using e.g. Nutch FsDirectory to read indexes from HDFS)?

> Search in Hadoop
> ----------------
>
>                 Key: SOLR-1614
>                 URL: https://issues.apache.org/jira/browse/SOLR-1614
>             Project: Solr
>          Issue Type: New Feature
>          Components: search
>    Affects Versions: 1.4
>            Reporter: Jason Rutherglen
>            Priority: Minor
>             Fix For: 1.5
>
>
> What's the use case? Sometimes queries are expensive (such as
> regex) or one has indexes located in HDFS, that then need to be
> searched on. By leveraging Hadoop, these non-time sensitive
> queries may be executed without dynamically deploying the
> indexes to new Solr servers. 
> We'll download the index out of HDFS (assuming they're zipped),
> perform the queries in a batch on the index shard, then merge
> the results either using a Solr query results priority queue, or
> simply using Hadoop's built in merge sorting. 
> The query file will be encoded in JSON format, (ID, query,
> numresults,fields). The shards file will simply contain newline
> delimited paths (HDFS or otherwise). The output can be a Solr
> encoded results file per query.
> I'm hoping to add an actual Hadoop unit test.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to