I havent seen anyone use solr as an input format.  I think you'll have
issues with controlling parallel reads across different splits in the data.
 You could write an custom hadoop input format that first queries solr for
a document count, and then queries solr for the actual documents, and for
each split specifies what document number to start on, and how many
documents to return.

Depending on your need for performance, automation, and such, I think your
best bet would be to just write a custom tool that reads all the documents
you want from solr, and outputs them in a text document that can be easily
read by pig.


On Wed, Nov 30, 2011 at 4:42 AM, kumar swami <[email protected]> wrote:

> Hi friends,
>
> I am new to Pig library. I need help on how to read data from solr using
> pig?. If you have any code samples please provide me.
>
> Thanks, swami
>



-- 

Thanks,
John C

Reply via email to