I havent seen anyone use solr as an input format. I think you'll have issues with controlling parallel reads across different splits in the data. You could write an custom hadoop input format that first queries solr for a document count, and then queries solr for the actual documents, and for each split specifies what document number to start on, and how many documents to return.
Depending on your need for performance, automation, and such, I think your best bet would be to just write a custom tool that reads all the documents you want from solr, and outputs them in a text document that can be easily read by pig. On Wed, Nov 30, 2011 at 4:42 AM, kumar swami <[email protected]> wrote: > Hi friends, > > I am new to Pig library. I need help on how to read data from solr using > pig?. If you have any code samples please provide me. > > Thanks, swami > -- Thanks, John C
