Ankur, In this case, there is no data on the grid a priori; the data has to come into the grid from a DB. So what would the C/M mappers run on? Is there a way to run say 5 mappers without having 5 blocks of data on HDFS?
Just trying to wrap my head around this; pl. excuse me if Im missing something obvious. Thanks Jai On 11/3/10 7:48 PM, "Ankur C. Goel" <[email protected]> wrote: Hitting the database from multiple mappers is not such a great idea IF there are hundreds/thousands of mappers involved processing hundreds of GBs. of data. This could easily saturate the I/O bandwidth of the database server creating a bottleneck in the processing. Export and dump to HDFS is a better option -...@nkur On 11/3/10 5:02 PM, "Anze" <[email protected]> wrote: Sonal, Thanks for answering! Hiho sounds nice, but from what I gathered, it is more a low-level interface for efficient loading from and storing to SQL DBs? (in other words, there is no loader and storage for Pig yet) I wrote a batch to export DB to local files and then copy them to HDFS, so there is no gain for me in using another type of export (unless it can be used directly from Pig and/or keeps the schema intact), but it's nice to know it exists. It just seems weird that there is no DB loader for Pig yet. I tried writing it but it would take more time than I have at the moment... I have a problem to solve ASAP. :) Thanks, Anze On Wednesday 03 November 2010, Sonal Goyal wrote: > Anze, > > You can check hiho as well: > > http://code.google.com/p/hiho/wiki/DatabaseImportFAQ > > Let me know if you need any help. > > Thanks and Regards, > Sonal > > Sonal Goyal | Founder and CEO | Nube Technologies LLP > http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal > > > > > > 2010/11/3 Anze <[email protected]> > > > Alejandro, thanks for answering! > > > > I was hoping it could be done directly from Pig, but... :) > > > > I'll take a look at Sqoop then, and if that doesn't help, I'll just write > > a simple batch to export data to TXT/CSV. Thanks for the pointer! > > > > Anze > > > > On Wednesday 03 November 2010, Alejandro Abdelnur wrote: > > > Not a 100% Pig solution, but you could use Sqoop to get the data in as > > > a pre-processing step. And if you want to handle all as single job, > > > you > > > > could > > > > > use Oozie to create a workflow that does Sqoop and then your Pig > > > processing. > > > > > > Alejandro > > > > > > On Wed, Nov 3, 2010 at 3:22 PM, Anze <[email protected]> wrote: > > > > Hi! > > > > > > > > Part of data I have resides in MySQL. Is there a loader that would > > > > allow > > > > > > loading directly from it? > > > > > > > > I can't find anything on the net, but it seems to me this must be a > > > > quite > > > > > > common problem. > > > > I checked piggybank but there is only DBStorage (and no DBLoader). > > > > > > > > Is some DBLoader out there too? > > > > > > > > Thanks, > > > > > > > > Anze
