> In this case, there is no data on the grid a priori; the data has to come
> into the grid from a DB. So what would the C/M mappers run on?  Is there a
> way to run say 5 mappers without having 5 blocks of data on HDFS?

No, but the data doesn't have to come from DB. The first copy yes, but other 4 
can be replicated within the cluster. 

Which is exactly what export + dump does - except that it is cumbersome to 
use, so it would be better if it could be done automatically (within Pig 
loader).

That's how I see it at least... :)

Anze


On Thursday 04 November 2010, Jai Krishna wrote:
> Ankur,
> 
> In this case, there is no data on the grid a priori; the data has to come
> into the grid from a DB. So what would the C/M mappers run on?  Is there a
> way to run say 5 mappers without having 5 blocks of data on HDFS?
> 
> Just trying to wrap my head around this; pl. excuse me if Im missing
> something obvious.
> 
> Thanks
> Jai
> 
> On 11/3/10 7:48 PM, "Ankur C. Goel" <[email protected]> wrote:
> 
> Hitting the database from multiple mappers is not such a great idea IF
> there are hundreds/thousands of mappers involved processing hundreds of
> GBs. of data. This could easily saturate the I/O bandwidth of the database
> server creating a  bottleneck in the processing.  Export and dump to HDFS
> is a better option
> 
> -...@nkur
> 
> On 11/3/10 5:02 PM, "Anze" <[email protected]> wrote:
> 
> Sonal,
> 
> Thanks for answering!
> 
> Hiho sounds nice, but from what I gathered, it is more a low-level
> interface for efficient loading from and storing to SQL DBs?
> (in other words, there is no loader and storage for Pig yet)
> 
> I wrote a batch to export DB to local files and then copy them to HDFS, so
> there is no gain for me in using another type of export (unless it can be
> used directly from Pig and/or keeps the schema intact), but it's nice to
> know it exists.
> 
> It just seems weird that there is no DB loader for Pig yet. I tried writing
> it but it would take more time than I have at the moment... I have a
> problem to solve ASAP. :)
> 
> Thanks,
> 
> Anze
> 
> On Wednesday 03 November 2010, Sonal Goyal wrote:
> > Anze,
> > 
> > You can check hiho as well:
> > 
> > http://code.google.com/p/hiho/wiki/DatabaseImportFAQ
> > 
> > Let me know if you need any help.
> > 
> > Thanks and Regards,
> > Sonal
> > 
> > Sonal Goyal | Founder and CEO | Nube Technologies LLP
> > http://www.nubetech.co | http://in.linkedin.com/in/sonalgoyal
> > 
> > 
> > 
> > 
> > 
> > 2010/11/3 Anze <[email protected]>
> > 
> > > Alejandro, thanks for answering!
> > > 
> > > I was hoping it could be done directly from Pig, but... :)
> > > 
> > > I'll take a look at Sqoop then, and if that doesn't help, I'll just
> > > write a simple batch to export data to TXT/CSV. Thanks for the
> > > pointer!
> > > 
> > > Anze
> > > 
> > > On Wednesday 03 November 2010, Alejandro Abdelnur wrote:
> > > > Not a 100% Pig solution, but you could use Sqoop to get the data in
> > > > as a pre-processing step. And if you want to handle all as single
> > > > job, you
> > > 
> > > could
> > > 
> > > > use Oozie to create a workflow that does Sqoop and then your Pig
> > > > processing.
> > > > 
> > > > Alejandro
> > > > 
> > > > On Wed, Nov 3, 2010 at 3:22 PM, Anze <[email protected]> wrote:
> > > > > Hi!
> > > > > 
> > > > > Part of data I have resides in MySQL. Is there a loader that would
> > > 
> > > allow
> > > 
> > > > > loading directly from it?
> > > > > 
> > > > > I can't find anything on the net, but it seems to me this must be a
> > > 
> > > quite
> > > 
> > > > > common problem.
> > > > > I checked piggybank but there is only DBStorage (and no DBLoader).
> > > > > 
> > > > > Is some DBLoader out there too?
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Anze

Reply via email to