Hi Noora, welcome to Apache Gora in particular =) +1 well said about Apache Gora, Tim
- Henry On Wed, Apr 30, 2014 at 6:06 AM, Tim Robertson <[email protected]> wrote: > Hi Noora, > > Welcome to the world of the Hadoop - It is a vast eco system and is quite > daunting at first. > > Perhaps if I summarize a few of the key technologies which build on each > other it might help you navigate things: > > a) Hadoop DFS - the distributed file system > b) Hadoop MapReduce (MR) - a distributed framework for processing where you > right Maps and Reduces. It is batch oriented, with 30+ sec latency to start > even the smallest jobs, so not ideally suited to interactive operations > c) Sqoop is a library that allows you to run MR jobs that either suck data > from a DB to HDFS or vice versa. It supports a variety of formats, such as > Avro (a data format where the schema is embedded) > d) You didn't mention it but Hive is a SQL layer, that allows to you to run > SQL as MR jobs. A common use is MySQL -> Sqoop -> HDFS -> Hive > e) HBase - a "big table" technology that allows you to have a column > oriented data stored, and you can GET or PUT by key, or perform limited > operations. > > So what is Gora? > Gora is a effectively an Object Relational Mapper, that allows you to define > the table definition using Avro format, and provide a mapping of how each > field is stored against the backend system and then Gora takes care of CRUD > operations and mediation with the backend, without the caller actually > knowing how to use the backend API. Various backends are supported. Thus I > can do Person p = new Person("Tim") and then "gora save Tim" - Gora will > then take care of saving my object in (e.g.) HBase. There are connectors > that allow you to run MR jobs over Gora stores as well. Gora is similar to > the likes of MyBATIS if you are familiar with that, but support "Hadoop > technologies" as backends, and provides MR capability allowing you to MR > across various backends consistently. > > So is gora real time or not - yes it is real time for CRUD, but MR type jobs > are batch operations, with reasonably high latency. > Does gora block? that depends on the backend... With HBase updates for > example, you typically either overwrite, or fail the update on a race > condition, and scans are non blocking. > > Perhaps if you explain what you are trying to do, the list can help advise > you if Gora is a suitable option, or could suggest the appropriate Hadoop > list to ask? > > I hope this helps, > Tim > > > > > > > On Wed, Apr 30, 2014 at 2:25 PM, Noora <[email protected]> wrote: >> >> Hi All, >> >> I want to integrate mysql and hdfs in my hadoop project. I searched a lot >> about different ways, there was two approach: real time using "mysql applier >> for hadoop" and "apache sqoop" for non real time uses. >> >> Then I found that Gora has this ability too but I could not find any >> information about how it works. >> >> Is Gora real time or not? What is the difference between gora and mysql >> applier or sqoop? If realtime, is db process blocking or not? >> For integration of hadoop and mysql, does it need any nosql db as >> interface? >> >> thanx > >

