Hi Noora, welcome to Apache Gora in particular =)

+1 well said about Apache Gora, Tim

- Henry

On Wed, Apr 30, 2014 at 6:06 AM, Tim Robertson
<[email protected]> wrote:
> Hi Noora,
>
> Welcome to the world of the Hadoop - It is a vast eco system and is quite
> daunting at first.
>
> Perhaps if I summarize a few of the key technologies which build on each
> other it might help you navigate things:
>
> a) Hadoop DFS - the distributed file system
> b) Hadoop MapReduce (MR) - a distributed framework for processing where you
> right Maps and Reduces.  It is batch oriented, with 30+ sec latency to start
> even the smallest jobs, so not ideally suited to interactive operations
> c) Sqoop is a library that allows you to run MR jobs that either suck data
> from a DB to HDFS or vice versa.  It supports a variety of formats, such as
> Avro (a data format where the schema is embedded)
> d) You didn't mention it but Hive is a SQL layer, that allows to you to run
> SQL as MR jobs.  A common use is MySQL -> Sqoop -> HDFS -> Hive
> e) HBase - a "big table" technology that allows you to have a column
> oriented data stored, and you can GET or PUT by key, or perform limited
> operations.
>
> So what is Gora?
> Gora is a effectively an Object Relational Mapper, that allows you to define
> the table definition using Avro format, and provide a mapping of how each
> field is stored against the backend system and then Gora takes care of CRUD
> operations and mediation with the backend, without the caller actually
> knowing how to use the backend API.  Various backends are supported.  Thus I
> can do Person p = new Person("Tim") and then "gora save Tim" - Gora will
> then take care of saving my object in (e.g.) HBase.  There are connectors
> that allow you to run MR jobs over Gora stores as well.  Gora is similar to
> the likes of MyBATIS if you are familiar with that, but support "Hadoop
> technologies" as backends, and provides MR capability allowing you to MR
> across various backends consistently.
>
> So is gora real time or not - yes it is real time for CRUD, but MR type jobs
> are batch operations, with reasonably high latency.
> Does gora block? that depends on the backend... With HBase updates for
> example, you typically either overwrite, or fail the update on a race
> condition, and scans are non blocking.
>
> Perhaps if you explain what you are trying to do, the list can help advise
> you if Gora is a suitable option, or could suggest the appropriate Hadoop
> list to ask?
>
> I hope this helps,
> Tim
>
>
>
>
>
>
> On Wed, Apr 30, 2014 at 2:25 PM, Noora <[email protected]> wrote:
>>
>> Hi All,
>>
>> I want to integrate mysql and hdfs in my hadoop project. I searched a lot
>> about different ways, there was two approach: real time using "mysql applier
>> for hadoop" and "apache sqoop" for non real time uses.
>>
>> Then I found that Gora has this ability too but I could not find any
>> information about how it works.
>>
>> Is Gora real time or not? What is the difference between gora and mysql
>> applier or sqoop? If realtime, is db process blocking or not?
>> For integration of hadoop and mysql, does it need any nosql db as
>> interface?
>>
>> thanx
>
>

Reply via email to