Has anyone tried using any RDBMS with the hadoop? If the data is stored in
the database is there any way we can use the mapreduce with the database
instead of the HDFS?
--
Lakshmi Narayanan
Xuan Dzung Doan wrote:
Hi,
I'm a Hadoop newbie. My question is as follows:
The level of parallelism of a job, with respect to mappers, is largely the
number of map tasks spawned, which is equal to the number of InputSplits. But
within each InputSplit, there may be many records (many input key
Hi,
I'm a Hadoop newbie. My question is as follows:
The level of parallelism of a job, with respect to mappers, is largely the
number of map tasks spawned, which is equal to the number of InputSplits. But
within each InputSplit, there may be many records (many input key-value pairs),
each is p
https://issues.apache.org/jira/browse/HADOOP-1700
"过佳" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Does HDFS support it?I need it to be synchronized , e.g. I call many
clients
to write a lots of IntWritable to one file.
Best.
Jarvis.
We wrote some custom tools that poll for new data and launch jobs
periodically.
Matt
On Tue, 2008-06-24 at 09:27 -0700, Vadim Zaliva wrote:
> Matt,
>
> How do you manage your tasks? Do you lauch them periodically or keep
> them somehow running and feed them data?
>
> Vadim
>
>
> On Mon, Jun 2
On Jun 23, 2008, at 9:54 PM, Matt Kent wrote:
Unless you have a significant amount of work to be done, I wouldn't
recommend using Hadoop because it's not worth the overhead of
launching
the jobs and moving the data around.
I think part of the tradeoff is having a system that is resilient t
Matt,
How do you manage your tasks? Do you lauch them periodically or keep
them somehow running and feed them data?
Vadim
On Mon, Jun 23, 2008 at 21:54, Matt Kent <[EMAIL PROTECTED]> wrote:
> We use Hadoop in a similar manner, to process batches of data in
> real-time every few minutes. However
Fernando Padilla wrote:
One use case I have a question about, is using Hadoop to power a web
search or other query. So the full job should be done in under a second,
from start to finish.
I don't think you should be using hadoop to answer the results of a
user's search query.
you should be loo
Matt Kent wrote:
We use Hadoop in a similar manner, to process batches of data in
real-time every few minutes. However, we do substantial amounts of
processing on that data, so we use Hadoop to distribute our computation.
Unless you have a significant amount of work to be done, I wouldn't
recomme