Re: Spark vs Tez

2014-10-19 Thread Mohan Radhakrishnan
Is Tez's architecture similar to Akka's distributed architecture ? I think I remember that Jonas boner mentioned during a presentation on distributed computing about Akka's support for protocols like raft etc. What makes Tez more scalable in this regard ? Thanks, Mohan On Sun, Oct 19, 2014 at

Re: Spark vs Tez

2014-10-18 Thread Mohan Radhakrishnan
I remember Spark uses Akka clusters. Isn't that totally different from other distributed technologies ? Thanks, Mohan On Sat, Oct 18, 2014 at 1:52 PM, Niels Basjes ni...@basjes.nl wrote: It is my understanding that one of the big differences between Tez and Spark is is that a Tez based query

Re: Hadoop and Open Data (CKAN.org).

2014-09-04 Thread Mohan Radhakrishnan
I understand that coding MR jobs using a language is required but if we are just processing large amounts of data (Machine Learning for example) we could use Pig. I recently processed 0.25 TB on AWS clusters in a reasonably short time. In this case the development effort is very less. Thanks,

Re: Started learning Hadoop. Which distribution is best for native install in pseudo distributed mode?

2014-08-15 Thread Mohan Radhakrishnan
Actually there was another thread about using MR for ML but I didn't see many responses. I use Octave or R for this but it would be useful to know how this is solved using Hadoop. The closest community that has an interest in this could be H2o but they have implemented MR for their engine to

Re: Managed File Transfer

2014-07-09 Thread Mohan Radhakrishnan
there, but I am not sure if you want to use it. Regards, *Stanley Shi,* On Mon, Jul 7, 2014 at 10:02 PM, Mohan Radhakrishnan radhakrishnan.mo...@gmail.com wrote: Hi, We used a commercial FT and scheduler tool in clustered mode. This was a traditional active-active cluster

Managed File Transfer

2014-07-07 Thread Mohan Radhakrishnan
Hi, We used a commercial FT and scheduler tool in clustered mode. This was a traditional active-active cluster that supported multiple protocols like FTPS etc. Now I am interested in evaluating a Distributed way of crawling FTP sites and downloading files using Hadoop. I thought

Practical examples

2014-04-28 Thread Mohan Radhakrishnan
Hi, I have been reading the definitive guide and taking online courses. Now I would like to understand how Hadoop is used for more real-time scenarios. Are machine learning, language processing and fraud detection examples available ? What are the other practical usecases ? I am familiar

Re: Practical examples

2014-04-28 Thread Mohan Radhakrishnan
framework. Regards, Shahab On Mon, Apr 28, 2014 at 10:02 PM, Mohan Radhakrishnan radhakrishnan.mo...@gmail.com wrote: Hi, I have been reading the definitive guide and taking online courses. Now I would like to understand how Hadoop is used for more real-time scenarios

Re: calling mapreduce from webservice

2014-04-18 Thread Mohan Radhakrishnan
Play framework is reactive and uses push channels. It may be useful here if the UI has to be asynchronous and reactive. Mohan On Sat, Apr 19, 2014 at 4:37 AM, Shahab Yunus shahab.yu...@gmail.comwrote: As far as I know there is no API to kick of M/R jobs. There is for M/R v2, a REST API to

Hadoop distribution(2-node cluster)

2014-04-14 Thread Mohan Radhakrishnan
Hi, As the subject implies I have 2 nodes, one is OSX and the other is linux. How is a distributed cluster installed in this case ? What other networking equipment do I need ? Thanks, Mohan

2-node cluster

2014-04-14 Thread Mohan Radhakrishnan
Hi, I have 2 nodes, one is OSX and the other is linux. How is a distributed cluster installed in this case ? What other networking equipment do I need ? Can I ask for pointers to instructions ? I am new. Thanks, Mohan