copy data between hosts and using hdfs proxy.

2013-05-29 Thread Pedro Sá da Costa
Hi, I want to copy data between hosts in Hadoop 2.0.4. But the hosts are using HDFS Proxy on port 3888. I tried with the protocol hftp, httpfs, and hdfs. All the examples didn't work. hadoop distcp hftp://host1:3888/user/out/part-m-00029 hftp://host2:3888/ Any suggestion?

Re: confirm subscribe to user@hadoop.apache.org

2013-05-29 Thread Sreenivasulu Malinedi
confirm me On Wed, May 29, 2013 at 11:30 AM, user-h...@hadoop.apache.org wrote: Hi! This is the ezmlm program. I'm managing the user@hadoop.apache.org mailing list. To confirm that you would like msreenivasulu.had...@gmail.com added to the user mailing list, please send a short

Hadoop based product recomendations.

2013-05-29 Thread Sai Sai
Just wondering if anyone would have any suggestions. We r a bunch of developers on bench for a few months trained on Hadoop but do not have any projects to work. We would like to develop a Hadoop/Hive/Pig based product for our company so we can be of value to the company and not be scared of lay

RE: confirm subscribe to user@hadoop.apache.org

2013-05-29 Thread Job Thomas
Respected sir/madom, Best Regards, Job M Thomas From: Sreenivasulu Malinedi [mailto:msreenivasulu.had...@gmail.com] Sent: Wed 5/29/2013 11:31 AM To:

Re: Not saving any output

2013-05-29 Thread Pramod N
*Sqoop* is often used in this scenario. You might also want to look at https://github.com/mongodb/mongo-hadoop *MongoDBHadoop Connector*. More on streaming support can be found here http://api.mongodb.org/hadoop/Hadoop+Streaming+Support.html There are pros and cons. Choose what suits you the

Re: Please help me with heartbeat storm

2013-05-29 Thread Philippe Signoret
This might be relevant: https://issues.apache.org/jira/browse/MAPREDUCE-4478 There are two configuration items to control the TaskTracker's heartbeat interval. One is *mapreduce.tasktracker.outofband.heartbeat*. The other is* mapreduce.tasktracker.outofband.heartbeat.damper*. If we set *

What else can be built on top of YARN.

2013-05-29 Thread Rahul Bhattacharjee
Hi all, I was going through the motivation behind Yarn. Splitting the responsibility of JT is the major concern.Ultimately the base (Yarn) was built in a generic way for building other generic distributed applications too. I am not able to think of any other parallel processing use case that

Reduce side question on MR

2013-05-29 Thread Rahul Bhattacharjee
Hi, I have one question related to the reduce phase of MR jobs. The intermediate outputs of map tasks are pulled in from the nodes which ran map tasks to the node where reducers is going to run and those intermediate data is written to the reducers local fs. My question is that if there is a job

OpenJDK?

2013-05-29 Thread John Lilley
I am having trouble finding a definitive answer about OpenJDK vs Sun JDK in regards to building Hadoop. This: http://wiki.apache.org/hadoop/HadoopJavaVersions Indicates that OpenJDK is not recommended, but is that an authoritative answer? BUILDING.txt states no preference. Thanks John

Re: What else can be built on top of YARN.

2013-05-29 Thread Krishna Kishore Bonagiri
Hi Rahul, I am porting a distributed application that runs on a fixed set of given resources to YARN, with the aim of being able to run it on a dynamically selected resources whichever are available at the time of running the application. Thanks, Kishore On Wed, May 29, 2013 at 8:04 PM,

Re: OpenJDK?

2013-05-29 Thread Lenin Raj
Yes. Use Sun/Oracle JDK I have had memory issues while using Oozie. When I replaced OpenJDK with Sun JDK 6. the memory issue was resolved. Thanks, Lenin On Wed, May 29, 2013 at 8:22 PM, John Lilley john.lil...@redpoint.netwrote: I am having trouble finding a definitive answer about OpenJDK

Unsubscribe

2013-05-29 Thread sahil soni
Unsubscribe

RE: OpenJDK?

2013-05-29 Thread John Lilley
Great, that's what I've done. At least I think so. This is JRE6 right? # java -version java version 1.6.0_43 Java(TM) SE Runtime Environment (build 1.6.0_43-b01) Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode) john From: Lenin Raj [mailto:emaille...@gmail.com] Sent: Wednesday,

Re: OpenJDK?

2013-05-29 Thread Lenin Raj
Yup. Thats right. Thanks, Lenin On Wed, May 29, 2013 at 10:23 PM, John Lilley john.lil...@redpoint.netwrote: Great, that’s what I’ve done. At least I think so. This is JRE6 right?* *** ** ** # java -version java version 1.6.0_43 Java(TM) SE Runtime Environment (build

Re: What else can be built on top of YARN.

2013-05-29 Thread Rahul Bhattacharjee
Thanks for the response Krishna. I was wondering if it were possible for using MR to solve you problem instead of building the whole stack on top of yarn. Most likely its not possible , thats why you are building it . I wanted to know why is that ? I am in just trying to find out the need or

Re: What else can be built on top of YARN.

2013-05-29 Thread John Conwell
Two scenarios I can think of are re-implementations of Twitter's Storm ( http://storm-project.net/) and DryadLinq ( http://research.microsoft.com/en-us/projects/dryadlinq/). Storm, a distributed realtime computation framework used for analyzing realtime steams of data, doesn't really need to be

Help: error in hadoop build

2013-05-29 Thread John Lilley
Sorry if this is a dumb question, but I'm not sure where to start. I am following BUILDING.txt instructions for source checked out today using git: git clone git://git.apache.org/hadoop-common.git Hadoop Following build steps and adding -X for more logging: mvn compile -X But I get this error

Re: Help: error in hadoop build

2013-05-29 Thread Ted Yu
What's the output of: protoc --version You should be using 2.4.1 Cheers On Wed, May 29, 2013 at 11:33 AM, John Lilley john.lil...@redpoint.netwrote: Sorry if this is a dumb question, but I’m not sure where to start. I am following BUILDING.txt instructions for source checked out today

Re: What else can be built on top of YARN.

2013-05-29 Thread Viral Bajaria
There is a project at Yahoo which makes it possible to run Storm on Yarn. I think the team behind it is going to give a talk at Hadoop Summit and plan to open source it after that. -Viral On Wed, May 29, 2013 at 11:04 AM, John Conwell j...@iamjohn.me wrote: Storm, a distributed realtime

Writing data in db instead of hdfs

2013-05-29 Thread jamal sasha
Hi, Is it possible to save data in database (HBase, cassandra??) directly from hadoop. so that there is no output in hdfs but that it directly writes data into this db? If I want to modify wordcount example to achive this, what/where should I made these modifications. Any help/ suggestions.

Re: Writing data in db instead of hdfs

2013-05-29 Thread Mohammad Tariq
Hello Jamal, Yes, it is possible. You could use TableReducer to do that. Use it instead of the normal reducer in your wordcount example. Alternatively you could use HFileOutputFormat to write directly to HFiles. Warm Regards, Tariq cloudfront.blogspot.com On Thu, May 30, 2013 at 2:08 AM,

Re: Reading json format input

2013-05-29 Thread Russell Jurney
Seriously consider Pig (free answer, 4 LOC): my_data = LOAD 'my_data.json' USING com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[]; words = FOREACH my_data GENERATE $0#'author' as author, FLATTEN(TOKENIZE($0#'text')) as word; word_counts = FOREACH (GROUP words BY word) GENERATE group

Re: Reading json format input

2013-05-29 Thread Michael Segel
Yeah, I have to agree w Russell. Pig is definitely the way to go on this. If you want to do it as a Java program you will have to do some work on the input string but it too should be trivial. How formal do you want to go? Do you want to strip it down or just find the quote after the text

Re: Reading json format input

2013-05-29 Thread Rishi Yadav
Hi Jamal, I took your input and put it in sample wordcount program and it's working just fine and giving this output. author 3 foo234 1 text 3 foo 1 foo123 1 hello 3 this 1 world 2 When we split using String[] words = input.split(\\W+); it takes care of all non-alphanumeric characters.

Re: Reading json format input

2013-05-29 Thread jamal sasha
Hi, For some reason, this have to be in java :( I am trying to use org.json library, something like (in mapper) JSONObject jsn = new JSONObject(value.toString()); String text = (String) jsn.get(text); StringTokenizer itr = new StringTokenizer(text); But its not working :( It would be better to

Re: Reading json format input

2013-05-29 Thread jamal sasha
Hi Rishi, But I dont want the wordcount of all the words.. In json, there is a field text.. and those are the words I wish to count? On Wed, May 29, 2013 at 4:43 PM, Rishi Yadav ri...@infoobjects.com wrote: Hi Jamal, I took your input and put it in sample wordcount program and it's

Re: Reading json format input

2013-05-29 Thread Rishi Yadav
for that, you have to only write intermediate data if word = text String[] words = line.split(\\W+); for (String word : words) { if (word.equals(text)) context.write(new Text(word), new IntWritable(1)); } I am assuming you have huge volume of data for it, otherwise

Re: issue launching mapreduce job with kerberos secured hadoop

2013-05-29 Thread Robert Molina
Hi Neeraj, This error doesn't look to be kerberos related initially. Can you verify if 192.168.49.51 has the tasktracker process running? Regards, Robert On Tue, May 28, 2013 at 7:58 PM, Rahul Bhattacharjee rahul.rec@gmail.com wrote: The error looks a little low level , network level .

Re: Reading json format input

2013-05-29 Thread Rahul Bhattacharjee
Whatever you have mentioned Jamal should work.you can debug this. Thanks, Rahul On Thu, May 30, 2013 at 5:14 AM, jamal sasha jamalsha...@gmail.com wrote: Hi, For some reason, this have to be in java :( I am trying to use org.json library, something like (in mapper) JSONObject jsn = new

Re: What else can be built on top of YARN.

2013-05-29 Thread Vinod Kumar Vavilapalli
Historically, many applications/frameworks wanted to take advantage of just the resource management capabilities and failure handling of Hadoop (via JobTracker/TaskTracker), but were forced to used MapReduce even though they didn't have to. Obvious examples are graph processing (Giraph),