Hi,
I want to copy data between hosts in Hadoop 2.0.4. But the hosts are
using HDFS Proxy on port 3888. I tried with the protocol hftp,
httpfs, and hdfs. All the examples didn't work. hadoop distcp
hftp://host1:3888/user/out/part-m-00029 hftp://host2:3888/ Any
suggestion?
confirm me
On Wed, May 29, 2013 at 11:30 AM, user-h...@hadoop.apache.org wrote:
Hi! This is the ezmlm program. I'm managing the
user@hadoop.apache.org mailing list.
To confirm that you would like
msreenivasulu.had...@gmail.com
added to the user mailing list, please send
a short
Just wondering if anyone would have any suggestions.
We r a bunch of developers on bench for a few months trained on Hadoop but do
not have any projects to work.
We would like to develop a Hadoop/Hive/Pig based product for our company so we
can be of value to the company and not be scared of lay
Respected sir/madom,
Best Regards,
Job M Thomas
From: Sreenivasulu Malinedi [mailto:msreenivasulu.had...@gmail.com]
Sent: Wed 5/29/2013 11:31 AM
To:
*Sqoop* is often used in this scenario.
You might also want to look at https://github.com/mongodb/mongo-hadoop
*MongoDBHadoop
Connector*.
More on streaming support can be found here
http://api.mongodb.org/hadoop/Hadoop+Streaming+Support.html
There are pros and cons. Choose what suits you the
This might be relevant: https://issues.apache.org/jira/browse/MAPREDUCE-4478
There are two configuration items to control the TaskTracker's heartbeat
interval. One is *mapreduce.tasktracker.outofband.heartbeat*. The other is*
mapreduce.tasktracker.outofband.heartbeat.damper*. If we set *
Hi all,
I was going through the motivation behind Yarn. Splitting the
responsibility of JT is the major concern.Ultimately the base (Yarn) was
built in a generic way for building other generic distributed applications
too.
I am not able to think of any other parallel processing use case that
Hi,
I have one question related to the reduce phase of MR jobs.
The intermediate outputs of map tasks are pulled in from the nodes which
ran map tasks to the node where reducers is going to run and those
intermediate data is written to the reducers local fs. My question is that
if there is a job
I am having trouble finding a definitive answer about OpenJDK vs Sun JDK in
regards to building Hadoop. This:
http://wiki.apache.org/hadoop/HadoopJavaVersions
Indicates that OpenJDK is not recommended, but is that an authoritative answer?
BUILDING.txt states no preference.
Thanks
John
Hi Rahul,
I am porting a distributed application that runs on a fixed set of given
resources to YARN, with the aim of being able to run it on a dynamically
selected resources whichever are available at the time of running the
application.
Thanks,
Kishore
On Wed, May 29, 2013 at 8:04 PM,
Yes. Use Sun/Oracle JDK
I have had memory issues while using Oozie. When I replaced OpenJDK with
Sun JDK 6. the memory issue was resolved.
Thanks,
Lenin
On Wed, May 29, 2013 at 8:22 PM, John Lilley john.lil...@redpoint.netwrote:
I am having trouble finding a definitive answer about OpenJDK
Unsubscribe
Great, that's what I've done. At least I think so. This is JRE6 right?
# java -version
java version 1.6.0_43
Java(TM) SE Runtime Environment (build 1.6.0_43-b01)
Java HotSpot(TM) 64-Bit Server VM (build 20.14-b01, mixed mode)
john
From: Lenin Raj [mailto:emaille...@gmail.com]
Sent: Wednesday,
Yup. Thats right.
Thanks,
Lenin
On Wed, May 29, 2013 at 10:23 PM, John Lilley john.lil...@redpoint.netwrote:
Great, that’s what I’ve done. At least I think so. This is JRE6 right?*
***
** **
# java -version
java version 1.6.0_43
Java(TM) SE Runtime Environment (build
Thanks for the response Krishna.
I was wondering if it were possible for using MR to solve you problem
instead of building the whole stack on top of yarn.
Most likely its not possible , thats why you are building it . I wanted to
know why is that ?
I am in just trying to find out the need or
Two scenarios I can think of are re-implementations of Twitter's Storm (
http://storm-project.net/) and DryadLinq (
http://research.microsoft.com/en-us/projects/dryadlinq/).
Storm, a distributed realtime computation framework used for analyzing
realtime steams of data, doesn't really need to be
Sorry if this is a dumb question, but I'm not sure where to start. I am
following BUILDING.txt instructions for source checked out today using git:
git clone git://git.apache.org/hadoop-common.git Hadoop
Following build steps and adding -X for more logging:
mvn compile -X
But I get this error
What's the output of:
protoc --version
You should be using 2.4.1
Cheers
On Wed, May 29, 2013 at 11:33 AM, John Lilley john.lil...@redpoint.netwrote:
Sorry if this is a dumb question, but I’m not sure where to start. I am
following BUILDING.txt instructions for source checked out today
There is a project at Yahoo which makes it possible to run Storm on Yarn. I
think the team behind it is going to give a talk at Hadoop Summit and plan
to open source it after that.
-Viral
On Wed, May 29, 2013 at 11:04 AM, John Conwell j...@iamjohn.me wrote:
Storm, a distributed realtime
Hi,
Is it possible to save data in database (HBase, cassandra??) directly
from hadoop.
so that there is no output in hdfs but that it directly writes data into
this db?
If I want to modify wordcount example to achive this, what/where should I
made these modifications.
Any help/ suggestions.
Hello Jamal,
Yes, it is possible. You could use TableReducer to do that. Use it
instead of the normal reducer in your wordcount example. Alternatively you
could use HFileOutputFormat to write directly to HFiles.
Warm Regards,
Tariq
cloudfront.blogspot.com
On Thu, May 30, 2013 at 2:08 AM,
Seriously consider Pig (free answer, 4 LOC):
my_data = LOAD 'my_data.json' USING
com.twitter.elephantbird.pig.load.JsonLoader() AS json:map[];
words = FOREACH my_data GENERATE $0#'author' as author,
FLATTEN(TOKENIZE($0#'text')) as word;
word_counts = FOREACH (GROUP words BY word) GENERATE group
Yeah,
I have to agree w Russell. Pig is definitely the way to go on this.
If you want to do it as a Java program you will have to do some work on the
input string but it too should be trivial.
How formal do you want to go?
Do you want to strip it down or just find the quote after the text
Hi Jamal,
I took your input and put it in sample wordcount program and it's working
just fine and giving this output.
author 3
foo234 1
text 3
foo 1
foo123 1
hello 3
this 1
world 2
When we split using
String[] words = input.split(\\W+);
it takes care of all non-alphanumeric characters.
Hi,
For some reason, this have to be in java :(
I am trying to use org.json library, something like (in mapper)
JSONObject jsn = new JSONObject(value.toString());
String text = (String) jsn.get(text);
StringTokenizer itr = new StringTokenizer(text);
But its not working :(
It would be better to
Hi Rishi,
But I dont want the wordcount of all the words..
In json, there is a field text.. and those are the words I wish to count?
On Wed, May 29, 2013 at 4:43 PM, Rishi Yadav ri...@infoobjects.com wrote:
Hi Jamal,
I took your input and put it in sample wordcount program and it's
for that, you have to only write intermediate data if word = text
String[] words = line.split(\\W+);
for (String word : words) {
if (word.equals(text))
context.write(new Text(word), new IntWritable(1));
}
I am assuming you have huge volume of data for it, otherwise
Hi Neeraj,
This error doesn't look to be kerberos related initially. Can you
verify if 192.168.49.51
has the tasktracker process running?
Regards,
Robert
On Tue, May 28, 2013 at 7:58 PM, Rahul Bhattacharjee
rahul.rec@gmail.com wrote:
The error looks a little low level , network level .
Whatever you have mentioned Jamal should work.you can debug this.
Thanks,
Rahul
On Thu, May 30, 2013 at 5:14 AM, jamal sasha jamalsha...@gmail.com wrote:
Hi,
For some reason, this have to be in java :(
I am trying to use org.json library, something like (in mapper)
JSONObject jsn = new
Historically, many applications/frameworks wanted to take advantage of just the
resource management capabilities and failure handling of Hadoop (via
JobTracker/TaskTracker), but were forced to used MapReduce even though they
didn't have to. Obvious examples are graph processing (Giraph),
30 matches
Mail list logo