Thanks harsh !
That means basically both APIs as well as hadoop client commands allow only
serial writes.
I was wondering what could be other ways to write data in parallel to HDFS
other than using multiple parallel threads.
Thanks,
JJ
Sent from my iPhone
On May 17, 2011, at 10:59 PM, Harsh J
Hello,
Adding to Joey's response, copyFromLocal's current implementation is serial
given a list of files.
On Wed, May 18, 2011 at 9:57 AM, Mapred Learn
wrote:
> Thanks Joey !
> I will try to find out abt copyFromLocal. Looks like Hadoop Apis write
serially as you pointed out.
>
> Thanks,
> -JJ
>
Thanks Joey !
I will try to find out abt copyFromLocal. Looks like Hadoop Apis write serially
as you pointed out.
Thanks,
-JJ
On May 17, 2011, at 8:32 PM, Joey Echeverria wrote:
> The sequence file writer definitely does it serially as you can only
> ever write to the end of a file in Hadoop.
The sequence file writer definitely does it serially as you can only
ever write to the end of a file in Hadoop.
Doing copyFromLocal could write multiple files in parallel (I'm not
sure if it does or not), but a single file would be written serially.
-Joey
On Tue, May 17, 2011 at 5:44 PM, Mapred
Try using the Apache Mahout code that solves exactly this problem.
Mahout has a distributed row-wise matrix that is read one row at a time.
Dot products with the vector are computed and the results are collected.
This capability is used extensively in the large scale SVD's in Mahout.
On Tue, Ma
Hi,
My question is when I run a command from hdfs client, for eg. hadoop fs
-copyFromLocal or create a sequence file writer in java code and append
key/values to it through Hadoop APIs, does it internally transfer/write data
to HDFS serially or in parallel ?
Thanks in advance,
-JJ
or conf.setBoolean("mapred.task.profile", true);
Mark
On Tue, May 17, 2011 at 4:49 PM, Mark question wrote:
> I usually do this setting inside my java program (in run function) as
> follows:
>
> JobConf conf = new JobConf(this.getConf(),My.class);
> conf.set("*mapred*.ta
I usually do this setting inside my java program (in run function) as
follows:
JobConf conf = new JobConf(this.getConf(),My.class);
conf.set("*mapred*.task.*profile*", "true");
then I'll see some output files in that same working directory.
Hope that helps,
Mark
On Tue,
I am running a Hadoop Java program in local single-JVM mode via an IDE
(IntelliJ). I want to do performance profiling of it. Following the
instructions in chapter 5 of *Hadoop: the Definitive Guide*, I added the
following properties to my job configuration file.
mapred.task.profile
t
Thanks for the inputs, but I'm running on a university cluster, not my own
and hence are the assumptions such as each task(mapper/reduer) will take 1
GB valid ?
So I guess to tune performance I should try running the job multiple times
and rely on execution time as an indicator of success.
Thank
Also, it seems like Ganglia would be very well complemented by Nagios
to allow you to monitor an overall health of your cluster.
--
Take care,
Konstantin (Cos) Boudnik
2CAC 8312 4870 D885 8616 6115 220F 6980 1F27 E622
Disclaimer: Opinions expressed in this email are those of the author,
and do
On May 17, 2011, at 3:11 PM, Mark question wrote:
> So what other memory consumption tools do you suggest? I don't want to do it
> manually and dump statistics into file because IO will affect performance
> too.
We watch memory with Ganglia. We also tune our systems such that a
task wi
So what other memory consumption tools do you suggest? I don't want to do it
manually and dump statistics into file because IO will affect performance
too.
Thanks,
Mark
On Tue, May 17, 2011 at 2:58 PM, Allen Wittenauer wrote:
>
> On May 17, 2011, at 1:01 PM, Mark question wrote:
>
> > Hi
> >
>
Hi Shah,
You've not mentioned which version of log4j you're using so I'm going to
guess 1.2. I'm also not an expert, but I'll give it a go.
I don't think you can set a max number of files to keep with the
DailyRollingFileAppender. You can with RollingFileAppender.
This seems to be a relatively c
On May 17, 2011, at 1:01 PM, Mark question wrote:
> Hi
>
> I need to use hadoop-tool-kit for monitoring. So I followed
> http://code.google.com/p/hadoop-toolkit/source/checkout
>
> and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2
Looking at the code, be awa
Sorry for the spam, but I didn't see my previous email yet.
I need to use hadoop-tool-kit for monitoring. So I followed
http://code.google.com/p/hadoop-toolkit/source/checkout
and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2
and set a property *“mapred.performance.
Hi all,
I was wondering how to go about doing a matrix-vector multiplication using
hadoop. I have my matrix in one file and my vector in another. All the map
tasks will need the vector file... basically they need to share it.
Basically I want my map function to output key-value pairs (i,m[i,j]*v(
Hi
I need to use hadoop-tool-kit for monitoring. So I followed
http://code.google.com/p/hadoop-toolkit/source/checkout
and applied the patch in my hadoop.20.2 directory as: patch -p0 < patch.20.2
and set a property *“mapred.performance.diagnose”* to true in *
mapred-site.xml*.
but I don't se
I reinstalled everything and am able to start everything other than the
jobtracker. Jobtracker still gives the port in use even though I verified
that the port is not running using netstat.
ipedited:/usr/lib/hadoop-0.20/logs/history # /usr/java/jdk1.6.0_25/bin/jps
7435 SecondaryNameNode
7517 TaskT
Hi Harsh,
I tried changing the port and tried again without luck. I changed the port
to 8023. And it says port 8023 in use. But when I did netstat 8023 is not
listed.
I am also using oozie configured in the system . While trying to work with
oozie the permissions of some of the directories got ch
Deepak,
>From the logs it appears as if some service on your machine already
uses the specified 8021 port. Try shutting down whatever might be
using that if possible, or switch your JT's port to something else.
On Tue, May 17, 2011 at 9:19 PM, Subhramanian, Deepak
wrote:
> Hi ,
>
> I am using cd
Hi ,
I am using cdh3 in pseudo distributed mode and getting the following error
while starting the task tracker and job tracker. Any suggestions.?
Error for Task Tracker
2011-05-17 13:28:10,234 INFO org.apache.hadoop.mapred.TaskTracker:
STARTUP_MSG:
/
I need some help on figuring out why my job failed. I built a single node
cluster just to try it out. I follow the example link
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/
Everything seems to be working correctly. I formated the namenode. Able to
con
Please help me out with this.
--
View this message in context:
http://lucene.472066.n3.nabble.com/log4j-properties-tp2842411p2951985.html
Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
Hey Lạc Trung,
I do not see a configuration instance used in your code; but you're
using the Configured class. Do you instantiate CopyFiles using
Hadoop's ReflectionUtils utility class? Unless that's done, the
getConf() would be returning a null causing the issue probably.
On Sat, May 14, 2011 at
On 16/05/11 21:12, Lạc Trung wrote:
I'm using Hadoop-0.21.
---
hut.edu.vn
At the top, it's your code, so you get to fix it. The good thing about
open source is you can go all the way in.
This is what I would do in the same situation
-Grab the 0.21 source JAR
-add it your IDE
-have a look at
Hello,
Am not sure of the latter part of your need, but you can add a filter
atop the UIs. One good example extension is provided at:
https://issues.apache.org/jira/browse/HADOOP-7119
There's also Hue, which provides this functionality of managing users
along with a lot of other goodies. Read mor
27 matches
Mail list logo