You could always write your own properties file and read it as resource.
On Tue, Sep 25, 2012 at 12:10 AM, Hemanth Yamijala yhema...@gmail.comwrote:
By java environment variables, do you mean the ones passed as
-Dkey=value ? That's one way of passing them. I suppose another way is
to have a
It would be helpful to see some statistics out of both the jobs like bytes
read, written number of errors etc.
On Thu, Aug 16, 2012 at 8:02 PM, Raj Vishwanathan rajv...@yahoo.com wrote:
You probably have speculative execution on. Extra maps and reduce tasks
are run in case some of them fail
creation.
Thanks!
On Tue, Aug 7, 2012 at 11:56 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
In Mapper I often use a Global Text object and througout the map
processing
I just call set on it. My question is, what happens if collector
receives
similar byte array value. Does the last one
I am trying to write a test on local file system but this test keeps taking
xml files in the path even though I am setting a different Configuration
object. Is there a way for me to override it? I thought the way I am doing
overwrites the configuration but doesn't seem to be working:
@Test
conf = new
JobConf(getConf()) and I don't pass in any configuration then does the data
from xml files in the path used? I want this to work for all the scenarios.
On Wed, Aug 8, 2012 at 1:10 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I am trying to write a test on local file system
I just wrote a test where fs.default.name is file:/// and
mapred.job.tracker is set to local. The test ran fine, I also see mapper
and reducer were invoked but what I am trying to understand is that how did
this run without specifying the job tracker port and which port task
tracker connected with
should be able to read older
data as well. Try it out. It is very straight forward.
Hope this helps!
Thanks! I am new to Avro what's the best place to see some examples of how
Avro deals with schema changes? I am trying to find some examples.
On Sun, Aug 5, 2012 at 12:01 AM, Mohit Anchlia
Is the compression done on the client side or on the server side? If I run
hadoop fs -text then is this client decompressing the file for me?
I am wondering what's the right way to go about designing reading input and
output where file format may change over period. For instance we might
start with field1,field2,field3 but at some point we add new field4 in
the input. What's the best way to deal with such scenarios? Keep a catalog
of
On Sun, Jun 10, 2012 at 9:39 AM, Harsh J ha...@cloudera.com wrote:
Mohit,
On Sat, Jun 9, 2012 at 11:11 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
Thanks Harsh for detailed info. It clears things up. Only thing from
those
page is concerning is what happens when client crashes
(), HBase
can survive potential failures caused by major power failure cases
(among others).
Let us know if this clears it up for you!
On Sat, Jun 9, 2012 at 4:58 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I am wondering the role of sync in replication of data to other nodes.
Say
client
I am wondering the role of sync in replication of data to other nodes. Say
client writes a line to a file in Hadoop, at this point file handle is open
and sync has not been called. In this scenario is data also replicated as
defined by the replication factor to other nodes as well? I am wondering
We have continuous flow of data into the sequence file. I am wondering what
would be the ideal file size before file gets rolled over. I know too many
small files are not good but could someone tell me what would be the ideal
size such that it doesn't overload NameNode.
issues with the
NameNode but rather increase in processing times if there are too many
small files. Looks like I need to find that balance.
It would also be interesting to see how others solve this problem when not
using Flume.
On Wed, Jun 6, 2012 at 7:00 AM, Mohit Anchlia mohitanch...@gmail.com
seek.
Thanks Harsh, Does flume also provides API on top. I am getting this data
as http call, how would I go about using flume with http calls?
On Fri, May 25, 2012 at 8:24 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
We get click data through API calls. I now need to send this data to our
Please see:
http://hbase.apache.org/book.html#dfs.datanode.max.xcievers
On Fri, May 4, 2012 at 5:46 AM, madhu phatak phatak@gmail.com wrote:
Hi,
We are running a three node cluster . From two days whenever we copy file
to hdfs , it is throwing java.IO.Exception Bad connect ack with
Is there a way to compress map only jobs to compress map output that gets
stored on hdfs as part-m-* files? In pig I used :
Would these work form plain map reduce jobs as well?
set output.compression.enabled true;
set output.compression.codec org.apache.hadoop.io.compress.SnappyCodec;
those properties in your job conf.
On Mon, Apr 30, 2012 at 5:25 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
Is there a way to compress map only jobs to compress map output that gets
stored on hdfs as part-m-* files? In pig I used :
Would these work form plain map reduce jobs as well
are also
available at:
http://hadoop.apache.org/common/docs/current/mapred-default.html
(core-default.html, hdfs-default.html)
On Tue, May 1, 2012 at 6:36 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
Thanks! When I tried to search for this property I couldn't find it. Is
there a page that has
:
property
namedfs.datanode.max.xcievers/name
value4096/value
/property
To your DNs' config/hdfs-site.xml and restart the DNs.
On Mon, Apr 30, 2012 at 1:35 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I even tried to lower number of parallel jobs even further but I still
get
or get) command?
If yes, how about a wordcount example?
'path/hadoop jar pathhadoop-*examples*.jar wordcount input output'
-Original Message-
From: Mohit Anchlia mohitanch...@gmail.com
Reply-To: common-user@hadoop.apache.org common-user@hadoop.apache.org
Date: Fri, 27 Apr 2012 14:36:49
Ant suggestion or pointers would be helpful. Are there any best practices?
On Mon, Apr 23, 2012 at 3:27 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
I just wanted to check how do people design their storage directories for
data that is sent to the system continuously. For eg: for a given
I had 20 mappers in parallel reading 20 gz files and each file around
30-40MB data over 5 hadoop nodes and then writing to the analytics
database. Almost midway it started to get this error:
2012-04-26 16:13:53,723 [Thread-8] INFO org.apache.hadoop.hdfs.DFSClient -
Exception in
I just wanted to check how do people design their storage directories for
data that is sent to the system continuously. For eg: for a given
functionality we get data feed continuously writen to sequencefile, that is
then coverted to more structured format using map reduce and stored in tab
I think if you called getInputFormat on JobConf and then called getSplits
you would atleast get the locations.
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/InputSplit.html
On Sun, Apr 8, 2012 at 9:16 AM, Deepak Nettem deepaknet...@gmail.comwrote:
Hi,
Is it
to understand the rational
behind using local disk for final output.
Prashant
On Apr 4, 2012, at 9:55 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
On Wed, Apr 4, 2012 at 8:42 PM, Harsh J ha...@cloudera.com wrote:
Hi Mohit,
On Thu, Apr 5, 2012 at 5:26 AM, Mohit Anchlia mohitanch
I am going through the chapter How mapreduce works and have some
confusion:
1) Below description of Mapper says that reducers get the output file using
HTTP call. But the description under The Reduce Side doesn't specifically
say if it's copied using HTTP. So first confusion, Is the output copied
On Wed, Apr 4, 2012 at 8:42 PM, Harsh J ha...@cloudera.com wrote:
Hi Mohit,
On Thu, Apr 5, 2012 at 5:26 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I am going through the chapter How mapreduce works and have some
confusion:
1) Below description of Mapper says that reducers get
Could someone please help me answer this question?
On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia mohitanch...@gmail.comwrote:
What is the corresponding system property for setNumTasks? Can it be used
explicitly as system property like mapred.tasks.?
as setNumTasks. There is however,
setNumReduceTasks, which sets mapred.reduce.tasks.
Does this answer your question?
On Thu, Mar 22, 2012 at 8:21 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
Could someone please help me answer this question?
On Wed, Mar 14, 2012 at 8:06 AM, Mohit Anchlia
in case of MR
tasks as well.
Regards
Bejoy.K.S
On Thu, Mar 15, 2012 at 6:17 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I have a client program that creates sequencefile, which essentially
merges
small files into a big file. I was wondering how is sequence file
splitting
the data
This is actually just hadoop job over HDFS. I am assuming you also know why
this is erroring out?
On Thu, Mar 15, 2012 at 1:02 PM, Gopal absoft...@gmail.com wrote:
On 03/15/2012 03:06 PM, Mohit Anchlia wrote:
When I start a job to read data from HDFS I start getting these errors.
Does
I have a client program that creates sequencefile, which essentially merges
small files into a big file. I was wondering how is sequence file splitting
the data accross nodes. When I start the sequence file is empty. Does it
get split when it reaches the dfs.block size? If so then does it mean
, Mar 9, 2012 at 7:32 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I have mapred.tasktracker.map.tasks.maximum set to 2 in my job and I
have 5
nodes. I was expecting this to have only 10 concurrent jobs. But I have
30
mappers running. Does hadoop ignores this setting when supplied from
What's the difference between mapred.tasktracker.reduce.tasks.maximum and
mapred.map.tasks
**
I want my data to be split against only 10 mappers in the entire cluster.
Can I do that using one of the above parameters?
default number of reduce (map) tasks your
job will has.
To set the number of mappers in your application. You can write like this:
*configuration.setNumMapTasks(the number you want);*
Chen
Actually, you can just use configuration.set()
On Fri, Mar 9, 2012 at 6:42 PM, Mohit Anchlia mohitanch
I have mapred.tasktracker.map.tasks.maximum set to 2 in my job and I have 5
nodes. I was expecting this to have only 10 concurrent jobs. But I have 30
mappers running. Does hadoop ignores this setting when supplied from the
job?
file.
On Fri, Mar 9, 2012 at 7:19 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
What's the difference between setNumMapTasks and mapred.map.tasks?
On Fri, Mar 9, 2012 at 5:00 PM, Chen He airb...@gmail.com wrote:
Hi Mohit
mapred.tasktracker.reduce(map).tasks.maximum means how
Can you check which user you are running this process as and compare it
with the ownership on the directory?
On Thu, Mar 8, 2012 at 3:13 PM, Leonardo Urbina lurb...@mit.edu wrote:
Does anyone have any idea how to solve this problem? Regardless of whether
I'm using plain HPROF or profiling
I am still trying to see how to narrow this down. Is it possible to set
heapdumponoutofmemoryerror option on these individual tasks?
On Mon, Mar 5, 2012 at 5:49 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
Sorry for multiple emails. I did find:
2012-03-05 17:26:35,636 INFO
, vs a flexible infrastructure that could use a local
cluster or cluster on a different cloud provider.
Thanks for your input. I am assuming HDFS is created on ephemerial disks
and not EBS. Also, is it possible to share some of your findings?
On Sun, Mar 4, 2012 at 8:51 AM, Mohit Anchlia
, 2012 at 5:03 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
I currently have java.opts.mapred set to 512MB and I am getting heap space
errors. How should I go about debugging heap space issues?
, Mohit Anchlia mohitanch...@gmail.comwrote:
All I see in the logs is:
2012-03-05 17:26:36,889 FATAL org.apache.hadoop.mapred.TaskTracker: Task:
attempt_201203051722_0001_m_30_1 - Killed : Java heap space
Looks like task tracker is killing the tasks. Not sure why. I increased
heap from
slow.
The setup is done pretty fast and there are some configuration parameters
you can bypass - for example blocksizes etc. - but in the end imho setting
up ec2 instances by copying images is the better alternative.
Kind Regards
Hannes
On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia mohitanch
I think found answer to this question. However, it's still not clear if
HDFS is on local disk or EBS volumes. Does anyone know?
On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
Just want to check how many are using AWS mapreduce and understand the
pros and cons
+1
On Fri, Mar 2, 2012 at 4:09 PM, Harsh J ha...@cloudera.com wrote:
Since you ask about anything in general, when I forayed into using
Hadoop, my biggest pain was lack of documentation clarity and
completeness over the MR and DFS user APIs (and other little points).
It would be nice to
When I try kill -QUIT for a job it doesn't send the stacktrace to the log
files. Does anyone know why or if I am doing something wrong?
I find the job using ps -ef|grep attempt. I then go to
logs/userLogs/jobid/attemptid/
Is this the right procedure to add nodes? I took some from hadoop wiki FAQ:
http://wiki.apache.org/hadoop/FAQ
1. Update conf/slave
2. on the slave nodes start datanode and tasktracker
3. hadoop balancer
Do I also need to run dfsadmin -refreshnodes?
, at 18:29, Mohit Anchlia mohitanch...@gmail.com wrote:
Is this the right procedure to add nodes? I took some from hadoop wiki
FAQ:
http://wiki.apache.org/hadoop/FAQ
1. Update conf/slave
2. on the slave nodes start datanode and tasktracker
3. hadoop balancer
Do I also need to run
/jobtracker know there is a new
node in the cluster. Is it initiated by namenode when slave file is edited?
Or is it initiated by tasktracker when tasktracker is started?
Sent from my iPhone
On Mar 1, 2012, at 18:49, Mohit Anchlia mohitanch...@gmail.com wrote:
On Thu, Mar 1, 2012 at 4:46 PM, Joey
) then you would need to edit these files and
issue the refresh command.
On Mar 1, 2012, at 5:35 PM, Mohit Anchlia wrote:
On Thu, Mar 1, 2012 at 4:57 PM, Joey Echeverria j...@cloudera.com
wrote:
Not quite. Datanodes get the namenode host from fs.defalt.name in
core-site.xml. Task trackers
can take a look at what you are doing in the UDF vs
the Mapper.
100x slow does not make sense for the same job/logic, its either the Mapper
code or may be the cluster was busy at the time you scheduled MapReduce
job?
Thanks,
Prashant
On Tue, Feb 28, 2012 at 4:11 PM, Mohit Anchlia mohitanch
and couldn't find one. Does anyone know where
stacktraces are generally sent?
On Wed, Feb 29, 2012 at 1:08 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
I can't seem to find what's causing this slowness. Nothing in the logs.
It's just painfuly slow. However, pig job is awesome in performance that
has
Guide, 2nd ed.).
On Wed, Feb 29, 2012 at 1:40 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
It looks like adding this line causes invocation exception. I looked in
hdfs and I see that file in that path
DistributedCache.*addFileToClassPath*(*new* Path(/jars/common.jar),
conf);
I have
I commented reducer and combiner both and still I see the same exception.
Could it be because I have 2 jars being added?
On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku...@gmail.com wrote:
On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
For some reason I
); but this works just fine.
On Tue, Feb 28, 2012 at 11:44 AM, Mohit Anchlia mohitanch...@gmail.comwrote:
I commented reducer and combiner both and still I see the same exception.
Could it be because I have 2 jars being added?
On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku
I am comparing runtime of similar logic. The entire logic is exactly same
but surprisingly map reduce job that I submit is 100x slow. For pig I use
udf and for hadoop I use mapper only and the logic same as pig. Even the
splits on the admin page are same. Not sure why it's so slow. I am
submitting
Can someone please suggest if parameters like dfs.block.size,
mapred.tasktracker.map.tasks.maximum are only cluster wide settings or can
these be set per client job configuration?
On Sat, Feb 25, 2012 at 5:43 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
If I want to change the block size
I submitted a map reduce job that had 9 tasks killed out of 139. But I
don't see any errors in the admin page. The entire job however has
SUCCEDED. How can I track down the reason?
Also, how do I determine if this is something to worry about?
How do I verify the block size of a given file? Is there a command?
On Mon, Feb 27, 2012 at 7:59 AM, Joey Echeverria j...@cloudera.com wrote:
dfs.block.size can be set per job.
mapred.tasktracker.map.tasks.maximum is per tasktracker.
-Joey
On Mon, Feb 27, 2012 at 10:19 AM, Mohit Anchlia
What's the best way to write records to a different file? I am doing xml
processing and during processing I might come accross invalid xml format.
Current I have it under try catch block and writing to log4j. But I think
it would be better to just write it to an output file that just contains
Does it matter if reducer is set even if the no of reducers is 0? Is there
a way to get more clear reason?
On Mon, Feb 27, 2012 at 8:23 PM, Subir S subir.sasiku...@gmail.com wrote:
On Tue, Feb 28, 2012 at 4:30 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
For some reason I am getting
to the topic in that book where I'll find this
information?
Sent from my iPhone
On Feb 27, 2012, at 8:54 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
Does it matter if reducer is set even if the no of reducers is 0? Is
there
a way to get more clear reason?
On Mon, Feb 27, 2012 at 8:23
:
Mohit,
Use the MultipleOutputs API:
http://hadoop.apache.org/common/docs/current/api/org/apache/hadoop/mapred/lib/MultipleOutputs.html
to have a named output of bad records. There is an example of use
detailed on the link.
On Tue, Feb 28, 2012 at 3:48 AM, Mohit Anchlia mohitanch
Eugen Stan stan.ieu...@gmail.com
wrote:
2012/2/26 Mohit Anchlia mohitanch...@gmail.com:
Thanks. Does it mean LZO is not installed by default? How can I install
LZO?
The LZO library is released under GPL and I believe it can't be
included in most distributions of Hadoop because
If I want to change the block size then can I use Configuration in
mapreduce job and set it when writing to the sequence file or does it need
to be cluster wide setting in .xml files?
Also, is there a way to check the block of a given file?
Thanks. Does it mean LZO is not installed by default? How can I install LZO?
On Sat, Feb 25, 2012 at 6:27 PM, Shi Yu sh...@uchicago.edu wrote:
Yes, it is supported by Hadoop sequence file. It is splittable
by default. If you have installed and specified LZO correctly,
use these:
I am looking at some hadoop tuning parameters like io.sort.mb,
mapred.child.javaopts etc.
- My question was where to look at for current setting
- Are these settings configured cluster wide or per job?
- What's the best way to look at reasons of slow performance?
'/examples/testfile5.txt using
org.apache.pig.piggybank.storage.XMLLoader('abc') as (document:chararray);
dump raw;
--Original Message--
From: Mohit Anchlia
To: common-user@hadoop.apache.org
ReplyTo: common-user@hadoop.apache.org
Subject: Splitting files on new line using hadoop fs
.
--
*From: *Mohit Anchlia mohitanch...@gmail.com
*Date: *Wed, 22 Feb 2012 12:29:26 -0800
*To: *common-user@hadoop.apache.org; bejoy.had...@gmail.com
*Subject: *Re: Splitting files on new line using hadoop fs
On Wed, Feb 22, 2012 at 12:23 PM, bejoy.had...@gmail.com wrote
Streaming job just seems to be hanging
12/02/22 17:35:50 INFO streaming.StreamJob: map 0% reduce 0%
-
On the admin page I see that it created 551 input split. Could somone
suggest a way to find out what might be causing it to hang? I increased
io.sort.mb to 200 MB.
I am using 5 data
org.apache.pig.piggybank.storage.XMLLoader
for processing. Would it work with sequence file?
This text file that I was referring to would be in hdfs itself. Is it still
different than using sequence file?
Regards
Bejoy.K.S
On Tue, Feb 21, 2012 at 10:45 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
We
, 2012 at 10:45 PM, Mohit Anchlia
mohitanch...@gmail.com
wrote:
We have small xml files. Currently I am planning to append these
small
files to one file in hdfs so that I can take advantage of splits,
larger
blocks and sequential IO. What I am unsure is if it's ok to append
one
of them with different input paths.
Hope this helps!
Cheers
Arko
On Tue, Feb 21, 2012 at 1:27 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I am trying to look for examples that demonstrates using sequence files
including writing to it and then running mapred on it, but unable to find
I am past this error. Looks like I needed to use CDH libraries. I changed
my maven repo. Now I am stuck at
*org.apache.hadoop.security.AccessControlException *since I am not writing
as user that owns the file. Looking online for solutions
On Tue, Feb 21, 2012 at 12:48 PM, Mohit Anchlia
, Mohit Anchlia mohitanch...@gmail.com
wrote:
Sorry may be it's something obvious but I was wondering when map or
reduce
gets called what would be the class used for key and value? If I used
org.apache.hadoop.io.Text
value = *new* org.apache.hadoop.io.Text(); would the map be called
It looks like in mapper values are coming as binary instead of Text. Is
this expected from sequence file? I initially wrote SequenceFile with Text
values.
On Tue, Feb 21, 2012 at 4:13 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
Need some more help. I wrote sequence file using below code
Finally figured it out. I needed to use SequenceFileAstextInputFormat.
There is just lack of examples that makes it difficult when you start.
On Tue, Feb 21, 2012 at 4:50 PM, Mohit Anchlia mohitanch...@gmail.comwrote:
It looks like in mapper values are coming as binary instead of Text
can't seem to find examples of how to do xml processing in Pig. Can you
please send me some pointers? Basically I need to convert my xml to more
structured format using hadoop to write it to database.
On Sat, Feb 18, 2012 at 1:18 AM, Mohit Anchlia mohitanch...@gmail.com
wrote:
On Tue, Feb 14
, as always, are well worth reading.
Tom Deutsch
Program Director
Information Management
Big Data Technologies
IBM
3565 Harbor Blvd
Costa Mesa, CA 92626-1420
tdeut...@us.ibm.com
Mohit Anchlia mohitanch...@gmail.com
02/18/2012 06:24 AM
On Tue, Feb 14, 2012 at 10:56 AM, W.P. McNeill bill...@gmail.com wrote:
I'm not sure what you mean by flat format here.
In my scenario, I have an file input.xml that looks like this.
myfile
section
value1/value
/section
section
value2/value
/section
/myfile
On Sun, Feb 12, 2012 at 9:24 AM, W.P. McNeill bill...@gmail.com wrote:
I've used the Mahout XMLInputFormat. It is the right tool if you have an
XML file with one type of section repeated over and over again and want to
turn that into Sequence file where each repeated section is a value. I've
I use eclipse. Is this http://wiki.apache.org/hadoop/EclipsePlugIn
still the best way to develop mapreduce programs in hadoop? Just want
to make sure before I go down this path.
Or should I just add hadoop jars in my classpath of eclipse and create
my own MapReduce programs.
Thanks
This process of managing looks like more pain long term. Would it be
easier to store in Hbase which has smaller block size?
What's the avg. file size?
On Sun, Oct 2, 2011 at 7:34 PM, Vitthal Suhas Gogate
gog...@hortonworks.com wrote:
Agree with Bejoy, although to minimize the processing latency
On Thu, Sep 1, 2011 at 1:25 AM, Dieter Plaetinck
dieter.plaeti...@intec.ugent.be wrote:
On Wed, 31 Aug 2011 08:44:42 -0700
Mohit Anchlia mohitanch...@gmail.com wrote:
Does map-reduce work well with binary contents in the file? This
binary content is basically some CAD files and map reduce
Does map-reduce work well with binary contents in the file? This
binary content is basically some CAD files and map reduce program need
to read these files using some proprietry tool extract values and do
some processing. Wondering if there are others doing similar type of
processing. Best
On Thu, Aug 11, 2011 at 3:26 PM, Charles Wimmer cwim...@yahoo-inc.com wrote:
We currently use P410s in 12 disk system. Each disk is set up as a RAID0
volume. Performance is at least as good as a bare disk.
Can you please share what throughput you see with P410s? Are these SATA or SAS?
On
On Fri, Aug 5, 2011 at 3:42 PM, Stevens, Keith D. steven...@llnl.gov wrote:
The Mapper and Reducer class in org.apache.hadoop.mapreduce implement the
identity function. So you should be able to just do
conf.setMapperClass(org.apache.hadoop.mapreduce.Mapper.class);
Assuming everything is up this solution still will not scale given the latency,
tcpip buffers, sliding window etc. See BDP
Sent from my iPad
On Aug 1, 2011, at 4:57 PM, Michael Segel michael_se...@hotmail.com wrote:
Yeah what he said.
Its never a good idea.
Forget about losing a NN or a
Is this what you are looking for?
http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
search for jobConf
On Fri, Jul 29, 2011 at 1:51 PM, Roger Chen rogc...@ucdavis.edu wrote:
Thanks for the response! However, I'm having an issue with this line
Path[] cacheFiles =
operation?
I am assuming there will be some errors in this case.
On Thu, Jul 28, 2011 at 5:08 AM, Mohit Anchlia mohitanch...@gmail.com wrote:
Just trying to understand what happens if there are 3 nodes with
replication set to 3 and one node fails. Does it fail the writes too?
If there is a link
Just trying to understand what happens if there are 3 nodes with
replication set to 3 and one node fails. Does it fail the writes too?
If there is a link that I can look at will be great. I tried searching
but didn't see any definitive answer.
Thanks,
Mohit
fire up some nix commands and pack together that file
onto itself a bunch if times and the put it back into hdfs and let 'er
rip
Sent from my mobile. Please excuse the typos.
On 2011-05-26, at 4:56 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
I think I understand that by last 2 replies
I am new to hadoop and from what I understand by default hadoop splits
the input into blocks. Now this might result in splitting a line of
record into 2 pieces and getting spread accross 2 maps. For eg: Line
abcd might get split into ab and cd. How can one prevent this in
hadoop and pig? I am
, does not happen; and the way the splitting is done
for Text files is explained in good detail here:
http://wiki.apache.org/hadoop/HadoopMapReduce
Hope this solves your doubt :)
On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I am new to hadoop and from what I
announcing something.
What you describe, does not happen; and the way the splitting is done
for Text files is explained in good detail here:
http://wiki.apache.org/hadoop/HadoopMapReduce
Hope this solves your doubt :)
On Fri, May 27, 2011 at 10:25 PM, Mohit Anchlia mohitanch...@gmail.com
wrote
If I have to overwrite a file I generally use
hadoop dfs -rm file
hadoop dfs -copyFromLocal or -put file
Is there a command to overwrite/replace the file instead of doing rm first?
I sent this to pig apache user mailing list but have got no response.
Not sure if that list is still active.
thought I will post here if someone is able to help me.
I am in process of installing and learning pig. I have a hadoop
cluster and when I try to run pig in mapreduce mode it errors out:
basically an issue of hadoop version differences between the one
Pig 0.8.1 release got bundled with vs. Hadoop 0.20.203 release which
is newer.
On Thu, May 26, 2011 at 10:26 PM, Mohit Anchlia mohitanch...@gmail.com
wrote:
I sent this to pig apache user mailing list but have got no response
/antlr-runtime-3.2.jar;
Is this a windows command? Sorry, have not used this before.
2011/5/26 Mohit Anchlia mohitanch...@gmail.com
For some reason I don't see that reply from Jonathan in my Inbox. I'll
try to google it.
What should be my next step in that case? I can't use pig then?
On Thu
1 - 100 of 114 matches
Mail list logo