Hi Madhav,
The behaviour to me sounds normal.
If the Block Size is 128 MB there could possibly be ~24 Mappers (i.e.,
containers used).
You cannot use entire cluster as the blocks could be only in the nodes
being used.
You should not try using the entire cluster resources for following reason
The
Hi Alex,
Can you please attach your code? and the sample input data.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Apr 30, 2013 at 2:29 AM, wrote:
>
> Hello,
>
> I try to write mapreduce program in hadoop -1.0.4. using mapred libs. I have
> a map function which ge
Can you do the following,
hadoop fs -copyToLocal
Best,
Mahesh Balija,
CalsoftLabs.
On Wed, Apr 24, 2013 at 12:12 PM, G, Prashanthi wrote:
> I want to load my HDFS data directly to a tape or external storage
> device.
>
> Please let me know if there is any wa
whole program.
Best,
Mahesh Balija,
Calsoft Labs.
On Wed, Apr 24, 2013 at 12:37 PM, Rahul Bhattacharjee <
rahul.rec@gmail.com> wrote:
> Thanks for the response Mahesh. I thought of this , but do not know why is
> this limitation.
>
> While sampling to pick up certain records
Can you manually go into the directory configured for hadoop.tmp.dir under
core-site.xml and do an ls -l to find the disk usage details, it will have
fsimage, edits, fstime, VERSION.
or the basic commands like,
hadoop fs -du
hadoop fsck
On Wed, Apr 24, 2013 at 7:56 AM, 自己 wrote:
> Hi, I would
based on the Mapper outkey type.
Best,
Mahesh Balija,
CalsoftLabs.
On Tue, Apr 23, 2013 at 4:12 PM, Rahul Bhattacharjee <
rahul.rec@gmail.com> wrote:
> + mapred dev
>
>
> On Tue, Apr 16, 2013 at 2:19 PM, Rahul Bhattacharjee <
> rahul.rec@gmail.com> wrote:
>
Mahout is an alternative for R, if you are NOT aware of.
Thanks,
Mahesh Balija,
CalsoftLabs.
On Thu, Apr 11, 2013 at 12:25 AM, Ted Yu wrote:
> There is RHadoop.
>
> Maybe there are other platforms.
>
>
> On Wed, Apr 10, 2013 at 11:49 AM, Shah, Rahul1 wrote:
>
>>
ght
be faster upto 66%. In order to speed up your program you may either have
to have more number of reducers or make your reducer code as optimized as
possible.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Mar 5, 2013 at 1:27 AM, Austin Chungath wrote:
> Hi all,
>
> I have 1 reduce
.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Mar 5, 2013 at 10:43 AM, AMARNATH, Balachandar <
balachandar.amarn...@airbus.com> wrote:
>
> Hi,
>
> I am new to hdfs. In my java application, I need to perform ‘similar
> operation’ over large number of files. I would like to
different cases.
Harsh, please correct me if I am wrong.
Best,
Mahesh Balija,
Calsoft Labs.
On Mon, Mar 4, 2013 at 8:32 PM, Vikas Jadhav wrote:
> Thank You for reply
>
> Can u please elaborate because i am not getting wht does following means
> in programming enviornment
>
>
> y
does passing the dfs.block.size=134217728 resolves your issue? or is it
something else fixed your problem?
On Tue, Feb 26, 2013 at 6:04 PM, Arindam Choudhury <
arindamchoudhu...@gmail.com> wrote:
> sorry my bad, it solved
>
>
> On Tue, Feb 26, 2013 at 1:22 PM, Arindam Choudhury <
> arindamchoudhu
keys are sorted, because of this
implementation the records are read from the stream directly and sorted
without the need to deserializing them into Objects.
Best,
Mahesh Balija,
CalsoftLabs.
On Sun, Feb 24, 2013 at 5:01 PM, Sai Sai wrote:
> Thanks Mahesh for your help.
>
> Wondering
Please check the in-line answers...
On Sat, Feb 23, 2013 at 6:22 PM, Sai Sai wrote:
>
> Hello
>
> I have a question about how Mapreduce sorting works internally with
> multiple columns.
>
> Below r my classes using 2 columns in an input file given below.
>
> 1st question: About the method hashCo
versa
I am NOT sure whether this is the optimized solution, prolly
you can check for other approaches.
Case 2:- After case 1 you can build Hive tables on the HDFS
(Cluster2)
Best,
Mahesh Balija,
CalsoftLabs.
On Fri, Feb 22, 2013 at 12:07 PM, samir das mohapatra
ks in the Hadoop eco-system includes Mahout, Hive,
Pig etc has their own applications.
One important note is that Hadoop run on a commodity hardware.
Best,
Mahesh Balija,
Calsoft Labs.
On Fri, Feb 15, 2013 at 12:08 PM, SrinivasaRao Kongar
wrote:
>
> Hi sir,
>
> What is Hadoop te
Hi Vikas,
You can get the FileSystem instance by calling
FileSystem.get(Configuration);
Once you get the FileSystem instance you can use
FileSystem.listStatus(InputPath); to get the fileStatus instances.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Feb 12, 2013
The best way is to first learn the concepts thoroughly and then if you like
you can also contribute to Hadoop projects.
After than prolly it is better to find some BigData based projects.
Best,
Mahesh Balija,
CalsoftLabs.
On Mon, Feb 11, 2013 at 10:32 AM, Monkey2Code wrote:
> Hi am fresher
key, value. You should get to
know through the API documentation.
So make sure that you are using right key value pairs.
Thanks,
Mahesh Balija,
CalsoftLabs.
On Fri, Feb 1, 2013 at 10:41 PM, Anbarasan Murthy wrote:
> I am getting the following Exception message when i try to output T
instances based on how you are defining the MR job.
Best,
Mahesh Balija,
CalsoftLabs.
On Fri, Feb 1, 2013 at 6:37 PM, Anbarasan Murthy wrote:
> By default SequenceFileOutputFormat expects the
>
> Input – LongWritable
>
> Output – Text
>
> ** **
>
> I wo
mapred.tasktracker.reduce.tasks.maximum.
Also they run in parallel.
Best,
Mahesh Balija,
CalsoftLabs.
On Fri, Jan 25, 2013 at 1:16 PM, jamal sasha wrote:
> Hi.
> A very very lame question.
> Does numbers of mapper depends on the number of nodes I have?
> How I imagine map-reduce is
Hi Steve,
On top of Harsh answer, other than Backup there is a feature
called Snapshot offered by some third party vendors like MapR.
Though its not really a backup it is just a point for which you
can revert back at any point in time.
Best,
Mahesh Balija,
CalsoftLabs
data collection and aggregation framework and NOT a file transfer tool and
may NOT be a good choice when you actually want to copy the files as-is
onto your cluster (NOT 100% sure as I am also working on that).
Thanks,
Mahesh Balija,
CalsoftLabs.
On Fri, Jan 25, 2013 at 6:39 AM, Panshul Whisper
Hi Mirko,
Thanks for your reply. It works for me as well.
Now I was able to mount the folder on the master node and
configured Flume such that it can either poll for logs in real time or even
for periodic retrieval.
Thanks,
Mahesh Balija.
Calsof Labs.
On Thu, Jan 17, 2013
atever you have tried??
>
> Warm Regards,
> Tariq
> https://mtariq.jux.com/
> cloudfront.blogspot.com
>
>
> On Thu, Jan 17, 2013 at 4:09 PM, Mahesh Balija > wrote:
>
>> I have studied Flume but I didn't find any thing useful in my case.
>> My requirement
wrote:
> Give Flume (http://flume.apache.org/) a chance to collect your data.
>
> Mirko
>
>
>
> 2013/1/17 sirenfei
>
>> ftp auto upload?
>>
>>
>> 2013/1/17 Mahesh Balija :
>> > the Hadoop cluster (HDFS) either in synchronous or asynchronou
>>
>
>
Hi,
My log files are generated and saved in a windows machine.
Now I have to move those remote files to the Hadoop cluster (HDFS)
either in synchronous or asynchronous way.
I have gone through flume (Various source types) but was not helpful.
Please suggest whether there I
client is responsible for processing individual file in order.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Jan 15, 2013 at 7:55 PM, Panshul Whisper wrote:
> Hello,
>
> I was wondering if hadoop performs the map reduce operations on the data
> in maintaining he order or sequence of d
Hi Smith,
In my experience usually the first 40% to around 70% the actual
process will occur the remaining would be devoted to write/flush the data
to the output files, usually this may take more time.
Best,
Mahesh Balija,
Calsoft Labs.
On Fri, Jan 11, 2013 at 9:32 AM, Roy Smith
cause these kind of issues based on the operation you
do in your reducer.
Can you put some logs in your reducer and try to trace out what
is happening.
Best,
Mahesh Balija,
Calsoft Labs.
On Fri, Jan 11, 2013 at 8:53 AM, yaotian wrote:
> I have 1 hadoop master which name node locates an
ll be some constant
say 1 -> the graph and 2 -> changes and value will be the actual value.
Now the only thing left for you is to append your changes to the
actual key and emit the final result.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Jan 8, 2013 at 5:47 AM, jamal sasha wrote:
>
changes in 20 api may be for backward
compatibility mapred package is still in existence.
There are few classes which exist in 19 api and those are not
supported in 0.20.* version.
Best,
Mahesh Balija,
Calsoft Labs.
On Mon, Jan 7, 2013 at 11:44 PM, Oleg Zhurakousky <
oleg.zhura
t one.
Best,
Mahesh Balija,
CalSoft Labs.
On Tue, Dec 11, 2012 at 11:29 AM, Ivan Ryndin wrote:
> Hi all,
>
> I have following question:
> What are the best practices working with files in Hadoop?
>
> I need to process a lot of log files, that arrive to Hadoop every minute.
> An
ut of the fast running once
or early completing task.
Best,
Mahesh Balija,
Calsoft Labs.
On Thu, Dec 6, 2012 at 8:27 PM, Ajay Srivastava
wrote:
> Hi,
>
> What is the behavior of jobTracker if speculative execution is off and a
> task on data node is running extremely slow?
> Will t
cluster.
This can be one possibility why there are fluctuations in your
job performance.
Best,
Mahesh Balija,
Calsoft Labs.
On Mon, Dec 3, 2012 at 8:57 PM, Cogan, Peter (Peter) <
peter.co...@alcatel-lucent.com> wrote:
> Hi there,
>
> I've been doing some performance
generates key-value pairs.
InputFormat also handle records that may be split on the
FileSplit boundary (i.e., different blocks).
Please check this link for more information,
http://wiki.apache.org/hadoop/HadoopMapReduce
Best,
Mahesh Balija,
Calsoft Labs.
On Mon, Dec 3, 2012
.
Also you can try your luck by running the JOB in old and new
versions.
Best,
Mahesh Balija,
Calsoft Labs.
On Fri, Nov 30, 2012 at 2:16 AM, Sandeep Jangra wrote:
> Hi Harsh,
>
> I tried putting the generic option first, but it throws exception file
> not found.
>
Hi Sandeep,
For me everything seems to be alright.
Can you tell us how are you running this job?
Best,
Mahesh.B.
Calsoft Labs.
On Thu, Nov 29, 2012 at 9:01 PM, Sandeep Jangra wrote:
> Hello everyone,
>
> Like most others I am also running into some problems while running
HDFS data is
compressed/sequence data.
Best,
Mahesh Balija,
Calsoft Labs.
On Thu, Nov 29, 2012 at 8:48 PM, Kartashov, Andy wrote:
> I also show some discrepancy Sqoop'ing data from MySQL. Both MySQL
> "select count(*) from.." and "sqoop -eval -query "select count(
Hi Pedro,
You can get the JobInProgress instance from JobTracker.
JobInProgress getJob(JobID jobid);
Best,
Mahesh Balija,
Calsoft Labs.
On Wed, Nov 28, 2012 at 10:41 PM, Pedro Sá da Costa wrote:
> I'm building a Java class and given a JobID, how can I
Hi Chris,
Can you try the following in your local machine,
du -b myfile.txt
and compare this with the hadoop fs -du myfile.txt.
Best,
Mahesh Balija,
Calsoft Labs.
On Wed, Nov 28, 2012 at 7:43 PM, wrote:
>
> Hi all,
>
> I wonder wy there is
r being so vague.
>
-> Its better start learning basics of HDFS, MapReduce architectures, and
then concepts like combiners, partitioner, recordreader, inputformats,
outputformats etc
Best,
Mahesh Balija,
Calsoft Labs.
all nodes in your cluster.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Nov 27, 2012 at 6:49 PM, dyuti a wrote:
> Hi Bharath,
> yes i have added all those jars.
>
> Thanks,
> dti
>
> On Tue, Nov 27, 2012 at 6:35 PM, bharath vissapragada <
> bharathvissapragada1..
this doesnot works for you, please tell what you are
trying to do?
Thanks,
Mahesh Balija,
Calsoft Labs.
On Tue, Nov 27, 2012 at 5:37 PM, GHui wrote:
>
> I call the sentence "JobID id = new JobID()" of hadoop API with JNI. But
> when my program run to this sentence, it exits. And
Hi AK,
I don't really understand what is stopping you to use the
job.getConfiguration() method to pass the configuration instance to
DistributedCache.addCacheFile(URI, job.getConfiguration()).
Only thing you need to do is pass the URI and configuration
object (getting it from o
g the hdfs files directly
then you have to use any commercial Hadoop packages like MapR which
supports updating the HDFS files.
Best,
Mahesh Balija,
Calsoft Labs.
On Sun, Nov 25, 2012 at 9:40 AM, bharath vissapragada <
bharathvissapragada1...@gmail.com> wrote:
> Hi Jeff,
>
> Ple
delete
the current path.
Best,
Mahesh Balija,
Calsoft Labs.
On Sun, Nov 25, 2012 at 8:04 AM, David Parks wrote:
> I want to move a file in HDFS after a job using the Java API, I'm trying
> this command but I always get false (could not rename):
>
> Path from = new
> Path(&qu
am NOT sure for python, but one suggestion is can you run
your Python code (Map unit & reduce unit) locally on your input data and
see whether your logic has any issues.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Nov 20, 2012 at 6:50 AM, jamal sasha wrote:
>
>
>
> Hi,
> This
/LongWritable and value will be Text,
so when the framework is trying to pass those LongWritable to
your mapper it is throwing the classcast exception at runtime.
Best,
Mahesh Balija,
Calsoft Labs.
On Tue, Nov 20, 2012 at 12:41 AM, Harsh J wrote:
> Hi,
>
> 1. Map/Reduce in 1.x.
Hi Prabhu,
For Twitter there are different types for obtaining feeds
like "gardenhose" and "FireHose" etc.
Some may be free and some are paid, like that you can look
for other social media options.
Best,
Mahesh Balija,
Calsoft Labs.
On Thu, Nov
associated with a given key and sends the key and List of values to the
reducer function.
Best,
Mahesh Balija.
On Wed, Nov 7, 2012 at 6:09 PM, Ramasubramanian Narayanan <
ramasubramanian.naraya...@gmail.com> wrote:
> Hi,
>
> Which of the following is correct w.r.t mapper.
>
> (a) It
50 matches
Mail list logo