Hi Eli,
Moving this to cdh-u...@cloudera.org as its a CDH specific question.
You'll get better answers from the community there. You are CC'd but
to subscribe to the CDH users community, head to
https://groups.google.com/a/cloudera.org/forum/#!forum/cdh-user. I've
bcc'd common-user@ here.
What yo
Hi Folks,
A coworker of mine recently setup a new CDH3 cluster with 4 machines (3
data nodes, one namenode that doubles as a jobtracker). I started
looking through it using "hadoop fs -ls," and that went fine with
everything displaying alright. Next, I decided to test out some simple
pig jobs.
AFAIK there is no way to disable this "feature" . This is an optimization. It
happens because in your case the node generating the data is also a data node.
Raj
>
> From: Stijn De Weirdt
>To: common-user@hadoop.apache.org
>Sent: Monday, April 2, 2012 12:18 PM
thanks serge.
is there a way to disable this "feature" (ie place first block always on
local node)?
and is this because the local node is a datanode? or is there always a
"local node" with datatransfers?
many thanks,
stijn
Local node is a node from where you are coping data from
If lets
Local node is a node from where you are coping data from
If lets say you are using -copyFromLocal option
Regards
Serge
On 4/2/12 11:53 AM, "Stijn De Weirdt" wrote:
>hi raj,
>
>what is a "local node"? is it relative to the tasks that are started?
>
>
>stijn
>
>On 04/02/2012 07:28 PM, Raj Vishw
hi raj,
what is a "local node"? is it relative to the tasks that are started?
stijn
On 04/02/2012 07:28 PM, Raj Vishwanathan wrote:
Stijn,
The first block of the data , is always stored in the local node. Assuming that
you had a replication factor of 3, the node that generates the data will
Per your jps, you don't have a DataNode running.
> hduser@sujit:~/Desktop/data$ ups
> 6022 NameNode
> 7100 Jps
> 6569 JobTracker
> 6798 TaskTracker
> 6491 SecondaryNameNode
Please read http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo to
solve this. You most likely need to also read:
http://s
Stijn,
The first block of the data , is always stored in the local node. Assuming that
you had a replication factor of 3, the node that generates the data will get
about 10GB of data and the other 20GB will be distributed among other nodes.
Raj
>
> From: St
hi all,
i'm just started to play around with hdfs+mapred. i'm currently playing
with teragen/sort/validate to see if i understand all.
the test setup involves 5 nodes that all are tasktracker and datanode
(and one node that is also jobtracker and namenode on top of that. (this
one node is ru
On 04/02/2012 04:18 PM, Harsh J wrote:
HADOOP_OPTS isn't applied for Task JVMs.
For Task JVMs, set "mapred.child.java.opts" in mapred-site.xml (Or via
Configuration for per-job tuning), to the opts string you want it to
have. For example "-Xmx200m -Dsomesysprop=abc".
thanks!
stijn
On Mon, A
+common-user@hadoop.apache.org
Hi Harsh,
Thanks for the information.
Is there any way to differentiate between a client side property and
server-side property?or a Document which enlists whether a property is
server or client-side? Many times i have to speculate over this and try out
test runs.
Thanks J : just curious about how you came to hypothesize (1) (i.e.
regarding the fact that threads and the
API componentns arent thread safe in my hadoop version).
I think thats a really good guess, and I would like to be able to make
those sorts of intelligent hypotheses
myself. Any reading you
Can some one please look in to below issue ??
Thanks in Advance
On Wed, Mar 7, 2012 at 9:09 AM, Sujit Dhamale wrote:
> Hadoop version : hadoop-0.20.203.0rc1.tar
> Operaring Syatem : ubuntu 11.10
>
>
>
> On Wed, Mar 7, 2012 at 12:19 AM, Harsh J wrote:
>
>> Hi Sujit,
>>
>> Please also tell us whic
HADOOP_OPTS isn't applied for Task JVMs.
For Task JVMs, set "mapred.child.java.opts" in mapred-site.xml (Or via
Configuration for per-job tuning), to the opts string you want it to
have. For example "-Xmx200m -Dsomesysprop=abc".
On Mon, Apr 2, 2012 at 7:47 PM, Stijn De Weirdt wrote:
> hi all,
>
Jay,
Without seeing the whole stack trace all I can say as cause for that
exception from a job is:
1. You're using threads and the API components you are using isn't
thread safe in your version of Hadoop.
2. Files are being written out to HDFS directories without following
the OC rules. (This is
hi all,
is it normal that HADOOP_OPTS are not passed to the actual tasks (ie the
java processes running as child of tasktracker)? the tasktracker process
uses them correctly.
is there a way to set general java options for each started task?
many thanks,
stijn
No, my job does not write files directly to disk. It simply goes to some
web pages , reads data (in the reducer phase), and parses jsons into thrift
objects which are emitted via the standard MultipleOutputs API to hdfs
files.
Any idea why hadoop would throw the "AlreadyBeingCreatedException" ?
O
Jay,
What does your job do? Create files directly on HDFS? If so, do you
follow this method?:
http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F
A local filesystem may not complain if you re-create an existent file.
HDFS' behavio
Is this a problem of proxy setting ? because after specifying the group
name also, I am not able to run it. Its still giving me the same error.
Thanks,
Praveenesh
On Mon, Apr 2, 2012 at 6:05 PM, Alejandro Abdelnur wrote:
> multiple value are comma separated. keep in mind that valid values for
>
multiple value are comma separated. keep in mind that valid values for
proxyuser groups, as the property name states are GROUPS, not USERS.
thxs.
Alejandro
On Mon, Apr 2, 2012 at 2:27 PM, praveenesh kumar wrote:
> How can I specify multiple users /groups for proxy user setting ?
> Can I give co
How can I specify multiple users /groups for proxy user setting ?
Can I give comma separated values in these settings ?
Thanks,
Praveenesh
On Mon, Apr 2, 2012 at 5:52 PM, Alejandro Abdelnur wrote:
> Praveenesh,
>
> If I'm not mistaken 0.20.205 does not support wildcards for the proxyuser
> (host
Praveenesh,
If I'm not mistaken 0.20.205 does not support wildcards for the proxyuser
(hosts/groups) settings. You have to use explicit hosts/groups.
Thxs.
Alejandro
PS: please follow up this thread in the oozie-us...@incubator.apache.org
On Mon, Apr 2, 2012 at 2:15 PM, praveenesh kumar wrote:
Thank you that worked!
Juan
On Mon, Apr 2, 2012 at 12:55 PM, Harsh J wrote:
> For 1.0, the right property is "mapred.reduce.child.java.opts". The
> "mapreduce.*" style would apply to MR in 2.0 and above.
>
> On Mon, Apr 2, 2012 at 3:00 PM, Juan Pino
> wrote:
> > Hello,
> >
> > I have a job tha
For 1.0, the right property is "mapred.reduce.child.java.opts". The
"mapreduce.*" style would apply to MR in 2.0 and above.
On Mon, Apr 2, 2012 at 3:00 PM, Juan Pino wrote:
> Hello,
>
> I have a job that requires a bit more memory than the default for the
> reducer (not for the mapper).
> So for
Ya I understand that we need to write the processing logic, what I want to know
is are there any kind of APIs that can be used for image processing,
Was reading about HIPI, is this the right API or webGL should be used?
Any other suggestions are welcome.
Thanks and Regards,
Shreya
-Original
This doesn't sound like a mapreduce[1] sort of problem. Now, of course,
you can store files in HDFS and retrieve them. But its up to your
application to interpret them. MapReduce cannot "display the
corresponding door image", it is a computation scheme and performs
calculations that you provide.
[
Hi Ondrej,
Pe 02.04.2012 13:00, Ondřej Klimpera a scris:
Ok, thanks.
I missed setup() method because of using older version of hadoop, so I
suppose that method configure() does the same in hadoop 0.20.203.
Aha, if it's possible, try upgrading. I don't know how support is for
versions older t
Hi,
My scenario is:
There are some images of Structures (Building plans etc) that have to be
stored in HDFS, If the user click on a door of that building, I want to
use mapreduce to display the corresponding door image stored in HDFS
and all the information related to it. In a nut shell an image
Hi Shreya,
Image files binary files . Use SequenceFile format to store the image in
hdfs and SequenceInputFormat to read the bytes . You can use TwoDWritable
to store matrix for image.
On Mon, Apr 2, 2012 at 3:36 PM, Sujit Dhamale wrote:
> Shreya can u please Explain your scenario .
>
>
> On Mo
I will be out of the office starting 04/02/2012 and will not return until
04/05/2012.
I am out of office, and will reply you when I am back.
Shreya can u please Explain your scenario .
On Mon, Apr 2, 2012 at 3:02 PM, wrote:
>
>
> Hi,
>
>
>
> Can someone point me to some info on Image processing using Hadoop?
>
>
>
> Regards,
>
> Shreya
>
>
> This e-mail and any files transmitted with it are for the sole use of the
> intended recipi
Ok, thanks.
I missed setup() method because of using older version of hadoop, so I
suppose that method configure() does the same in hadoop 0.20.203.
Now I'm able to load a map file inside configure() method to
MapFile.Reader instance as a class private variable, all works fine,
just wonderin
Hi Ondrej,
Pe 30.03.2012 14:30, Ondřej Klimpera a scris:
And one more question, is it even possible to add a MapFile (as it
consits of index and data file) to Distributed cache?
Thanks
Should be no problem, they are just two files.
On 03/30/2012 01:15 PM, Ondřej Klimpera wrote:
Hello,
I'm
Hi,
Can someone point me to some info on Image processing using Hadoop?
Regards,
Shreya
This e-mail and any files transmitted with it are for the sole use of the
intended recipient(s) and may contain confidential and privileged information.
If you are not the intended recipient, please
Hello,
I have a job that requires a bit more memory than the default for the
reducer (not for the mapper).
So for this I have this property in my configuration file:
mapreduce.reduce.java.opts=-Xmx4000m
When I run the job, I can see its configuration in the web interface and I
see that indeed I
Gaurav
NN memory might have hit its upper bound. As a bench mark, for every
1 million files/blocks/directories 1GB of memory is required on the NN. The
number of files in your cluster might have grown beyond this treshold. So
the options left for you would be
- If there are large number of s
36 matches
Mail list logo