Re: Compression codec org.apache.hadoop.io.compress.DeflateCodec not found.

2012-04-02 Thread Harsh J
Hi Eli, Moving this to cdh-u...@cloudera.org as its a CDH specific question. You'll get better answers from the community there. You are CC'd but to subscribe to the CDH users community, head to https://groups.google.com/a/cloudera.org/forum/#!forum/cdh-user. I've bcc'd common-user@ here. What yo

Compression codec org.apache.hadoop.io.compress.DeflateCodec not found.

2012-04-02 Thread Eli Finkelshteyn
Hi Folks, A coworker of mine recently setup a new CDH3 cluster with 4 machines (3 data nodes, one namenode that doubles as a jobtracker). I started looking through it using "hadoop fs -ls," and that went fine with everything displaying alright. Next, I decided to test out some simple pig jobs.

Re: data distribution in HDFS

2012-04-02 Thread Raj Vishwanathan
AFAIK there is no way to disable this "feature" . This is an optimization. It happens because in your case the node generating the data is also a data node. Raj > > From: Stijn De Weirdt >To: common-user@hadoop.apache.org >Sent: Monday, April 2, 2012 12:18 PM

Re: data distribution in HDFS

2012-04-02 Thread Stijn De Weirdt
thanks serge. is there a way to disable this "feature" (ie place first block always on local node)? and is this because the local node is a datanode? or is there always a "local node" with datatransfers? many thanks, stijn Local node is a node from where you are coping data from If lets

Re: data distribution in HDFS

2012-04-02 Thread Serge Blazhievsky
Local node is a node from where you are coping data from If lets say you are using -copyFromLocal option Regards Serge On 4/2/12 11:53 AM, "Stijn De Weirdt" wrote: >hi raj, > >what is a "local node"? is it relative to the tasks that are started? > > >stijn > >On 04/02/2012 07:28 PM, Raj Vishw

Re: data distribution in HDFS

2012-04-02 Thread Stijn De Weirdt
hi raj, what is a "local node"? is it relative to the tasks that are started? stijn On 04/02/2012 07:28 PM, Raj Vishwanathan wrote: Stijn, The first block of the data , is always stored in the local node. Assuming that you had a replication factor of 3, the node that generates the data will

Re: Getting RemoteException: while copying data from Local machine to HDFS

2012-04-02 Thread Harsh J
Per your jps, you don't have a DataNode running. > hduser@sujit:~/Desktop/data$ ups > 6022 NameNode > 7100 Jps > 6569 JobTracker > 6798 TaskTracker > 6491 SecondaryNameNode Please read http://wiki.apache.org/hadoop/CouldOnlyBeReplicatedTo to solve this. You most likely need to also read: http://s

Re: data distribution in HDFS

2012-04-02 Thread Raj Vishwanathan
Stijn, The first block of the data , is always stored in the local node. Assuming that you had a replication factor of 3, the node that generates the data will get about 10GB of data and the other 20GB will be distributed among other nodes. Raj  > > From: St

data distribution in HDFS

2012-04-02 Thread Stijn De Weirdt
hi all, i'm just started to play around with hdfs+mapred. i'm currently playing with teragen/sort/validate to see if i understand all. the test setup involves 5 nodes that all are tasktracker and datanode (and one node that is also jobtracker and namenode on top of that. (this one node is ru

Re: HADOOP_OPTS to tasks

2012-04-02 Thread Stijn De Weirdt
On 04/02/2012 04:18 PM, Harsh J wrote: HADOOP_OPTS isn't applied for Task JVMs. For Task JVMs, set "mapred.child.java.opts" in mapred-site.xml (Or via Configuration for per-job tuning), to the opts string you want it to have. For example "-Xmx200m -Dsomesysprop=abc". thanks! stijn On Mon, A

Re: HBase bulk loader doing speculative execution when it set to false in mapred-site.xml

2012-04-02 Thread anil gupta
+common-user@hadoop.apache.org Hi Harsh, Thanks for the information. Is there any way to differentiate between a client side property and server-side property?or a Document which enlists whether a property is server or client-side? Many times i have to speculate over this and try out test runs.

Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

2012-04-02 Thread Jay Vyas
Thanks J : just curious about how you came to hypothesize (1) (i.e. regarding the fact that threads and the API componentns arent thread safe in my hadoop version). I think thats a really good guess, and I would like to be able to make those sorts of intelligent hypotheses myself. Any reading you

Re: getting NullPointerException while running Word cont example

2012-04-02 Thread Sujit Dhamale
Can some one please look in to below issue ?? Thanks in Advance On Wed, Mar 7, 2012 at 9:09 AM, Sujit Dhamale wrote: > Hadoop version : hadoop-0.20.203.0rc1.tar > Operaring Syatem : ubuntu 11.10 > > > > On Wed, Mar 7, 2012 at 12:19 AM, Harsh J wrote: > >> Hi Sujit, >> >> Please also tell us whic

Re: HADOOP_OPTS to tasks

2012-04-02 Thread Harsh J
HADOOP_OPTS isn't applied for Task JVMs. For Task JVMs, set "mapred.child.java.opts" in mapred-site.xml (Or via Configuration for per-job tuning), to the opts string you want it to have. For example "-Xmx200m -Dsomesysprop=abc". On Mon, Apr 2, 2012 at 7:47 PM, Stijn De Weirdt wrote: > hi all, >

Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

2012-04-02 Thread Harsh J
Jay, Without seeing the whole stack trace all I can say as cause for that exception from a job is: 1. You're using threads and the API components you are using isn't thread safe in your version of Hadoop. 2. Files are being written out to HDFS directories without following the OC rules. (This is

HADOOP_OPTS to tasks

2012-04-02 Thread Stijn De Weirdt
hi all, is it normal that HADOOP_OPTS are not passed to the actual tasks (ie the java processes running as child of tasktracker)? the tasktracker process uses them correctly. is there a way to set general java options for each started task? many thanks, stijn

Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

2012-04-02 Thread Jay Vyas
No, my job does not write files directly to disk. It simply goes to some web pages , reads data (in the reducer phase), and parses jsons into thrift objects which are emitted via the standard MultipleOutputs API to hdfs files. Any idea why hadoop would throw the "AlreadyBeingCreatedException" ? O

Re: org.apache.hadoop.ipc.RemoteException: org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException:'

2012-04-02 Thread Harsh J
Jay, What does your job do? Create files directly on HDFS? If so, do you follow this method?: http://wiki.apache.org/hadoop/FAQ#Can_I_write_create.2BAC8-write-to_hdfs_files_directly_from_map.2BAC8-reduce_tasks.3F A local filesystem may not complain if you re-create an existent file. HDFS' behavio

Re: How can I configure oozie to submit different workflows from different users ?

2012-04-02 Thread praveenesh kumar
Is this a problem of proxy setting ? because after specifying the group name also, I am not able to run it. Its still giving me the same error. Thanks, Praveenesh On Mon, Apr 2, 2012 at 6:05 PM, Alejandro Abdelnur wrote: > multiple value are comma separated. keep in mind that valid values for >

Re: How can I configure oozie to submit different workflows from different users ?

2012-04-02 Thread Alejandro Abdelnur
multiple value are comma separated. keep in mind that valid values for proxyuser groups, as the property name states are GROUPS, not USERS. thxs. Alejandro On Mon, Apr 2, 2012 at 2:27 PM, praveenesh kumar wrote: > How can I specify multiple users /groups for proxy user setting ? > Can I give co

Re: How can I configure oozie to submit different workflows from different users ?

2012-04-02 Thread praveenesh kumar
How can I specify multiple users /groups for proxy user setting ? Can I give comma separated values in these settings ? Thanks, Praveenesh On Mon, Apr 2, 2012 at 5:52 PM, Alejandro Abdelnur wrote: > Praveenesh, > > If I'm not mistaken 0.20.205 does not support wildcards for the proxyuser > (host

Re: How can I configure oozie to submit different workflows from different users ?

2012-04-02 Thread Alejandro Abdelnur
Praveenesh, If I'm not mistaken 0.20.205 does not support wildcards for the proxyuser (hosts/groups) settings. You have to use explicit hosts/groups. Thxs. Alejandro PS: please follow up this thread in the oozie-us...@incubator.apache.org On Mon, Apr 2, 2012 at 2:15 PM, praveenesh kumar wrote:

Re: mapred.child.java.opts and mapreduce.reduce.java.opts

2012-04-02 Thread Juan Pino
Thank you that worked! Juan On Mon, Apr 2, 2012 at 12:55 PM, Harsh J wrote: > For 1.0, the right property is "mapred.reduce.child.java.opts". The > "mapreduce.*" style would apply to MR in 2.0 and above. > > On Mon, Apr 2, 2012 at 3:00 PM, Juan Pino > wrote: > > Hello, > > > > I have a job tha

Re: mapred.child.java.opts and mapreduce.reduce.java.opts

2012-04-02 Thread Harsh J
For 1.0, the right property is "mapred.reduce.child.java.opts". The "mapreduce.*" style would apply to MR in 2.0 and above. On Mon, Apr 2, 2012 at 3:00 PM, Juan Pino wrote: > Hello, > > I have a job that requires a bit more memory than the default for the > reducer (not for the mapper). > So for

RE: Image Processing in Hadoop

2012-04-02 Thread Shreya.Pal
Ya I understand that we need to write the processing logic, what I want to know is are there any kind of APIs that can be used for image processing, Was reading about HIPI, is this the right API or webGL should be used? Any other suggestions are welcome. Thanks and Regards, Shreya -Original

RE: Image Processing in Hadoop

2012-04-02 Thread Darren Govoni
This doesn't sound like a mapreduce[1] sort of problem. Now, of course, you can store files in HDFS and retrieve them. But its up to your application to interpret them. MapReduce cannot "display the corresponding door image", it is a computation scheme and performs calculations that you provide. [

Re: Working with MapFiles

2012-04-02 Thread Ioan Eugen Stan
Hi Ondrej, Pe 02.04.2012 13:00, Ondřej Klimpera a scris: Ok, thanks. I missed setup() method because of using older version of hadoop, so I suppose that method configure() does the same in hadoop 0.20.203. Aha, if it's possible, try upgrading. I don't know how support is for versions older t

RE: Image Processing in Hadoop

2012-04-02 Thread Shreya.Pal
Hi, My scenario is: There are some images of Structures (Building plans etc) that have to be stored in HDFS, If the user click on a door of that building, I want to use mapreduce to display the corresponding door image stored in HDFS and all the information related to it. In a nut shell an image

Re: Image Processing in Hadoop

2012-04-02 Thread madhu phatak
Hi Shreya, Image files binary files . Use SequenceFile format to store the image in hdfs and SequenceInputFormat to read the bytes . You can use TwoDWritable to store matrix for image. On Mon, Apr 2, 2012 at 3:36 PM, Sujit Dhamale wrote: > Shreya can u please Explain your scenario . > > > On Mo

Yuan Jin is out of the office.

2012-04-02 Thread Yuan Jin
I will be out of the office starting 04/02/2012 and will not return until 04/05/2012. I am out of office, and will reply you when I am back.

Re: Image Processing in Hadoop

2012-04-02 Thread Sujit Dhamale
Shreya can u please Explain your scenario . On Mon, Apr 2, 2012 at 3:02 PM, wrote: > > > Hi, > > > > Can someone point me to some info on Image processing using Hadoop? > > > > Regards, > > Shreya > > > This e-mail and any files transmitted with it are for the sole use of the > intended recipi

Re: Working with MapFiles

2012-04-02 Thread Ondřej Klimpera
Ok, thanks. I missed setup() method because of using older version of hadoop, so I suppose that method configure() does the same in hadoop 0.20.203. Now I'm able to load a map file inside configure() method to MapFile.Reader instance as a class private variable, all works fine, just wonderin

Re: Working with MapFiles

2012-04-02 Thread Ioan Eugen Stan
Hi Ondrej, Pe 30.03.2012 14:30, Ondřej Klimpera a scris: And one more question, is it even possible to add a MapFile (as it consits of index and data file) to Distributed cache? Thanks Should be no problem, they are just two files. On 03/30/2012 01:15 PM, Ondřej Klimpera wrote: Hello, I'm

Image Processing in Hadoop

2012-04-02 Thread Shreya.Pal
Hi, Can someone point me to some info on Image processing using Hadoop? Regards, Shreya This e-mail and any files transmitted with it are for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please

mapred.child.java.opts and mapreduce.reduce.java.opts

2012-04-02 Thread Juan Pino
Hello, I have a job that requires a bit more memory than the default for the reducer (not for the mapper). So for this I have this property in my configuration file: mapreduce.reduce.java.opts=-Xmx4000m When I run the job, I can see its configuration in the web interface and I see that indeed I

Re: 0 tasktrackers in jobtracker but all datanodes present

2012-04-02 Thread Bejoy Ks
Gaurav NN memory might have hit its upper bound. As a bench mark, for every 1 million files/blocks/directories 1GB of memory is required on the NN. The number of files in your cluster might have grown beyond this treshold. So the options left for you would be - If there are large number of s