Re: How does Hadoop compile the program written in language other than Java ?
Hi, the streaming API doesn't compile the streaming scripts. The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be called as external programs. The input key/value pairs will be send to your scripts as stdin, and the output will be collected from their stdout. So, no compilation, the scripts will just be executed. Kai Am 04.03.2012 um 15:42 schrieb Lac Trung: Hi everyone ! Hadoop is written in Java, so mapreduce programs are written in Java, too. But Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java (ex. Python), called Hadoop Streaming. I read the guide of Hadoop Streaming in herehttp://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples but I haven't seen any paragraph that write about converting language to Java. Can anybody tell me how Hadoop compile the program written in language other than Java. Thank you ! -- Lac Trung -- Kai Voigt k...@123.org
Re: AWS MapReduce
Hi, yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. The setup is done pretty fast and there are some configuration parameters you can bypass - for example blocksizes etc. - but in the end imho setting up ec2 instances by copying images is the better alternative. Kind Regards Hannes On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia mohitanch...@gmail.comwrote: I think found answer to this question. However, it's still not clear if HDFS is on local disk or EBS volumes. Does anyone know? On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Just want to check how many are using AWS mapreduce and understand the pros and cons of Amazon's MapReduce machines? Is it true that these map reduce machines are really reading and writing from S3 instead of local disks? Has anyone found issues with Amazon MapReduce and how does it compare with using MapReduce on local attached disks compared to using S3. --- www.informera.de Hadoop Big Data Services
Re: How does Hadoop compile the program written in language other than Java ?
Can you give one or some examples about this, Kai ? I haven't understood how Hadoop run a mapreduce program in other language :D Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết: Hi, the streaming API doesn't compile the streaming scripts. The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be called as external programs. The input key/value pairs will be send to your scripts as stdin, and the output will be collected from their stdout. So, no compilation, the scripts will just be executed. Kai Am 04.03.2012 um 15:42 schrieb Lac Trung: Hi everyone ! Hadoop is written in Java, so mapreduce programs are written in Java, too. But Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java (ex. Python), called Hadoop Streaming. I read the guide of Hadoop Streaming in here http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples but I haven't seen any paragraph that write about converting language to Java. Can anybody tell me how Hadoop compile the program written in language other than Java. Thank you ! -- Lac Trung -- Kai Voigt k...@123.org -- Lạc Trung 20083535
Re: How does Hadoop compile the program written in language other than Java ?
Hi, please read http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming and http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming for more explanation and examples. Kai Am 04.03.2012 um 16:10 schrieb Lac Trung: Can you give one or some examples about this, Kai ? I haven't understood how Hadoop run a mapreduce program in other language :D Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết: Hi, the streaming API doesn't compile the streaming scripts. The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be called as external programs. The input key/value pairs will be send to your scripts as stdin, and the output will be collected from their stdout. So, no compilation, the scripts will just be executed. Kai Am 04.03.2012 um 15:42 schrieb Lac Trung: Hi everyone ! Hadoop is written in Java, so mapreduce programs are written in Java, too. But Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java (ex. Python), called Hadoop Streaming. I read the guide of Hadoop Streaming in here http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples but I haven't seen any paragraph that write about converting language to Java. Can anybody tell me how Hadoop compile the program written in language other than Java. Thank you ! -- Lac Trung -- Kai Voigt k...@123.org -- Lạc Trung 20083535 -- Kai Voigt k...@123.org
Re: How does Hadoop compile the program written in language other than Java ?
On 3/4/2012 5:44 AM, Kai Voigt wrote: Hi, please read http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming and http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming for more explanation and examples. Kai Am 04.03.2012 um 16:10 schrieb Lac Trung: Can you give one or some examples about this, Kai ? I haven't understood how Hadoop run a mapreduce program in other language :D Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigtk...@123.org đã viết: Hi, the streaming API doesn't compile the streaming scripts. The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be called as external programs. The input key/value pairs will be send to your scripts as stdin, and the output will be collected from their stdout. So, no compilation, the scripts will just be executed. Kai Am 04.03.2012 um 15:42 schrieb Lac Trung: Hi everyone ! Hadoop is written in Java, so mapreduce programs are written in Java, too. But Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java (ex. Python), called Hadoop Streaming. I read the guide of Hadoop Streaming in here http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples but I haven't seen any paragraph that write about converting language to Java. Can anybody tell me how Hadoop compile the program written in language other than Java. Thank you ! -- Lac Trung -- Kai Voigt k...@123.org -- Lạc Trung 20083535 Its nothing out of ordinary. The java routines written for your specific language ( Jruby, Jython ) etc. will interpret the code and run it. No different from how Perl or Php work. Thanks
Re: How does Hadoop compile the program written in language other than Java ?
Oh, I read it twice but it is familiar with thishttp://hadoop.apache.org/common/docs/r0.15.2/streaming.html. Now I think I should read Hadoop Pipe first. You are so pro, Kai. Vào 02:44 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết: Hi, please read http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming and http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming for more explanation and examples. Kai Am 04.03.2012 um 16:10 schrieb Lac Trung: Can you give one or some examples about this, Kai ? I haven't understood how Hadoop run a mapreduce program in other language :D Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết: Hi, the streaming API doesn't compile the streaming scripts. The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be called as external programs. The input key/value pairs will be send to your scripts as stdin, and the output will be collected from their stdout. So, no compilation, the scripts will just be executed. Kai Am 04.03.2012 um 15:42 schrieb Lac Trung: Hi everyone ! Hadoop is written in Java, so mapreduce programs are written in Java, too. But Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java (ex. Python), called Hadoop Streaming. I read the guide of Hadoop Streaming in here http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples but I haven't seen any paragraph that write about converting language to Java. Can anybody tell me how Hadoop compile the program written in language other than Java. Thank you ! -- Lac Trung -- Kai Voigt k...@123.org -- Lạc Trung 20083535 -- Kai Voigt k...@123.org -- Lạc Trung 20083535
Re: How does Hadoop compile the program written in language other than Java ?
Can anyone give me a diagram that describe the process from given input data to output data ? Vào 02:57 Ngày 04 tháng 3 năm 2012, Lac Trung trungnb3...@gmail.com đã viết: Oh, I read it twice but it is familiar with thishttp://hadoop.apache.org/common/docs/r0.15.2/streaming.html. Now I think I should read Hadoop Pipe first. You are so pro, Kai. Vào 02:44 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết: Hi, please read http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming and http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming for more explanation and examples. Kai Am 04.03.2012 um 16:10 schrieb Lac Trung: Can you give one or some examples about this, Kai ? I haven't understood how Hadoop run a mapreduce program in other language :D Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết: Hi, the streaming API doesn't compile the streaming scripts. The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be called as external programs. The input key/value pairs will be send to your scripts as stdin, and the output will be collected from their stdout. So, no compilation, the scripts will just be executed. Kai Am 04.03.2012 um 15:42 schrieb Lac Trung: Hi everyone ! Hadoop is written in Java, so mapreduce programs are written in Java, too. But Hadoop provides an API to MapReduce that allows you to write your map and reduce functions in languages other than Java (ex. Python), called Hadoop Streaming. I read the guide of Hadoop Streaming in here http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples but I haven't seen any paragraph that write about converting language to Java. Can anybody tell me how Hadoop compile the program written in language other than Java. Thank you ! -- Lac Trung -- Kai Voigt k...@123.org -- Lạc Trung 20083535 -- Kai Voigt k...@123.org -- Lạc Trung 20083535 -- Lạc Trung 20083535
Re: AWS MapReduce
As far as I see in the docs it looks like you could also use hdfs instead of s3. But what I am not sure is if these are local disks or EBS. On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer hannesc...@googlemail.com wrote: Hi, yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow. The setup is done pretty fast and there are some configuration parameters you can bypass - for example blocksizes etc. - but in the end imho setting up ec2 instances by copying images is the better alternative. Kind Regards Hannes On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia mohitanch...@gmail.com wrote: I think found answer to this question. However, it's still not clear if HDFS is on local disk or EBS volumes. Does anyone know? On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia mohitanch...@gmail.com wrote: Just want to check how many are using AWS mapreduce and understand the pros and cons of Amazon's MapReduce machines? Is it true that these map reduce machines are really reading and writing from S3 instead of local disks? Has anyone found issues with Amazon MapReduce and how does it compare with using MapReduce on local attached disks compared to using S3. --- www.informera.de Hadoop Big Data Services
Re: Hadoop pain points?
2012/3/2 Kunaal wrote: I am doing a general poll on what are the most prevalent pain points that people run into with Hadoop? These could be performance related (memory usage, IO latencies), usage related or anything really. My biggest frustration with core Hadoop after the last year or so has been not having the capability to efficiently implement the so-called analytic functions in general with map reduce. These are not what one would think they are from just the name by the way - see Oracle Analytics as an example of what I mean. The big advantage is that they often allow you to avoid expensive self-joins which can make a huge difference performance wise. (I would say that 80% of the analytic functions can be implemented with a UDF or a UDA in hive -- things like lead() or lag() or first() or rank() -- but it is the other 20% that would knock the ball out of the park)
Fwd: nutch log
hi all, i use solr 1.4.1 and nutch 1.4 . I don't have error in tomcat log and not error in nutch log. I see only this code on cygwin window: Exception in thread main org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl/segments/20120303171628/parse_data at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190) at org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44) at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201) at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810) at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781) at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730) at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175) at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149) at org.apache.nutch.crawl.Crawl.run(Crawl.java:143) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.crawl.Crawl.main(Crawl.java:55) why, in your opinion? thanks again alessio Il giorno 03 marzo 2012 16:43, Koji Sekiguchi k...@r.email.ne.jp ha scritto: (12/03/04 0:09), alessio crisantemi wrote: is true. this is the slr problem: mar 03, 2012 12:08:04 PM org.apache.solr.common.**SolrException log Grave: org.apache.solr.common.**SolrException: invalid boolean value: Solr said that there was an erroneous boolean value in your solrconfig.xml. Check the values of bool.../bool of your solr plugins in solrconfig.xml. Those should be one of true/false/on/off/... koji -- Query Log Visualizer for Apache Solr http://soleami.com/
Re: Hadoop pain points?
On Mar 2, 2012, at 4:09 PM, Harsh J wrote: Since you ask about anything in general, when I forayed into using Hadoop, my biggest pain was lack of documentation clarity and completeness over the MR and DFS user APIs (and other little points). It would be nice to have some work done to have one example or semi-example for every single Input/OutputFormat, Mapper/Reducer implementations, etc. added to the javadocs. I believe examples and snippets help out a ton (tons more than explaining just behavior) to new devs. Good points Harsh. Would you like to contribute some documentation patches? On Fri, Mar 2, 2012 at 9:45 PM, Kunaal kunalbha...@alumni.cmu.edu wrote: I am doing a general poll on what are the most prevalent pain points that people run into with Hadoop? These could be performance related (memory usage, IO latencies), usage related or anything really. The goal is to look for what areas this platform could benefit the most in the near future. Any feedback is much appreciated. Thanks, Kunal. -- Harsh J -- Arun C. Murthy Hortonworks Inc. http://hortonworks.com/
Re: Hadoop pain points?
What? The lack of documentation is what made Hadoop, really HBase, a lot of fun:-) You know what they say... Not guts, no glory... I'm sorry, while I agree w Harsh, I just don't want to sound like some old guy talking about how when they were young, they had to walk in chest high snow, in a blizzard, uphill (both ways)to and from school ... And how you newbies have it so much better... ;-P Sent from my iPhone On Mar 2, 2012, at 6:42 PM, Russell Jurney russell.jur...@gmail.com wrote: +2 Russell Jurney http://datasyndrome.com On Mar 2, 2012, at 4:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote: +1 On Fri, Mar 2, 2012 at 4:09 PM, Harsh J ha...@cloudera.com wrote: Since you ask about anything in general, when I forayed into using Hadoop, my biggest pain was lack of documentation clarity and completeness over the MR and DFS user APIs (and other little points). It would be nice to have some work done to have one example or semi-example for every single Input/OutputFormat, Mapper/Reducer implementations, etc. added to the javadocs. I believe examples and snippets help out a ton (tons more than explaining just behavior) to new devs. On Fri, Mar 2, 2012 at 9:45 PM, Kunaal kunalbha...@alumni.cmu.edu wrote: I am doing a general poll on what are the most prevalent pain points that people run into with Hadoop? These could be performance related (memory usage, IO latencies), usage related or anything really. The goal is to look for what areas this platform could benefit the most in the near future. Any feedback is much appreciated. Thanks, Kunal. -- Harsh J
Custom data structures in Hadoop
Hi, I have a following issue in Hadoop 0.20.2. When i try to use inheritance with WritableComparables the job is failing. Example If i create a base writable called as shape public abstract class ShapeWritableT implements WritableComparableT { } Then extend this for a concrete class called CircleWritable . Now if my mapper output is set as ShapeWritable in job configuration and I write a CircleWriable in collect.write() , the map fails with class mismatch. When i looked into source code of MapTask.java , i saw the following code *public** **synchronized** **void* collect(K key, V value,* **int* partition )* **throws* IOException { reporter.progress(); * **if* (key.getClass() != keyClass) { * **throw** **new* IOException(Type mismatch in key from map: expected + keyClass.getName() + , recieved + key.getClass().getName()); } * **if* (value.getClass() != valClass) { * **throw** **new* IOException(Type mismatch in value from map: expected + valClass.getName() + , recieved + value.getClass().getName()); } Here , we are matching directly classes which means inheritance don't work. Can anyone tell why its implemented in this way? Is it changed in newer version of Hadoop? -- Join me at http://hadoopworkshop.eventbrite.com/
Re: Custom Seq File Loader: ClassNotFoundException
Hi, Please make sure that your CustomWritable has a default constructor. On Sat, Mar 3, 2012 at 4:56 AM, Mark question markq2...@gmail.com wrote: Hello, I'm trying to debug my code through eclipse, which worked fine with given Hadoop applications (eg. wordcount), but as soon as I run it on my application with my custom sequence input file/types, I get: Java.lang.runtimeException.java.ioException (Writable name can't load class) SequenceFile$Reader.getValeClass(Sequence File.class) because my valueClass is customed. In other words, how can I add/build my CustomWritable class to be with hadoop LongWritable,IntegerWritable etc. Did anyone used eclipse? Mark -- Join me at http://hadoopworkshop.eventbrite.com/