Re: How does Hadoop compile the program written in language other than Java ?

2012-03-04 Thread Kai Voigt
Hi,

the streaming API doesn't compile the streaming scripts.

The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be 
called as external programs.

The input key/value pairs will be send to your scripts as stdin, and the output 
will be collected from their stdout.

So, no compilation, the scripts will just be executed.

Kai

Am 04.03.2012 um 15:42 schrieb Lac Trung:

 Hi everyone !
 
 Hadoop is written in Java, so mapreduce programs are written in Java, too.
 But Hadoop provides an API to MapReduce that allows you to write your map
 and reduce functions in languages other than Java (ex. Python), called
 Hadoop Streaming.
 I read the guide of Hadoop Streaming in
 herehttp://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples
 but
 I haven't seen any paragraph that write about converting language to Java.
 Can anybody tell me how Hadoop compile the program written in language
 other than Java.
 
 Thank you !
 -- 
 Lac Trung

-- 
Kai Voigt
k...@123.org






Re: AWS MapReduce

2012-03-04 Thread Hannes Carl Meyer
Hi,

yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow.
The setup is done pretty fast and there are some configuration parameters
you can bypass - for example blocksizes etc. - but in the end imho setting
up ec2 instances by copying images is the better alternative.

Kind Regards

Hannes

On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia mohitanch...@gmail.comwrote:

 I think found answer to this question. However, it's still not clear if
 HDFS is on local disk or EBS volumes. Does anyone know?

 On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  Just want to check  how many are using AWS mapreduce and understand the
  pros and cons of Amazon's MapReduce machines? Is it true that these map
  reduce machines are really reading and writing from S3 instead of local
  disks? Has anyone found issues with Amazon MapReduce and how does it
  compare with using MapReduce on local attached disks compared to using
 S3.


---
www.informera.de
Hadoop  Big Data Services


Re: How does Hadoop compile the program written in language other than Java ?

2012-03-04 Thread Lac Trung
Can you give one or some examples about this, Kai ?
I haven't understood how Hadoop run a mapreduce program in other language :D

Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết:

 Hi,

 the streaming API doesn't compile the streaming scripts.

 The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be
 called as external programs.

 The input key/value pairs will be send to your scripts as stdin, and the
 output will be collected from their stdout.

 So, no compilation, the scripts will just be executed.

 Kai

 Am 04.03.2012 um 15:42 schrieb Lac Trung:

  Hi everyone !
 
  Hadoop is written in Java, so mapreduce programs are written in Java,
 too.
  But Hadoop provides an API to MapReduce that allows you to write your map
  and reduce functions in languages other than Java (ex. Python), called
  Hadoop Streaming.
  I read the guide of Hadoop Streaming in
  here
 http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples
 
  but
  I haven't seen any paragraph that write about converting language to
 Java.
  Can anybody tell me how Hadoop compile the program written in language
  other than Java.
 
  Thank you !
  --
  Lac Trung

 --
 Kai Voigt
 k...@123.org







-- 
Lạc Trung
20083535


Re: How does Hadoop compile the program written in language other than Java ?

2012-03-04 Thread Kai Voigt
Hi,

please read http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming 
and http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming for more 
explanation and examples.

Kai

Am 04.03.2012 um 16:10 schrieb Lac Trung:

 Can you give one or some examples about this, Kai ?
 I haven't understood how Hadoop run a mapreduce program in other language :D
 
 Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết:
 
 Hi,
 
 the streaming API doesn't compile the streaming scripts.
 
 The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be
 called as external programs.
 
 The input key/value pairs will be send to your scripts as stdin, and the
 output will be collected from their stdout.
 
 So, no compilation, the scripts will just be executed.
 
 Kai
 
 Am 04.03.2012 um 15:42 schrieb Lac Trung:
 
 Hi everyone !
 
 Hadoop is written in Java, so mapreduce programs are written in Java,
 too.
 But Hadoop provides an API to MapReduce that allows you to write your map
 and reduce functions in languages other than Java (ex. Python), called
 Hadoop Streaming.
 I read the guide of Hadoop Streaming in
 here
 http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples
 
 but
 I haven't seen any paragraph that write about converting language to
 Java.
 Can anybody tell me how Hadoop compile the program written in language
 other than Java.
 
 Thank you !
 --
 Lac Trung
 
 --
 Kai Voigt
 k...@123.org
 
 
 
 
 
 
 
 -- 
 Lạc Trung
 20083535

-- 
Kai Voigt
k...@123.org






Re: How does Hadoop compile the program written in language other than Java ?

2012-03-04 Thread Gopal

On 3/4/2012 5:44 AM, Kai Voigt wrote:

Hi,

please read http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming 
and http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming for more 
explanation and examples.

Kai

Am 04.03.2012 um 16:10 schrieb Lac Trung:


Can you give one or some examples about this, Kai ?
I haven't understood how Hadoop run a mapreduce program in other language :D

Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigtk...@123.org  đã viết:


Hi,

the streaming API doesn't compile the streaming scripts.

The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will be
called as external programs.

The input key/value pairs will be send to your scripts as stdin, and the
output will be collected from their stdout.

So, no compilation, the scripts will just be executed.

Kai

Am 04.03.2012 um 15:42 schrieb Lac Trung:


Hi everyone !

Hadoop is written in Java, so mapreduce programs are written in Java,

too.

But Hadoop provides an API to MapReduce that allows you to write your map
and reduce functions in languages other than Java (ex. Python), called
Hadoop Streaming.
I read the guide of Hadoop Streaming in
here

http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples

but
I haven't seen any paragraph that write about converting language to

Java.

Can anybody tell me how Hadoop compile the program written in language
other than Java.

Thank you !
--
Lac Trung

--
Kai Voigt
k...@123.org







--
Lạc Trung
20083535

Its nothing out of ordinary.

The java routines written for your specific language  ( Jruby, Jython ) 
etc. will interpret the code and run it.


No different from how Perl or Php work.

Thanks


Re: How does Hadoop compile the program written in language other than Java ?

2012-03-04 Thread Lac Trung
Oh, I read it twice but it is familiar with
thishttp://hadoop.apache.org/common/docs/r0.15.2/streaming.html.
Now I think I should read Hadoop Pipe first. You are so pro, Kai.

Vào 02:44 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết:

 Hi,

 please read
 http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming and
 http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming for
 more explanation and examples.

 Kai

 Am 04.03.2012 um 16:10 schrieb Lac Trung:

  Can you give one or some examples about this, Kai ?
  I haven't understood how Hadoop run a mapreduce program in other
 language :D
 
  Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết:
 
  Hi,
 
  the streaming API doesn't compile the streaming scripts.
 
  The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will
 be
  called as external programs.
 
  The input key/value pairs will be send to your scripts as stdin, and the
  output will be collected from their stdout.
 
  So, no compilation, the scripts will just be executed.
 
  Kai
 
  Am 04.03.2012 um 15:42 schrieb Lac Trung:
 
  Hi everyone !
 
  Hadoop is written in Java, so mapreduce programs are written in Java,
  too.
  But Hadoop provides an API to MapReduce that allows you to write your
 map
  and reduce functions in languages other than Java (ex. Python), called
  Hadoop Streaming.
  I read the guide of Hadoop Streaming in
  here
 
 http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples
 
  but
  I haven't seen any paragraph that write about converting language to
  Java.
  Can anybody tell me how Hadoop compile the program written in language
  other than Java.
 
  Thank you !
  --
  Lac Trung
 
  --
  Kai Voigt
  k...@123.org
 
 
 
 
 
 
 
  --
  Lạc Trung
  20083535

 --
 Kai Voigt
 k...@123.org







-- 
Lạc Trung
20083535


Re: How does Hadoop compile the program written in language other than Java ?

2012-03-04 Thread Lac Trung
Can anyone give me a diagram that describe the process from given input
data to output data ?

Vào 02:57 Ngày 04 tháng 3 năm 2012, Lac Trung trungnb3...@gmail.com đã
viết:

 Oh, I read it twice but it is familiar with 
 thishttp://hadoop.apache.org/common/docs/r0.15.2/streaming.html.
 Now I think I should read Hadoop Pipe first. You are so pro, Kai.

 Vào 02:44 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết:

 Hi,

 please read
 http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming and
 http://developer.yahoo.com/hadoop/tutorial/module4.html#streaming for
 more explanation and examples.

 Kai

 Am 04.03.2012 um 16:10 schrieb Lac Trung:

  Can you give one or some examples about this, Kai ?
  I haven't understood how Hadoop run a mapreduce program in other
 language :D
 
  Vào 02:21 Ngày 04 tháng 3 năm 2012, Kai Voigt k...@123.org đã viết:
 
  Hi,
 
  the streaming API doesn't compile the streaming scripts.
 
  The PHP/Perl/Python/Ruby scripts you create as mapper and reducer will
 be
  called as external programs.
 
  The input key/value pairs will be send to your scripts as stdin, and
 the
  output will be collected from their stdout.
 
  So, no compilation, the scripts will just be executed.
 
  Kai
 
  Am 04.03.2012 um 15:42 schrieb Lac Trung:
 
  Hi everyone !
 
  Hadoop is written in Java, so mapreduce programs are written in Java,
  too.
  But Hadoop provides an API to MapReduce that allows you to write your
 map
  and reduce functions in languages other than Java (ex. Python), called
  Hadoop Streaming.
  I read the guide of Hadoop Streaming in
  here
 
 http://www.hadoop.apache.org/common/docs/r0.15.2/streaming.html#More+usage+examples
 
  but
  I haven't seen any paragraph that write about converting language to
  Java.
  Can anybody tell me how Hadoop compile the program written in language
  other than Java.
 
  Thank you !
  --
  Lac Trung
 
  --
  Kai Voigt
  k...@123.org
 
 
 
 
 
 
 
  --
  Lạc Trung
  20083535

 --
 Kai Voigt
 k...@123.org







 --
 Lạc Trung
 20083535




-- 
Lạc Trung
20083535


Re: AWS MapReduce

2012-03-04 Thread Mohit Anchlia
As far as I see in the docs it looks like you could also use hdfs instead
of s3. But what I am not sure is if these are local disks or EBS.

On Sun, Mar 4, 2012 at 2:27 AM, Hannes Carl Meyer hannesc...@googlemail.com
 wrote:

 Hi,

 yes, its loaded from S3. Imho is Amazon AWS Map-Reduce pretty slow.
 The setup is done pretty fast and there are some configuration parameters
 you can bypass - for example blocksizes etc. - but in the end imho setting
 up ec2 instances by copying images is the better alternative.

 Kind Regards

 Hannes

 On Sun, Mar 4, 2012 at 2:31 AM, Mohit Anchlia mohitanch...@gmail.com
 wrote:

  I think found answer to this question. However, it's still not clear if
  HDFS is on local disk or EBS volumes. Does anyone know?
 
  On Sat, Mar 3, 2012 at 3:54 PM, Mohit Anchlia mohitanch...@gmail.com
  wrote:
 
   Just want to check  how many are using AWS mapreduce and understand the
   pros and cons of Amazon's MapReduce machines? Is it true that these map
   reduce machines are really reading and writing from S3 instead of local
   disks? Has anyone found issues with Amazon MapReduce and how does it
   compare with using MapReduce on local attached disks compared to using
  S3.
 

 ---
 www.informera.de
 Hadoop  Big Data Services



Re: Hadoop pain points?

2012-03-04 Thread robert
2012/3/2 Kunaal wrote:
 I am doing a general poll on what are the most prevalent pain points that
 people run into with Hadoop? These could be performance related (memory
 usage, IO latencies), usage related or anything really.


My biggest frustration with core Hadoop after the last year or so has
been not having the capability to efficiently implement the so-called
analytic functions in general with map reduce.

These are not what one would think they are from just the name by the
way - see Oracle Analytics as an example of what I mean. The big
advantage is that they often allow you to avoid expensive self-joins
which can make a huge difference performance wise.

(I would say that 80% of the analytic functions can be implemented with
a UDF or a UDA in hive -- things like lead() or lag() or first() or
rank() -- but it is the other 20% that would knock the ball out of the park)




Fwd: nutch log

2012-03-04 Thread alessio crisantemi
hi all,
i use solr 1.4.1 and nutch 1.4 .
I don't have error in tomcat log and not error in nutch log.
I see only this code on cygwin window:

Exception in thread main org.apache.hadoop.mapred.InvalidInputException:
Input path does not exist:
file:/C:/temp/apache-nutch-1.4-bin/runtime/local/crawl/segments/20120303171628/parse_data

at
org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:190)

at
org.apache.hadoop.mapred.SequenceFileInputFormat.listStatus(SequenceFileInputFormat.java:44)

at
org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:201)

at org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)

at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:781)

at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)

at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)

at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:175)

at org.apache.nutch.crawl.LinkDb.invert(LinkDb.java:149)

at org.apache.nutch.crawl.Crawl.run(Crawl.java:143)

at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)

at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)


why, in your opinion?
thanks again
alessio
Il giorno 03 marzo 2012 16:43, Koji Sekiguchi k...@r.email.ne.jp ha
scritto:

(12/03/04 0:09), alessio crisantemi wrote:

 is true.
 this is the slr problem:
 mar 03, 2012 12:08:04 PM org.apache.solr.common.**SolrException log
 Grave: org.apache.solr.common.**SolrException: invalid boolean value:


 Solr said that there was an erroneous boolean value in your solrconfig.xml.
 Check the values of bool.../bool of your solr plugins in
 solrconfig.xml.
 Those should be one of true/false/on/off/...


 koji
 --
 Query Log Visualizer for Apache Solr
 http://soleami.com/



Re: Hadoop pain points?

2012-03-04 Thread Arun C Murthy
On Mar 2, 2012, at 4:09 PM, Harsh J wrote:

 Since you ask about anything in general, when I forayed into using
 Hadoop, my biggest pain was lack of documentation clarity and
 completeness over the MR and DFS user APIs (and other little points).
 
 It would be nice to have some work done to have one example or
 semi-example for every single Input/OutputFormat, Mapper/Reducer
 implementations, etc. added to the javadocs.
 
 I believe examples and snippets help out a ton (tons more than
 explaining just behavior) to new devs.

Good points Harsh. Would you like to contribute some documentation patches? 


 
 On Fri, Mar 2, 2012 at 9:45 PM, Kunaal kunalbha...@alumni.cmu.edu wrote:
 I am doing a general poll on what are the most prevalent pain points that
 people run into with Hadoop? These could be performance related (memory
 usage, IO latencies), usage related or anything really.
 
 The goal is to look for what areas this platform could benefit the most in
 the near future.
 
 Any feedback is much appreciated.
 
 Thanks,
 Kunal.
 
 
 
 -- 
 Harsh J

--
Arun C. Murthy
Hortonworks Inc.
http://hortonworks.com/




Re: Hadoop pain points?

2012-03-04 Thread Michael Segel
What?
The lack of documentation is what made Hadoop, really HBase, a lot of fun:-)
You know what they say... Not guts, no glory...

I'm sorry, while I agree w Harsh, I just don't want to sound like some old guy 
talking about how when they were young, they had to walk in chest high snow, in 
a blizzard, uphill (both ways)to and from school ... And how you newbies have 
it so much better...

;-P

Sent from my iPhone

On Mar 2, 2012, at 6:42 PM, Russell Jurney russell.jur...@gmail.com wrote:

 +2
 
 Russell Jurney http://datasyndrome.com
 
 On Mar 2, 2012, at 4:38 PM, Mohit Anchlia mohitanch...@gmail.com wrote:
 
 +1
 
 On Fri, Mar 2, 2012 at 4:09 PM, Harsh J ha...@cloudera.com wrote:
 
 Since you ask about anything in general, when I forayed into using
 Hadoop, my biggest pain was lack of documentation clarity and
 completeness over the MR and DFS user APIs (and other little points).
 
 It would be nice to have some work done to have one example or
 semi-example for every single Input/OutputFormat, Mapper/Reducer
 implementations, etc. added to the javadocs.
 
 I believe examples and snippets help out a ton (tons more than
 explaining just behavior) to new devs.
 
 On Fri, Mar 2, 2012 at 9:45 PM, Kunaal kunalbha...@alumni.cmu.edu wrote:
 I am doing a general poll on what are the most prevalent pain points that
 people run into with Hadoop? These could be performance related (memory
 usage, IO latencies), usage related or anything really.
 
 The goal is to look for what areas this platform could benefit the most
 in
 the near future.
 
 Any feedback is much appreciated.
 
 Thanks,
 Kunal.
 
 
 
 --
 Harsh J
 


Custom data structures in Hadoop

2012-03-04 Thread madhu phatak
Hi,
 I have a following issue in Hadoop 0.20.2. When i try to use inheritance
with WritableComparables the job is failing. Example If i create a base
writable called as shape

  public abstract class ShapeWritableT implements WritableComparableT
  {

  }

 Then extend this for a concrete class called CircleWritable . Now if my
mapper output is set as ShapeWritable in job configuration  and I write a
CircleWriable in collect.write() , the map fails with class mismatch. When
i looked into source code of MapTask.java , i saw the following code

*public** **synchronized** **void* collect(K key, V value,* **int* partition

 )* **throws* IOException {

  reporter.progress();

 * **if* (key.getClass() != keyClass) {

   * **throw** **new* IOException(Type mismatch in key from map:
expected 

  + keyClass.getName() + , recieved 

  + key.getClass().getName());

  }

 * **if* (value.getClass() != valClass) {

   * **throw** **new* IOException(Type mismatch in value from map:
expected 

  + valClass.getName() + , recieved 

  + value.getClass().getName());

  }
Here , we are matching directly classes which means inheritance don't work.
Can anyone tell why its implemented in this way? Is it changed in newer
version of Hadoop?


-- 
Join me at http://hadoopworkshop.eventbrite.com/


Re: Custom Seq File Loader: ClassNotFoundException

2012-03-04 Thread madhu phatak
Hi,
 Please make sure that your CustomWritable has a default constructor.

On Sat, Mar 3, 2012 at 4:56 AM, Mark question markq2...@gmail.com wrote:

 Hello,

   I'm trying to debug my code through eclipse, which worked fine with
 given Hadoop applications (eg. wordcount), but as soon as I run it on my
 application with my custom sequence input file/types, I get:
 Java.lang.runtimeException.java.ioException (Writable name can't load
 class)
 SequenceFile$Reader.getValeClass(Sequence File.class)

 because my valueClass is customed. In other words, how can I add/build my
 CustomWritable class to be with hadoop LongWritable,IntegerWritable 
 etc.

 Did anyone used eclipse?

 Mark




-- 
Join me at http://hadoopworkshop.eventbrite.com/