Re: copy data from one hadoop cluster to another hadoop cluster + cant use distcp

2015-06-20 Thread SF Hadoop
Really depends on your requirements for the format of the data.

The easiest way I can think of is to stream batches of data into a pub
sub system that the target system can access and then consume.

Verify each batch and then ditch them.

You can throttle the size of the intermediary infrastructure based on your
batches.

Seems the most efficient approach.

On Thursday, June 18, 2015, Divya Gehlot divya.htco...@gmail.com wrote:

 Hi,
 I need to copy data from first hadoop cluster to second hadoop cluster.
 I cant access second hadoop cluster from first hadoop cluster due to some
 security issue.
 Can any point me how can I do apart from distcp command.
 For instance
 Cluster 1 secured zone - copy hdfs data  to - cluster 2 in non secured
 zone



 Thanks,
 Divya





Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
I'm not sure if this is an HBase issue or an Hadoop issue so if this is
off-topic please forgive.

I am having a problem with Hadoop maxing out drive space on a select few
nodes when I am running an HBase job.  The scenario is this:

- The job is a data import using Map/Reduce / HBase
- The data is being imported to one table
- The table only has a couple of regions
- As the job runs, HBase? / Hadoop? begins placing the data in HDFS on the
datanode / regionserver that is hosting  the regions
- As the job progresses (and more data is imported) the two datanodes
hosting the regions start to get full and eventually drive space hits 100%
utilization whilst the other nodes in the cluster are at 40% or less drive
space utilization
- The job in Hadoop then begins to hang with multiple out of space errors
and eventually fails.

I have tried running hadoop balancer during the job run and this helped but
only really succeeded in prolonging the eventual job failure.

How can I get Hadoop / HBase to distribute the data to HDFS more evenly
when it is favoring the nodes that the regions are on?

Am I missing something here?

Thanks for any help.


Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
This doesn't help because the space is simply reserved for the OS. Hadoop
still maxes out its quota and spits out out of space errors.

Thanks

On Wednesday, October 8, 2014, Bing Jiang jiangbinglo...@gmail.com wrote:

 Could you set a reserved room for non-dfs usage? Just to avoid the disk
 gets full.  hdfs-site.xml

 property

 namedfs.datanode.du.reserved/name

 value/value

 descriptionReserved space in bytes per volume. Always leave this much
 space free for non dfs use.

 /description

 /property

 2014-10-09 14:01 GMT+08:00 SF Hadoop sfhad...@gmail.com
 javascript:_e(%7B%7D,'cvml','sfhad...@gmail.com');:

 I'm not sure if this is an HBase issue or an Hadoop issue so if this is
 off-topic please forgive.

 I am having a problem with Hadoop maxing out drive space on a select few
 nodes when I am running an HBase job.  The scenario is this:

 - The job is a data import using Map/Reduce / HBase
 - The data is being imported to one table
 - The table only has a couple of regions
 - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
 the datanode / regionserver that is hosting  the regions
 - As the job progresses (and more data is imported) the two datanodes
 hosting the regions start to get full and eventually drive space hits 100%
 utilization whilst the other nodes in the cluster are at 40% or less drive
 space utilization
 - The job in Hadoop then begins to hang with multiple out of space
 errors and eventually fails.

 I have tried running hadoop balancer during the job run and this helped
 but only really succeeded in prolonging the eventual job failure.

 How can I get Hadoop / HBase to distribute the data to HDFS more evenly
 when it is favoring the nodes that the regions are on?

 Am I missing something here?

 Thanks for any help.




 --
 Bing Jiang




Re: Hadoop / HBase hotspotting / overloading specific nodes

2014-10-09 Thread SF Hadoop
Haven't tried this. I'll give it a shot.

Thanks

On Thursday, October 9, 2014, Ted Yu yuzhih...@gmail.com wrote:

 Looks like the number of regions is lower than the number of nodes in the
 cluster.

 Can you split the table such that, after hbase balancer is run, there is
 region hosted by every node ?

 Cheers

 On Oct 8, 2014, at 11:01 PM, SF Hadoop sfhad...@gmail.com javascript:;
 wrote:

  I'm not sure if this is an HBase issue or an Hadoop issue so if this is
 off-topic please forgive.
 
  I am having a problem with Hadoop maxing out drive space on a select few
 nodes when I am running an HBase job.  The scenario is this:
 
  - The job is a data import using Map/Reduce / HBase
  - The data is being imported to one table
  - The table only has a couple of regions
  - As the job runs, HBase? / Hadoop? begins placing the data in HDFS on
 the datanode / regionserver that is hosting  the regions
  - As the job progresses (and more data is imported) the two datanodes
 hosting the regions start to get full and eventually drive space hits 100%
 utilization whilst the other nodes in the cluster are at 40% or less drive
 space utilization
  - The job in Hadoop then begins to hang with multiple out of space
 errors and eventually fails.
 
  I have tried running hadoop balancer during the job run and this helped
 but only really succeeded in prolonging the eventual job failure.
 
  How can I get Hadoop / HBase to distribute the data to HDFS more evenly
 when it is favoring the nodes that the regions are on?
 
  Am I missing something here?
 
  Thanks for any help.



Re: Hadoop configuration for cluster machines with different memory capacity / # of cores etc.

2014-10-09 Thread SF Hadoop
Yes.  You are correct.  Just keep in mind, for every spec X machine you
have to have version X of hadoop configs (that only reside on spec X
machines).  Version Y configs reside on only version Y machines, and so on.

But yes, it is possible.

On Thu, Oct 9, 2014 at 9:40 AM, Manoj Samel manojsamelt...@gmail.com
wrote:

 So, in that case, the resource manager will allocate containers of
 different capacity based on node capacity ?

 Thanks,

 On Wed, Oct 8, 2014 at 9:42 PM, Nitin Pawar nitinpawar...@gmail.com
 wrote:

 you can have different values on different nodes

 On Thu, Oct 9, 2014 at 4:15 AM, Manoj Samel manojsamelt...@gmail.com
 wrote:

 In a hadoop cluster where different machines have different memory
 capacity and / or different # of cores etc., it is required that
 memory/core related parameters be set to SAME for all nodes ? Or it is
 possible to set different values for different nodes ?

 E.g. can yarn.nodemanager.resource.memory-mb
 and yarn.nodemanager.resource.cpu-vcores have different values for
 different nodes ?

 Thanks,





 --
 Nitin Pawar





Re: Standby Namenode and Datanode coexistence

2014-10-09 Thread SF Hadoop
You can run any of the daemons on any machine you want, you just have to be
aware of the trade offs you are making with RAM allocation.

I am hoping this is a DEV cluster.  This is definitely not a configuration
you would want to use in production.  If you are asking in regards to a
production cluster, the NNs should live apart from the datanodes though it
is perfectly fine to run the journal node and zookeeper instances on the
NNs.  But again, you should NEVER have the NN and DN on the same machine
(unless you are in a DEV cluster and experimenting).


On Thu, Oct 9, 2014 at 4:19 AM, oc tsdb oc.t...@gmail.com wrote:

 Hi,

 We have cluster with 3 nodes (1 namenode + 2 datanodes).
 Cluster is running with hadoop 2.4.0 version.

 We would like to add High Availability(HA) to Namenode using the Quorum
 Journal Manager.

 As per the below link, we need two NN machines with same configuration.


 http://hadoop.apache.org/docs/r2.4.0/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html#Hardware
 resources

 Our query is:

 As we have existing cluster with 3 nodes (1 namenode + 2 datanodes), can
 we configure standby namenode on one of the datanodes? Will there be any
 issues if we run standby namenode and datanode together?
 Or we should add one more machine and configure it as standby namenode?

 Regarding Journal node, Can we run it on any machine (datanode and
 namenode)?

 Thanks in advance.

 Thanks
 oc.tsdb






Re: MapReduce jobs start only on the PC they are typed on

2014-10-09 Thread SF Hadoop
What is in /etc/hadoop/conf/slaves?

Something tells me it just says 'localhost'.  You need to specify your
slaves in that file.

On Thu, Oct 9, 2014 at 2:24 PM, Piotr Kubaj pku...@riseup.net wrote:

 Hi. I'm trying to run Hadoop on a 2-PC cluster (I need to do some
 benchmarks for my bachelor thesis) and it works, but jobs start only on
 the PC I typed the command (doesn't matter whether it has better specs
 or not or where data is physically since I count Pi). My mapred-site.xml
 is:

 configuration
 property
   namemapred.job.tracker/name
   value10.0.0.1:54311/value
   descriptionThe host and port that the MapReduce job tracker runs
   at.  If local, then jobs are run in-process as a single map
   and reduce task.
   /description
 /property
 property
  namemapred.framework.name/name
  valueyarn/value
 /property
 property
   namemapred.map.tasks/name
   value20/value
 /property
 property
   namemapred.reduce.tasks/name
   value20/value
 /property
 property
   namemapreduce.tasktracker.map.tasks.maximum/name
   value20/value
 /property
 property
   namemapreduce.tasktracker.reduce.tasks.maximum/name
   value20/value
 /property
 property
   namemapreduce.tasktracker.map.tasks.maximum/name
   value30/value
   finaltrue/final
 /property
 property
   namemapreduce.tasktracker.reduce.tasks.maximum/name
   value30/value
 /property
 property
   namemapreduce.job.maps/name
   value3500/value
 /property
 property
   namemapreduce.job.reduces/name
   value3500/value
 /property
 property
   namemapred.child.java.opts/name
   value-Xmx2048m/value
 /property
 property
   namemapreduce.reduce.shuffle.parallelcopies/name
   value10/value
 /property
 property
   namemapreduce.jobhistory.address/name
   valueDESKTOP1:10020/value
 /property
 property
   namemapreduce.jobhistory.webapp.address/name
   valueDESKTOP1:19888/value
 /property
 /configuration

 And yarn-site.xml:
 configuration

 property
  nameyarn.nodemanager.local-dirs/name
  value/var/cache/hadoop-hdfs/hdfs/value
  descriptionComma separated list of paths. Use the list of directories
 from $YARN_LOCAL_DIR.
 For example,
 /grid/hadoop/hdfs/yarn,/grid1/hadoop/hdfs/yarn./description
 /property

 property
  nameyarn.nodemanager.log-dirs/name
  value/var/log/hadoop/yarn/value
  descriptionUse the list of directories from $YARN_LOG_DIR.
 For example, /var/log/hadoop/yarn./description
 /property

 property
 nameyarn.resourcemanager.hostname/name
 value10.0.0.1/value
   /property

   property
 nameyarn.resourcemanager.address/name
 value${yarn.resourcemanager.hostname}:8032/value
   /property

   property
 nameyarn.resourcemanager.scheduler.address/name
 value${yarn.resourcemanager.hostname}:8030/value
   /property

   property
 nameyarn.resourcemanager.resource-tracker.address/name
 value${yarn.resourcemanager.hostname}:8031/value
   /property

   property
 nameyarn.resourcemanager.admin.address/name
 value${yarn.resourcemanager.hostname}:8033/value
   /property

   property
 descriptionThe address of the RM web application./description
 nameyarn.resourcemanager.webapp.address/name
 value${yarn.resourcemanager.hostname}:8088/value
   /property

   property
 nameyarn.scheduler.maximum-allocation-mb/name
 value131072/value
   /property

   property
 nameyarn.nodemanager.resource.memory-mb/name
 value131072/value
   /property

   property
 descriptionNumber of CPU cores that can be allocated
 for containers./description
 nameyarn.nodemanager.resource.cpu-vcores/name
 value8/value
   /property

   property
 nameyarn.resourcemanager.am.max-attempts/name
 value3/value
   /property
   property
 nameyarn.log-aggregation-enable/name
 valuetrue/value
   /property
   property
 nameyarn.log-aggregation.retain-seconds/name
 value604800/value
   /property

 /configuration






Block placement without rack aware

2014-10-02 Thread SF Hadoop
What is the block placement policy hadoop follows when rack aware is not
enabled?

Does it just round robin?

Thanks.


Re: Block placement without rack aware

2014-10-02 Thread SF Hadoop
Thanks for the info.  Exactly what I needed.

Cheers.

On Thu, Oct 2, 2014 at 4:21 PM, Pradeep Gollakota pradeep...@gmail.com
wrote:

 It appears to be randomly chosen. I just came across this blog post from
 Lars George about HBase file locality in HDFS
 http://www.larsgeorge.com/2010/05/hbase-file-locality-in-hdfs.html

 On Thu, Oct 2, 2014 at 4:12 PM, SF Hadoop sfhad...@gmail.com wrote:

 What is the block placement policy hadoop follows when rack aware is not
 enabled?

 Does it just round robin?

 Thanks.





Re: Data node with multiple disks

2014-05-17 Thread SF Hadoop
just set you replication factor to 1 and you will be fine.


On Tue, May 13, 2014 at 8:12 AM, Marcos Sousa falecom...@marcossousa.comwrote:

 Yes,

 I don't want to replicate, just use as one disk? Isn't possible to make
 this work?

 Best regards,

 Marcos


 On Tue, May 13, 2014 at 6:55 AM, Rahul Chaudhari 
 rahulchaudhari0...@gmail.com wrote:

 Marcos,
 While configuring hadoop, the dfs.datanode.data.dir property in
 hdfs-default.xml should have this list of disks specified on separate line.
 If you specific comma separated list, it will replicate on all those
 disks/partitions.

 _Rahul
 Sent from my iPad

  On 13-May-2014, at 12:22 am, Marcos Sousa falecom...@marcossousa.com
 wrote:
 
  Hi,
 
  I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to
 be my datanode:
 
  /vol1/hadoop/data
  /vol2/hadoop/data
  /vol3/hadoop/data
  /volN/hadoop/data
 
  How do user those distinct discs not to replicate?
 
  Best regards,
 
  --
  Marcos Sousa




 --
 Marcos Sousa
 www.marcossousa.com Enjoy it!



Re: Data node with multiple disks

2014-05-13 Thread SF Hadoop
Your question is unclear. Please restate and describe what you are
attempting to do.

Thanks.


On Monday, May 12, 2014, Marcos Sousa falecom...@marcossousa.com wrote:

 Hi,

 I have 20 servers with 10 HD with 400GB SATA. I'd like to use them to be
 my datanode:

 /vol1/hadoop/data
 /vol2/hadoop/data
 /vol3/hadoop/data
 /volN/hadoop/data

 How do user those distinct discs not to replicate?

 Best regards,

 --
 Marcos Sousa



Re: Not information in Job History UI

2014-03-04 Thread SF Hadoop
That explains a lot.  Thanks for the information.  I appreciate your help.


On Mon, Mar 3, 2014 at 7:47 PM, Jian He j...@hortonworks.com wrote:

  You said, there are no job logs generated on the server that is
 running the job..
 that was quoting your previous sentence and answer your question..

  If I were to run a job and I wanted to tail the job log as it was
 running, where would I find that log?
 1) set yarn.nodemanager.delete.debug-delay-sec to be a larger value, and
 look for logs in local dirs specified by yarn.nodemanager.log-dirs.
 Or
 2) enable log aggregation yarn.log-aggregation-enable. Log aggregation is
 to aggregate those NM local logs and upload them to HDFS once application
 is finished.Then you can use yarn logs command  or simply go the history UI
 to see the logs.
 You can find good explanation from
 http://hortonworks.com/blog/simplifying-user-logs-management-and-access-in-yarn/

 Thanks.


 On Mon, Mar 3, 2014 at 4:29 PM, SF Hadoop sfhad...@gmail.com wrote:

 Thanks for that info Jian.

 You said, there are no job logs generated on the server that is running
 the job..  So am I correct in assuming the logs will be in the dir
 specified by yarn.nodemanager.log-dirs on the datanodes?

 I am quite confused as to where the logs for each specific part of the
 ecosystem reside.

 If I were to run a job and I wanted to tail the job log as it was
 running, where would I find that log?

 Thanks for your help.


  On Mon, Mar 3, 2014 at 11:46 AM, Jian He j...@hortonworks.com wrote:

  Note that node manager will not keep the finished applications and
 only show running apps,  so the UI won't show the finished apps.
  Conversely, job history server UI will only show the finished apps but
 not the running apps.

 bq. there are no job logs generated on the server that is running the
 job.
 by default, the local logs will be deleted after job finished.  you can
 config yarn.nodemanager.delete.debug-delay-sec, to delay the deletion
 of the logs.

 Jian


 On Mon, Mar 3, 2014 at 10:45 AM, SF Hadoop sfhad...@gmail.com wrote:

 Hadoop 2.2.0
 CentOS 6.4
 Viewing UI in various browsers.

 I am having a problem where no information is visible in my Job History
 UI.  I run test jobs, they complete without error, but no information ever
 populates the nodemanager or jobhistory server UI.

 Also, there are no job logs generated on the server that is running the
 job.

 I have the following settings configured:
 yarn.nodemanager.local-dirs
 yarn.nodemanager.log-dirs
 yarn.log.server.url

 ...plus the basic yarn log dir.  I get output in regards to the daemons
 but very little in regards to the job.  All I get that refers to the
 jobhistory server is the following (so it appears to be functioning
 properly):

 2014-02-18 11:43:06,824 INFO org.apache.hadoop.http.HttpServer: Jetty
 bound to port 19888
 2014-02-18 11:43:06,824 INFO org.mortbay.log: jetty-6.1.26
 2014-02-18 11:43:06,847 INFO org.mortbay.log: Extract
 jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.1.0.2.0.5.0-67.jar!/webapps/jobhistory
 to /tmp/Jetty_server_19888_jobhistoryv7gnnv/webapp
 2014-02-18 11:43:07,085 INFO org.mortbay.log: Started
 SelectChannelConnector@server:19888
 2014-02-18 11:43:07,085 INFO org.apache.hadoop.yarn.webapp.WebApps: Web
 app /jobhistory started at 19888
 2014-02-18 11:43:07,477 INFO org.apache.hadoop.yarn.webapp.WebApps:
 Registered webapp guice modules

 I have a feeling this is a misconfiguration but I cannot figure out
 what setting is missing or wrong.

 Other than not being able to see any of the jobs in the UIs, everything
 appears to be working correctly so this is quite confusing.

 Any help is appreciated.



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



Not information in Job History UI

2014-03-03 Thread SF Hadoop
Hadoop 2.2.0
CentOS 6.4
Viewing UI in various browsers.

I am having a problem where no information is visible in my Job History UI.
 I run test jobs, they complete without error, but no information ever
populates the nodemanager or jobhistory server UI.

Also, there are no job logs generated on the server that is running the job.

I have the following settings configured:
yarn.nodemanager.local-dirs
yarn.nodemanager.log-dirs
yarn.log.server.url

...plus the basic yarn log dir.  I get output in regards to the daemons but
very little in regards to the job.  All I get that refers to the jobhistory
server is the following (so it appears to be functioning properly):

2014-02-18 11:43:06,824 INFO org.apache.hadoop.http.HttpServer: Jetty bound
to port 19888
2014-02-18 11:43:06,824 INFO org.mortbay.log: jetty-6.1.26
2014-02-18 11:43:06,847 INFO org.mortbay.log: Extract
jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.1.0.2.0.5.0-67.jar!/webapps/jobhistory
to /tmp/Jetty_server_19888_jobhistoryv7gnnv/webapp
2014-02-18 11:43:07,085 INFO org.mortbay.log: Started
SelectChannelConnector@server:19888
2014-02-18 11:43:07,085 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app
/jobhistory started at 19888
2014-02-18 11:43:07,477 INFO org.apache.hadoop.yarn.webapp.WebApps:
Registered webapp guice modules

I have a feeling this is a misconfiguration but I cannot figure out what
setting is missing or wrong.

Other than not being able to see any of the jobs in the UIs, everything
appears to be working correctly so this is quite confusing.

Any help is appreciated.


Re: Not information in Job History UI

2014-03-03 Thread SF Hadoop
Thanks for that info Jian.

You said, there are no job logs generated on the server that is running
the job..  So am I correct in assuming the logs will be in the dir
specified by yarn.nodemanager.log-dirs on the datanodes?

I am quite confused as to where the logs for each specific part of the
ecosystem reside.

If I were to run a job and I wanted to tail the job log as it was running,
where would I find that log?

Thanks for your help.


On Mon, Mar 3, 2014 at 11:46 AM, Jian He j...@hortonworks.com wrote:

 Note that node manager will not keep the finished applications and only
 show running apps,  so the UI won't show the finished apps.
  Conversely, job history server UI will only show the finished apps but
 not the running apps.

 bq. there are no job logs generated on the server that is running the job.
 by default, the local logs will be deleted after job finished.  you can
 config yarn.nodemanager.delete.debug-delay-sec, to delay the deletion of
 the logs.

 Jian


 On Mon, Mar 3, 2014 at 10:45 AM, SF Hadoop sfhad...@gmail.com wrote:

 Hadoop 2.2.0
 CentOS 6.4
 Viewing UI in various browsers.

 I am having a problem where no information is visible in my Job History
 UI.  I run test jobs, they complete without error, but no information ever
 populates the nodemanager or jobhistory server UI.

 Also, there are no job logs generated on the server that is running the
 job.

 I have the following settings configured:
 yarn.nodemanager.local-dirs
 yarn.nodemanager.log-dirs
 yarn.log.server.url

 ...plus the basic yarn log dir.  I get output in regards to the daemons
 but very little in regards to the job.  All I get that refers to the
 jobhistory server is the following (so it appears to be functioning
 properly):

 2014-02-18 11:43:06,824 INFO org.apache.hadoop.http.HttpServer: Jetty
 bound to port 19888
 2014-02-18 11:43:06,824 INFO org.mortbay.log: jetty-6.1.26
 2014-02-18 11:43:06,847 INFO org.mortbay.log: Extract
 jar:file:/usr/lib/hadoop-yarn/hadoop-yarn-common-2.1.0.2.0.5.0-67.jar!/webapps/jobhistory
 to /tmp/Jetty_server_19888_jobhistoryv7gnnv/webapp
 2014-02-18 11:43:07,085 INFO org.mortbay.log: Started
 SelectChannelConnector@server:19888
 2014-02-18 11:43:07,085 INFO org.apache.hadoop.yarn.webapp.WebApps: Web
 app /jobhistory started at 19888
 2014-02-18 11:43:07,477 INFO org.apache.hadoop.yarn.webapp.WebApps:
 Registered webapp guice modules

 I have a feeling this is a misconfiguration but I cannot figure out what
 setting is missing or wrong.

 Other than not being able to see any of the jobs in the UIs, everything
 appears to be working correctly so this is quite confusing.

 Any help is appreciated.



 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity
 to which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.


Java version with Hadoop 2.0

2013-10-09 Thread SF Hadoop
I am preparing to deploy multiple cluster / distros of Hadoop for testing /
benchmarking.

In my research I have noticed discrepancies in the version of the JDK that
various groups are using.  Example:  Hortonworks is suggesting JDK6u31, CDH
recommends either 6 or 7 providing you stick to some guidelines for each
and Apache Hadoop seems to be somewhat of a no mans land; a lot of people
using a lot of different versions.

Does anyone have any insight they could share about how to approach
choosing the best JDK release?  (I'm a total Java newb, so any info /
further reading you guys can provide is appreciated.)

Thanks.

sf


Re: Java version with Hadoop 2.0

2013-10-09 Thread SF Hadoop
I hadn't.  Thank you!!!  Very helpful.

Andy


On Wed, Oct 9, 2013 at 2:25 PM, Patai Sangbutsarakum 
patai.sangbutsara...@turn.com wrote:

  maybe you've already seen this.

  http://wiki.apache.org/hadoop/HadoopJavaVersions


  On Oct 9, 2013, at 2:16 PM, SF Hadoop sfhad...@gmail.com
  wrote:

  I am preparing to deploy multiple cluster / distros of Hadoop for
 testing / benchmarking.

  In my research I have noticed discrepancies in the version of the JDK
 that various groups are using.  Example:  Hortonworks is suggesting
 JDK6u31, CDH recommends either 6 or 7 providing you stick to some
 guidelines for each and Apache Hadoop seems to be somewhat of a no mans
 land; a lot of people using a lot of different versions.

  Does anyone have any insight they could share about how to approach
 choosing the best JDK release?  (I'm a total Java newb, so any info /
 further reading you guys can provide is appreciated.)

  Thanks.

  sf