Re: How to handle sensitive data

2013-02-15 Thread Marcos Ortiz Valmaseda
Regards, abhishek.
I´m agree with Michael. You can encrypt your incoming data from your 
application.
I recommend to use HBase too.

- Mensaje original -

De: Michael Segel michael_se...@hotmail.com
Para: common-user@hadoop.apache.org
CC: cdh-u...@cloudera.org
Enviados: Viernes, 15 de Febrero 2013 8:47:16
Asunto: Re: How to handle sensitive data

Simple, have your app encrypt the field prior to writing to HDFS.

Also consider HBase.

On Feb 14, 2013, at 10:35 AM, abhishek abhishek.dod...@gmail.com wrote:


 Hi all,

 we are having some sensitive data, in some particular fields(columns). Can I 
 know how to handle sensitive in Hadoop.

 How do different people handle sensitive data in Hadoop.

 Thanks
 Abhi


Michael Segel | (m) 312.755.9623

Segel and Associates





--

Marcos Ortiz Valmaseda,
Product Manager  Data Scientist at UCI
Blog : http://marcosluis2186.posterous.com
LinkedIn: http://www.linkedin.com/in/marcosluis2186
Twitter : @marcosluis2186


Re: Which hardware to choose

2012-10-02 Thread Marcos Ortiz

Which is a reasonable number in this hardware?

On 10/02/2012 09:40 PM, Michael Segel wrote:

I think he's saying that its 24 maps 8 reducers per node and at 48GB that could 
be too many mappers.
Especially if they want to run HBase.

On Oct 2, 2012, at 8:14 PM, hadoopman hadoop...@gmail.com wrote:


Only 24 map and 8 reduce tasks for 38 data nodes?  are you sure that's right?  
Sounds VERY low for a cluster that size.

We have only 10 c2100's and are running I believe 140 map and 70 reduce slots 
so far with pretty decent performance.



On 10/02/2012 12:55 PM, Alexander Pivovarov wrote:

38 data nodes + 2 Name Nodes

  
Data Node:
Dell PowerEdge C2100 series
2 x XEON x5670
48 GB RAM ECC  (12x4GB 1333MHz)
12 x 2 TB  7200 RPM SATA HDD (with hot swap)  JBOD
Intel Gigabit ET Dual port PCIe x4
Redundant Power Supply
Hadoop CDH3
max map tasks 24
max reduce tasks 8




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


--

Marcos Luis Ortíz Valmaseda
*Data Engineer  Sr. System Administrator at UCI*
about.me/marcosortiz http://about.me/marcosortiz
My Blog http://marcosluis2186.posterous.com
Tumblr's blog http://marcosortiz.tumblr.com/
@marcosluis2186 http://twitter.com/marcosluis2186



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Hadoop 1.0.3 setup

2012-07-09 Thread Marcos Ortiz


On 07/09/2012 09:58 AM, prabhu K wrote:

Yes, i have configuared multinode setup, 1 master 2 slaves,

i have formated the namenode and then i run the stat-dfs.sh script and
start-mapred.sh script.

I run the bin/hadoop fs -put input input command , getting following error
on my terminal.

hduser@md-trngpoc1:/usr/local/hadoop_dir/hadoop$ bin/hadoop fs -put input
input
Warning: $HADOOP_HOME is deprecated.
put: org.apache.hadoop.security.AccessControlException: Permission denied:
user=hduser, access=WRITE, inode=:root:supergroup:rwxr-xr-x
and executed the below command, getting /hadoop-install/hadoop directroy, i
coud't understand what's wrong iam doing?
Well, this erros says to you that you have the wrong permissions in the 
hadoop directory,
the user and group that you have is root:supergroup and the correct 
values for it is:

 hduser:supergroup


hduser@md-trngpoc1:/usr/local/hadoop_dir/hadoop$ echo $HADOOP_HOME
/hadoop-install/hadoop

*Namenode log:*
==

java.lang.InterruptedException: sleep interrupted
 at java.lang.Thread.sleep(Native Method)
 at
org.apache.hadoop.hdfs.server.namenode.DecommissionManager$Monitor.run(DecommissionManager.java:65)
 at java.lang.Thread.run(Thread.java:662)
2012-07-09 19:02:12,696 ERROR
org.apache.hadoop.hdfs.server.namenode.NameNode: java.net.BindException:
Problem binding to md-trngpoc1/10.5.114.110:54310 : Address alrea
dy in use

It seems that you are using that address:port values.
Use this commands:
netstat -puta | grep namenode
netstat -puta | grep datanode

to check which are the ports that the NN and DN are using.

 at org.apache.hadoop.ipc.Server.bind(Server.java:227)
 at org.apache.hadoop.ipc.Server$Listener.init(Server.java:301)
 at org.apache.hadoop.ipc.Server.init(Server.java:1483)
 at org.apache.hadoop.ipc.RPC$Server.init(RPC.java:545)
 at org.apache.hadoop.ipc.RPC.getServer(RPC.java:506)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:294)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.init(NameNode.java:496)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1279)
 at
org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1288)
Caused by: java.net.BindException: Address already in use
 at sun.nio.ch.Net.bind(Native Method)
 at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:126)
 at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:59)
 at org.apache.hadoop.ipc.Server.bind(Server.java:225)
 ... 8 more
*Datanode log*
=
2012-07-09 18:44:39,949 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = md-trngpoc3/10.5.114.168
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1335192; compiled by 'hortonfo' on Tue May  8 20:31:25 UTC 2012
/
2012-07-09 18:44:40,039 INFO org.apache.hadoop.metrics2.impl.MetricsConfig:
loaded properties from hadoop-metrics2.properties
2012-07-09 18:44:40,047 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source
MetricsSystem,sub=Stats registered.
2012-07-09 18:44:40,048 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled snapshot
period at 10 second(s).
2012-07-09 18:44:40,048 INFO
org.apache.hadoop.metrics2.impl.MetricsSystemImpl: DataNode metrics system
started
2012-07-09 18:44:40,125 INFO
org.apache.hadoop.metrics2.impl.MetricsSourceAdapter: MBean for source ugi
registered.
2012-07-09 18:44:40,163 WARN
org.apache.hadoop.hdfs.server.datanode.DataNode: Invalid directory in
dfs.data.dir: can not create directory: /app/hadoop_dir/hadoop/tmp/df
s/data
2012-07-09 18:44:40,163 ERROR
org.apache.hadoop.hdfs.server.datanode.DataNode: All directories in
dfs.data.dir are invalid.
2012-07-09 18:44:40,163 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: Exiting Datanode
2012-07-09 18:44:40,164 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:
/
SHUTDOWN_MSG: Shutting down DataNode at md-trngpoc3/10.5.114.168
/
2012-07-09 18:46:09,586 INFO
org.apache.hadoop.hdfs.server.datanode.DataNode: STARTUP_MSG:
/
STARTUP_MSG: Starting DataNode
STARTUP_MSG:   host = md-trngpoc3/10.5.114.168
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 1.0.3
STARTUP_MSG:   build =
https://svn.apache.org/repos/asf/hadoop/common/branches/branch-1.0 -r
1335192; compiled by 'hortonfo' on Tue May  8 20:31:25 UTC 2012

Re: Versions

2012-07-07 Thread Marcos Ortiz


On 07/07/2012 02:39 PM, Harsh J wrote:

The Apache Bigtop project was started for this very purpose (building
stable, well inter-operating version stacks). Take a read at
http://incubator.apache.org/bigtop/ and for 1.x Bigtop packages, see
https://cwiki.apache.org/confluence/display/BIGTOP/How+to+install+Hadoop+distribution+from+Bigtop

To specifically answer your question though, your list appears fine to
me. They 'should work', but I am not suggesting that I have tested
this stack completely myself.

On Sat, Jul 7, 2012 at 11:57 PM, prabhu K prabhu.had...@gmail.com wrote:

Hi users list,

I am planing to install following tools.

Hadoop 1.0.3
hive 0.9.0
flume 1.2.0
Hbase 0.92.1
sqoop 1.4.1
My only suggestion here is that you use the 0.94 version of HBase, it 
has a lot of improvements over 0.92.1

See the Cloudera's blog post for it:
http://www.cloudera.com/blog/2012/05/apache-hbase-0-94-is-now-released/

Best wishes



my questions are.

1. the above tools are compatible with all the versions.

2. any tool need to change the version

3. list out all the tools with compatible versions.

Please suggest on this?





--

Marcos Luis Ortíz Valmaseda
*Data Engineer  Sr. System Administrator at UCI*



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: set up Hadoop cluster on mixed OS

2012-07-06 Thread Marcos Ortiz
I have a mixed cluster too, with Linux (CentOS) and Solaris, the unique 
recommendation that I can give you

is to use exactly the same Hadoop version in all machines.

Best wishes
On 07/06/2012 05:31 AM, Senthil Kumar wrote:

You can setup hadoop cluster on mixed environment. We have a cluster with
Mac, Linux and Solaris.

Regards
Senthil

On Fri, Jul 6, 2012 at 1:50 PM, Yongwei Xing jdxyw2...@gmail.com wrote:


I have one MBP with 10.7.4 and one laptop with Ubuntu 12.04. Is it possible
to set up a hadoop cluster by such mixed environment?

Best Regards,

--
Welcome to my ET Blog http://www.jdxyw.com



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


--

Marcos Luis Ortíz Valmaseda
*Data Engineer  Sr. System Administrator at UCI*
about.me/marcosortiz http://about.me/marcosortiz
My Blog http://marcosluis2186.posterous.com
@marcosluis2186 http://twitter.com/marcosluis2186





10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode

2012-06-13 Thread Marcos Ortiz
According to the CDH 4 official documentation, you should install a 
JobHistory server for your MRv2 (YARN)

cluster.
https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster

How to configure the HistoryServer
https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step3



On 06/13/2012 03:16 PM, anil gupta wrote:

Hi All

I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5
nodes in my cluster(2 Admin Node and 3 DN).
My resourcemanager is up and running and showing that all three DN are
running the nodemanager. HDFS is also working fine and showing 3 DN's.

But when i fire the pi example job. It starts to run in Local mode.
Here is the console output:
sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-
examples.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/06/13 12:03:27 WARN conf.Configuration: session.id is deprecated.
Instead, use dfs.metrics.session-id
12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM Metrics with
processName=JobTracker, sessionId=
12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the native-hadoop
library
12/06/13 12:03:27 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input paths to
process : 10
12/06/13 12:03:29 INFO mapred.JobClient: Running job: job_local_0001
12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter set in
config null
12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapred.FileOutputCommitter
12/06/13 12:03:29 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with exit code
0
12/06/13 12:03:29 INFO mapred.Task:  Using ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3d46e381
12/06/13 12:03:29 WARN mapreduce.Counters: Counter name
MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as group
name and  BYTES_READ as counter name instead
12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1
12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100
12/06/13 12:03:30 INFO mapred.MapTask: data buffer = 79691776/99614720
12/06/13 12:03:30 INFO mapred.MapTask: record buffer = 262144/327680
12/06/13 12:03:30 INFO mapred.JobClient:  map 0% reduce 0%
12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000
samples.
12/06/13 12:03:36 INFO mapred.JobClient:  map 100% reduce 0%
12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000
samples.

Here is the content of yarn-site.xml:

configuration
   property
 nameyarn.nodemanager.aux-services/name
 valuemapreduce.shuffle/value
   /property

   property
 nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name
 valueorg.apache.hadoop.mapred.ShuffleHandler/value
   /property

   property
 nameyarn.log-aggregation-enable/name
 valuetrue/value
   /property

   property
 descriptionList of directories to store localized files in./
description
 nameyarn.nodemanager.local-dirs/name
 value/disk/yarn/local/value
   /property

   property
 descriptionWhere to store container logs./description
 nameyarn.nodemanager.log-dirs/name
 value/disk/yarn/logs/value
   /property

   property
 descriptionWhere to aggregate logs to./description
 nameyarn.nodemanager.remote-app-log-dir/name
 value/var/log/hadoop-yarn/apps/value
   /property

   property
 descriptionClasspath for typical applications./description
  nameyarn.application.classpath/name
  value
 $HADOOP_CONF_DIR,
 $HADOOP_COMMON_HOME/*,$HADOOP_COMMON_HOME/lib/*,
 $HADOOP_HDFS_HOME/*,$HADOOP_HDFS_HOME/lib/*,
 $HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,
 $YARN_HOME/*,$YARN_HOME/lib/*
  /value
   /property
property
 nameyarn.resourcemanager.resource-tracker.address/name
 valueihub-an-g1:8025/value
/property
property
 nameyarn.resourcemanager.address/name
 valueihub-an-g1:8040/value
/property
property
 nameyarn.resourcemanager.scheduler.address/name
 valueihub-an-g1:8030/value
/property
property
 nameyarn.resourcemanager.admin.address/name
 valueihub-an-g1:8141/value
/property
property
 nameyarn.resourcemanager.webapp.address/name
 valueihub-an-g1:8088/value
/property
property
 namemapreduce.jobhistory.intermediate-done-dir/name
 value/disk/mapred/jobhistory/intermediate/done/value
/property
property
 

Re: Yarn job runs in Local Mode even though the cluster is running in Distributed Mode

2012-06-13 Thread Marcos Ortiz
Can you share with us in pastebin all conf files that you are using for 
YARN?



On 06/13/2012 05:26 PM, anil gupta wrote:

Hi Marcus,

Sorry i forgot to mention that Job history server is installed and 
running and AFAIK resourcemanager is responsible for running MR jobs. 
Historyserver is only used to get info about MR jobs.


Thanks,
Anil

On Wed, Jun 13, 2012 at 2:04 PM, Marcos Ortiz mlor...@uci.cu 
mailto:mlor...@uci.cu wrote:


According to the CDH 4 official documentation, you should install
a JobHistory server for your MRv2 (YARN)
cluster.

https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster

How to configure the HistoryServer

https://ccp.cloudera.com/display/CDH4DOC/Deploying+MapReduce+v2+%28YARN%29+on+a+Cluster#DeployingMapReducev2%28YARN%29onaCluster-Step3





On 06/13/2012 03:16 PM, anil gupta wrote:

Hi All

I am using cdh4 for running a HBase cluster on CentOs6.0. I have 5
nodes in my cluster(2 Admin Node and 3 DN).
My resourcemanager is up and running and showing that all
three DN are
running the nodemanager. HDFS is also working fine and showing
3 DN's.

But when i fire the pi example job. It starts to run in Local
mode.
Here is the console output:
sudo -u hdfs yarn jar /usr/lib/hadoop-mapreduce/hadoop-mapreduce-
examples.jar pi 10 10
Number of Maps  = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
12/06/13 12:03:27 WARN conf.Configuration: session.id
http://session.id is deprecated.
Instead, use dfs.metrics.session-id
12/06/13 12:03:27 INFO jvm.JvmMetrics: Initializing JVM
Metrics with
processName=JobTracker, sessionId=
12/06/13 12:03:27 INFO util.NativeCodeLoader: Loaded the
native-hadoop
library
12/06/13 12:03:27 WARN mapred.JobClient: Use
GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the
same.
12/06/13 12:03:28 INFO mapred.FileInputFormat: Total input
paths to
process : 10
12/06/13 12:03:29 INFO mapred.JobClient: Running job:
job_local_0001
12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter
set in
config null
12/06/13 12:03:29 INFO mapred.LocalJobRunner: OutputCommitter is
org.apache.hadoop.mapred.FileOutputCommitter
12/06/13 12:03:29 WARN mapreduce.Counters: Group
org.apache.hadoop.mapred.Task$Counter is deprecated. Use
org.apache.hadoop.mapreduce.TaskCounter instead
12/06/13 12:03:29 INFO util.ProcessTree: setsid exited with
exit code
0
12/06/13 12:03:29 INFO mapred.Task:  Using
ResourceCalculatorPlugin :
org.apache.hadoop.util.LinuxResourceCalculatorPlugin@3d46e381
12/06/13 12:03:29 WARN mapreduce.Counters: Counter name
MAP_INPUT_BYTES is deprecated. Use FileInputFormatCounters as
group
name and  BYTES_READ as counter name instead
12/06/13 12:03:29 INFO mapred.MapTask: numReduceTasks: 1
12/06/13 12:03:29 INFO mapred.MapTask: io.sort.mb = 100
12/06/13 12:03:30 INFO mapred.MapTask: data buffer =
79691776/99614720
12/06/13 12:03:30 INFO mapred.MapTask: record buffer =
262144/327680
12/06/13 12:03:30 INFO mapred.JobClient:  map 0% reduce 0%
12/06/13 12:03:35 INFO mapred.LocalJobRunner: Generated 95735000
samples.
12/06/13 12:03:36 INFO mapred.JobClient:  map 100% reduce 0%
12/06/13 12:03:38 INFO mapred.LocalJobRunner: Generated 151872000
samples.

Here is the content of yarn-site.xml:

configuration
property
nameyarn.nodemanager.aux-services/name
valuemapreduce.shuffle/value
/property

property
nameyarn.nodemanager.aux-services.mapreduce.shuffle.class/name
valueorg.apache.hadoop.mapred.ShuffleHandler/value
/property

property
nameyarn.log-aggregation-enable/name
valuetrue/value
/property

property
descriptionList of directories to store localized files in./
description
nameyarn.nodemanager.local-dirs/name
value/disk/yarn/local/value
/property

property
descriptionWhere to store container logs./description
nameyarn.nodemanager.log-dirs/name
value/disk/yarn/logs/value
/property

property
descriptionWhere to aggregate logs to./description

Re: No space left on device

2012-05-28 Thread Marcos Ortiz

Do you have the JT and NN on the same node?
Look here on the Lars Francke´s post:
http://gbif.blogspot.com/2011/01/setting-up-hadoop-cluster-part-1-manual.html
This is a very schema how to install Hadoop, and look the configuration 
that he used for the name and data directories.
If this directories are in the same disk, and you don´t have enough 
space for it, you can find that exception.


My recomendation is to divide these directories in separate discs with a 
very similar schema to the Lars´s configuration

Another recomendation is to check the Hadoop´s logs. Read about this here:
http://www.cloudera.com/blog/2010/11/hadoop-log-location-and-retention/

regards

On 05/28/2012 02:20 AM, yingnan.ma wrote:

ok,I find it. the jobtracker server is full.


2012-05-28



yingnan.ma



发件人: yingnan.ma
发送时间: 2012-05-28  13:01:56
收件人: common-user
抄送:
主题: No space left on device

Hi,
I encounter a problem as following:
  Error - Job initialization failed:
org.apache.hadoop.fs.FSError: java.io.IOException: No space left on device
  at 
org.apache.hadoop.fs.RawLocalFileSystem$LocalFSFileOutputStream.write(RawLocalFileSystem.java:201)
 at 
java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:65)
 at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:123)
 at java.io.FilterOutputStream.close(FilterOutputStream.java:140)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
 at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
 at 
org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSOutputSummer.close(ChecksumFileSystem.java:348)
 at 
org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:61)
 at 
org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:86)
 at 
org.apache.hadoop.mapred.JobHistory$JobInfo.logSubmitted(JobHistory.java:1344)
 ..
So, I think that the HDFS is full or something, but I cannot find a way to 
address the problem, if you had some suggestion, Please show me , thank you.
Best Regards


--
Marcos Luis Ortíz Valmaseda
 Data Engineer  Sr. System Administrator at UCI
 http://marcosluis2186.posterous.com
 http://www.linkedin.com/in/marcosluis2186
 Twitter: @marcosluis2186


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: EOFException at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)......

2012-05-25 Thread Marcos Ortiz


Regards, waqas. I think that you have to ask to MapR experts.


On 05/25/2012 05:42 AM, waqas latif wrote:

Hi Experts,

I am fairly new to hadoop MapR and I was trying to run a matrix
multiplication example presented by Mr. Norstadt under following link
http://www.norstad.org/matrix-multiply/index.html. I can run it
successfully with hadoop 0.20.2 but I tried to run it with hadoop 1.0.3 but
I am getting following error. Is it the problem with my hadoop
configuration or it is compatibility problem in the code which was written
in hadoop 0.20 by author.Also please guide me that how can I fix this error
in either case. Here is the error I am getting.

The same code that you write for 0.20.2 should work in 1.0.3 too.



in thread main java.io.EOFException
 at java.io.DataInputStream.readFully(DataInputStream.java:180)
 at java.io.DataInputStream.readFully(DataInputStream.java:152)
 at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
 at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486)
 at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
 at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
 at TestMatrixMultiply.fillMatrix(TestMatrixMultiply.java:60)
 at TestMatrixMultiply.readMatrix(TestMatrixMultiply.java:87)
 at TestMatrixMultiply.checkAnswer(TestMatrixMultiply.java:112)
 at TestMatrixMultiply.runOneTest(TestMatrixMultiply.java:150)
 at TestMatrixMultiply.testRandom(TestMatrixMultiply.java:278)
 at TestMatrixMultiply.main(TestMatrixMultiply.java:308)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
 at java.lang.reflect.Method.invoke(Method.java:597)
 at org.apache.hadoop.util.RunJar.main(RunJar.java:156)

Thanks in advance

Regards,
waqas

Can you put here the completed log for this?
Best wishes

--
Marcos Luis Ortíz Valmaseda
 Data Engineer  Sr. System Administrator at UCI
 http://marcosluis2186.posterous.com
 http://www.linkedin.com/in/marcosluis2186
 Twitter: @marcosluis2186


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: While Running in cloudera version of hadoop getting error

2012-05-24 Thread Marcos Ortiz

Why don´t use the same Hadoop version in both clusters?
It will brings to you minor troubles.


On 05/24/2012 02:26 PM, samir das mohapatra wrote:

Hi
   I created application jar and i was trying to run in 2 node cluster using
cludera  .20 version  , it was running fine,
But when i am running that same jar in Deployment server (Cloudera version
.20.x ) having 40 node cluster I am getting error

cloude any one please help me with this.

12/05/24 09:39:09 WARN mapred.JobClient: Use GenericOptionsParser for
parsing the arguments. Applications should implement Tool for the same.

Like this says here, you should implement Tool for your MapReduce Job


12/05/24 09:39:10 INFO mapred.FileInputFormat: Total input paths to process
: 1

12/05/24 09:39:10 INFO mapred.JobClient: Running job: job_201203231049_12426

12/05/24 09:39:11 INFO mapred.JobClient:  map 0% reduce 0%

12/05/24 09:39:20 INFO mapred.JobClient: Task Id :
attempt_201203231049_12426_m_00_0, Status : FAILED

java.lang.RuntimeException: Error in configuring object

 at
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)

 at
org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)

 at
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)

 at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:387)

 at org.apache.hadoop.mapred.MapTask.run(MapTask.java:325)

 at org.apache.hadoop.mapred.Child$4.run(Child.java:270)

 at java.security.AccessController.doPrivileged(Native Method)

 at javax.security.auth.Subject.doAs(Subject.java:396)

 at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1157)

 at org.apache.hadoop.mapred.Child.main(Child.java:264)

Caused by: java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

 at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.jav

attempt_201203231049_12426_m_00_0: getDefaultExtension()

12/05/24 09:39:20 INFO mapred.JobClient: Task Id :
attempt_201203231049_12426_m_01_0, Status : FAILED



Thanks

samir





--
Marcos Luis Ortíz Valmaseda
 Data Engineer  Sr. System Administrator at UCI
 http://marcosluis2186.posterous.com
 http://www.linkedin.com/in/marcosluis2186
 Twitter: @marcosluis2186


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Is it okay to upgrade from CDH3U2 to hadoop 1.0.2 and hbase 0.92.1?

2012-05-21 Thread Marcos Ortiz
I think that you should follow the CDH4 Beta 2 docs, specifically the 
know issues for this version:

https://ccp.cloudera.com/display/CDH4B2/Known+Issues+and+Work+Arounds+in+CDH4

Then, you should see the HBase installation and upgrading on this version:
https://ccp.cloudera.com/display/CDH4B2/HBase+Installation#HBaseInstallation-InstallingHBase

Another thing that you keep in mind is that with HBase 0.92.1, you 
should restart your cluster because
the wire protocol changed from 0.90 to 0.92, so, the rolling restarts do 
not work here.


Best wishes

On 05/21/2012 10:44 PM, edward choi wrote:

Hi,
I have used CDH3U2 for almost a year now. Since it is a quite old
distribution, there are certain glitches that keep bothering me.
So I was considering upgrading to Hadoop 1.0.3 and Hbase 0.92.1.

My concern is that, if it is okay to just install the new packages and set
the configurations the same as before?
Or do I need to download all the files on HDFS to local hard drive and
upload them again once the new packages are installed? (that would be a
horrible job to do though)
Any advice will be helpful.
Thanks.

Ed


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


--
Marcos Luis Ortíz Valmaseda
 Data Engineer  Sr. System Administrator at UCI
 http://marcosluis2186.posterous.com
 http://www.linkedin.com/in/marcosluis2186
 Twitter: @marcosluis2186


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: hadoop on fedora 15

2012-04-26 Thread Marcos Ortiz



On 04/26/2012 01:49 AM, john cohen wrote:

I had the same issue.  My problem was the use of VPN
connected to work, and at the same time working
with M/R jobs on my Mac.  It occurred to me that
maybe Hadoop was binding to the wrong IP (the IP
given to you after connecting through VPN),
bottom line, I disconnect from the VPN, and the M/R job
finished as expected after that.

This is logic because, after that you configure to connect to the VPN, 
your machines have
other IPs, based on the request of the private network. You can test, 
changing the IPs for the

new ones based on the VPN request.

--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Yahoo Hadoop Tutorial with new APIs?

2012-04-04 Thread Marcos Ortiz

Regards to all the list.
There are many people that use the Hadoop Tutorial released by Yahoo at 
http://developer.yahoo.com/hadoop/tutorial/ 
http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
The main issue here is that, this tutorial is written with the old APIs? 
(Hadoop 0.18 I think).
Is there a project for update this tutorial to the new APIs? to Hadoop 
1.0.2 or YARN (Hadoop 0.23)


Best wishes

--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Yahoo Hadoop Tutorial with new APIs?

2012-04-04 Thread Marcos Ortiz



On 04/04/2012 09:15 AM, Jagat Singh wrote:

Hello Marcos

Yes , Yahoo tutorials are pretty old but still they explain the 
concepts of Map Reduce , HDFS beautifully. The way in which tutorials 
have been defined into sub sections , each builing on previous one is 
awesome. I remember when i started i was digged in there for many 
days. The tutorials are lagging now from new API point of view.
Yes, for that reason, for its beauty, this tutorial is read by many new 
Hadoop comers, so, I think that it need an update.


Lets have some documentation session one day , I would love to 
Volunteer to update those tutorials if people at Yahoo take input from 
outside world :)
I want to help on this too, so, we need to talk with Hadoop colleagues 
to do this.

Regards and best wishes


Regards,

Jagat


- Original Message -

From: Marcos Ortiz

Sent: 04/04/12 08:32 AM

To: common-user@hadoop.apache.org, 'hdfs-u...@hadoop.apache.org', 
mapreduce-u...@hadoop.apache.org


Subject: Yahoo Hadoop Tutorial with new APIs?


Regards to all the list.
There are many people that use the Hadoop Tutorial released by Yahoo 
at http://developer.yahoo.com/hadoop/tutorial/ 
http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
The main issue here is that, this tutorial is written with the old 
APIs? (Hadoop 0.18 I think).
Is there a project for update this tutorial to the new APIs? to 
Hadoop 1.0.2 or YARN (Hadoop 0.23)


Best wishes

--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
  Data Engineer at UCI
  http://marcosluis2186.posterous.com  


http://www.uci.cu/








--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: opensuse 12.1

2012-04-04 Thread Marcos Ortiz
Like OpenSUSE is a RPM-based distribution, you can try with the Apache 
BigTop project [1], and look for the RPM packages and give them a try.
You have noticed that the RPM specification between OpenSUSE and Red 
Hat-based distributions ()  change a little, but it can be a starting point.

See the documentation for the project [2].

[1] http://incubator.apache.org/projects/bigtop.html
[2] 
https://cwiki.apache.org/confluence/display/BIGTOP/Index%3bjsessionid=AA31645DFDAE1F3282D0159DB9B6AE9A


Regards

On 04/04/2012 12:24 PM, Raj Vishwanathan wrote:

Lots of people seem to start with this.

http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-multi-node-cluster/ 



Raj





From: Barry, Sean Fsean.f.ba...@intel.com
To: common-user@hadoop.apache.orgcommon-user@hadoop.apache.org
Sent: Wednesday, April 4, 2012 9:12 AM
Subject: FW: opensuse 12.1



-Original Message-
From: Barry, Sean F [mailto:sean.f.ba...@intel.com]
Sent: Wednesday, April 04, 2012 9:10 AM
To: common-user@hadoop.apache.org
Subject: opensuse 12.1

 What is the best way to install hadoop on opensuse 12.1 for a 
small two node cluster.

-SB




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci




--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Yahoo Hadoop Tutorial with new APIs?

2012-04-04 Thread Marcos Ortiz
Ok, Robert, I will be waiting for you then. There are many folks that 
use this tutorial, so I think this a good effort in favor of the Hadoop 
community.It would be nice
if Yahoo! donate this work, because, I have some ideas behind this, for 
example: to release a Spanish version of the tutorial.

Regards and best wishes

On 04/04/2012 05:29 PM, Robert Evans wrote:
I am dropping the cross posts and leaving this on common-user with the 
others BCCed.


Marcos,

That is a great idea to be able to update the tutorial, especially if 
the community is interested in helping to do so.  We are looking into 
the best way to do this.  The idea right now is to donate this to the 
Hadoop project so that the community can keep it up to date, but we 
need some time to jump through all of the corporate hoops to get this 
to happen.  We have a lot going on right now, so if you don't see any 
progress on this please feel free to ping me and bug me about it.


--
Bobby Evans


On 4/4/12 8:15 AM, Jagat Singh jagatsi...@gmail.com wrote:

Hello Marcos

 Yes , Yahoo tutorials are pretty old but still they explain the
concepts of Map Reduce , HDFS beautifully. The way in which
tutorials have been defined into sub sections , each builing on
previous one is awesome. I remember when i started i was digged in
there for many days. The tutorials are lagging now from new API
point of view.

 Lets have some documentation session one day , I would love to
Volunteer to update those tutorials if people at Yahoo take input
from outside world :)

 Regards,

 Jagat

- Original Message -
From: Marcos Ortiz
Sent: 04/04/12 08:32 AM
To: common-user@hadoop.apache.org, 'hdfs-u...@hadoop.apache.org
%27hdfs-u...@hadoop.apache.org', mapreduce-u...@hadoop.apache.org
Subject: Yahoo Hadoop Tutorial with new APIs?

Regards to all the list.
 There are many people that use the Hadoop Tutorial released by
Yahoo at http://developer.yahoo.com/hadoop/tutorial/
http://developer.yahoo.com/hadoop/tutorial/module4.html#chaining
The main issue here is that, this tutorial is written with the old
APIs? (Hadoop 0.18 I think).
 Is there a project for update this tutorial to the new APIs? to
Hadoop 1.0.2 or YARN (Hadoop 0.23)

 Best wishes
 -- Marcos Luis Ortíz Valmaseda (@marcosluis2186) Data Engineer at
UCI http://marcosluis2186.posterous.com
http://www.uci.cu/



http://www.uci.cu/


--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Job tracker service start issue.

2012-03-26 Thread Marcos Ortiz



On 03/23/2012 06:57 AM, kasi subrahmanyam wrote:

Hi Oliver,

I am not sure my suggestion might solve your problem or it might be already
solved on your side.
It seems the task tracker is having a problem accessing the tmp directory.
Try going to the core and mapred site xml and change the tmp directory to a
new one.
If this is not yet working then manually change the permissions of theat
directory  using :
chmod -R 777 tmp
Please, don´t do chmod -R 777 in tmp directory. It´s not recommendable 
for production servers.

The first option is more wise:
1- change the tmp directory in the core and mapreduce files
2- chown this new directory to group hadoop, where are the mapred and 
hdfs users


On Fri, Mar 23, 2012 at 3:33 PM, Olivier Sallouolivier.sal...@irisa.frwrote:



Le 3/23/12 8:50 AM, Manish Bhoge a écrit :

I have Hadoop running on Standalone box. When I am starting deamon for
namenode, secondarynamenode, job tracker, task tracker and data node, it

is starting gracefully. But soon after it start job tracker it doesn't

show up job tracker service. when i run 'jps' it is showing me all the
services including task tracker except Job Tracker.

Is there any time limit that need to set up or is it going into the safe
mode. Because when i saw job tracker log this what it is showing, looks
like it is starting the namenode but soon after it shutdown:

2012-03-22 23:26:04,061 INFO org.apache.hadoop.mapred.JobTracker:

STARTUP_MSG:

/
STARTUP_MSG: Starting JobTracker
STARTUP_MSG:   host = manish/10.131.18.119
STARTUP_MSG:   args = []
STARTUP_MSG:   version = 0.20.2-cdh3u3
STARTUP_MSG:   build =

file:///data/1/tmp/nightly_2012-02-16_09-46-24_3/hadoop-0.20-0.20.2+923.195-1~maverick
-r 217a3767c48ad11d4632e19a22897677268c40c4; compiled by 'root' on Thu Feb
16 10:22:53 PST 2012

/
2012-03-22 23:26:04,140 INFO

org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens

2012-03-22 23:26:04,141 INFO

org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Starting expired delegation token remover thread,
tokenRemoverScanInterval=60 min(s)

2012-03-22 23:26:04,141 INFO

org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager:
Updating the current master key for generating delegation tokens

2012-03-22 23:26:04,142 INFO org.apache.hadoop.mapred.JobTracker:

Scheduler configured with (memSizeForMapSlotOnJT, memSizeForReduceSlotOnJT,
limitMaxMemForMapTasks, limitMaxMemForReduceTasks) (-1, -1, -1, -1)

2012-03-22 23:26:04,143 INFO org.apache.hadoop.util.HostsFileReader:

Refreshing hosts (include/exclude) list

2012-03-22 23:26:04,186 INFO org.apache.hadoop.mapred.JobTracker:

Starting jobtracker with owner as mapred

2012-03-22 23:26:04,201 INFO org.apache.hadoop.ipc.Server: Starting

Socket Reader #1 for port 54311

2012-03-22 23:26:04,203 INFO org.apache.hadoop.ipc.metrics.RpcMetrics:

Initializing RPC Metrics with hostName=JobTracker, port=54311

2012-03-22 23:26:04,206 INFO

org.apache.hadoop.ipc.metrics.RpcDetailedMetrics: Initializing RPC Metrics
with hostName=JobTracker, port=54311

2012-03-22 23:26:09,250 INFO org.mortbay.log: Logging to

org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
org.mortbay.log.Slf4jLog

2012-03-22 23:26:09,298 INFO org.apache.hadoop.http.HttpServer: Added

global filtersafety
(class=org.apache.hadoop.http.HttpServer$QuotingInputFilter)

2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Port

returned by webServer.getConnectors()[0].getLocalPort() before open() is
-1. Opening the listener on 50030

2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer:

listener.getLocalPort() returned 50030
webServer.getConnectors()[0].getLocalPort() returned 50030

2012-03-22 23:26:09,318 INFO org.apache.hadoop.http.HttpServer: Jetty

bound to port 50030

2012-03-22 23:26:09,319 INFO org.mortbay.log: jetty-6.1.26.cloudera.1
2012-03-22 23:26:09,517 INFO org.mortbay.log: Started

SelectChannelConnector@0.0.0.0:50030

2012-03-22 23:26:09,519 INFO org.apache.hadoop.metrics.jvm.JvmMetrics:

Initializing JVM Metrics with processName=JobTracker, sessionId=

2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker:

JobTracker up at: 54311

2012-03-22 23:26:09,519 INFO org.apache.hadoop.mapred.JobTracker:

JobTracker webserver: 50030

2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: Failed

to operate on mapred.system.dir
(hdfs://localhost:54310/app/hadoop/tmp/mapred/system) because of
permissions.

2012-03-22 23:26:09,648 WARN org.apache.hadoop.mapred.JobTracker: This

directory should be owned by the user 'mapred (auth:SIMPLE)'

2012-03-22 23:26:09,650 WARN org.apache.hadoop.mapred.JobTracker:

Bailing out ...

org.apache.hadoop.security.AccessControlException: The systemdir


Apache Hadoop works with IPv6?

2012-03-21 Thread Marcos Ortiz

Regards.
I'm very interested to know if Apache Hadoop works with IPv6 hosts. One 
of my clients
has some hosts with this feature and they want to know if Hadoop 
supports this.

Anyone has tested this?

Best wishes

--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: Reduce copy speed too slow

2012-03-20 Thread Marcos Ortiz

Hi, Gayatri


On 03/20/2012 11:59 AM, Gayatri Rao wrote:

Hi all,

I am running a map reduce job in EC2 instances and it seems to be very
slow. It takes hours together for simple projection and aggregation of
data.

What filesystem are you using for data storage: HDFS in EC2 or Amazon S3?
Which is the data size that you are analyzing?


Upon observation, I gathered that the reduce copy speed is 0.01 MB/sec. I
am new to hadoop. Could any one please share  insights about the reduce
copy speeds
are good to work with. If any one has an experience any tips in improving
it.
Hadoop Map/Reduce jobs shuffle lots of data, so the recommended 
configuration is to use 10Gbps networks for

the underline connection (and dedicated switches on dual-gigabit networks)

Remember too that Hadoop is not a real-time system, if you need 
real-time random access to your data, use HBase

http://hbase.apache.org

Regards


Thanks
Gayatri


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


--
Marcos Luis Ortíz Valmaseda (@marcosluis2186)
 Data Engineer at UCI
 http://marcosluis2186.posterous.com


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


Re: Best practice to setup Sqoop,Pig and Hive for a hadoop cluster ?

2012-03-15 Thread Marcos Ortiz



On 03/15/2012 09:22 AM, Manu S wrote:

Thanks a lot Bijoy, that makes sense :)

Suppose if I have Mysql database in some other node(not in hadoop 
cluster), can I import the tables using sqoop to my HDFS?

Yes, this is the main purpose of Sqoop
On the Cloudera site, you have the completed documentation for it

Sqoop User Guide
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html

Sqoop installation
https://ccp.cloudera.com/display/CDHDOC/Sqoop+Installation

Sqoop for MySQL
http://archive.cloudera.com/cdh/3/sqoop/SqoopUserGuide.html#_mysql

Sqoop site on GitHub
http://github.com/cloudera/sqoop

Cloudera blog related post to Sqoop
http://www.cloudera.com/blog/category/sqoop/


Best wishes




On Thu, Mar 15, 2012 at 6:27 PM, Bejoy Ks bejoy.had...@gmail.com 
mailto:bejoy.had...@gmail.com wrote:


Hi Manu
 Please find my responses inline

I had read about we can install Pig, hive  Sqoop on the client
node, no
need to install it in cluster. What is the client node actually?
Can I use
my management-node as a client?

On larger clusters we have different node that is out of hadoop
cluster and
these stay in there. So user programs would be triggered from this
node.
This is the node refereed to as client node/ edge node etc . For your
cluster management node and client node can be the same

What is the best practice to install Pig, Hive,  Sqoop?

On a client node

For the fully distributed cluster do we need to install Pig,
Hive,  Sqoop
in each nodes?

No, can be on a client node or on any of the nodes

Mysql is needed for Hive as a metastore and sqoop can import
mysql database
to HDFS or hive or pig, so can we make use of mysql DB's residing on
another node?
Regarding your first point, SQOOP import is for different purpose,
to get
data from RDBNS into hdfs. But the meta stores is used by hive  in
framing
the map reduce jobs corresponding to your hive query. Here SQOOP
can't help
you much
Recommend to have the metastore db of hive on the same node where
hive is
installed as for execution hive queries there is meta data look up
required
much especially when your table has large number of partitions and
all.

Regards
Bejoy.K.S

On Thu, Mar 15, 2012 at 5:34 PM, Manu S manupk...@gmail.com
mailto:manupk...@gmail.com wrote:

 Greetings All !!!

 I am using Cloudera CDH3 for Hadoop deployment. We have 7 nodes,
in which 5
 are used for a fully distributed cluster, 1 for
pseudo-distributed  1 as
 management-node.

 Fully distributed cluster: HDFS, Mapreduce  Hbase cluster
 Pseudo distributed mode: All

 I had read about we can install Pig, hive  Sqoop on the client
node, no
 need to install it in cluster. What is the client node actually?
Can I use
 my management-node as a client?

 What is the best practice to install Pig, Hive,  Sqoop?
 For the fully distributed cluster do we need to install Pig,
Hive,  Sqoop
 in each nodes?

 Mysql is needed for Hive as a metastore and sqoop can import
mysql database
 to HDFS or hive or pig, so can we make use of mysql DB's residing on
 another node?

 --
 Thanks  Regards
 
 Manu S
 SI Engineer - OpenSource  HPC
 Wipro Infotech
 Mob: +91 8861302855Skype: manuspkd
 www.opensourcetalk.co.in http://www.opensourcetalk.co.in





--
Thanks  Regards

Manu S
SI Engineer - OpenSource  HPC
Wipro Infotech
Mob: +91 8861302855Skype: manuspkd
www.opensourcetalk.co.in http://www.opensourcetalk.co.in





--
Marcos Luis Ortíz Valmaseda
 Sr. Software Engineer (UCI)
 http://marcosluis2186.posterous.com
 http://postgresql.uci.cu/blog/38



10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: hadoop branch-0.20-append Build error:build.xml:933: exec returned: 1

2011-04-12 Thread Marcos Ortiz

El 4/11/2011 10:45 PM, Alex Luya escribió:

BUILD FAILED
.../branch-0 .20-append/build.xml:927: The following error
occurred while executing this line:
../branch-0 .20-append/build.xml:933: exec returned: 1

Total time: 1 minute 17 seconds
+ RESULT=1
+ '[' 1 '!=' 0 ']'
+ echo 'Build Failed: 64-bit build not run'
Build Failed: 64-bit build not run
+ exit 1
-
I checked content in file build.xml:

line 927:antcall target=cn-docs//targettarget name=cn-docs
depends=forrest.check, init description=Generate forrest-based
Chinese documentation. To use, specify -Dforrest.home=lt;base of Apache
Forrest installationgt; on the command line. if=forrest.home
line 933:exec dir=${src.docs.cn}
executable=${forrest.home}/bin/forrest failonerror=true
---
It seems try to execute forrest,what is the problem here?I am running a
64bit ubuntu,with 64+32bit-jdk-1.6 and 64-bit-jdk-1.5  installed.Some
guys told there are some tricks in this
page:http://wiki.apache.org/hadoop/HowToRelease  to get forrest build to
work.But I can't find any tricks in the page.
Any help is appreciated.


   

1- Which version of Java do you have on the JAVA_HOME variable?
You can browse on the Forrest page to get how you can build it: 
http://forrest.apache.org


2- another question for you:
Do you actually need Forrest?

Regards

--
Marcos Luís Ortíz Valmaseda
 Software Engineer (Large-Scaled Distributed Systems)
 University of Information Sciences,
 La Habana, Cuba
 Linux User # 418229