Re: Restoring Data to HDFS with distcp from standard input /dev/stdin

2017-08-16 Thread Ravi Prakash
Hi Heitor!

Welcome to the Hadoop community.

Think of the "hadoop distcp" command as a script which launches other JAVA
programs on the Hadoop worker nodes. The script collects the list of
sources, divides it among the several worker nodes and waits for the worker
nodes to actually do the copying from source to target. The sources could
be hdfs://hadoop2:54310/source-folder or perhaps s3a://some-bucket/somepath
or adl://somepath etc.

If you are trying to upload a file on the local file system to hdfs, please
take a look at the "hdfs dfs -put" command.

HTH
Ravi



On Wed, Aug 16, 2017 at 9:22 AM, Heitor Faria  wrote:

> Hello, List,
>
> I'm new here and I hope you are all very fine.
> I'm trying different combinations of distcp in order to restore data that
> I receive from standard input. Example:
>
> ===
> echo data | /etc/hadoop/bin/hadoop distcp file:///dev/stdin
> hdfs://hadoop2:54310/a
> ===
>
> I tried different options of distcp but the MapReduce always stalls. E.g.:
>
> 
> 7-08-15 06:59:26,665 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsConfig:
> loaded properties from hadoop-metrics2.properties
> 2017-08-15 06:59:26,802 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
> Scheduled snapshot period at 10 second(s).
> 2017-08-15 06:59:26,802 INFO [main] 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl:
> MapTask metrics system started
> 2017-08-15 06:59:26,813 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Executing with tokens:
> 2017-08-15 06:59:26,814 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Kind: mapreduce.job, Service: job_1502794712113_0001, Ident:
> (org.apache.hadoop.mapreduce.security.token.JobTokenIdentifier@467f0da4)
> 2017-08-15 06:59:26,996 INFO [main] org.apache.hadoop.mapred.YarnChild:
> Sleeping for 0ms before retrying again. Got null now.
> 2017-08-15 06:59:27,518 INFO [main] org.apache.hadoop.mapred.YarnChild:
> mapreduce.cluster.local.dir for child: /root/hdfs/hadoop-tmp-dir/nm-l
> ocal-dir/usercache/root/appcache/application_1502794712113_0001
> 2017-08-15 06:59:28,926 INFO [main] org.apache.hadoop.conf.Configu
> ration.deprecation: session.id is deprecated. Instead, use
> dfs.metrics.session-id
> 2017-08-15 06:59:29,783 INFO [main] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
> File Output Committer Algorithm version is 1
> 2017-08-15 06:59:29,804 INFO [main] org.apache.hadoop.mapred.Task:  Using
> ResourceCalculatorProcessTree : [ ]
> 2017-08-15 06:59:30,139 INFO [main] org.apache.hadoop.mapred.MapTask:
> Processing split: /tmp/hadoop-yarn/staging/root/
> .staging/_distcp-298457134/fileList.seq:0+176
> 2017-08-15 06:59:30,145 INFO [main] 
> org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter:
> File Output Committer Algorithm version is 1
> 2017-08-15 06:59:30,250 INFO [main] org.apache.hadoop.tools.mapred.CopyMapper:
> Copying file:/dev/stdin to hdfs://hadoop2:54310/aaa
> 2017-08-15 06:59:30,259 INFO [main] 
> org.apache.hadoop.tools.mapred.RetriableFileCopyCommand:
> Creating temp file: hdfs://hadoop2:54310/.distcp.t
> mp.attempt_1502794712113_0001_m_00_0
> 
>
> Regards,
> --
> 
> ===
> Heitor Medrado de Faria | CEO Bacula do Brasil | Visto EB-1 | LPIC-III |
> EMC 05-001 | ITIL-F
> • Não seja tarifado pelo tamanho dos seus backups, conheça o Bacula
> Enterprise: http://www.bacula.com.br/enterprise/
> • Ministro treinamento e implementação in-company do Bacula Community:
> http://www.bacula.com.br/in-company/
> +55 61 98268-4220 <+55%2061%2098268-4220> | www.bacula.com.br
> 
> 
> Indicamos também as capacitações complementares:
> • Shell básico e Programação em Shell  com
> Julio Neves.
> • Zabbix  com Adail Host.
> 
> 
>


Error connecting to ZooKeeper server

2017-08-16 Thread Michael Chen

Hi,

I've run into a ZooKeeper connection error during the execution of a 
Nutch hadoop job. The tasks stall on connection error to ZooKeeper 
server. Here's what I know:


1. ZK connection error is the only known problem, other logs report no issue

2. Error message on YARN NodeManager on one of the slaves is:

2017-08-16 19:03:42,280 INFO [main-SendThread(localhost:2181)] 
org.apache.zookeeper.ClientCnxn: Opening socket connection to server 
localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown 
error)
2017-08-16 19:03:42,281 WARN [main-SendThread(localhost:2181)] 
org.apache.zookeeper.ClientCnxn: Session 0x0 for server null, unexpected error, 
closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused

The connection keeps failing until it hits the 10min limit and the task 
fails.


3. ZooKeeper Server is deployed only on master

4. Cluster managed by CloudEra Manager 5.12.

Could a configuration on Nutch side or CloudEra Manager side be missing? 
There are no ZK servers on the slaves and the NodeManager should be 
connecting to the ZK server on the master, instead of localhost:2181.


Any suggestion or help is greatly appreciated!

Thank you,

Michael