Gents,
Two questions:
1. Say you have 5 folders with input data (fold1,fold2,fold3,....,fold5)
in you hdfs in pseudo-dist mode cluster.
You will write your MR job to access your files by listing them in :
FileInputFormat.addInputPaths(job, "fold1, fold2, fold3...,fold5");
Q: Is there a way to move the above folders to the parent folder say,
"the_folder", so that the dir struct will be the_folder/fold1,
the_folder/fold2... Will it be possible to access your files with something
like: FileInputFormat.addInputPaths(job, "the_fold1/*"); or similar?
I am asking in case your input folders list grows too long. How to curb that?
2. Hypothetically speaking in fully-dist mode cluster your folders with
Data are located as follows: Node1: (fold1,fold2,fold3) and Node2:(fold4,
fold5)
Q: Do we change below command or will NN and JT take care how of locating
those files?
FileInputFormat.addInputPaths(job, "fold1, fold2, fold3...,fold5");
2a. Using Data balancer which splits input/moves Data across
additional DNs indicated in conf/slaves, is it possible to run "hdfs dfs -ls
-r " command on the slave node that runs DN on a separate machine? I have
Cheers,
AK
NOTICE: This e-mail message and any attachments are confidential, subject to
copyright and may be privileged. Any unauthorized use, copying or disclosure is
prohibited. If you are not the intended recipient, please delete and contact
the sender immediately. Please consider the environment before printing this
e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont
confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le
secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est
interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel,
supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ?
l'environnement avant d'imprimer le pr?sent courriel