Gents,
Two questions:

1.       Say you have 5 folders with input data (fold1,fold2,fold3,....,fold5) 
in you hdfs in pseudo-dist mode cluster.
You will write your MR job to access your files by listing them in :
FileInputFormat.addInputPaths(job, "fold1, fold2, fold3...,fold5");
Q: Is there a way to move the above folders to the parent folder say, 
"the_folder", so that the dir struct will be the_folder/fold1, 
the_folder/fold2... Will it be possible to access your files with something 
like: FileInputFormat.addInputPaths(job, "the_fold1/*"); or similar?
I am asking in case your input folders list grows too long. How to curb that?

2.       Hypothetically speaking  in fully-dist mode cluster your folders with 
Data are located as follows:  Node1: (fold1,fold2,fold3) and  Node2:(fold4, 
fold5)

Q: Do we change below command  or will NN and JT  take care how of locating 
those files?
FileInputFormat.addInputPaths(job, "fold1, fold2, fold3...,fold5");
     2a.     Using Data balancer which splits input/moves Data across 
additional DNs indicated in conf/slaves,  is it possible to run "hdfs dfs -ls 
-r " command  on the slave node that runs DN on a separate machine? I have

Cheers,

AK

NOTICE: This e-mail message and any attachments are confidential, subject to 
copyright and may be privileged. Any unauthorized use, copying or disclosure is 
prohibited. If you are not the intended recipient, please delete and contact 
the sender immediately. Please consider the environment before printing this 
e-mail. AVIS : le pr?sent courriel et toute pi?ce jointe qui l'accompagne sont 
confidentiels, prot?g?s par le droit d'auteur et peuvent ?tre couverts par le 
secret professionnel. Toute utilisation, copie ou divulgation non autoris?e est 
interdite. Si vous n'?tes pas le destinataire pr?vu de ce courriel, 
supprimez-le et contactez imm?diatement l'exp?diteur. Veuillez penser ? 
l'environnement avant d'imprimer le pr?sent courriel

Reply via email to