Help with WebHDFS authentication: simple vs simple-dt

2016-09-27 Thread Benjamin Ross
All,
I'm in the process of setting up encryption at rest on a cluster, but I want to 
make sure that everything else remains permissive - otherwise it will break 
existing processes that we have in place.  I'm very close to getting this 
working - the last piece is that webhdfs is not permissive:

In my local setup where I have things working, webhdfs reports the following 
when trying to create a file (note t=simple):
$ curl -i -X PUT 
'localhost:50070/webhdfs/v1/tmp/foo?op=CREATE=true=yarn'
HTTP/1.1 307 TEMPORARY_REDIRECT
Cache-Control: no-cache
Expires: Tue, 27 Sep 2016 14:52:06 GMT
Date: Tue, 27 Sep 2016 14:52:06 GMT
Pragma: no-cache
Expires: Tue, 27 Sep 2016 14:52:06 GMT
Date: Tue, 27 Sep 2016 14:52:06 GMT
Pragma: no-cache
Content-Type: application/octet-stream
Set-Cookie: 
hadoop.auth="u=yarn=yarn=simple=1475023926231=0wqlgqLNm50k/mN66qZwyCb4xUs=";
 Path=/; HttpOnly
Location: 
http://localhost:50075/webhdfs/v1/tmp/foo?op=CREATE=yarn=localhost:9000==true=true
Content-Length: 0
Server: Jetty(6.1.26.hwx)


On the cluster, however, it reports the following (note t=simple-dt)
$ curl -i -X PUT 
'http://10.41.1.6:14000/webhdfs/v1/tmp/foo?op=CREATE=true=yarn'
HTTP/1.1 307 Temporary Redirect
Server: Apache-Coyote/1.1
Set-Cookie: 
hadoop.auth="u=yarn=yarn=simple-dt=1475023818932=9FteGx9VW06bh5dD1L9J+1ENWtY=";
 Path=/; HttpOnly
Location: 
http://10.41.1.6:14000/webhdfs/v1/tmp/foo?op=CREATE=yarn=true=true
Content-Type: application/json
Content-Length: 0
Date: Tue, 27 Sep 2016 14:50:18 GMT


Note that my local setup reports the authentication type as simple whereas the 
cluster reports simple-dt.  This is the reason why I'm getting an 
authentication failure when trying to write a file to the cluster.  I don't 
want Keberos or delegation tokens enabled.

Does anyone know what I need to change so that this becomes simple again?

Thanks in advance,
Ben


This message has been scanned for malware by Websense. www.websense.com


read multiple files

2016-09-27 Thread Divya Gehlot
Hi,
The input data files for my spark job generated at every five minutes file
name follows epoch time convention  as below :

InputFolder/batch-147495960
InputFolder/batch-147495990
InputFolder/batch-147496020
InputFolder/batch-147496050
InputFolder/batch-147496080
InputFolder/batch-147496110
InputFolder/batch-147496140
InputFolder/batch-147496170
InputFolder/batch-147496200
InputFolder/batch-147496230

As per requirement I need to read one month of data from current timestamp.

Would really appreciate if anybody could help me .

Thanks,
Divya