Thanks, Peter Rudenko
Hi, i want to play with Criteo 1 tb dataset. Files are located on azure
storage. Here's a command to download them:
curl -O
http://azuremlsampleexperiments.blob.core.windows.net/criteo/day_{`seq
-s ‘,’ 0 23`}.gz
is there any way to read files through http protocol with spark without
downloading them first to hdfs?. Something like this:
sc.textFile("
http://azuremlsampleexperiments.blob.core.windows.net/criteo/day_{0-23}.gz"),
so it will have 24 partitions.