Hi, i want to play with Criteo 1 tb dataset. Files are located on azure storage. Here's a command to download them: curl -O http://azuremlsampleexperiments.blob.core.windows.net/criteo/day_{`seq -s ‘,’ 0 23`}.gz is there any way to read files through http protocol with spark without downloading them first to hdfs?. Something like this: sc.textFile(" http://azuremlsampleexperiments.blob.core.windows.net/criteo/day_{0-23}.gz";), so it will have 24 partitions.

Thanks,
Peter Rudenko

Reply via email to