Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-19 Thread Michel Sumbul
g Spark with HDFS encrypted with KMS :-) >> >> Thanks, >> Michel >> >> Le jeu. 13 août 2020 à 14:32, Michel Sumbul a >> écrit : >> >>> Hi guys, >>> >>> Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS &g

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-19 Thread Prashant Sharma
using Spark with HDFS encrypted with KMS :-) > > Thanks, > Michel > > Le jeu. 13 août 2020 à 14:32, Michel Sumbul a > écrit : > >> Hi guys, >> >> Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS >> in HA mode (with kerberos)? >

Re: Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-15 Thread Michel Sumbul
:-) Thanks, Michel Le jeu. 13 août 2020 à 14:32, Michel Sumbul a écrit : > Hi guys, > > Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS in > HA mode (with kerberos)? > > I have a wordcount job running with Spark3 reading data on HDFS (hadoop > 3

Spark3 on k8S reading encrypted data from HDFS with KMS in HA

2020-08-13 Thread Michel Sumbul
Hi guys, Does anyone try Spark3 on k8s reading data from HDFS encrypted with KMS in HA mode (with kerberos)? I have a wordcount job running with Spark3 reading data on HDFS (hadoop 3.1) everything secure with kerberos. Everything works fine if the data folder is not encrypted (spark on k8s

Data from HDFS

2018-04-22 Thread Zois Theodoros
Hello, I am reading data from HDFS in a Spark application and as far as I read each HDFS block is 1 partition for Spark by default. Is there any way to select only 1 block from HDFS to read in my Spark application? Thank you, Thodoris

Re: Spark loads data from HDFS or S3

2017-12-13 Thread Jörn Franke
philjj...@gmail.com> wrote: > > Hi​ > > I have a few of questions about a structure of HDFS and S3 when Spark-like > loads data from two storage. > > Generally, when Spark loads data from HDFS, HDFS supports data locality and > already own distributed file on datanodes

Re: Spark loads data from HDFS or S3

2017-12-13 Thread Sebastian Nagel
ve a few of questions about a structure of HDFS and S3 when Spark-like > loads data from two storage. > > > Generally, when Spark loads data from HDFS, HDFS supports data locality and > already own distributed > file on datanodes, right? Spark could just process data on workers. >

Spark loads data from HDFS or S3

2017-12-13 Thread Philip Lee
Hi ​ I have a few of questions about a structure of HDFS and S3 when Spark-like loads data from two storage. Generally, when Spark loads data from HDFS, HDFS supports data locality and already own distributed file on datanodes, right? Spark could just process data on workers. What about S3

Can Spark read input data from HDFS centralized cache?

2016-01-25 Thread Jia Zou
I configured HDFS to cache file in HDFS's cache, like following: hdfs cacheadmin -addPool hibench hdfs cacheadmin -addDirective -path /HiBench/Kmeans/Input -pool hibench But I didn't see much performance impacts, no matter how I configure dfs.datanode.max.locked.memory Is it possible that

Re: Can Spark read input data from HDFS centralized cache?

2016-01-25 Thread Ted Yu
Have you read this thread ? http://search-hadoop.com/m/uOzYttXZcg1M6oKf2/HDFS+cache=RE+hadoop+hdfs+cache+question+do+client+processes+share+cache+ Cheers On Mon, Jan 25, 2016 at 1:23 PM, Jia Zou wrote: > I configured HDFS to cache file in HDFS's cache, like following:

Re: Can Spark read input data from HDFS centralized cache?

2016-01-25 Thread Ted Yu
Please see also: http://hadoop.apache.org/docs/r2.7.1/hadoop-project-dist/hadoop-hdfs/CentralizedCacheManagement.html According to Chris Nauroth, an hdfs committer, it's extremely difficult to use the feature correctly. The feature also brings operational complexity. Since off-heap memory is

Re: How to load partial data from HDFS using Spark SQL

2016-01-02 Thread swetha kasireddy
from table where id = ") > //filtered data frame > df.count > > On Sat, Jan 2, 2016 at 11:56 AM, SRK <swethakasire...@gmail.com> wrote: > >> Hi, >> >> How to load partial data from hdfs using Spark SQL? Suppose I want to load >> data based on a filter like &

How to load partial data from HDFS using Spark SQL

2016-01-01 Thread SRK
Hi, How to load partial data from hdfs using Spark SQL? Suppose I want to load data based on a filter like "Select * from table where id = " using Spark SQL with DataFrames, how can that be done? The idea here is that I do not want to load the whole data into memory when I use the

Re: How to load partial data from HDFS using Spark SQL

2016-01-01 Thread UMESH CHAUDHARY
Ok, so whats wrong in using : var df=HiveContext.sql("Select * from table where id = ") //filtered data frame df.count On Sat, Jan 2, 2016 at 11:56 AM, SRK <swethakasire...@gmail.com> wrote: > Hi, > > How to load partial data from hdfs using Spark SQL? Suppose I

ClassCastException while reading data from HDFS through Spark

2015-10-07 Thread Vinoth Sankar
I'm just reading data from HDFS through Spark. It throws *java.lang.ClassCastException: org.apache.hadoop.io.LongWritable cannot be cast to org.apache.hadoop.io.BytesWritable* at line no 6. I never used LongWritable in my code, no idea how the data was in that format. Note : I'm not using

Re: ClassCastException while reading data from HDFS through Spark

2015-10-07 Thread UMESH CHAUDHARY
wrote: > I'm just reading data from HDFS through Spark. It throws > *java.lang.ClassCastException: > org.apache.hadoop.io.LongWritable cannot be cast to > org.apache.hadoop.io.BytesWritable* at line no 6. I never used > LongWritable in my code, no idea how the data was in that format. >

Re: SparkSQL: Reading data from hdfs and storing into multiple paths

2015-10-02 Thread Michael Armbrust
Once you convert your data to a dataframe (look at spark-csv), try df.write.partitionBy("", "mm").save("..."). On Thu, Oct 1, 2015 at 4:11 PM, haridass saisriram < haridass.saisri...@gmail.com> wrote: > Hi, > > I am trying to find a simple example to read a data file on HDFS. The > file

SparkSQL: Reading data from hdfs and storing into multiple paths

2015-10-01 Thread haridass saisriram
Hi, I am trying to find a simple example to read a data file on HDFS. The file has the following format a , b , c ,,mm a1,b1,c1,2015,09 a2,b2,c2,2014,08 I would like to read this file and store it in HDFS partitioned by year and month. Something like this /path/to/hdfs//mm I want to