Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread ๏̯͡๏
-Original Message- > *From: *ÐΞ€ρ@Ҝ (๏̯͡๏) [deepuj...@gmail.com] > *Sent: *Thursday, August 06, 2015 12:41 AM Eastern Standard Time > *To: *Philip Weaver > *Cc: *user > *Subject: *Re: How to read gzip data in Spark - Simple question > > how do i persist the RDD to HDFS ? > > On Wed,

RE: How to read gzip data in Spark - Simple question

2015-08-05 Thread Ganelin, Ilya
d Time To: Philip Weaver Cc: user Subject: Re: How to read gzip data in Spark - Simple question how do i persist the RDD to HDFS ? On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver mailto:philip.wea...@gmail.com>> wrote: This message means that java.util.Date is not supported by Spark DataFr

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread Philip Weaver
I encourage you to find the answer this this on your own :). On Wed, Aug 5, 2015 at 9:43 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > Code: > > val summary = rowStructText.map(s => s.split(",")).map( > { > s => > Summary(formatStringAsDate(s(0)), > s(1).replaceAll("\"", "").toLong, >

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread ๏̯͡๏
Code: val summary = rowStructText.map(s => s.split(",")).map( { s => Summary(formatStringAsDate(s(0)), s(1).replaceAll("\"", "").toLong, s(3).replaceAll("\"", "").toLong, s(4).replaceAll("\"", "").toInt, s(5).replaceAll("\"", ""),

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread ๏̯͡๏
how do i persist the RDD to HDFS ? On Wed, Aug 5, 2015 at 8:32 PM, Philip Weaver wrote: > This message means that java.util.Date is not supported by Spark > DataFrame. You'll need to use java.sql.Date, I believe. > > On Wed, Aug 5, 2015 at 8:29 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > >> That seem to be work

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread Philip Weaver
This message means that java.util.Date is not supported by Spark DataFrame. You'll need to use java.sql.Date, I believe. On Wed, Aug 5, 2015 at 8:29 PM, ÐΞ€ρ@Ҝ (๏̯͡๏) wrote: > That seem to be working. however i see a new exception > > Code: > def formatStringAsDate(dateStr: String) = new > Simpl

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread ๏̯͡๏
That seem to be working. however i see a new exception Code: def formatStringAsDate(dateStr: String) = new SimpleDateFormat("-MM-dd").parse(dateStr) //(2015-07-27,12459,,31242,6,Daily,-999,2099-01-01,2099-01-02,1,0,0.1,0,1,-1,isGeo,,,204,694.0,1.9236856708701322E-4,0.0,-4.48,0.0,0.0,0.0,) val

Re: How to read gzip data in Spark - Simple question

2015-08-05 Thread Philip Weaver
The parallelize method does not read the contents of a file. It simply takes a collection and distributes it to the cluster. In this case, the String is a collection 67 characters. Use sc.textFile instead of sc.parallelize, and it should work as you want. On Wed, Aug 5, 2015 at 8:12 PM, ÐΞ€ρ@Ҝ (๏