from:"\"Saliya Ekanayake\""

Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

2017-06-24 Thread Saliya Ekanayake

I haven't worked with datasets but would this help https://stackoverflow.com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd ? On Jun 23, 2017 5:43 PM, "Keith Chapman" wrote: > Hi, > > I have code that does the following using RDDs, > > val outputPartitionCount = 300 > val part = ne

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake

e. Sorry I can't be helpful, > hopefully someone else will be able to explain exactly how this works. > -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake

in any implementation based on Spark DataFrame. > > > If you are using "spark.ml" package, then most ML libraries in it are > based on DataFrame. So you shouldn't use "spark.default.parallelism", > instead of "spark.sql.shuffle.partitions". > > > Yon

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake

Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Wed, 18 Jan, 2017 at 10:16 pm, Saliya Ekanayake > wrote: > Thank you, for the quick response. No, this is not Spark SQL. I am running > the built-in PageRank. > > On Wed,

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake

Thank you, for the quick response. No, this is not Spark SQL. I am running the built-in PageRank. On Wed, Jan 18, 2017 at 10:33 AM, wrote: > Are you talking here of Spark SQL ? > > If yes, spark.sql.shuffle.partitions needs to be changed. > > > > *From:* Saliya E

Spark #cores

2017-01-18 Thread Saliya Ekanayake

terministic way? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg

Re: Pregel Question

2016-11-22 Thread Saliya Ekanayake

Just realized the attached file has text formatting wrong. The github link to the file is https://github.com/esaliya/graphxprimer/blob/master/src/main/scala-2.10/org/saliya/graphxprimer/PregelExample2.scala On Tue, Nov 22, 2016 at 3:08 PM, Saliya Ekanayake wrote: > Hi, > > I've c

Pregel Question

2016-11-22 Thread Saliya Ekanayake

27;t clone Spark would send the same array that it got after the initial call. Is there a way to turn off this caching effect? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg PregelEx

GraphX updating vertex property

2016-11-15 Thread Saliya Ekanayake

Hi, I have created a property graph using GraphX. Each vertex has an integer array as a property. I'd like to update the values of theses arrays without creating new graph objects. Is this possible in Spark? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Ne

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake

lowing similar partitioning on > both RDDs > > On Wed, Sep 14, 2016 at 2:00 PM, Saliya Ekanayake > wrote: > >> Thank you, but isn't that join going to be too expensive for this? >> >> On Tue, Sep 13, 2016 at 11:55 PM, ayan guha wrote: >> >>>

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake

filename,filecontent). > 3. Join RDD1 and 2 based on some file name (or some other key). > > On Wed, Sep 14, 2016 at 1:41 PM, Saliya Ekanayake > wrote: > >> 1.) What needs to be parallelized is the work for each of those 6M rows, >> not the 80K files. Let me elaborate thi

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake

ile has 6M rows, but total number of files~80K. is > there a scenario where there may not be a file in HDFS corresponding to the > row in first text file? > 3. May be a follow up of 1, what is your end goal? > > On Wed, Sep 14, 2016 at 12:17 PM, Saliya Ekanayake > wrote: > >

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake

; On 13 Sep 2016 11:39 p.m., "Saliya Ekanayake" wrote: > >> Just wonder if this is possible with Spark? >> >> On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake >> wrote: >> >>> Hi, >>> >>> I've got a text file where each line

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake

Just wonder if this is possible with Spark? On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake wrote: > Hi, > > I've got a text file where each line is a record. For each record, I need > to process a file in HDFS. > > So if I represent these records as an RDD and invo

Access HDFS within Spark Map Operation

2016-09-11 Thread Saliya Ekanayake

ere a better solution to that? Thank you, Saliya -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington

Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

Re: Spark #cores

Re: Spark #cores

Re: Spark #cores

Re: Spark #cores

Spark #cores

Re: Pregel Question

Pregel Question

GraphX updating vertex property

Re: Access HDFS within Spark Map Operation

Re: Access HDFS within Spark Map Operation

Re: Access HDFS within Spark Map Operation

Re: Access HDFS within Spark Map Operation

Re: Access HDFS within Spark Map Operation

Access HDFS within Spark Map Operation

15 matches

Site Navigation

Mail list logo

Footer information