Re: Is there an api in Dataset/Dataframe that does repartitionAndSortWithinPartitions?

2017-06-24 Thread Saliya Ekanayake
I haven't worked with datasets but would this help https://stackoverflow.com/questions/37513667/how-to-create-a-spark-dataset-from-an-rdd ? On Jun 23, 2017 5:43 PM, "Keith Chapman" wrote: > Hi, > > I have code that does the following using RDDs, > > val outputPartitionCount = 300 > val part = ne

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake
e. Sorry I can't be helpful, > hopefully someone else will be able to explain exactly how this works. > -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake
in any implementation based on Spark DataFrame. > > > If you are using "spark.ml" package, then most ML libraries in it are > based on DataFrame. So you shouldn't use "spark.default.parallelism", > instead of "spark.sql.shuffle.partitions". > > > Yon

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake
Mail on Android > <https://overview.mail.yahoo.com/mobile/?.src=Android> > > On Wed, 18 Jan, 2017 at 10:16 pm, Saliya Ekanayake > wrote: > Thank you, for the quick response. No, this is not Spark SQL. I am running > the built-in PageRank. > > On Wed,

Re: Spark #cores

2017-01-18 Thread Saliya Ekanayake
Thank you, for the quick response. No, this is not Spark SQL. I am running the built-in PageRank. On Wed, Jan 18, 2017 at 10:33 AM, wrote: > Are you talking here of Spark SQL ? > > If yes, spark.sql.shuffle.partitions needs to be changed. > > > > *From:* Saliya E

Spark #cores

2017-01-18 Thread Saliya Ekanayake
terministic way? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg

Re: Pregel Question

2016-11-22 Thread Saliya Ekanayake
Just realized the attached file has text formatting wrong. The github link to the file is https://github.com/esaliya/graphxprimer/blob/master/src/main/scala-2.10/org/saliya/graphxprimer/PregelExample2.scala On Tue, Nov 22, 2016 at 3:08 PM, Saliya Ekanayake wrote: > Hi, > > I've c

Pregel Question

2016-11-22 Thread Saliya Ekanayake
27;t clone Spark would send the same array that it got after the initial call. Is there a way to turn off this caching effect? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Network Dynamics and Simulation Science Laboratory (NDSSL) Virginia Tech, Blacksburg PregelEx

GraphX updating vertex property

2016-11-15 Thread Saliya Ekanayake
Hi, I have created a property graph using GraphX. Each vertex has an integer array as a property. I'd like to update the values of theses arrays without creating new graph objects. Is this possible in Spark? Thank you, Saliya -- Saliya Ekanayake, Ph.D Applied Computer Scientist Ne

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
lowing similar partitioning on > both RDDs > > On Wed, Sep 14, 2016 at 2:00 PM, Saliya Ekanayake > wrote: > >> Thank you, but isn't that join going to be too expensive for this? >> >> On Tue, Sep 13, 2016 at 11:55 PM, ayan guha wrote: >> >>>

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
filename,filecontent). > 3. Join RDD1 and 2 based on some file name (or some other key). > > On Wed, Sep 14, 2016 at 1:41 PM, Saliya Ekanayake > wrote: > >> 1.) What needs to be parallelized is the work for each of those 6M rows, >> not the 80K files. Let me elaborate thi

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
ile has 6M rows, but total number of files~80K. is > there a scenario where there may not be a file in HDFS corresponding to the > row in first text file? > 3. May be a follow up of 1, what is your end goal? > > On Wed, Sep 14, 2016 at 12:17 PM, Saliya Ekanayake > wrote: > >

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
; On 13 Sep 2016 11:39 p.m., "Saliya Ekanayake" wrote: > >> Just wonder if this is possible with Spark? >> >> On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake >> wrote: >> >>> Hi, >>> >>> I've got a text file where each line

Re: Access HDFS within Spark Map Operation

2016-09-13 Thread Saliya Ekanayake
Just wonder if this is possible with Spark? On Mon, Sep 12, 2016 at 12:14 AM, Saliya Ekanayake wrote: > Hi, > > I've got a text file where each line is a record. For each record, I need > to process a file in HDFS. > > So if I represent these records as an RDD and invo

Access HDFS within Spark Map Operation

2016-09-11 Thread Saliya Ekanayake
ere a better solution to that? Thank you, Saliya -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington