date:20161220

Launching multiple spark jobs within a main spark job.

2016-12-20 Thread Naveen

Hi Team, Is it ok to spawn multiple spark jobs within a main spark job, my main spark job's driver which was launched on yarn cluster, will do some preprocessing and based on it, it needs to launch multilple spark jobs on yarn cluster. Not sure if this right pattern. Please share your thoughts. S

Re: Aggregating over sorted data

2016-12-20 Thread Liang-Chi Hsieh

Hi, Can you try the combination of `repartition` + `sortWithinPartitions` on the dataset? E.g., val df = Seq((2, "b c a"), (1, "c a b"), (3, "a c b")).toDF("number", "letters") val df2 = df.explode('letters) { case Row(letters: String) => letters.split(" ").map(Tuple1(_)).t

Re: Null pointer exception with RDD while computing a method, creating dataframe.

2016-12-20 Thread Liang-Chi Hsieh

Hi, You can't invoke any RDD actions/transformations inside another transformations. They must be invoked by the driver. If I understand your purpose correctly, you can partition your data (i.e., `partitionBy`) when writing out to parquet files. - Liang-Chi Hsieh | @viirya Spark Technolo

Null pointer exception with RDD while computing a method, creating dataframe.

2016-12-20 Thread satyajit vegesna

Hi All, PFB sample code , val df = spark.read.parquet() df.registerTempTable("df") val zip = df.select("zip_code").distinct().as[String].rdd def comp(zipcode:String):Unit={ val zipval = "SELECT * FROM df WHERE zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode) val data = spark.sql(zip

[no subject]

2016-12-20 Thread satyajit vegesna

Hi All, PFB sample code , val df = spark.read.parquet() df.registerTempTable("df") val zip = df.select("zip_code").distinct().as[String].rdd def comp(zipcode:String):Unit={ val zipval = "SELECT * FROM df WHERE zip_code='$zipvalrepl'".replace("$zipvalrepl", zipcode) val data = spark.sql(zip

Re: Reg: Any Dev member in and around Chennai / Tamilnadu

2016-12-20 Thread Mark Hamstra

http://spark.apache.org/committers.html On Tue, Dec 20, 2016 at 4:48 AM, Sivanesan Govindaraj < nesan.commit...@gmail.com> wrote: > HI Dev, > >Sorry to bother with non-technical query. I wish to connect with any > active contributor / committer in and around Chennai / TamilNadu. I wish to > c

答复: How to deal with string column data for spark mlib?

2016-12-20 Thread Triones,Deng(vip.com)

Hi spark dev, I am using spark 2 to write orc file to hdfs. I have one questions about the savemode. My use case is this. When I write data into hdfs. If one task failed I hope the file that the task created should be delete and the retry task can write all data, that is to s

question about the data frame save mode to make the data exactly one

2016-12-20 Thread Triones,Deng(vip.com)

Hi spark dev, I am using spark 2 to write orc file to hdfs. I have one questions about the savemode. My use case is this. When I write data into hdfs. If one task failed I hope the file that the task created should be delete and the retry task can write all data, that is to

Reg: Any Dev member in and around Chennai / Tamilnadu

2016-12-20 Thread Sivanesan Govindaraj

HI Dev, Sorry to bother with non-technical query. I wish to connect with any active contributor / committer in and around Chennai / TamilNadu. I wish to connect in person. Is there a list of all committer details in any location? Regs, Siva.

Re: Expand the Spark SQL programming guide?

2016-12-20 Thread Ricardo Almeida

The examples look great indeed. Seems a good addition to the existing documentation. I understand the UDAF examples don't apply to Python but is there any relevant reason to skip Python API altogether from this window functions documentation? On 20 December 2016 at 16:56, Jim Hughes wrote: > Hi

Re: Expand the Spark SQL programming guide?

2016-12-20 Thread Jim Hughes

Hi Anton, Your example and documentation looks great! I left some comments suggesting a few additions, but the PR in its current state is a great improvement! Thanks, Jim On 12/18/2016 09:09 AM, Anton Okolnychyi wrote: Any comments/suggestions are more than welcome. Thanks, Anton 2016-1

Re: Kafka Spark structured streaming latency benchmark.

2016-12-20 Thread Prashant Sharma

Hi Shixiong, Thanks for taking a look, I am trying to run and see if making ContextCleaner run more frequently and/or making it non blocking will help. --Prashant On Tue, Dec 20, 2016 at 4:05 AM, Shixiong(Ryan) Zhu wrote: > Hey Prashant. Thanks for your codes. I did some investigation and it

Re: Kafka Spark structured streaming latency benchmark.

2016-12-20 Thread Jacek Laskowski

Hi, (what a timing. Just reviewed CC yesterday!) In ALS they trigger cleaning up shufflemapstages themselves so if I understood the issue the streaming part could do it too. Jacek On 19 Dec 2016 11:35 p.m., "Shixiong(Ryan) Zhu" wrote: > Hey Prashant. Thanks for your codes. I did some investig

Re: Reduce memory usage of UnsafeInMemorySorter

2016-12-20 Thread Liang-Chi Hsieh

Hi Nick, The scope of the PR I submitted is reduced because we can't make sure if it is really the root cause of the error you faced. You can check out the discussion on the PR. So I can just change the assert in the code as shown in the PR. If you can have a repro, we can go back to see if it i

Launching multiple spark jobs within a main spark job.

Re: Aggregating over sorted data

Re: Null pointer exception with RDD while computing a method, creating dataframe.

Null pointer exception with RDD while computing a method, creating dataframe.

[no subject]

Re: Reg: Any Dev member in and around Chennai / Tamilnadu

答复: How to deal with string column data for spark mlib?

question about the data frame save mode to make the data exactly one

Reg: Any Dev member in and around Chennai / Tamilnadu

Re: Expand the Spark SQL programming guide?

Re: Expand the Spark SQL programming guide?

Re: Kafka Spark structured streaming latency benchmark.

Re: Kafka Spark structured streaming latency benchmark.

Re: Reduce memory usage of UnsafeInMemorySorter

14 matches

Site Navigation

Mail list logo

Footer information