from:"Renyi Xiong"

SparkR driver side JNI

2015-08-06 Thread Renyi Xiong

why SparkR chose to uses inter-process socket solution eventually on driver side instead of in-process JNI showed in one of its doc's below (about page 20)? https://spark-summit.org/wp-content/uploads/2014/07/SparkR-Interactive-R-Programs-at-Scale-Shivaram-Vankataraman-Zongheng-Yang.pdf

Re: SparkR driver side JNI

2015-09-11 Thread Renyi Xiong

> Yeah in addition to the downside of having 2 JVMs the command line > arguments and SparkConf etc. will be set by spark-submit in the first > JVM which won't be available in the second JVM. > > Shivaram > > On Thu, Sep 10, 2015 at 5:18 PM, Renyi Xiong > wrote: > >

Re: SparkR driver side JNI

2015-09-11 Thread Renyi Xiong

ough JNI? > > > > On Thu, Sep 10, 2015 at 9:29 PM, Shivaram Venkataraman > > wrote: > >> > >> Yeah in addition to the downside of having 2 JVMs the command line > >> arguments and SparkConf etc. will be set by spark-submit in the first > >&

pyspark streaming DStream compute

2015-09-15 Thread Renyi Xiong

Can anybody help understand why pyspark streaming uses py4j callback to execute python code while pyspark batch uses worker.py? regarding pyspark streaming, is py4j callback only used for DStream, worker.py still used for RDD? thanks, Renyi.

SparkR streaming source code

2015-09-16 Thread Renyi Xiong

SparkR streaming is mentioned at about page 17 in below pdf, can anyone share source code? (could not find it on GitHub) https://spark-summit.org/2015-east/wp-content/uploads/2015/03/SSE15-19-Hao-Lin-Haichuan-Wang.pdf Thanks, Renyi.

Re: SparkR streaming source code

2015-09-16 Thread Renyi Xiong

Reynold Xin wrote: > > You should reach out to the speakers directly. > > > > > > On Wed, Sep 16, 2015 at 9:52 AM, Renyi Xiong > wrote: > >> > >> SparkR streaming is mentioned at about page 17 in below pdf, can anyone > >> share source code? (cou

Spark streaming DStream state on worker

2015-09-16 Thread Renyi Xiong

Hi, I want to do temporal join operation on DStream across RDDs, my question is: Are RDDs from same DStream always computed on same worker (except failover) ? thanks, Renyi.

failed to run spark sample on windows

2015-09-28 Thread Renyi Xiong

I tried to run HdfsTest sample on windows spark-1.4.0 bin\run-sample org.apache.spark.examples.HdfsTest but got below exception, any body any idea what was wrong here? 15/09/28 16:33:56.565 ERROR SparkContext: Error initializing SparkContext. java.lang.NullPointerException at java.lang.

Re: failed to run spark sample on windows

2015-09-29 Thread Renyi Xiong

ith the one which was used to build Spark > 1.4.0 ? > > Cheers > > On Mon, Sep 28, 2015 at 4:36 PM, Renyi Xiong > wrote: > >> I tried to run HdfsTest sample on windows spark-1.4.0 >> >> bin\run-sample org.apache.spark.examples.HdfsTest >> >> but got

Re: failed to run spark sample on windows

2015-09-30 Thread Renyi Xiong

thanks a lot, it works now after I set %HADOOP_HOME% On Tue, Sep 29, 2015 at 1:22 PM, saurfang wrote: > See > > http://stackoverflow.com/questions/26516865/is-it-possible-to-run-hadoop-jobs-like-the-wordcount-sample-in-the-local-mode > , > https://issues.apache.org/jira/browse/SPARK-6961 and fin

SparkR dataframe UDF

2015-10-02 Thread Renyi Xiong

Hi Shiva, Is Dataframe UDF implemented in SparkR yet? - I could not find it in below URL https://github.com/hlin09/spark/tree/SparkR-streaming/R/pkg/R Thanks, Renyi.

Re: failure notice

2015-10-05 Thread Renyi Xiong

t; g...@mail.gmail.com> > 2...@mail.gmail.com> > Date: Mon, 28 Sep 2015 16:31:42 -0700 > Message-ID: < > cangsv69hyqbbvb8_8zshstlrpdy-37fjnwyvxce-xf7dphq...@mail.gmail.com> > Subject: Re: Spark streaming DStream state on worker > From: Renyi Xiong >

Re: failure notice

2015-10-06 Thread Renyi Xiong

should not be any node-specific long term state that > you rely on unless you can recover that state on a different node. > > On Mon, Oct 5, 2015 at 3:03 PM, Renyi Xiong wrote: > >> if RDDs from same DStream not guaranteed to run on same worker, then the >> question becom

Re: [Streaming] join events in last 10 minutes

2015-10-14 Thread Renyi Xiong

Hi TD, The scenario here is to let events from topic1 wait a fixed 10 minutes for events with same key from topic2 to come and left outer join them by the key does the query do what is expected? if not, what is the right way to achieve this? thanks, Renyi. On Tue, Oct 13, 2015 at 5:14 PM, Danie

let spark streaming sample come to stop

2015-11-13 Thread Renyi Xiong

Hi, I try to run the following 1.4.1 sample by putting a words.txt under localdir bin\run-example org.apache.spark.examples.streaming.HdfsWordCount localdir 2 questions 1. it does not pick up words.txt because it's 'old' I guess - any option to let it picked up? 2. I managed to put a 'new' file

DStream not initialized SparkException

2015-12-09 Thread Renyi Xiong

hi, I met following exception when the driver program tried to recover from checkpoint, looks like the logic relies on zeroTime being set which doesn't seem to happen here. am I missing anything or is it a bug in 1.4.1? org.apache.spark.SparkException: org.apache.spark.streaming.api.csharp.CSharp

Re: let spark streaming sample come to stop

2015-12-09 Thread Renyi Xiong

push them into a dstream. It is meant to be run > indefinitely, unless interrupted by ctrl-c, for example. > > -bryan > On Nov 13, 2015 10:52 AM, "Renyi Xiong" wrote: > >> Hi, >> >> I try to run the following 1.4.1 s

Re: DStream not initialized SparkException

2015-12-09 Thread Renyi Xiong

bmit(SparkSubmit.scala:193) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) On Wed, Dec 9, 2015 at 12:45 PM, Renyi Xiong wrote: > hi, > > I met following exception when the driver program tried

Re: DStream not initialized SparkException

2015-12-09 Thread Renyi Xiong

never mind, one of my peers correct the driver program for me - all dstream operations need to be within the scope of getOrCreate API On Wed, Dec 9, 2015 at 3:32 PM, Renyi Xiong wrote: > following scala program throws same exception, I know people are running > streaming jobs against ka

pyspark streaming 1.6 mapWithState?

2015-12-21 Thread Renyi Xiong

Hi TD, I noticed mapWithState was available in spark 1.6. Is there any plan to enable it in pyspark as well? thanks, Renyi.

Spark Streaming KafkaUtils missing Save API?

2016-01-15 Thread Renyi Xiong

Hi, We noticed there's no Save method in KafkaUtils. we do have scenarios where we want to save RDD back to Kafka queue to be consumed by down stream streaming applications. I wonder if this is a common scenario, if yes, any plan to add it? Thanks, Renyi.

pyspark worker concurrency

2016-02-06 Thread Renyi Xiong

Hi, is it a good idea to have 2 threads in pyspark worker? - main thread responsible for receive and send data over socket while the other thread is calling user functions to process data? since CPU is idle (?) during network I/O, this should improve concurrency quite a bit. can expert answer t

Re: pyspark worker concurrency

2016-02-08 Thread Renyi Xiong

never mind, I think pyspark is already doing async socket read / write, but on scala side in PythonRDD.scala On Sat, Feb 6, 2016 at 6:27 PM, Renyi Xiong wrote: > Hi, > > is it a good idea to have 2 threads in pyspark worker? - main thread > responsible for receive and send data

DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition

2016-03-10 Thread Renyi Xiong

Hi TD, Thanks a lot for offering to look at our PR (if we fire one) at the conference NYC. As we discussed briefly the issues of unbalanced and under-distributed kafka partitions when developing Spark streaming application in Mobius (C# for Spark), we're trying the option of repartitioning within

Re: DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition

2016-03-14 Thread Renyi Xiong

a lot of potential change already underway. > > See > > https://issues.apache.org/jira/browse/SPARK-12177 > > On Thu, Mar 10, 2016 at 1:59 PM, Renyi Xiong > wrote: > > Hi TD, > > > > Thanks a lot for offering to look at our PR (if we fire one) at the > > conf

Declare rest of @Experimental items non-experimental if they've existed since 1.2.0

2016-04-01 Thread Renyi Xiong

Hi Sean, We're upgrading Mobius (C# binding for Spark) in Microsoft to align with Spark 1.6.2 and noticed some changes in API you did in https://github.com/apache/spark/commit/6f81eae24f83df51a99d4bb2629dd7daadc01519 mostly on APIs with Approx postfix. (still marked as experimental in pyspark t

RE: Declare rest of @Experimental items non-experimental if they'veexisted since 1.2.0

2016-04-01 Thread Renyi Xiong

Thanks a lot, Sean, really appreciate your comments. Sent from my Windows 10 phone From: Sean Owen Sent: Friday, April 1, 2016 12:55 PM To: Renyi Xiong Cc: Tathagata Das; dev Subject: Re: Declare rest of @Experimental items non-experimental if they'veexisted since 1.2.0 The change ther

Spark Streaming UI reporting a different task duration

2016-04-05 Thread Renyi Xiong

Hi TD, We noticed that Spark Streaming UI is reporting a different task duration from time to time. e.g. here's the standard output of the application which reports the duration of the longest task is about 3.3 minutes: 16/04/01 16:07:19 INFO TaskSetManager: Finished task 1077.0 in stage 0.0 (TI

Spark streaming Kafka receiver WriteAheadLog question

2016-04-22 Thread Renyi Xiong

Hi, Is it possible for Kafka receiver generated WriteAheadLogBackedBlockRDD to hold corresponded Kafka offset range so that during recovery the RDD can refer back to Kafka queue instead of paying the cost of write ahead log? I guess there must be a reason here. Could anyone please help me underst

Spark streaming concurrent job scheduling question

2016-04-28 Thread Renyi Xiong

Hi, I am trying to run an I/O intensive RDD in parallel with CPU intensive RDD within an application through a window like below: var ssc = new StreamingContext(sc, 1min); var ds1 = ... var ds2 = ds1.Window(2min).ForeachRDD(...) ds1.ForeachRDD(...) I hope ds1 to start its job at 1min interval ev

persist versus checkpoint

2016-04-30 Thread Renyi Xiong

Hi, Is RDD.persist equivalent to RDD.checkpoint If they save same number of copies (say 3) to disk? (I assume persist saves copies on different machines ?) thanks, Renyi.

MetadataFetchFailedException if executorLost when spark.speculation enabled ?

2016-05-01 Thread Renyi Xiong

Hi, We observed MetadataFetchFailedException during executorLost if spark.speculation enabled. looks like something out of sync between original task on failed executor and its speculative counterpart task? is it a known issue? please let me know if you need more details. Thanks, Renyi.

Re: Spark streaming Kafka receiver WriteAheadLog question

2016-05-02 Thread Renyi Xiong

; inconsistencies between data reliably received by Spark Streaming and > offsets tracked by Zookeeper. * > > > thanks > Mario > > [image: Inactive hide details for Renyi Xiong ---01/05/2016 03:34:51 > am---Hi, Thanks a lot, Cody and Mario, for your comments.]Renyi Xiong > ---01/05

Spark shuffling OutOfMemoryError Java heap space

2016-05-15 Thread Renyi Xiong

Hi I am consistently observing driver OutOfMemoryError (Java heap space) during shuffling operation indicated by the log: 16/05/14 21:57:03 INFO MapOutputTrackerMaster: Size of output statuses for shuffle 2 is 36060250 bytes à shuffle metadata size is big and the full metadata will be sent

SparkR driver side JNI

Re: SparkR driver side JNI

Re: SparkR driver side JNI

pyspark streaming DStream compute

SparkR streaming source code

Re: SparkR streaming source code

Spark streaming DStream state on worker

failed to run spark sample on windows

Re: failed to run spark sample on windows

Re: failed to run spark sample on windows

SparkR dataframe UDF

Re: failure notice

Re: failure notice

Re: [Streaming] join events in last 10 minutes

let spark streaming sample come to stop

DStream not initialized SparkException

Re: let spark streaming sample come to stop

Re: DStream not initialized SparkException

Re: DStream not initialized SparkException

pyspark streaming 1.6 mapWithState?

Spark Streaming KafkaUtils missing Save API?

pyspark worker concurrency

Re: pyspark worker concurrency

DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition

Re: DynamicPartitionKafkaRDD - 1:n mapping between kafka and RDD partition

Declare rest of @Experimental items non-experimental if they've existed since 1.2.0

RE: Declare rest of @Experimental items non-experimental if they'veexisted since 1.2.0

Spark Streaming UI reporting a different task duration

Spark streaming Kafka receiver WriteAheadLog question

Spark streaming concurrent job scheduling question

persist versus checkpoint

MetadataFetchFailedException if executorLost when spark.speculation enabled ?

Re: Spark streaming Kafka receiver WriteAheadLog question

Spark shuffling OutOfMemoryError Java heap space

34 matches

Site Navigation

Mail list logo

Footer information