from:"Peter Liu"

Re: read binary files (for stream reader) / spark 2.3

2019-09-09 Thread Peter Liu

ta-support-in-apache-spark/ > > > > > https://spark.apache.org/docs/2.3.0/api/scala/index.html#org.apache.spark.ml.image.ImageSchema$ > > > > There’s also a spark package for spark versions older than 2.3: > > https://github.com/Microsoft/spark-images > > > >

Re: read image or binary files / spark 2.3

2019-09-05 Thread Peter Liu

Hello experts, I have quick question: which API allows me to read images files or binary files (for SparkSession.readStream) from a local/hadoop file system in Spark 2.3? I have been browsing the following documentations and googling for it and didn't find a good example/documentation: https://s

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Liu

ocket for local communication or just directly read a part > from other's jvm shuffle file. But yes, it's not available in spark out of > box. > > Thanks, > Peter Rudenko > > пт, 19 жовт. 2018 о 16:54 Peter Liu пише: > >> Hi Peter, >> >&

Re: Spark In Memory Shuffle / 5403

2018-10-19 Thread Peter Liu

should get better > performance. > > Thanks, > Peter Rudenko > > чт, 18 жовт. 2018 о 18:07 Peter Liu пише: > >> I would be very interested in the initial question here: >> >> is there a production level implementation for memory only shuffle and >> configur

Re: Spark In Memory Shuffle / 5403

2018-10-18 Thread Peter Liu

I would be very interested in the initial question here: is there a production level implementation for memory only shuffle and configurable (similar to MEMORY_ONLY storage level, MEMORY_OR_DISK storage level) as mentioned in this ticket, https://github.com/apache/spark/pull/5403 ? It would be

re: yarn resource overcommit: cpu / vcores

2018-10-11 Thread Peter Liu

Hi there, is there any best practice guideline on yarn resource overcommit with cpu / vcores, such as yarn config options, candidate cases ideal for overcommiting vcores etc.? this slide below (from 2016) seems to address the memory overcommit topic and hint a "future" topic on cpu overcommit: ht

Re: [External Sender] re: streaming, batch / spark 2.2.1

2018-08-02 Thread Peter Liu

. This is why it's important than > your throughput is higher than your input rate. If it's not, batches will > become bigger and bigger and take longer and longer until the application > fails > > > > On Thu, Aug 2, 2018 at 2:43 PM Peter Liu wrote: > >> He

re: streaming, batch / spark 2.2.1

2018-08-02 Thread Peter Liu

Hello there, I'm new to spark streaming and have trouble to understand spark batch "composition" (google search keeps give me an older spark streaming concept). Would appreciate any help and clarifications. I'm using spark 2.2.1 for a streaming workload (see quoted code in (a) below). The general

Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError)

2018-06-28 Thread Peter Liu

Hello there, I just upgraded to spark 2.3.1 from spark 2.2.1, ran my streaming workload and got the error (java.lang.AbstractMethodError) never seen before; check the error stack attached in (a) bellow. anyone knows if spark 2.3.1 works well with kafka spark-streaming-kafka-0-10? this link spar

re: streaming - kafka partition transition time from (stage change logger)

2018-06-11 Thread Peter Liu

Hi there, Working on the streaming processing latency time based on timestamps from Kafka, I have two quick general questions triggered by looking at the kafka stage change log file: (a) the partition state change from OfflineReplica state *to OnlinePartition *state seems to take more than 20 sec

Re: help with streaming batch interval question needed

2018-05-25 Thread Peter Liu

t; > https://about.me/JacekLaskowski > Mastering Spark SQL https://bit.ly/mastering-spark-sql > Spark Structured Streaming https://bit.ly/spark-structured-streaming > Mastering Kafka Streams https://bit.ly/mastering-kafka-streams > Follow me at https://twitter.com/jaceklaskowsk

re: help with streaming batch interval question needed

2018-05-24 Thread Peter Liu

Hi there, from my apache spark streaming website (see links below), - the batch-interval is set when a spark StreamingContext is constructed (see example (a) quoted below) - the StreamingContext is available in older and new Spark version (v1.6, v2.2 to v2.3.0) (see https://spark.

Re: Advice on multiple streaming job

2018-05-08 Thread Peter Liu

Hi Dhaval, I'm using Yarn scheduler (without the need to specify the port in the submit). Not sue why the port issue here. Gerard seem to have a good point here to have the multiple topics managed within your application (to avoid the port issue) - Not sure if you're using Spark Streaming or Spar

re: spark streaming / AnalysisException on collect()

2018-04-30 Thread Peter Liu

Hello there, I have a quick question regarding how to share data (a small data collection) between a kafka producer and consumer using spark streaming (spark 2.2): (A) the data published by a kafka producer is received in order on the kafka consumer side (see (a) copied below). (B) however, col

Re: read binary files (for stream reader) / spark 2.3

Re: read image or binary files / spark 2.3

Re: Spark In Memory Shuffle / 5403

Re: Spark In Memory Shuffle / 5403

Re: Spark In Memory Shuffle / 5403

re: yarn resource overcommit: cpu / vcores

Re: [External Sender] re: streaming, batch / spark 2.2.1

re: streaming, batch / spark 2.2.1

Re: spark 2.3.1 with kafka spark-streaming-kafka-0-10 (java.lang.AbstractMethodError)

re: streaming - kafka partition transition time from (stage change logger)

Re: help with streaming batch interval question needed

re: help with streaming batch interval question needed

Re: Advice on multiple streaming job

re: spark streaming / AnalysisException on collect()

14 matches

Site Navigation

Mail list logo

Footer information