Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-14 Thread Kidong Lee
l be saved to iceberg table using spark session. I have tested this spark streaming job with transactional producers which send several millions of messages. Correctly consumed and saved to iceberg tables correctly. - Kidong. 2024년 4월 14일 (일) 오후 11:05, Mich Talebzadeh 님이 작성: > Interesting &

Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-14 Thread Mich Talebzadeh
pr 2024 at 13:40, Kidong Lee wrote: > > Because spark streaming for kafk transaction does not work correctly to > suit my need, I moved to another approach using raw kafka consumer which > handles read_committed messages from kafka correctly. > > My codes look like the followin

Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-14 Thread Kidong Lee
Because spark streaming for kafk transaction does not work correctly to suit my need, I moved to another approach using raw kafka consumer which handles read_committed messages from kafka correctly. My codes look like the following. JavaDStream stream = ssc.receiverStream(new CustomReceiver

Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-13 Thread Kidong Lee
. 2024년 4월 14일 (일) 오전 4:25, Mich Talebzadeh 님이 작성: > Hi Kidong, > > There may be few potential reasons why the message counts from your Kafka > producer and Spark Streaming consumer might not match, especially with > transactional messages and read_committed isolation level. >

Re: Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-13 Thread Mich Talebzadeh
Hi Kidong, There may be few potential reasons why the message counts from your Kafka producer and Spark Streaming consumer might not match, especially with transactional messages and read_committed isolation level. 1) Just ensure that both your Spark Streaming job and the Kafka consumer written

Spark streaming job for kafka transaction does not consume read_committed messages correctly.

2024-04-12 Thread Kidong Lee
Hi, I have a kafka producer which sends messages transactionally to kafka and spark streaming job which should consume read_committed messages from kafka. But there is a problem for spark streaming to consume read_committed messages. The count of messages sent by kafka producer transactionally

Kafka-based Spark Streaming and Vertex AI for Sentiment Analysis

2024-02-21 Thread Mich Talebzadeh
I am working on a pet project to implement a real-time sentiment analysis system for analyzing customer reviews. It leverages Kafka for data ingestion, Spark Structured Streaming (SSS) for real-time processing, and Vertex AI for sentiment analysis and potential action triggers. *Features* -

Re: [Streaming (DStream) ] : Does Spark Streaming supports pause/resume consumption of message from Kafka?

2023-12-01 Thread Mich Talebzadeh
Ok pause/continue to throw some challenges. The implication is to pause gracefully and resume the same' First have a look at this SPIP of mine [SPARK-42485] SPIP: Shutting down spark structured streaming when the streaming process completed current process - ASF JIRA (apache.org)

[Streaming (DStream) ] : Does Spark Streaming supports pause/resume consumption of message from Kafka?

2023-12-01 Thread Saurabh Agrawal (180813)
Hi Spark Team, I am using Spark 3.4.0 version in my application which is use to consume messages from Kafka topics. I have below queries: 1. Does DStream support pause/resume streaming message consumption at runtime on particular condition? If yes, please provide details. 2. I tried to revoke

Spark streaming sourceArchiveDir does not move file to archive directory

2023-09-19 Thread Yunus Emre G?rses
Hello everyone, I'm using scala and spark with the version 3.4.1 in Windows 10. While streaming using Spark, I give the `cleanSource` option as "archive" and the `sourceArchiveDir` option as "archived" as in the code below. ``` spark.readStream .option("cleanSource", "archive")

Re: [Spark streaming]: Microbatch id in logs

2023-06-26 Thread Mich Talebzadeh
In SSS writeStream. \ outputMode('append'). \ option("truncate", "false"). \ * foreachBatch(SendToBigQuery). \* option('checkpointLocation', checkpoint_path). \ so this writeStream will call

[Spark streaming]: Microbatch id in logs

2023-06-25 Thread Anil Dasari
Hi, I am using spark 3.3.1 distribution and spark stream in my application. Is there a way to add a microbatch id to all logs generated by spark and spark applications ? Thanks.

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread Mich Talebzadeh
evolving. So if anyone is interested, > please support the project. > > -- > Lingzhe Sun > Hirain Technologies > > > *From:* Mich Talebzadeh > *Date:* 2023-04-11 02:06 > *To:* Rajesh Katkar > *CC:* user > *Subject:* Re: spark streami

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread 孙令哲
Hi Rajesh, It's working fine, at least for now. But you'll need to build your own spark image using later versions. Lingzhe Sun Hirain Technologies Original: From:Rajesh Katkar Date:2023-04-12 21:36:52To:Lingzhe SunCc:Mich Talebzadeh , user Subject:Re: Re: spark streaming

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread Yi Huang
;> >> >> *From:* Mich Talebzadeh >> *Date:* 2023-04-11 02:06 >> *To:* Rajesh Katkar >> *CC:* user >> *Subject:* Re: spark streaming and kinesis integration >> What I said was this >> "In so far as I know k8s does not support spark structured stre

Re: Re: spark streaming and kinesis integration

2023-04-12 Thread Rajesh Katkar
s interested, > please support the project. > > -- > Lingzhe Sun > Hirain Technologies > > > *From:* Mich Talebzadeh > *Date:* 2023-04-11 02:06 > *To:* Rajesh Katkar > *CC:* user > *Subject:* Re: spark streaming and kinesis integration

Re: Re: spark streaming and kinesis integration

2023-04-11 Thread Lingzhe Sun
for quite long time. Kind of worried that this project might finally become outdated as k8s is evolving. So if anyone is interested, please support the project. Lingzhe Sun Hirain Technologies From: Mich Talebzadeh Date: 2023-04-11 02:06 To: Rajesh Katkar CC: user Subject: Re: spark streaming

Re: spark streaming and kinesis integration

2023-04-10 Thread Mich Talebzadeh
oss, damage or destruction. > > > > > On Mon, 10 Apr 2023 at 06:31, Rajesh Katkar > wrote: > >> Do you have any link or ticket which justifies that k8s does not support >> spark streaming ? >> >> On Thu, 6 Apr, 2023, 9:15 pm Mich Talebzadeh, >>

Re: spark streaming and kinesis integration

2023-04-10 Thread Mich Talebzadeh
uch loss, damage or destruction. On Mon, 10 Apr 2023 at 06:31, Rajesh Katkar wrote: > Do you have any link or ticket which justifies that k8s does not support > spark streaming ? > > On Thu, 6 Apr, 2023, 9:15 pm Mich Talebzadeh, > wrote: > >> Do you have a high level diagram of the

Re: spark streaming and kinesis integration

2023-04-10 Thread Rajesh Katkar
Do you have any link or ticket which justifies that k8s does not support spark streaming ? On Thu, 6 Apr, 2023, 9:15 pm Mich Talebzadeh, wrote: > Do you have a high level diagram of the proposed solution? > > In so far as I know k8s does not support spark structured streaming?

Re: spark streaming and kinesis integration

2023-04-06 Thread Rajesh Katkar
Use case is , we want to read/write to kinesis streams using k8s Officially I could not find the connector or reader for kinesis from spark like it has for kafka. Checking here if anyone used kinesis and spark streaming combination ? On Thu, 6 Apr, 2023, 7:23 pm Mich Talebzadeh, wrote: >

RE: spark streaming and kinesis integration

2023-04-06 Thread Jonske, Kurt
kar Cc: u...@spark.incubator.apache.org Subject: Re: spark streaming and kinesis integration ⚠ [EXTERNAL EMAIL]: Use Caution Do you have a high level diagram of the proposed solution? In so far as I know k8s does not support spark structured streaming? Mich Talebzadeh, Lead Solutions

Re: spark streaming and kinesis integration

2023-04-06 Thread Mich Talebzadeh
kinesis from spark > like it has for kafka. > > Checking here if anyone used kinesis and spark streaming combination ? > > On Thu, 6 Apr, 2023, 7:23 pm Mich Talebzadeh, > wrote: > >> Hi Rajesh, >> >> What is the use case for Kinesis here? I have not used it pe

Re: spark streaming and kinesis integration

2023-04-06 Thread Mich Talebzadeh
elying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Thu, 6 Apr 2023 at 13:08, Rajesh Katkar wrote: > Hi Spark Team, > > We need to read/write the kinesis streams using

spark streaming and kinesis integration

2023-04-06 Thread Rajesh Katkar
Hi Spark Team, We need to read/write the kinesis streams using spark streaming. We checked the official documentation - https://spark.apache.org/docs/latest/streaming-kinesis-integration.html It does not mention kinesis connector. Alternative is - https://github.com/qubole/kinesis-sql which

Re: Re: should one every make a spark streaming job in pyspark

2022-11-03 Thread Lingzhe Sun
-processing-structured-streaming.html Best Regards! ... Lingzhe Sun Hirain Technology / APIC From: Mich Talebzadeh Date: 2022-11-03 19:15 To: Joris Billen CC: User Subject: Re: should one every make a spark streaming job

Re: should one every make a spark streaming job in pyspark

2022-11-03 Thread Mich Talebzadeh
Well your mileage varies so to speak. - Spark itself is written in Scala. However, that does not imply you should stick with Scala. - I have used both for spark streaming and spark structured streaming, they both work fine - PySpark has become popular with the widespread use

should one every make a spark streaming job in pyspark

2022-11-02 Thread Joris Billen
Dear community, I had a general question about the use of scala VS pyspark for spark streaming. I believe spark streaming will work most efficiently when written in scala. I believe however that things can be implemented in pyspark. My question: 1)is it completely dumb to make a streaming job

[PySpark, Spark Streaming] Bug in timestamp handling in Structured Streaming?

2022-10-21 Thread kai-michael.roes...@sap.com.INVALID
Hi, I suspect I may have come across a bug in the handling of data containing timestamps in PySpark "Structured Streaming" using the foreach option. I'm "just" a user of PySpark, no Spark community member, so I don't know how to properly address the issue. I have posted a

Re: Updating Broadcast Variable in Spark Streaming 2.4.4

2022-09-28 Thread Sean Owen
; i...@ricobergmann.de> wrote: > Hi folks! > > > I'm trying to implement an update of a broadcast var in Spark Streaming. > The idea is that whenever some configuration value has changed (this is > periodically checked by the driver) the existing broadcast variable is > unpersiste

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-09-28 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup

Re: Error - Spark STREAMING

2022-09-21 Thread Anupam Singh
Which version of spark are you using? On Tue, Sep 20, 2022, 1:57 PM Akash Vellukai wrote: > Hello, > > > py4j.protocol.Py4JJavaError: An error occurred while calling o80.load. : > java.lang.NoClassDefFoundError: > org/apache/spark/sql/internal/connector/SimpleTableProvider > > > May anyone

Error - Spark STREAMING

2022-09-20 Thread Akash Vellukai
Hello, py4j.protocol.Py4JJavaError: An error occurred while calling o80.load. : java.lang.NoClassDefFoundError: org/apache/spark/sql/internal/connector/SimpleTableProvider May anyone help Me to solve this issue. Thanks and regards Akash

Re: Spark streaming

2022-08-20 Thread Gourav Sengupta
sukumaran < sandrasukumara...@gmail.com> wrote: > Dear Sir, > > > > Is there any possible method to fetch MySQL database bin log, with > the help of spark streaming. > Kafka streaming is not applicable in this case. > > > > Thanks and regards > Sandra >

Re: [EXTERNAL] Re: Spark streaming

2022-08-20 Thread sandra sukumaran
a sukumaran > *Cc:* user@spark.apache.org > *Subject:* [EXTERNAL] Re: Spark streaming > > *Caution! This email originated outside of FedEx. Please do not open > attachments or click links from an unknown or suspicious origin*. > https://github.com/allwefantasy/spark-binlog

Re: [EXTERNAL] Re: Spark streaming

2022-08-19 Thread Saurabh Gulati
You can also try out https://debezium.io/documentation/reference/0.10/connectors/mysql.html From: Ajit Kumar Amit Sent: 19 August 2022 14:30 To: sandra sukumaran Cc: user@spark.apache.org Subject: [EXTERNAL] Re: Spark streaming Caution! This email originated

Re: Spark streaming

2022-08-19 Thread Ajit Kumar Amit
https://github.com/allwefantasy/spark-binlog Sent from my iPhone > On 19 Aug 2022, at 5:45 PM, sandra sukumaran > wrote: > >  > Dear Sir, > > > > Is there any possible method to fetch MySQL database bin log, with the > help of spark streaming. >

Spark streaming

2022-08-19 Thread sandra sukumaran
Dear Sir, Is there any possible method to fetch MySQL database bin log, with the help of spark streaming. Kafka streaming is not applicable in this case. Thanks and regards Sandra

Re: Spark streaming

2022-08-18 Thread ミユナ (alice)
> Dear sir, > > >I want to check the logs of MySQL database using spark streaming, can > someone help me with those listening queries. > > > Thanks and regards > Akash P > you can ingest logs by fluent-bit to kafka then setup spark to read r

Spark streaming

2022-08-17 Thread Prajith Vellukkai
Dear sir, I want to check the logs of MySQL database using spark streaming, can someone help me with those listening queries. Thanks and regards Akash P

Re: [EXTERNAL] Re: Spark streaming - Data Ingestion

2022-08-17 Thread Akash Vellukai
I am beginner with spark may , also know how to connect MySQL database with spark streaming Thanks and regards Akash P On Wed, 17 Aug, 2022, 8:28 pm Saurabh Gulati, wrote: > Another take: > >- Debezium ><https://debezium.io/documentation/reference/stable/connec

Re: [EXTERNAL] Re: Spark streaming - Data Ingestion

2022-08-17 Thread Gibson
The idea behind spark-streaming is to process change events as they occur, hence the suggestions above that require capturing change events using Debezium. But you can use jdbc drivers to connect Spark to relational databases On Wed, Aug 17, 2022 at 6:21 PM Akash Vellukai wrote: > I

Re: [EXTERNAL] Re: Spark streaming - Data Ingestion

2022-08-17 Thread Saurabh Gulati
Another take: * Debezium<https://debezium.io/documentation/reference/stable/connectors/mysql.html> to read Write Ahead logs(WAL) and send to Kafka * Kafka connect to write to cloud storage -> Hive * OR * Spark streaming to parse WAL -> Storage ->

Re: Spark streaming - Data Ingestion

2022-08-17 Thread Gibson
t; > Data ingestion from MySql to Hive with spark- streaming? > > Could you give me an overview. > > > Thanks and regards > Akash P >

Spark streaming - Data Ingestion

2022-08-17 Thread Akash Vellukai
Dear sir I have tried a lot on this could you help me with this? Data ingestion from MySql to Hive with spark- streaming? Could you give me an overview. Thanks and regards Akash P

Re: Updating Broadcast Variable in Spark Streaming 2.4.4

2022-07-22 Thread Sean Owen
in your code. On Fri, Jul 22, 2022 at 4:24 AM Dipl.-Inf. Rico Bergmann < i...@ricobergmann.de> wrote: > Hi folks! > > I'm trying to implement an update of a broadcast var in Spark Streaming. > The idea is that whenever some configuration value has changed (this is > periodically

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-07-22 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup

Re: Spark streaming pending mircobatches queue max length

2022-07-13 Thread Anil Dasari
Retry. From: Anil Dasari Date: Tuesday, July 12, 2022 at 3:42 PM To: user@spark.apache.org Subject: Spark streaming pending mircobatches queue max length Hello, Spark is adding entry to pending microbatches queue at periodic batch interval. Is there config to set the max size for pending

Spark streaming pending mircobatches queue max length

2022-07-12 Thread Anil Dasari
Hello, Spark is adding entry to pending microbatches queue at periodic batch interval. Is there config to set the max size for pending microbatches queue ? Thanks

Spark streaming / confluent Kafka- messages are empty

2022-06-09 Thread KhajaAsmath Mohammed
Hi, I am trying to read data from confluent Kafka using avro schema registry. Messages are always empty and stream always shows empty records. Any suggestion on this please ?? Thanks, Asmath - To unsubscribe e-mail:

Re: protobuf data as input to spark streaming

2022-05-30 Thread Kiran Biswal
Hello Stelios, friendly reminder if you could share any sample code/repo Are you using a schema registry? Thanks Kiran On Fri, Apr 8, 2022 at 4:37 PM Kiran Biswal wrote: > Hello Stelios > > Just a gentle follow up if you can share any sample code/repo > > Regards > Kiran > > On Wed, Apr 6,

Re: [Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-20 Thread Xavier Gervilla
Thank you for the flatten function, it has a bigger functionality than what I need for my project but the examples (which were really, really useful) helped me find a solution. Instead of accessing the confidence and entity attributes (metadata.confidence and metadata.entity) I was accessing

Re: [Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-20 Thread Bjørn Jørgensen
Glad to hear that it works :) Your dataframe is nested with both map, array and struct. I`m using this function to flatten a nested dataframe to rows and columns. from pyspark.sql.types import * from pyspark.sql.functions import * def flatten_test(df, sep="_"): """Returns a flattened

Re: [Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-19 Thread Bjørn Jørgensen
https://github.com/JohnSnowLabs/spark-nlp#packages-cheatsheet *change spark = sparknlp.start()* to spark = sparknlp.start(spark32=True) tir. 19. apr. 2022 kl. 21:10 skrev Bjørn Jørgensen : > Yes, there are some that have that issue. > > Please open a new issue at >

Re: [Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-19 Thread Bjørn Jørgensen
Yes, there are some that have that issue. Please open a new issue at https://github.com/JohnSnowLabs/spark-nlp/issues and they will help you. tir. 19. apr. 2022 kl. 20:33 skrev Xavier Gervilla < xavier.gervi...@datapta.com>: > Thank you for your advice, I had small knowledge of Spark NLP and

Re: [Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-19 Thread Jungtaek Lim
I have no context on ML, but your "streaming" query exposes the possibility of memory issues. *flattenedNER.registerTempTable(**"df"**) >>> >>> >>> querySelect = **"SELECT col as entity, avg(sentiment) as sentiment, >>> count(col) as count FROM df GROUP BY col"** >>> finalDF =

Re: [Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-18 Thread Bjørn Jørgensen
When did SpaCy have support for Spark? Try Spark NLP it`s made for spark. They have a lot of notebooks at https://github.com/JohnSnowLabs/spark-nlp and they public user guides at

Re: [Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-18 Thread Sean Owen
It looks good, are you sure it even starts? the problem I see is that you send a copy of the model from the driver for every task. Try broadcasting the model instead. I'm not sure if that resolves it but would be a good practice. On Mon, Apr 18, 2022 at 9:10 AM Xavier Gervilla wrote: > Hi Team,

[Spark Streaming] [Debug] Memory error when using NER model in Python

2022-04-18 Thread Xavier Gervilla
Hi Team,https://stackoverflow.com/questions/71841814/is-there-a-way-to-prevent-excessive-ram-consumption-with-the-spark-configuration I'm developing a project that retrieves tweets on a 'host' app, streams them to Spark and with different operations with DataFrames obtains the Sentiment of

[Spark Streaming]: Why planInputPartitions is called multiple times for each micro-batch in Spark 3?

2022-04-13 Thread Hussain, Saghir
Hi All While upgrading our custom streaming data source from Spark 2.4.5 to Spark 3.2.1, we observed that the planInputPartitions() method in MicroBatchStream is being called multiple times(4 in our case) for each micro-batch in Spark 3. The Apache Spark documentation also says that : The

Re: protobuf data as input to spark streaming

2022-04-08 Thread Kiran Biswal
Hello Stelios Just a gentle follow up if you can share any sample code/repo Regards Kiran On Wed, Apr 6, 2022 at 3:19 PM Kiran Biswal wrote: > Hello Stelios > > Preferred language would have been Scala or pyspark but if Java is proven > I am open to using it > > Any sample reference or

Re: protobuf data as input to spark streaming

2022-04-06 Thread Kiran Biswal
Hello Stelios Preferred language would have been Scala or pyspark but if Java is proven I am open to using it Any sample reference or example code link? How are you handling the peotobuf to spark dataframe conversion (serialization federalization)? Thanks Kiran On Wed, Apr 6, 2022, 2:38 PM

Re: protobuf data as input to spark streaming

2022-04-06 Thread Stelios Philippou
Yes we are currently using it as such. Code is in java. Will that work? On Wed, 6 Apr 2022 at 00:51, Kiran Biswal wrote: > Hello Experts > > Has anyone used protobuf (proto3) encoded data (from kafka) as input > source and been able to do spark structured streaming? > > I would appreciate if

protobuf data as input to spark streaming

2022-04-05 Thread Kiran Biswal
Hello Experts Has anyone used protobuf (proto3) encoded data (from kafka) as input source and been able to do spark structured streaming? I would appreciate if you can share any sample code/example Regards Kiran >

python API in Spark-streaming-kafka spark 3.2.1

2022-03-07 Thread Wiśniewski Michał
Hi, I've read in the documentation, that since spark 3.2.1 python API for spark-streaming-kafka is back in the game. https://spark.apache.org/docs/3.2.1/streaming-programming-guide.html#advanced-sources But in the Kafka Integration Guide there is no documentation for the python API. https

Re: Spark Streaming | Dynamic Action Support

2022-03-03 Thread Mich Talebzadeh
r any monetary damages arising from > such loss, damage or destruction. > > > > > On Thu, 3 Mar 2022 at 10:56, Pappu Yadav wrote: > >> Hi, >> >> Is there any way I can add/delete actions/jobs dynamically in a running >> spark streaming job. >> I will cal

Spark Streaming | Dynamic Action Support

2022-03-03 Thread Pappu Yadav
Hi, Is there any way I can add/delete actions/jobs dynamically in a running spark streaming job. I will call an API and execute only the configured actions in the system. Eg . In the first batch suppose there are 5 actions in the spark application. Now suppose some configuration is changed

Failed to construct kafka consumer, Failed to load SSL keystore + Spark Streaming

2022-02-12 Thread joyan sil
Hi All, I am trying to read from Kafka using spark streaming from spark-shell but getting the below error. Any suggestions to fix this is much appreciated. I am running from spark-shell hence it is client mode and the files are available in the local filesystem. I tried to access the files

Re: Kafka to spark streaming

2022-01-30 Thread Gourav Sengupta
Hi Amit, before answering your question, I am just trying to understand it. I am not exactly clear how do the Akka application, Kafka and SPARK Streaming application sit together, and what are you exactly trying to achieve? Can you please elaborate? Regards, Gourav On Fri, Jan 28, 2022 at 10

Re: Kafka to spark streaming

2022-01-29 Thread Amit Sharma
xplicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or destruction. > > > > > On Fri, 28 Jan 2022 at 22:14, Amit Sharma wrote: > >> Hello everyone, we have spark streaming application. We send request to

Re: Kafka to spark streaming

2022-01-29 Thread Mich Talebzadeh
which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 28 Jan 2022 at 22:14, Amit Sharma wrote: > Hello everyone, we have spark streaming ap

Kafka to spark streaming

2022-01-28 Thread Amit Sharma
Hello everyone, we have spark streaming application. We send request to stream through Akka actor using Kafka topic. We wait for response as it is real time. Just want a suggestion is there any better option like Livy where we can send and receive request to spark streaming. Thanks Amit

[spark streaming] how to connect to rabbitmq with spark streaming.

2021-10-04 Thread Joris Billen
Hi, I am looking for someone who has made a spark streaming job that connects to rabbitmq. There is a lot of documentation how to make a connection with a java api (like here: https://www.rabbitmq.com/api-guide.html#connecting) , but I am looking for a recent working example for spark streaming

Re: question regarding spark streaming continuous processing

2021-09-04 Thread Antonio Si
Hi all, I would like to followup on this question. Any information would be very helpful. Thanks. Antonio. On 2021/09/01 18:50:34, Antonio Si wrote: > Hi, > > Hi all, I have a couple questions regarding continuous processing: > > 1. What is the plan for continuous processing moving

spark streaming to jdbc

2021-09-03 Thread igyu
val lines = spark.readStream .format("socket") // .schema(StructType(schemas)) .option("host", "10.3.87.23") .option("port", ) .load() .selectExpr("CAST(value AS STRING)").as[(String)]DF = lines.map(x => { val obj = JSON.parseObject(x) val ls = new util.ArrayList()

question regarding spark streaming continuous processing

2021-09-01 Thread Antonio Si
Hi, Hi all, I have a couple questions regarding continuous processing: 1. What is the plan for continuous processing moving forward? Will this eventually be released as a production feature as it seems it is still experimental? 2. In microbatch streaming, there is a StreamingQueryListener

Re: Spark Streaming with Files

2021-04-30 Thread muru
Yes, trigger (once=True) set to all streaming sources and it will treat as a batch mode. Then you can use any scheduler (e.g airflow) to run it whatever time window. With checkpointing, in the next run it will start processing files from the last checkpoint. On Fri, Apr 23, 2021 at 8:13 AM Mich

Re: Spark Streaming non functional requirements

2021-04-27 Thread Mich Talebzadeh
d all responsibility for any > loss, damage or destruction of data or any other property which may arise > from relying on this email's technical content is explicitly disclaimed. > The author will in no case be liable for any monetary damages arising from > such loss, damage or de

Re: Spark Streaming non functional requirements

2021-04-27 Thread ashok34...@yahoo.com.INVALID
, ashok34...@yahoo.com.INVALID wrote: Hello, When we design a typical spark streaming process, the focus is to get functional requirements. However, I have been asked to provide non-functional requirements as well. Likely things I can consider are Fault tolerance and Reliability (component

Re: Spark Streaming non functional requirements

2021-04-27 Thread Mich Talebzadeh
e from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Mon, 26 Apr 2021 at 17:16, ashok34...@yahoo.com.INVALID wrote: > Hello, > > When we design a typical

Spark Streaming non functional requirements

2021-04-26 Thread ashok34...@yahoo.com.INVALID
Hello, When we design a typical spark streaming process, the focus is to get functional requirements. However, I have been asked to provide non-functional requirements as well. Likely things I can consider are Fault tolerance and Reliability (component failures).  Are there a standard list

Re: [Spark-Streaming] moving average on categorical data with time windowing

2021-04-26 Thread Sean Owen
You might be able to do this with multiple aggregations on avg(col("col1") == "cat1") etc, but how about pivoting the DataFrame first so that you get columns like "cat1" being 1 or 0? you would end up with columns x categories new columns if you want to count all categories in all cols. But then

[Spark-Streaming] moving average on categorical data with time windowing

2021-04-26 Thread halil
Hello everyone, I am trying to apply moving average on categorical data like below, which is a synthetic data generated by myself. sqltimestamp,col1,col2,col3,col4,col5 1618574879,cat1,cat4,cat2,cat5,cat3 1618574880,cat1,cat3,cat4,cat2,cat5 1618574881,cat5,cat3,cat4,cat2,cat1

Re: Spark Streaming with Files

2021-04-23 Thread Mich Talebzadeh
Interesting. If we go back to classic Lambda architecture on premise, you could Flume API to Kafka to add files to HDFS in time series bases. Most higher CDC vendors do exactly that. Oracle GoldenGate (OGG) classic gets data from Oracle redo logs and sends them to subscribers. One can deploy OGC

Spark Streaming with Files

2021-04-23 Thread ayan guha
Hi In one of the spark summit demo, it is been alluded that we should think batch jobs in streaming pattern, using "run once" in a schedule. I find this idea very interesting and I understand how this can be achieved for sources like kafka, kinesis or similar. in fact we have implemented this

Re: Spark streaming giving error for version 2.4

2021-03-16 Thread Attila Zsolt Piros
ogs... > > G > > > On Tue, Mar 16, 2021 at 6:03 AM Renu Yadav wrote: > >> Hi Team, >> >> >> I have upgraded my spark streaming from 2.2 to 2.4 but getting below >> error: >> >> >> spark-streaming-kafka_0-10.2.11_2.4.0 >> >>

Re: Spark streaming giving error for version 2.4

2021-03-16 Thread Gabor Somogyi
Well, this is not much. Please provide driver and executor logs... G On Tue, Mar 16, 2021 at 6:03 AM Renu Yadav wrote: > Hi Team, > > > I have upgraded my spark streaming from 2.2 to 2.4 but getting below error: > > > spark-streaming-kafka_0-10.2.11_2.4.0 > >

Spark streaming giving error for version 2.4

2021-03-15 Thread Renu Yadav
Hi Team, I have upgraded my spark streaming from 2.2 to 2.4 but getting below error: spark-streaming-kafka_0-10.2.11_2.4.0 scala 2.11 Any Idea? main" java.lang.AbstractMethodError at org.apache.spark.util.ListenerBus$class.$init$(ListenerBus.sca

Re: DB Config data update across multiple Spark Streaming Jobs

2021-03-15 Thread forece85
Any suggestion on this? How to update configuration data on all executors with out downtime? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

DB Config data update across multiple Spark Streaming Jobs

2021-03-13 Thread forece85
Hi, We have multiple spark jobs running on a single EMR cluster. All jobs use same business related configurations which are stored in Postgres. How to update this configuration data at all executors dynamically if any changes happened to Postgres db data with out spark restarts. We are using

Re: Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
Not sure if kinesis have such flexibility. What else possibilities are there at transformations level? -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail:

Re: Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
Any example for this please -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread Sean Owen
ionner that makes it possible to send your data > with the same key to the same partition. Though, each task in your spark > streaming app will load data from the same partition in the same executor. > I think this is the simplest way to achieve what you want to do. > > Best regards, &g

Re: Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread Ali Gouta
Do not know Kenesis, but it looks like it works like kafka. Your producer should implement a paritionner that makes it possible to send your data with the same key to the same partition. Though, each task in your spark streaming app will load data from the same partition in the same executor. I

Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
We are doing batch processing using Spark Streaming with Kinesis with a batch size of 5 mins. We want to send all events with same eventId to same executor for a batch so that we can do multiple events based grouping operations based on eventId. No previous batch or future batch data is concerned

Spark Streaming - Routing rdd to Executor based on Key

2021-03-09 Thread forece85
We are doing batch processing using Spark Streaming with Kinesis with a batch size of 5 mins. We want to send all events with same eventId to same executor for a batch so that we can do multiple events based grouping operations based on eventId. No previous batch or future batch data is concerned

Spark streaming with multiple Kafka topics

2021-03-05 Thread lalitha bandaru
Hi Team, I have a spark streaming application configured to consume events from 2 Kafka topics. But when I run the application locally, the messages are consumed from either of these topics only and not both. If the first event is published to say topic2 and second message to topic1 then only

Re: Converting spark batch to spark streaming

2021-01-08 Thread Jacek Laskowski
Hi, Start with DataStreamWriter.foreachBatch. Pozdrawiam, Jacek Laskowski https://about.me/JacekLaskowski "The Internals Of" Online Books Follow me on https://twitter.com/jaceklaskowski On Thu, Jan 7, 2021 at 6:55 PM mhd wrk

Converting spark batch to spark streaming

2021-01-07 Thread mhd wrk
I'm trying to convert a spark batch application to a streaming application and wondering what function (or design pattern) I should use to execute a series of operations inside the driver upon arrival of each message (a text file inside an HDFS folder) before starting computation inside executors.

[Spark Streaming] Why is ZooKeeper LeaderElection Agent not being called by Spark Master?

2020-12-29 Thread Saloni Mehta
Hello, Request you to please help me out on the below queries: I have 2 spark masters and 3 zookeepers deployed on my system on separate virtual machines. The services come up online in the below sequence: 1. zookeeper-1 2. sparkmaster-1 3. sparkmaster-2 4. zookeeper-2 5.

  1   2   3   4   5   6   7   8   9   10   >