Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-13 Thread Mich Talebzadeh
+0 For reasons I outlined in the discussion thread https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/m

Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread Mich Talebzadeh
, it is evident that this approach functions more like dynamic scripts than traditional compiled stored procedures. HTH Mich Talebzadeh,Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-p

Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-10 Thread Mich Talebzadeh
Hi, If the underlying table changes (DDL), if I recall from RDBMSs like Oracle, the stored procedure will be invalidated as it is a compiled object. How is this going to be handled? Does it follow the same mechanism? Thanks Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Mich Talebzadeh
Allocation: Spark allocates memory for the cached DataFrame. Depending on the cluster configuration and available memory, this allocation can take some time. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin

Re: Why spark-submit works with package not with jar

2024-05-06 Thread Mich Talebzadeh
Thanks David. I wanted to explain the difference between Package and Jar with comments from the community on previous discussions back a few years ago. cheers Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin

Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
-alpha versions and are not expected to meet the same level of stability and completeness as release candidates or final releases. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mi

Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
@Wenchen Fan Thanks for the update! To clarify, is the vote for approving a specific preview build, or is it for moving towards an RC stage? I gather there is a distinction between these two? Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United

Re: ASF board report draft for May

2024-05-06 Thread Mich Talebzadeh
rsion available for evaluation as soon as it is feasible" HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybo

Re: [SparkListener] Accessing classes loaded via the '--packages' option

2024-05-04 Thread Mich Talebzadeh
and a its dependencies listed in maven *HTH* Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzade

Fwd: Why spark-submit works with package not with jar

2024-05-04 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is c

Spark Materialized Views: Improve Query Performance and Data Management

2024-05-03 Thread Mich Talebzadeh
a look at the ticket and add your comments. Thanks Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The information provided is correct to the best

Re: Issue with Materialized Views in Spark SQL

2024-05-03 Thread Mich Talebzadeh
hat uUsing materialized views with Spark Structured Streaming and Change Data Capture (CDC) is a potential solution for efficiently streaming view data updates in this scenario. . Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view m

Issue with Materialized Views in Spark SQL

2024-05-02 Thread Mich Talebzadeh
ilar issue or if there are any insights into why this discrepancy exists between Spark SQL and Hive. Thanks Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: [DISCUSS] Spark 4.0.0 release

2024-05-02 Thread Mich Talebzadeh
Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to th

Re: Potential Impact of Hive Upgrades on Spark Tables

2024-05-01 Thread Mich Talebzadeh
ed to test the Spark applications thoroughly after a Hive upgrade, which will necessitates liaising with Hive group as your are relying on their metdadata Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profil

Potential Impact of Hive Upgrades on Spark Tables

2024-04-30 Thread Mich Talebzadeh
, depending on the severity of the changes, the Hive metastore schema might change, which could require Spark code to be updated to handle these changes in how table metadata is represented. Is this assertion correct? Thanks Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Mich Talebzadeh
presented in Hortonworks meet-up. Hive on Spark Engine Versus Spark Using Hive Metastore <https://www.linkedin.com/pulse/hive-spark-engine-versus-using-metastore-mich-talebzadeh-ph-d-/> With regard to why I castred +1 votre for one and -1 for the other, I think it is my prerogative how I v

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-28 Thread Mich Talebzadeh
it may require a number of changes to the old scripts. Hence my concern. As a matter of interest has anyone liaised with the Hive team to ensure they have introduced the additional changes you outlined? HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London U

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Mich Talebzadeh
rrant the importance of carefully evaluating the impact of changing the default behaviour. Mich TalebzadehTechnologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> ht

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-26 Thread Mich Talebzadeh
consideration. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The infor

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
ok thanks got it Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The infor

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
durability when choosing a catalog solution for production deployments. In many cases, a combination of in-memory and disk-based catalog solutions may offer the best balance of performance and resilience for demanding large scale workloads. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
Well, I will be surprised because Derby database is single threaded and won't be much of a use here. Most Hive metastore in the commercial world utilise postgres or Oracle for metastore that are battle proven, replicated and backed up. Mich Talebzadeh, Technologist | Architect | Data Engineer

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
ng is that if I understand correctly, and I might be totally wrong here, the internal spark catalog is a local installation of hive metastore anyway, so I'm not sure what the catalog has to do with anything" .I don't understand this. Do you mean a Derby database? HTH Mich Talebzadeh, Technologis

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
with Spark applications and libraries. 5) There seems to be some similarity with spark catalog and Databricks unity catalog, so that may favour the choice. HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Mich Talebzadeh
t;.. because we support better." Are you referring to the performance of Spark catalog (I believe it is internal) or integration with Spark? HTH Mich Talebzadeh, Technologist | Architect | Data Engineer | Generative AI | FinCrime London United Kingdom view my Linkedin profile <h

Re: Which version of spark version supports parquet version 2 ?

2024-04-17 Thread Mich Talebzadeh
(if possible) to see if it indirectly enables v2 writing with Spark 3.2.0. HTH Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Mich Talebzadeh
Hi Prem, Regrettably this is not my area of speciality. I trust another colleague will have a more informed idea. Alternatively you may raise an SPIP for it. Spark Project Improvement Proposals (SPIP) | Apache Spark <https://spark.apache.org/improvement-proposals.html> HTH Mich Tale

Re: Which version of spark version supports parquet version 2 ?

2024-04-16 Thread Mich Talebzadeh
klaws of diminishing returns, I would not advise that either.. You can ofcourse usse gzip for compression that may be more suitable for your needs. HTH Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https:

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Mich Talebzadeh
Sorry you have a point there. It was released in version 3.00. What version of spark are you using? Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is c

Re: Which version of spark version supports parquet version 2 ?

2024-04-15 Thread Mich Talebzadeh
the library itself. However, you can have a look at this https://github.com/apache/parquet-mr/blob/master/CHANGES.md HTH Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/m

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-14 Thread Mich Talebzadeh
+ 1 for me It makes it more compatible with the other ANSI SQL compliant products. Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Mich Talebzadeh
additional security considerations. - integration and support in the cloud HTH Technologist | Solutions Architect | Data Engineer | Generative AI Mich Talebzadeh, London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.e

Re: External Spark shuffle service for k8s

2024-04-08 Thread Mich Talebzadeh
anks Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is cor

Fwd: Apache Spark 3.4.3 (?)

2024-04-07 Thread Mich Talebzadeh
Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is c

Re: External Spark shuffle service for k8s

2024-04-07 Thread Mich Talebzadeh
Thanks Cheng for the heads up. I will have a look. Cheers Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: External Spark shuffle service for k8s

2024-04-07 Thread Mich Talebzadeh
a Kubernetes cluster. They can include these configurations in the Spark application code or pass them as command-line arguments or environment variables during application submission. HTH Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view

Re: External Spark shuffle service for k8s

2024-04-06 Thread Mich Talebzadeh
better performance and scalability for handling larger datasets efficiently. Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

External Spark shuffle service for k8s

2024-04-06 Thread Mich Talebzadeh
with these files systems come into it. I will be interested in hearing more about any progress on this. Thanks . Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh

Re: Scheduling jobs using FAIR pool

2024-04-01 Thread Mich Talebzadeh
Hi, Have you put this question to Databricks forum Data Engineering - Databricks <https://community.databricks.com/t5/data-engineering/bd-p/data-engineering> Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin p

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread Mich Talebzadeh
looks fine except that processing all Unicode whitespace characters might add overhead to the parsing process, potentially impacting performance. Although I think this is a moot point +1 Mich Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Mich Talebzadeh
Talebzadeh, Technologist | Solutions Architect | Data Engineer | Generative AI London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best

Re: Improved Structured Streaming Documentation Proof-of-Concept

2024-03-25 Thread Mich Talebzadeh
issues brought up in the user group and otherwise). Perhaps using a section such as the proposed "Knowledge Sharing Hub'', may become more relevant. Moreover, the examples have to reflect real life scenarios and conversly will be of limited use otherwise. HTH Mich Talebzadeh, Technologist |

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
. Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
ertain this idea. They seem to have a well defined structure for hosting topics. Let me know your thoughts Thanks <https://community.databricks.com/t5/knowledge-sharing-hub/bd-p/Knowledge-Sharing-Hub> Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
be that the information (topics) are provided as best efforts and cannot be guaranteed. Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
- Databricks <https://community.databricks.com/t5/knowledge-sharing-hub/bd-p/Knowledge-Sharing-Hub> Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/&

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is c

A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
hat should not be that difficult. If anyone is supportive of this proposal, let the usual +1, 0, -1 decide HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh Disclaimer: The informatio

Re: Enhanced Console Sink for Structured Streaming

2024-03-12 Thread Mich Talebzadeh
addBatch" : 37, "commitOffsets" : 41, "getBatch" : 0, "latestOffset" : 0, "queryPlanning" : 5, "triggerExecution" : 187, "walCommit" : 104 }, "stateOperators" : [ ], "sources" : [ { "description" :

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Mich Talebzadeh
+1 Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to th

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Mich Talebzadeh
Splendid. Thanks Gengliang Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information pr

SPARK-44951, Improve Spark Dynamic Allocation

2024-03-08 Thread Mich Talebzadeh
Hi all, On this ticket, improve Spark Dynamic Allocation <https://issues.apache.org/jira/browse/SPARK-44951> I see no movement since it was opened back in August 2023 I may be wrong of course Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-05 Thread Mich Talebzadeh
ly working with the filtered dataset, representing the partitions that would have hypothetically succeeded. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Mich Talebzadeh
ed reboots for whatever reason. Look at the host logs or run /usr/bin/dmesg to see what happened.. Good luck Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mich Talebzadeh
clearer for everyone at first glance. Cheers Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-02 Thread Mich Talebzadeh
checks: Implement data validation checks after processing to identify potential duplicates or missing data. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Mich Talebzadeh
ds additional processing overhead but can ensure data integrity. HTH Mich Talebzadeh, Dad | Technologist London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Mich Talebzadeh
ira/browse/SPARK-24815> This will ensure everyone involved can benefit from your team's expertise and facilitate further collaboration. Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.co

Please unlock Jira ticket for SPARK-24815, Dynamic resource allocation for structured streaming

2024-02-26 Thread Mich Talebzadeh
tps://issues.apache.org/jira/browse/SPARK-24815> For now I have volunteered to mentor the team until a committer volunteers to take it over. This should not be that strenuous hopefully. Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Li

Proposal about moving on from the Shepherd terminology in SPIPs

2024-02-23 Thread Mich Talebzadeh
s or another alternative proposal). HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The inf

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
olved lately and would be missing a lot of context." So we need to improvise and see how best we can drive this and similar ones. We wait a short while for a response otherwise I am happy to give a hand if needed and work with you guys to drive this. It is something worthwhile. HTH T Mich Taleb

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-23 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is c

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
d give technical justifications. OK a shepherd from PMC members is required. Maybe Jungtaek Lee can kindly help the process cheers Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebz

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
Hi Pavan, Do you have a list of votes for this feature by any chance? Does it pass the required condition as approved? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-p

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-02-22 Thread Mich Talebzadeh
I can see it was closed. Was it because of inactivity? Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Tale

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Mich Talebzadeh
Ok thanks for your clarifications Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The infor

Re: ASF board report draft for February

2024-02-18 Thread Mich Talebzadeh
Np, thanks for addressing the point promptly Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disc

Re: ASF board report draft for February

2024-02-18 Thread Mich Talebzadeh
" I would be inclined to leave that line out for now. The rest is fine. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywik

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-16 Thread Mich Talebzadeh
Hi Chao, As a cool feature - Compared to standard Spark, what kind of performance gains can be expected with Comet? - Can one use Comet on k8s in conjunction with something like a Volcano addon? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-15 Thread Mich Talebzadeh
Hi,I gather from the replies that the plugin is not currently available in the form expected although I am aware of the shell script. Also have you got some benchmark results from your tests that you can possibly share? Thanks, Mich Talebzadeh, Dad | Technologist | Solutions Architect

Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Mich Talebzadeh
, the data is redistributed across partitions, and each partition may process its portion of the data independently and that makes the debugging distributed systems challenging. I hope that makes sense. Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my

Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Mich Talebzadeh
tition it into two partitions (repartition(num_partitions)). Repartitioning can shuffle the data across partitions, introducing a different order for the subsequent aggregation. The sum operation is then performed on the data in a different order, leading to a slightly different result from result1

Re: Building an Event-Driven Real-Time Data Processor with Spark Structured Streaming and API Integration

2024-02-09 Thread Mich Talebzadeh
The full code is available from the link below https://github.com/michTalebzadeh/Event_Driven_Real_Time_data_processor_with_SSS_and_API_integration Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.

Re: Pyspark Write Batch Streaming Data to Snowflake Fails with more columns

2024-02-09 Thread Mich Talebzadeh
the columns in your PySpark DataFrame to the corresponding columns in the Snowflake table during the write operation. This can help avoid any implicit mappings that might be causing issues. 1. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom

Building an Event-Driven Real-Time Data Processor with Spark Structured Streaming and API Integration

2024-02-09 Thread Mich Talebzadeh
esktop> HTH, Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own ri

Re: Shuffle write and read phase optimizations for parquet+zstd write

2024-02-08 Thread Mich Talebzadeh
. If downstream tools or processes expect data in a specific format, the serialized format may require additional processing or conversion, impacting compatibility. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <ht

Re: Enhanced Console Sink for Structured Streaming

2024-02-05 Thread Mich Talebzadeh
I don't think adding this to the streaming flow (at micro level) will be that useful However, this can be added to Spark UI as an enhancement to the Streaming Query Statistics page. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my

Re: Enhanced Console Sink for Structured Streaming

2024-02-03 Thread Mich Talebzadeh
. Whilst the proposed enhancements offer valuable insights into the behavior of Structured Streaming, we ought to think about the potential downsides, particularly in terms of increased verbosity, complexity, and the impact on user experience HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect

Re: [QUESTION] Legal dependency with Oracle JDBC driver

2024-01-30 Thread Mich Talebzadeh
Hi Alex, Well, that is just Justin's opinion vis-à-vis his matter. It is different from mine. Bottom line, you can always refer to Oracle or a copyright expert on this matter and see what they suggest. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom

Re: [QUESTION] Legal dependency with Oracle JDBC driver

2024-01-29 Thread Mich Talebzadeh
ontravened Spark references as in [1] in - spark <https://github.com/apache/spark/tree/master> - /sql <https://github.com/apache/spark/tree/master/sql> - /core <https://github.com/apache/spark/tree/master/sql/core> /pom.xml HTH Mich Talebzadeh, Dad | Technologist | Solutions Architec

Re: [EXTERNAL] Re: Spark Kafka Rack Aware Consumer

2024-01-26 Thread Mich Talebzadeh
Ok I made a request to access this document Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> Ent https://en.everybodywiki.com/Mich_Tale

Re: Spark Kafka Rack Aware Consumer

2024-01-26 Thread Mich Talebzadeh
hanged since then please? Are you implying if this doc is still relevant? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.every

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-19 Thread Mich Talebzadeh
Everyone's vote matters whether they are PMC or not. There is no monopoly here HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywi

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Mich Talebzadeh
+1 for me (non binding) *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Mich Talebzadeh
I think we have discussed this enough and I consider it as a useful feature.. I propose a vote on it. + 1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-520

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-09 Thread Mich Talebzadeh
ements. Cheers Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk.

Re: AutoReply: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
Hi, Please stop this acknowledgement email. It is spamming the forum unnecessarily! Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
+1 for me as well Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own ris

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
and performance of your Flask application. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at yo

Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
yone's benefit. Hopefully your comments will help me to improve it. Cheers Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mic

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Mich Talebzadeh
Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and all responsibility f

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-02 Thread Mich Talebzadeh
on the processing time is crucial - Implementing mechanisms for graceful scaling operations, avoiding abrupt changes, can contribute to a smoother user experience. I do not know whether some of these points are already considered in your proposal? HTH Mich Talebzadeh, Dad | Technologist

Re: Validate spark sql

2023-12-24 Thread Mich Talebzadeh
don't have access to the actual data. In summary - Theis method validates syntax but will not catch semantic errors - If you need more comprehensive validation, consider using a testing framework and a small dataset. - For complex queries, using a linter or code analysis tool ca

Re: ShuffleManager and Speculative Execution

2023-12-21 Thread Mich Talebzadeh
proceed with the earliest available data, minimizing the impact of speculative execution on job completion time which is another important factor. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Mich Talebzadeh
of statistic Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view my Linkedin profile <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. A

Re: When and how does Spark use metastore statistics?

2023-12-11 Thread Mich Talebzadeh
to enable the CBO, or false to disable it. - spark.sql.cbo.strategy: Set to AUTO to use the CBO as the default optimizer, or NONE to disable it completely. HTH Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view my Linkedin pro

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-11 Thread Mich Talebzadeh
Thanks Zhou for your response to my points raised (private communication) If we start with a base model and cluster, minimal footprint for the tool, then we can establish the operational parameters needed. So +1 for me too. HTH view my Linkedin profile <https://www.linkedin.com/in/m

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Mich Talebzadeh
bea static image (docker file)? Other alternative would be that this docker file is created by the user through set of scripts? These are the things that come into my mind. HTH Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view my

  1   2   3   4   >