[ANNOUNCE] Apache Kyuubi released 1.7.0

2023-03-07 Thread Cheng Pan
Hi all,

The Apache Kyuubi community is pleased to announce that Apache Kyuubi
1.7.0 has been released!

Apache Kyuubi is a distributed multi-tenant Lakehouse gateway for
large-scale data processing and analytics, built on top of Apache Spark,
Apache Flink, Trino and also supports other computing engines.

Kyuubi provides a pure SQL gateway through Thrift JDBC/ODBC interface
for end-users to manipulate large-scale data with pre-programmed and
extensible Spark SQL engines.

We are aiming to make Kyuubi an "out-of-the-box" tool for data warehouses
and data lakes.

This "out-of-the-box" model minimizes the barriers and costs for end-users
to use Spark at the client side.

At the server-side, Kyuubi server and engine's multi-tenant architecture
provides the administrators a way to achieve computing resource isolation,
data security, high availability, high client concurrency, etc.

The full release notes and download links are available at:
Release Notes: https://kyuubi.apache.org/release/1.7.0.html

To learn more about Apache Kyuubi, please see: https://kyuubi.apache.org/

Kyuubi Resources:
- Issue: https://github.com/apache/kyuubi/issues
- Mailing list: d...@kyuubi.apache.org

We would like to thank all contributors of the Kyuubi community who
made this release possible!

Thanks,
Cheng Pan, on behalf of Apache Kyuubi community


Re: Online classes for spark topics

2023-03-07 Thread Mich Talebzadeh
Hi,

This might  be a worthwhile exercise on the assumption that the
contributors will find the time and bandwidth to chip in so to speak.

I am sure there are many but on top of my head I can think of Holden Karau
for k8s, and Sean Owen for data science stuff. They are both very
experienced.

Anyone else 樂

HTH



   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Tue, 7 Mar 2023 at 19:17, ashok34...@yahoo.com.INVALID
 wrote:

> Hello gurus,
>
> Does Spark arranges online webinars for special topics like Spark on K8s,
> data science and Spark Structured Streaming?
>
> I would be most grateful if experts can share their experience with
> learners with intermediate knowledge like myself. Hopefully we will find
> the practical experiences told valuable.
>
> Respectively,
>
> AK
>


Online classes for spark topics

2023-03-07 Thread ashok34...@yahoo.com.INVALID
Hello gurus,
Does Spark arranges online webinars for special topics like Spark on K8s, data 
science and Spark Structured Streaming?
I would be most grateful if experts can share their experience with learners 
with intermediate knowledge like myself. Hopefully we will find the practical 
experiences told valuable.
Respectively,
AK 

Re: [Spark Structured Streaming] Could we apply new options of readStream/writeStream without stopping spark application (zero downtime)?

2023-03-07 Thread Mich Talebzadeh
hm interesting proposition. I guess you mean altering one of following
parameters in flight


  streamingDataFrame = self.spark \
.readStream \
.format("kafka") \
.option("kafka.bootstrap.servers",
config['MDVariables']['bootstrapServers'],) \
.option("schema.registry.url",
config['MDVariables']['schemaRegistryURL']) \
.option("group.id", config['common']['appName']) \
.option("zookeeper.connection.timeout.ms",
config['MDVariables']['zookeeperConnectionTimeoutMs']) \
.option("rebalance.backoff.ms",
config['MDVariables']['rebalanceBackoffMS']) \
.option("zookeeper.session.timeout.ms",
config['MDVariables']['zookeeperSessionTimeOutMs']) \
.option("auto.commit.interval.ms",
config['MDVariables']['autoCommitIntervalMS']) \
.option("subscribe", config['MDVariables']['topic']) \
.option("failOnDataLoss", "false") \
.option("includeHeaders", "true") \
.option("startingOffsets", "latest") \
.load() \
.select(from_json(col("value").cast("string"),
schema).alias("parsed_value"))

Ok, one secure way of doing it though shutting down the streaming process
gracefully without loss of data that impacts consumers. The other method
implies inflight changes as suggested by the topic with zeio interruptions.
Interestingly one of our clients requested a similar solution. As solutions
architect /engineering manager I should come back with few options. I am on
the case so to speak. There is a considerable interest in Spark Structured
Streaming across the board, especially in trading systems.

HTH


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Thu, 16 Feb 2023 at 04:12, hueiyuan su  wrote:

> *Component*: Spark Structured Streaming
> *Level*: Advanced
> *Scenario*: How-to
>
> -
> *Problems Description*
> I would like to confirm could we directly apply new options of
> readStream/writeStream without stopping current running spark structured
> streaming applications? For example, if we just want to adjust throughput
> properties of readStream with kafka. Do we have method can just adjust it
> without stopping application? If you have any ideas, please let me know. I
> will be appreciate it and your answer.
>
>
> --
> Best Regards,
>
> Mars Su
> *Phone*: 0988-661-013
> *Email*: hueiyua...@gmail.com
>


Re: 回复:Re: Build SPARK from source with SBT failed

2023-03-07 Thread Tufan Rakshit
I use m1 apple silicon , use java11 from Zulu , and runs SBT based Build
Jobs in Kubernetes

Best
Tufan

On Tue, 7 Mar 2023 at 16:11, Sean Owen  wrote:

> No, it's that JAVA_HOME wasn't set to .../Home. It is simply not finding
> javac, in the error. Zulu supports M1.
>
> On Tue, Mar 7, 2023 at 9:05 AM Artemis User 
> wrote:
>
>> Looks like Maven build did find the javac, just can't run it.  So it's
>> not a path problem but a compatibility problem.  Are you doing this on a
>> Mac with M1/M2?  I don't think that Zulu JDK supports Apple silicon.   Your
>> best option would be to use homebrew to install the dev tools (including
>> OpenJDK) on Mac.  On Ubuntu, it seems still the compatibility problem.  Try
>> to use the apt to install your dev tools, don't do it manually.  If you
>> manually install JDK, it doesn't install hardware-optimized JVM libraries.
>>
>> On 3/7/23 8:21 AM, ckgppl_...@sina.cn wrote:
>>
>> No. I haven't installed Apple Developer Tools. I have installed Zulu
>> OpenJDK 11.0.17 manually.
>> So I need to install Apple Developer Tools?
>> - 原始邮件 -
>> 发件人:Sean Owen  
>> 收件人:ckgppl_...@sina.cn
>> 抄送人:user  
>> 主题:Re: Build SPARK from source with SBT failed
>> 日期:2023年03月07日 20点58分
>>
>> This says you don't have the java compiler installed. Did you install the
>> Apple Developer Tools package?
>>
>> On Tue, Mar 7, 2023 at 1:42 AM  wrote:
>>
>> Hello,
>>
>> I have tried to build SPARK source codes with SBT in my local dev
>> environment (MacOS 13.2.1). But it reported following error:
>> [error] java.io.IOException: Cannot run program
>> "/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/bin/javac" (in
>> directory "/Users/username/spark-remotemaster"): error=2, No such file or
>> directory
>>
>> [error] at
>> java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
>>
>> [error] at
>> java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
>>
>> [error] at
>> scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)
>> [error] at
>> scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:106)
>>
>> I need to export JAVA_HOME to let it run successfully. But if I use maven
>> then I don't need to export JAVA_HOME. I have also tried to build SPARK
>> with SBT in Ubuntu X86_64 environment. It reported similar error.
>>
>> The official SPARK
>> documentation  haven't mentioned export JAVA_HOME operation. So I think
>> this is a bug which needs documentation or scripts change. Please correct
>> me if I am wrong.
>>
>> Thanks
>>
>> Liang
>>
>>
>>


Re: 回复:Re: Build SPARK from source with SBT failed

2023-03-07 Thread Sean Owen
No, it's that JAVA_HOME wasn't set to .../Home. It is simply not finding
javac, in the error. Zulu supports M1.

On Tue, Mar 7, 2023 at 9:05 AM Artemis User  wrote:

> Looks like Maven build did find the javac, just can't run it.  So it's not
> a path problem but a compatibility problem.  Are you doing this on a Mac
> with M1/M2?  I don't think that Zulu JDK supports Apple silicon.   Your
> best option would be to use homebrew to install the dev tools (including
> OpenJDK) on Mac.  On Ubuntu, it seems still the compatibility problem.  Try
> to use the apt to install your dev tools, don't do it manually.  If you
> manually install JDK, it doesn't install hardware-optimized JVM libraries.
>
> On 3/7/23 8:21 AM, ckgppl_...@sina.cn wrote:
>
> No. I haven't installed Apple Developer Tools. I have installed Zulu
> OpenJDK 11.0.17 manually.
> So I need to install Apple Developer Tools?
> - 原始邮件 -
> 发件人:Sean Owen  
> 收件人:ckgppl_...@sina.cn
> 抄送人:user  
> 主题:Re: Build SPARK from source with SBT failed
> 日期:2023年03月07日 20点58分
>
> This says you don't have the java compiler installed. Did you install the
> Apple Developer Tools package?
>
> On Tue, Mar 7, 2023 at 1:42 AM  wrote:
>
> Hello,
>
> I have tried to build SPARK source codes with SBT in my local dev
> environment (MacOS 13.2.1). But it reported following error:
> [error] java.io.IOException: Cannot run program
> "/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/bin/javac" (in
> directory "/Users/username/spark-remotemaster"): error=2, No such file or
> directory
>
> [error] at
> java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
>
> [error] at
> java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
>
> [error] at
> scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)
> [error] at
> scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:106)
>
> I need to export JAVA_HOME to let it run successfully. But if I use maven
> then I don't need to export JAVA_HOME. I have also tried to build SPARK
> with SBT in Ubuntu X86_64 environment. It reported similar error.
>
> The official SPARK
> documentation  haven't mentioned export JAVA_HOME operation. So I think
> this is a bug which needs documentation or scripts change. Please correct
> me if I am wrong.
>
> Thanks
>
> Liang
>
>
>


Re: 回复:Re: Build SPARK from source with SBT failed

2023-03-07 Thread Artemis User
Looks like Maven build did find the javac, just can't run it.  So it's 
not a path problem but a compatibility problem.  Are you doing this on a 
Mac with M1/M2?  I don't think that Zulu JDK supports Apple silicon.   
Your best option would be to use homebrew to install the dev tools 
(including OpenJDK) on Mac.  On Ubuntu, it seems still the compatibility 
problem.  Try to use the apt to install your dev tools, don't do it 
manually.  If you manually install JDK, it doesn't install 
hardware-optimized JVM libraries.


On 3/7/23 8:21 AM, ckgppl_...@sina.cn wrote:
No. I haven't installed Apple Developer Tools. I have installed Zulu 
OpenJDK 11.0.17 manually.

So I need to install Apple Developer Tools?
- 原始邮件 -
发件人:Sean Owen 
收件人:ckgppl_...@sina.cn
抄送人:user 
主题:Re: Build SPARK from source with SBT failed
日期:2023年03月07日 20点58分

This says you don't have the java compiler installed. Did you install 
the Apple Developer Tools package?


On Tue, Mar 7, 2023 at 1:42 AM  wrote:

Hello,

I have tried to build SPARK source codes with SBT in my local dev
environment (MacOS 13.2.1). But it reported following error:
[error] java.io.IOException: Cannot run program
"/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/bin/javac"
(in directory "/Users/username/spark-remotemaster"): error=2, No
such file or directory

[error] at
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)

[error] at
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)

[error] at
scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)

[error] at

scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:106)

I need to export JAVA_HOME to let it run successfully. But if I
use maven then I don't need to export JAVA_HOME. I have also tried
to build SPARK with SBT in Ubuntu X86_64 environment. It reported
similar error.                                                    
  The official SPARK documentation  haven't mentioned export
JAVA_HOME operation. So I think this is a bug which needs
documentation or scripts change. Please correct me if I am wrong.

Thanks

Liang



回复:Re: Build SPARK from source with SBT failed

2023-03-07 Thread ckgppl_yan
No. I haven't installed Apple Developer Tools. I have installed Zulu OpenJDK 
11.0.17 manually.So I need to install Apple Developer Tools?- 原始邮件 -
发件人:Sean Owen 
收件人:ckgppl_...@sina.cn
抄送人:user 
主题:Re: Build SPARK from source with SBT failed
日期:2023年03月07日 20点58分

This says you don't have the java compiler installed. Did you install the Apple 
Developer Tools package?
On Tue, Mar 7, 2023 at 1:42 AM  wrote:
Hello,
I have tried to build SPARK source codes with SBT in my local dev environment 
(MacOS 13.2.1). But it reported following error:[error] java.io.IOException: 
Cannot run program 
"/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/bin/javac" (in 
directory "/Users/username/spark-remotemaster"): error=2, No such file or 
directory
[error] at 
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
[error] at 
java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
[error] at 
scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)
[error] at 
scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:106)
I need to export JAVA_HOME to let it run successfully. But if I use maven then 
I don't need to export JAVA_HOME. I have also tried to build SPARK with SBT in 
Ubuntu X86_64 environment. It reported similar error.   

The official SPARK documentation  haven't mentioned 
export JAVA_HOME operation. So I think this is a bug which needs documentation 
or scripts change. Please correct me if I am wrong.
Thanks
Liang


Re: Pandas UDFs vs Inbuilt pyspark functions

2023-03-07 Thread Sean Owen
It's hard to evaluate without knowing what you're doing. Generally, using a
built-in function will be fastest. pandas UDFs can be faster than normal
UDFs if you can take advantage of processing multiple rows at once.

On Tue, Mar 7, 2023 at 6:47 AM neha garde  wrote:

> Hello All,
>
> I need help deciding on what is better, pandas udfs or inbuilt functions
> I have to perform a transformation where I managed to compare the two for
> a few thousand records
> and pandas_udf infact performed better.
> Given the complexity of the transformation, I also found pandas_udf makes
> it more readable.
> I also found a lot of comparisons made between normal udfs and pandas_udfs
>
> What I am looking forward to is whether pandas_udfs will behave as a
> normal pyspark in-built data.
> How do pandas_udfs work internally, and will they be equally performant on
> bigger sets of data.?
> I did go through a few documents but wasn't able to get a clear idea.
> I am mainly looking from the performance perspective.
>
> Thanks in advance
>
>
> Regards,
> Neha R.Garde.
>


Re: Build SPARK from source with SBT failed

2023-03-07 Thread Sean Owen
This says you don't have the java compiler installed. Did you install the
Apple Developer Tools package?

On Tue, Mar 7, 2023 at 1:42 AM  wrote:

> Hello,
>
> I have tried to build SPARK source codes with SBT in my local dev
> environment (MacOS 13.2.1). But it reported following error:
> [error] java.io.IOException: Cannot run program
> "/Library/Java/JavaVirtualMachines/zulu-11.jdk/Contents/bin/javac" (in
> directory "/Users/username/spark-remotemaster"): error=2, No such file or
> directory
>
> [error] at
> java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1128)
>
> [error] at
> java.base/java.lang.ProcessBuilder.start(ProcessBuilder.java:1071)
>
> [error] at
> scala.sys.process.ProcessBuilderImpl$Simple.run(ProcessBuilderImpl.scala:75)
> [error] at
> scala.sys.process.ProcessBuilderImpl$AbstractBuilder.run(ProcessBuilderImpl.scala:106)
>
> I need to export JAVA_HOME to let it run successfully. But if I use maven
> then I don't need to export JAVA_HOME. I have also tried to build SPARK
> with SBT in Ubuntu X86_64 environment. It reported similar error.
>
> The official SPARK
> documentation  haven't mentioned export JAVA_HOME operation. So I think
> this is a bug which needs documentation or scripts change. Please correct
> me if I am wrong.
>
> Thanks
>
> Liang
>
>


Pandas UDFs vs Inbuilt pyspark functions

2023-03-07 Thread neha garde
Hello All,

I need help deciding on what is better, pandas udfs or inbuilt functions
I have to perform a transformation where I managed to compare the two for a
few thousand records
and pandas_udf infact performed better.
Given the complexity of the transformation, I also found pandas_udf makes
it more readable.
I also found a lot of comparisons made between normal udfs and pandas_udfs

What I am looking forward to is whether pandas_udfs will behave as a normal
pyspark in-built data.
How do pandas_udfs work internally, and will they be equally performant on
bigger sets of data.?
I did go through a few documents but wasn't able to get a clear idea.
I am mainly looking from the performance perspective.

Thanks in advance


Regards,
Neha R.Garde.