Re: Creating a Spark 3 Connector

2022-11-23 Thread Jungtaek Lim
Bjørn, that is the project of "spark connect" which dissociates client and server from Spark driver. Not related to data source (which is also known as connector). Mitch, as I understand correctly, unfortunately we don't have dedicated documentation for implementing data source/connectors. It's

Re: Creating a Spark 3 Connector

2022-11-23 Thread Bjørn Jørgensen
This is from the vote for spark connector. Is this you are looking for? The goal of the SPIP is to introduce a Dataframe based client/server API for Spark Please also refer to: - Previous discussion in dev mailing list: [DISCUSS] SPIP: Spark Connect - A client and server interface for Apache

Creating a Spark 3 Connector

2022-11-23 Thread Mitch Shepherd
Hello, I’m wondering if anyone can point me in the right direction for a Spark connector developer guide. I’m looking for information on writing a new connector for Spark to move data between Apache Spark and other systems. Any information would be helpful. I found a similar thing for

Re: Unable to use GPU with pyspark in windows

2022-11-23 Thread Sean Owen
Using a GPU is unrelated to Spark. You can run code that uses GPUs. This error indicates that something failed when you ran your code (GPU OOM?) and you need to investigate why. On Wed, Nov 23, 2022 at 7:51 AM Vajiha Begum S A < vajihabegu...@maestrowiz.com> wrote: > Hi Sean Owen, > I'm using

Stack Overflow Question

2022-11-23 Thread liang...@yy.com
saprk sql Syntax How do I point the cache to a third party path https://stackoverflow.com/q/74544827/10237864?sem=2 liang...@yy.com

Unable to use GPU with pyspark in windows

2022-11-23 Thread Vajiha Begum S A
Hi Sean Owen, I'm using windows system with the NVIDIA Quadro K1200. GPU memory 20GB, Intel Core: 8 core installed - CUDAF 0.14 jar file, Rapid 4 Spark 2.12-22.10.0 jar file, CUDA Toolkit 11.8.0 windows version. Also installed- WSL 2.0 ( since I'm using windows system) I'm running only single

[sparklyR] broadcast table for temporary table -> can you compute statistics for temporary table?

2022-11-23 Thread Joris Billen
Hi, question about using the R api for spark:we load some files from oracle (through jdbc ) and register it in a temporary table in spark. I see a lot of shuffling, but we have joins between large and small tables. So I probably need to broadcast the small tables. Normally autobroadcasting