Re: Exposing functions to pyspark

2019-10-08 Thread Andrew Melo
Hello again,

Is it possible to grab a handle to the underlying DataSourceReader
backing a DataFrame? I see that there's no nice way to add extra
methods to Dataset, so being able to grab the DataSource backing
the dataframe would be a good escape hatch.

Cheers
Andrew

On Mon, Sep 30, 2019 at 3:48 PM Andrew Melo  wrote:
>
> Hello,
>
> I'm working on a DSv2 implementation with a userbase that is 100% pyspark 
> based.
>
> There's some interesting additional DS-level functionality I'd like to
> expose from the Java side to pyspark -- e.g. I/O metrics, which source
> site provided the data, etc...
>
> Does someone have an example of how to expose that to pyspark? We
> provide a python library for scientists to use, so I can also provide
> the python half, I just don't know where to begin. Part of the mental
> issue I'm having is that when a user does the following in pyspark:
>
> df = spark.read.format('edu.vanderbilt.accre.laurelin.Root') \
> .option("tree", "tree") \
> .load('small-flat-tree.root')
>
> They don't have a reference to any of my DS objects -- "df" is a
> DataFrame object, which I don't own.
>
> Does anyone have a tip?
> Thanks
> Andrew

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Exposing functions to pyspark

2019-09-30 Thread Andrew Melo
Hello,

I'm working on a DSv2 implementation with a userbase that is 100% pyspark based.

There's some interesting additional DS-level functionality I'd like to
expose from the Java side to pyspark -- e.g. I/O metrics, which source
site provided the data, etc...

Does someone have an example of how to expose that to pyspark? We
provide a python library for scientists to use, so I can also provide
the python half, I just don't know where to begin. Part of the mental
issue I'm having is that when a user does the following in pyspark:

df = spark.read.format('edu.vanderbilt.accre.laurelin.Root') \
.option("tree", "tree") \
.load('small-flat-tree.root')

They don't have a reference to any of my DS objects -- "df" is a
DataFrame object, which I don't own.

Does anyone have a tip?
Thanks
Andrew

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org