Re: Adding Custom finalize method to RDDs.

2019-06-13 Thread Phillip Henry
> handled by JVM. Thus I am not sure try and finalize will help. > > Thus I wanted to use some mechanism to cleanup of some temporary data > which is created by RDD immediately as soon as it goes out of scope. > > > > Any ideas ? > > > > Thanks, > > Nasrul

Re: Adding Custom finalize method to RDDs.

2019-06-12 Thread Phillip Henry
That's not the kind of thing a finalize method was ever supposed to do. Use a try/finally block instead. Phillip On Wed, 12 Jun 2019, 00:01 Nasrulla Khan Haris, wrote: > I want to delete some files which I created In my datasource api, as soon > as the RDD is cleaned up. > > > > Thanks, > >

Re: Hyperparameter Optimization via Randomization

2021-01-30 Thread Phillip Henry
course it helps a lot if you're doing a smarter search over the space, > like what hyperopt does. For that, I mean, one can just use hyperopt + > Spark ML already if desired. > > On Fri, Jan 29, 2021 at 9:01 AM Phillip Henry > wrote: > >> Thanks, Sean! I hope to offer a PR ne

Hyperparameter Optimization via Randomization

2021-01-29 Thread Phillip Henry
Hi, I have no work at the moment so I was wondering if anybody would be interested in me contributing code that generates an Array[ParamMap] for random hyperparameters? Apparently, this technique can find a hyperparameter in the top 5% of parameter space in fewer than 60 iterations with 95%

Re: Hyperparameter Optimization via Randomization

2021-02-08 Thread Phillip Henry
out to > also be pretty simple. > > On Sat, Jan 30, 2021 at 4:42 AM Phillip Henry > wrote: > >> Hi, Sean. >> >> Perhaps I don't understand. As I see it, ParamGridBuilder builds an >> Array[ParamMap]. What I am proposing is a new class that also builds an &g

Re: Hyperparameter Optimization via Randomization

2021-02-09 Thread Phillip Henry
. > But the API change isn't significant so maybe just fine. > > On Mon, Feb 8, 2021 at 3:49 AM Phillip Henry > wrote: > >> Hi, Sean. >> >> I don't think sampling from a grid is a good idea as the min/max may lie >> between grid points. Unconstrained rando

Re: Hyperparameter Optimization via Randomization

2021-01-29 Thread Phillip Henry
gt; the grid search process that says what fraction of all possible > combinations you want to randomly test. > > On Fri, Jan 29, 2021 at 5:52 AM Phillip Henry > wrote: > >> Hi, >> >> I have no work at the moment so I was wondering if anybody would be >> intereste

K8s integration test failure ("credentials Jenkins is using is probably wrong...")

2021-02-23 Thread Phillip Henry
Hi, Silly question: the Jenkins build for my PR is failing but it seems outside of my control. What must I do to remedy this? I've submitted https://github.com/apache/spark/pull/31535 but Spark QA is telling me "Kubernetes integration test status failure". The Jenkins job says "SUCCESS" but

Log likelhood in GeneralizedLinearRegression

2022-01-22 Thread Phillip Henry
Hi, As far as I know, there is no function to generate the log likelihood from a GeneralizedLinearRegression model. Are there any plans to implement one? I've coded my own in PySpark and in testing it agrees with the values we get from the Python library StatsModels to one part in a million.

SPARK-24156: Kafka messages left behind in Spark Structured Streaming

2023-10-19 Thread Phillip Henry
Hi, folks, A few years ago, I asked about SSS not processing the final batch left on a Kafka topic when using groupBy, OutputMode.Append and withWatermark. At the time, Jungtaek Lim kindly pointed out (27/7/20) that this was expected behaviour, that (if I have this correct) a message needs to

Data Contracts

2023-06-12 Thread Phillip Henry
Hi, folks. There currently seems to be a buzz around "data contracts". From what I can tell, these mainly advocate a cultural solution. But instead, could big data tools be used to enforce these contracts? My questions really are: are there any plans to implement data constraints in Spark (eg,

Re: Data Contracts

2023-06-13 Thread Phillip Henry
in Avro that can can evaluate compatible and incompatible >>> changes to the schema, from the perspective of the reader, writer, or both. >>> This provides some potential degree of enforcement, and means to >>> communicate a contract. Interestingly I believe this approach

Re: Data Contracts

2023-07-16 Thread Phillip Henry
gonna to be simple in any terms . > Thanks for sharing the git Philip . > Will definitely go through it . > > Thanks > Deepak > > On Mon, 19 Jun 2023 at 3:47 PM, Phillip Henry > wrote: > >> I think it might be a bit more complicated than this (but happy to be >>

Re: Data Contracts

2023-06-19 Thread Phillip Henry
in the yaml > . > > Thanks > Deepak > > On Mon, 19 Jun 2023 at 1:49 PM, Phillip Henry > wrote: > >> For my part, I'm not too concerned about the mechanism used to implement >> the validation as long as it's rich enough to express the constraints. >> >>

Re: Data Contracts

2023-06-19 Thread Phillip Henry
neering Lead >> Palantir Technologies Limited >> London >> United Kingdom >> >>view my Linkedin profile >> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/> >> >> >> https://en.everybodywiki.com/Mich_Talebzadeh >> >>