@Ian: This is a very interesting use/test case for the work you are
doing at UCI on the new more dynamic deployment model, and on how the
underlying UDF infrastructure can best support ML-model-based UDFs...!
On 11/17/19 4:56 PM, Xikui Wang wrote:
I wonder what would the
I wonder what would the deployment-initialization do?
btw, the UDF does have a deinitialize() method which is expected to be
invoked when the UDF is deinitialized, but that's is ignored for now as the
IScalarEvaluator in general doesn't not deinitialize. To make that work, we
would need a bigger
It seems that it's be nice if we had a step (similar to the
initialization step) in the deployment lifecycle as well.
And I guess that we'd need to corresponding clean-up step for
un-deployment as well.
Does that make sense? If so, should we file an improvement for this?
Cheers,
Till
On 17
The UDF interface has an initialize method which is invoked per every
lifecycle. Putting the model loading code in there can probably solve your
problem. The initialization is done per query (Hyrack job). For example, if
you do
SELECT mylib#myudf(t) FROM Tweets t;
in which there are 100 tweets
Everything you said was correct, the server accepted my large UDF now, thank
you!
Best wishes,
Torsten Bergh Moss
From: Murtadha Hubail
Sent: Sunday, November 17, 2019 4:29 PM
To: Torsten Bergh Moss; dev@asterixdb.apache.org
Subject: Re: Large UDFs
Yes,
Yes, and I believe it should go under the [common] config section. You will
need to restart the asterixdb instance after that for the change to take
effect. This property is configured in bytes. For example, if you want to set
it to 100MB, it would be something like this:
[common]
Thanks Murtadha,
Do I configure this property under [cc] inside cc.conf?
Best wishes,
Torsten
From: Murtadha Hubail
Sent: Sunday, November 17, 2019 1:50 PM
To: Torsten Bergh Moss; dev@asterixdb.apache.org
Subject: Re: Large UDFs
Torsten,
The maximum
Dear developers,
I am trying to build a machine learning-based UDF for classification. This
involves loading in a model that has been trained offline, which in practice
basically is deserialization of a big object. This process of deserialization
takes a significant amount of time, but it