Re: Large UDFs

2019-11-17 Thread Torsten Bergh Moss
Everything you said was correct, the server accepted my large UDF now, thank 
you!

Best wishes,
Torsten Bergh Moss

From: Murtadha Hubail 
Sent: Sunday, November 17, 2019 4:29 PM
To: Torsten Bergh Moss; dev@asterixdb.apache.org
Subject: Re: Large UDFs

Yes, and I believe it should go under the [common] config section. You will 
need to restart the asterixdb instance after that for the change to take 
effect. This property is configured in bytes. For example, if you want to set 
it to 100MB, it would be something like this:

[common]
max.web.request.size=104857600

Cheers,
Murtadha

On 11/17/2019, 6:17 PM, "Torsten Bergh Moss"  wrote:

Thanks Murtadha,

Do I configure this property under [cc] inside cc.conf?

Best wishes,
Torsten

From: Murtadha Hubail 
Sent: Sunday, November 17, 2019 1:50 PM
To: Torsten Bergh Moss; dev@asterixdb.apache.org
Subject: Re: Large UDFs

Torsten,

The maximum HTTP request size is configurable using the property 
(max.web.request.size) and by default it is set to 50MB.

Cheers,
Murtadha

On 11/17/2019, 3:34 PM, "Torsten Bergh Moss"  
wrote:

I must say that I feel really confident that the problem has to do with 
the size of the UDF.

I realized a lot of the dependencies actually were related to Asterix, 
thus redundant, so I solved the dependency problem by unapologetically cloning 
the repos for the external libraries my UDF is explicitly using and adding the 
code to the repo. It worked.

However, my UDF is based on machine learning (Naive Bayes for sentiment 
analysis of Tweets), and is trained on about 900 000 tweets. The trained model 
manifests as large dictionaries containing term frequencies for the different 
classes/sentiments. So in order to use my UDF I either have to upload it with 
the training data or serialized versions of these dictionaries.

And I can see that if I mvn package my UDF without these large files 
(.csv or .ser) it is "accepted" by the server when I send it via POST, but if I 
add these large files to the repo and then mvn package the UDF then the server 
rejects it because of file size. In other words, it seems to solely depend on 
the presence of these big files. And I mean it kind of makes sense as that is 
exactly what the cc.log file is saying: "A large request encountered. Closing 
channel."

Best wishes,
Torsten


From: Xikui Wang 
Sent: Sunday, November 17, 2019 12:21 AM
To: dev@asterixdb.apache.org
Subject: Re: Large UDFs

I think the warning message that you see probably is orthogonal to the
dependencies that you are trying to add, since the installation of UDF
merely copies the jar files to a designated location for AsterixDB to
discover. It shouldn't touch the code that raises the warning message.
Maybe that's related to how you interacted with system? Not sure...

As for handling large dependency libraries, besides making a fat jar, 
you
can also copy the dependency jar files into the
"apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
deployed to the cluster together with AsterixDB and then be used by UDFs
directly.

Best,
Xikui

On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon  wrote:

> Sounds like a bug, can you share the UDF in question so I can debug 
it?
>
> > On Nov 16, 2019, at 05:17, Torsten Bergh Moss 

> wrote:
> >
> > Greetings devs,
> >
> >
> > Hope you are all enjoying your weekends.
> >
> >
> > I am trying to build a GPU-based UDF, and this UDF relies on a 
bunch of
> dependencies (one of them being the GPU-framework). In order to "bake"
> these dependencies into the UDF I am packaging it as a
> jar-with-dependencies, however, this jar ends up being too big to 
deploy as
> a UDF as the Hyracks Http Server cries out
> >
> >
> > [nioEventLoopGroup-5-7] WARN
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request
> encountered. Closing the channel.
> >
> >
> > Is there any way to adjust these file size limits, or should UDFs 
with
> dependencies be handled some other way? I looked into the
> HttpRequestAggregator.java file and tried following some trails, but I
> can't seem to discover where the limit is actually set.
> >
> >
> > Best wishes,
> >
> > Torsten
>








Re: Large UDFs

2019-11-17 Thread Murtadha Hubail
Yes, and I believe it should go under the [common] config section. You will 
need to restart the asterixdb instance after that for the change to take 
effect. This property is configured in bytes. For example, if you want to set 
it to 100MB, it would be something like this:

[common]
max.web.request.size=104857600

Cheers,
Murtadha

On 11/17/2019, 6:17 PM, "Torsten Bergh Moss"  wrote:

Thanks Murtadha,

Do I configure this property under [cc] inside cc.conf?

Best wishes,
Torsten

From: Murtadha Hubail 
Sent: Sunday, November 17, 2019 1:50 PM
To: Torsten Bergh Moss; dev@asterixdb.apache.org
Subject: Re: Large UDFs

Torsten,

The maximum HTTP request size is configurable using the property 
(max.web.request.size) and by default it is set to 50MB.

Cheers,
Murtadha

On 11/17/2019, 3:34 PM, "Torsten Bergh Moss"  
wrote:

I must say that I feel really confident that the problem has to do with 
the size of the UDF.

I realized a lot of the dependencies actually were related to Asterix, 
thus redundant, so I solved the dependency problem by unapologetically cloning 
the repos for the external libraries my UDF is explicitly using and adding the 
code to the repo. It worked.

However, my UDF is based on machine learning (Naive Bayes for sentiment 
analysis of Tweets), and is trained on about 900 000 tweets. The trained model 
manifests as large dictionaries containing term frequencies for the different 
classes/sentiments. So in order to use my UDF I either have to upload it with 
the training data or serialized versions of these dictionaries.

And I can see that if I mvn package my UDF without these large files 
(.csv or .ser) it is "accepted" by the server when I send it via POST, but if I 
add these large files to the repo and then mvn package the UDF then the server 
rejects it because of file size. In other words, it seems to solely depend on 
the presence of these big files. And I mean it kind of makes sense as that is 
exactly what the cc.log file is saying: "A large request encountered. Closing 
channel."

Best wishes,
Torsten


From: Xikui Wang 
Sent: Sunday, November 17, 2019 12:21 AM
To: dev@asterixdb.apache.org
Subject: Re: Large UDFs

I think the warning message that you see probably is orthogonal to the
dependencies that you are trying to add, since the installation of UDF
merely copies the jar files to a designated location for AsterixDB to
discover. It shouldn't touch the code that raises the warning message.
Maybe that's related to how you interacted with system? Not sure...

As for handling large dependency libraries, besides making a fat jar, 
you
can also copy the dependency jar files into the
"apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
deployed to the cluster together with AsterixDB and then be used by UDFs
directly.

Best,
Xikui

On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon  wrote:

> Sounds like a bug, can you share the UDF in question so I can debug 
it?
>
> > On Nov 16, 2019, at 05:17, Torsten Bergh Moss 

> wrote:
> >
> > Greetings devs,
> >
> >
> > Hope you are all enjoying your weekends.
> >
> >
> > I am trying to build a GPU-based UDF, and this UDF relies on a 
bunch of
> dependencies (one of them being the GPU-framework). In order to "bake"
> these dependencies into the UDF I am packaging it as a
> jar-with-dependencies, however, this jar ends up being too big to 
deploy as
> a UDF as the Hyracks Http Server cries out
> >
> >
> > [nioEventLoopGroup-5-7] WARN
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request
> encountered. Closing the channel.
> >
> >
> > Is there any way to adjust these file size limits, or should UDFs 
with
> dependencies be handled some other way? I looked into the
> HttpRequestAggregator.java file and tried following some trails, but I
> can't seem to discover where the limit is actually set.
> >
> >
> > Best wishes,
> >
> > Torsten
>








Re: Large UDFs

2019-11-17 Thread Torsten Bergh Moss
Thanks Murtadha,

Do I configure this property under [cc] inside cc.conf?

Best wishes,
Torsten

From: Murtadha Hubail 
Sent: Sunday, November 17, 2019 1:50 PM
To: Torsten Bergh Moss; dev@asterixdb.apache.org
Subject: Re: Large UDFs

Torsten,

The maximum HTTP request size is configurable using the property 
(max.web.request.size) and by default it is set to 50MB.

Cheers,
Murtadha

On 11/17/2019, 3:34 PM, "Torsten Bergh Moss"  wrote:

I must say that I feel really confident that the problem has to do with the 
size of the UDF.

I realized a lot of the dependencies actually were related to Asterix, thus 
redundant, so I solved the dependency problem by unapologetically cloning the 
repos for the external libraries my UDF is explicitly using and adding the code 
to the repo. It worked.

However, my UDF is based on machine learning (Naive Bayes for sentiment 
analysis of Tweets), and is trained on about 900 000 tweets. The trained model 
manifests as large dictionaries containing term frequencies for the different 
classes/sentiments. So in order to use my UDF I either have to upload it with 
the training data or serialized versions of these dictionaries.

And I can see that if I mvn package my UDF without these large files (.csv 
or .ser) it is "accepted" by the server when I send it via POST, but if I add 
these large files to the repo and then mvn package the UDF then the server 
rejects it because of file size. In other words, it seems to solely depend on 
the presence of these big files. And I mean it kind of makes sense as that is 
exactly what the cc.log file is saying: "A large request encountered. Closing 
channel."

Best wishes,
Torsten


From: Xikui Wang 
Sent: Sunday, November 17, 2019 12:21 AM
To: dev@asterixdb.apache.org
Subject: Re: Large UDFs

I think the warning message that you see probably is orthogonal to the
dependencies that you are trying to add, since the installation of UDF
merely copies the jar files to a designated location for AsterixDB to
discover. It shouldn't touch the code that raises the warning message.
Maybe that's related to how you interacted with system? Not sure...

As for handling large dependency libraries, besides making a fat jar, you
can also copy the dependency jar files into the
"apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
deployed to the cluster together with AsterixDB and then be used by UDFs
directly.

Best,
Xikui

On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon  wrote:

> Sounds like a bug, can you share the UDF in question so I can debug it?
>
> > On Nov 16, 2019, at 05:17, Torsten Bergh Moss 

> wrote:
> >
> > Greetings devs,
> >
> >
> > Hope you are all enjoying your weekends.
> >
> >
> > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
> dependencies (one of them being the GPU-framework). In order to "bake"
> these dependencies into the UDF I am packaging it as a
> jar-with-dependencies, however, this jar ends up being too big to deploy 
as
> a UDF as the Hyracks Http Server cries out
> >
> >
> > [nioEventLoopGroup-5-7] WARN
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request
> encountered. Closing the channel.
> >
> >
> > Is there any way to adjust these file size limits, or should UDFs with
> dependencies be handled some other way? I looked into the
> HttpRequestAggregator.java file and tried following some trails, but I
> can't seem to discover where the limit is actually set.
> >
> >
> > Best wishes,
> >
> > Torsten
>





Re: Large UDFs

2019-11-17 Thread Murtadha Hubail
Torsten,

The maximum HTTP request size is configurable using the property 
(max.web.request.size) and by default it is set to 50MB.

Cheers,
Murtadha

On 11/17/2019, 3:34 PM, "Torsten Bergh Moss"  wrote:

I must say that I feel really confident that the problem has to do with the 
size of the UDF. 

I realized a lot of the dependencies actually were related to Asterix, thus 
redundant, so I solved the dependency problem by unapologetically cloning the 
repos for the external libraries my UDF is explicitly using and adding the code 
to the repo. It worked.

However, my UDF is based on machine learning (Naive Bayes for sentiment 
analysis of Tweets), and is trained on about 900 000 tweets. The trained model 
manifests as large dictionaries containing term frequencies for the different 
classes/sentiments. So in order to use my UDF I either have to upload it with 
the training data or serialized versions of these dictionaries. 

And I can see that if I mvn package my UDF without these large files (.csv 
or .ser) it is "accepted" by the server when I send it via POST, but if I add 
these large files to the repo and then mvn package the UDF then the server 
rejects it because of file size. In other words, it seems to solely depend on 
the presence of these big files. And I mean it kind of makes sense as that is 
exactly what the cc.log file is saying: "A large request encountered. Closing 
channel."

Best wishes,
Torsten


From: Xikui Wang 
Sent: Sunday, November 17, 2019 12:21 AM
To: dev@asterixdb.apache.org
Subject: Re: Large UDFs

I think the warning message that you see probably is orthogonal to the
dependencies that you are trying to add, since the installation of UDF
merely copies the jar files to a designated location for AsterixDB to
discover. It shouldn't touch the code that raises the warning message.
Maybe that's related to how you interacted with system? Not sure...

As for handling large dependency libraries, besides making a fat jar, you
can also copy the dependency jar files into the
"apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
deployed to the cluster together with AsterixDB and then be used by UDFs
directly.

Best,
Xikui

On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon  wrote:

> Sounds like a bug, can you share the UDF in question so I can debug it?
>
> > On Nov 16, 2019, at 05:17, Torsten Bergh Moss 

> wrote:
> >
> > Greetings devs,
> >
> >
> > Hope you are all enjoying your weekends.
> >
> >
> > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
> dependencies (one of them being the GPU-framework). In order to "bake"
> these dependencies into the UDF I am packaging it as a
> jar-with-dependencies, however, this jar ends up being too big to deploy 
as
> a UDF as the Hyracks Http Server cries out
> >
> >
> > [nioEventLoopGroup-5-7] WARN
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request
> encountered. Closing the channel.
> >
> >
> > Is there any way to adjust these file size limits, or should UDFs with
> dependencies be handled some other way? I looked into the
> HttpRequestAggregator.java file and tried following some trails, but I
> can't seem to discover where the limit is actually set.
> >
> >
> > Best wishes,
> >
> > Torsten
>





Re: Large UDFs

2019-11-17 Thread Torsten Bergh Moss
I must say that I feel really confident that the problem has to do with the 
size of the UDF. 

I realized a lot of the dependencies actually were related to Asterix, thus 
redundant, so I solved the dependency problem by unapologetically cloning the 
repos for the external libraries my UDF is explicitly using and adding the code 
to the repo. It worked.

However, my UDF is based on machine learning (Naive Bayes for sentiment 
analysis of Tweets), and is trained on about 900 000 tweets. The trained model 
manifests as large dictionaries containing term frequencies for the different 
classes/sentiments. So in order to use my UDF I either have to upload it with 
the training data or serialized versions of these dictionaries. 

And I can see that if I mvn package my UDF without these large files (.csv or 
.ser) it is "accepted" by the server when I send it via POST, but if I add 
these large files to the repo and then mvn package the UDF then the server 
rejects it because of file size. In other words, it seems to solely depend on 
the presence of these big files. And I mean it kind of makes sense as that is 
exactly what the cc.log file is saying: "A large request encountered. Closing 
channel."

Best wishes,
Torsten


From: Xikui Wang 
Sent: Sunday, November 17, 2019 12:21 AM
To: dev@asterixdb.apache.org
Subject: Re: Large UDFs

I think the warning message that you see probably is orthogonal to the
dependencies that you are trying to add, since the installation of UDF
merely copies the jar files to a designated location for AsterixDB to
discover. It shouldn't touch the code that raises the warning message.
Maybe that's related to how you interacted with system? Not sure...

As for handling large dependency libraries, besides making a fat jar, you
can also copy the dependency jar files into the
"apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
deployed to the cluster together with AsterixDB and then be used by UDFs
directly.

Best,
Xikui

On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon  wrote:

> Sounds like a bug, can you share the UDF in question so I can debug it?
>
> > On Nov 16, 2019, at 05:17, Torsten Bergh Moss 
> wrote:
> >
> > Greetings devs,
> >
> >
> > Hope you are all enjoying your weekends.
> >
> >
> > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
> dependencies (one of them being the GPU-framework). In order to "bake"
> these dependencies into the UDF I am packaging it as a
> jar-with-dependencies, however, this jar ends up being too big to deploy as
> a UDF as the Hyracks Http Server cries out
> >
> >
> > [nioEventLoopGroup-5-7] WARN
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request
> encountered. Closing the channel.
> >
> >
> > Is there any way to adjust these file size limits, or should UDFs with
> dependencies be handled some other way? I looked into the
> HttpRequestAggregator.java file and tried following some trails, but I
> can't seem to discover where the limit is actually set.
> >
> >
> > Best wishes,
> >
> > Torsten
>


Re: Large UDFs

2019-11-16 Thread Xikui Wang
I think the warning message that you see probably is orthogonal to the
dependencies that you are trying to add, since the installation of UDF
merely copies the jar files to a designated location for AsterixDB to
discover. It shouldn't touch the code that raises the warning message.
Maybe that's related to how you interacted with system? Not sure...

As for handling large dependency libraries, besides making a fat jar, you
can also copy the dependency jar files into the
"apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be
deployed to the cluster together with AsterixDB and then be used by UDFs
directly.

Best,
Xikui

On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon  wrote:

> Sounds like a bug, can you share the UDF in question so I can debug it?
>
> > On Nov 16, 2019, at 05:17, Torsten Bergh Moss 
> wrote:
> >
> > Greetings devs,
> >
> >
> > Hope you are all enjoying your weekends.
> >
> >
> > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of
> dependencies (one of them being the GPU-framework). In order to "bake"
> these dependencies into the UDF I am packaging it as a
> jar-with-dependencies, however, this jar ends up being too big to deploy as
> a UDF as the Hyracks Http Server cries out
> >
> >
> > [nioEventLoopGroup-5-7] WARN
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request
> encountered. Closing the channel.
> >
> >
> > Is there any way to adjust these file size limits, or should UDFs with
> dependencies be handled some other way? I looked into the
> HttpRequestAggregator.java file and tried following some trails, but I
> can't seem to discover where the limit is actually set.
> >
> >
> > Best wishes,
> >
> > Torsten
>


Re: Large UDFs

2019-11-16 Thread Ian Maxon
Sounds like a bug, can you share the UDF in question so I can debug it?

> On Nov 16, 2019, at 05:17, Torsten Bergh Moss  
> wrote:
> 
> Greetings devs,
> 
> 
> Hope you are all enjoying your weekends.
> 
> 
> I am trying to build a GPU-based UDF, and this UDF relies on a bunch of 
> dependencies (one of them being the GPU-framework). In order to "bake" these 
> dependencies into the UDF I am packaging it as a jar-with-dependencies, 
> however, this jar ends up being too big to deploy as a UDF as the Hyracks 
> Http Server cries out
> 
> 
> [nioEventLoopGroup-5-7] WARN 
> org.apache.hyracks.http.server.HttpRequestAggregator - A large request 
> encountered. Closing the channel.
> 
> 
> Is there any way to adjust these file size limits, or should UDFs with 
> dependencies be handled some other way? I looked into the 
> HttpRequestAggregator.java file and tried following some trails, but I can't 
> seem to discover where the limit is actually set.
> 
> 
> Best wishes,
> 
> Torsten


Large UDFs

2019-11-16 Thread Torsten Bergh Moss
Greetings devs,


Hope you are all enjoying your weekends.


I am trying to build a GPU-based UDF, and this UDF relies on a bunch of 
dependencies (one of them being the GPU-framework). In order to "bake" these 
dependencies into the UDF I am packaging it as a jar-with-dependencies, 
however, this jar ends up being too big to deploy as a UDF as the Hyracks Http 
Server cries out


[nioEventLoopGroup-5-7] WARN 
org.apache.hyracks.http.server.HttpRequestAggregator - A large request 
encountered. Closing the channel.


Is there any way to adjust these file size limits, or should UDFs with 
dependencies be handled some other way? I looked into the 
HttpRequestAggregator.java file and tried following some trails, but I can't 
seem to discover where the limit is actually set.


Best wishes,

Torsten