Re: Large UDFs
Everything you said was correct, the server accepted my large UDF now, thank you! Best wishes, Torsten Bergh Moss From: Murtadha Hubail Sent: Sunday, November 17, 2019 4:29 PM To: Torsten Bergh Moss; dev@asterixdb.apache.org Subject: Re: Large UDFs Yes, and I believe it should go under the [common] config section. You will need to restart the asterixdb instance after that for the change to take effect. This property is configured in bytes. For example, if you want to set it to 100MB, it would be something like this: [common] max.web.request.size=104857600 Cheers, Murtadha On 11/17/2019, 6:17 PM, "Torsten Bergh Moss" wrote: Thanks Murtadha, Do I configure this property under [cc] inside cc.conf? Best wishes, Torsten From: Murtadha Hubail Sent: Sunday, November 17, 2019 1:50 PM To: Torsten Bergh Moss; dev@asterixdb.apache.org Subject: Re: Large UDFs Torsten, The maximum HTTP request size is configurable using the property (max.web.request.size) and by default it is set to 50MB. Cheers, Murtadha On 11/17/2019, 3:34 PM, "Torsten Bergh Moss" wrote: I must say that I feel really confident that the problem has to do with the size of the UDF. I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked. However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries. And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel." Best wishes, Torsten From: Xikui Wang Sent: Sunday, November 17, 2019 12:21 AM To: dev@asterixdb.apache.org Subject: Re: Large UDFs I think the warning message that you see probably is orthogonal to the dependencies that you are trying to add, since the installation of UDF merely copies the jar files to a designated location for AsterixDB to discover. It shouldn't touch the code that raises the warning message. Maybe that's related to how you interacted with system? Not sure... As for handling large dependency libraries, besides making a fat jar, you can also copy the dependency jar files into the "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be deployed to the cluster together with AsterixDB and then be used by UDFs directly. Best, Xikui On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon wrote: > Sounds like a bug, can you share the UDF in question so I can debug it? > > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss > wrote: > > > > Greetings devs, > > > > > > Hope you are all enjoying your weekends. > > > > > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of > dependencies (one of them being the GPU-framework). In order to "bake" > these dependencies into the UDF I am packaging it as a > jar-with-dependencies, however, this jar ends up being too big to deploy as > a UDF as the Hyracks Http Server cries out > > > > > > [nioEventLoopGroup-5-7] WARN > org.apache.hyracks.http.server.HttpRequestAggregator - A large request > encountered. Closing the channel. > > > > > > Is there any way to adjust these file size limits, or should UDFs with > dependencies be handled some other way? I looked into the > HttpRequestAggregator.java file and tried following some trails, but I > can't seem to discover where the limit is actually set. > > > > > > Best wishes, > > > > Torsten >
Re: Large UDFs
Yes, and I believe it should go under the [common] config section. You will need to restart the asterixdb instance after that for the change to take effect. This property is configured in bytes. For example, if you want to set it to 100MB, it would be something like this: [common] max.web.request.size=104857600 Cheers, Murtadha On 11/17/2019, 6:17 PM, "Torsten Bergh Moss" wrote: Thanks Murtadha, Do I configure this property under [cc] inside cc.conf? Best wishes, Torsten From: Murtadha Hubail Sent: Sunday, November 17, 2019 1:50 PM To: Torsten Bergh Moss; dev@asterixdb.apache.org Subject: Re: Large UDFs Torsten, The maximum HTTP request size is configurable using the property (max.web.request.size) and by default it is set to 50MB. Cheers, Murtadha On 11/17/2019, 3:34 PM, "Torsten Bergh Moss" wrote: I must say that I feel really confident that the problem has to do with the size of the UDF. I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked. However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries. And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel." Best wishes, Torsten From: Xikui Wang Sent: Sunday, November 17, 2019 12:21 AM To: dev@asterixdb.apache.org Subject: Re: Large UDFs I think the warning message that you see probably is orthogonal to the dependencies that you are trying to add, since the installation of UDF merely copies the jar files to a designated location for AsterixDB to discover. It shouldn't touch the code that raises the warning message. Maybe that's related to how you interacted with system? Not sure... As for handling large dependency libraries, besides making a fat jar, you can also copy the dependency jar files into the "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be deployed to the cluster together with AsterixDB and then be used by UDFs directly. Best, Xikui On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon wrote: > Sounds like a bug, can you share the UDF in question so I can debug it? > > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss > wrote: > > > > Greetings devs, > > > > > > Hope you are all enjoying your weekends. > > > > > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of > dependencies (one of them being the GPU-framework). In order to "bake" > these dependencies into the UDF I am packaging it as a > jar-with-dependencies, however, this jar ends up being too big to deploy as > a UDF as the Hyracks Http Server cries out > > > > > > [nioEventLoopGroup-5-7] WARN > org.apache.hyracks.http.server.HttpRequestAggregator - A large request > encountered. Closing the channel. > > > > > > Is there any way to adjust these file size limits, or should UDFs with > dependencies be handled some other way? I looked into the > HttpRequestAggregator.java file and tried following some trails, but I > can't seem to discover where the limit is actually set. > > > > > > Best wishes, > > > > Torsten >
Re: Large UDFs
Thanks Murtadha, Do I configure this property under [cc] inside cc.conf? Best wishes, Torsten From: Murtadha Hubail Sent: Sunday, November 17, 2019 1:50 PM To: Torsten Bergh Moss; dev@asterixdb.apache.org Subject: Re: Large UDFs Torsten, The maximum HTTP request size is configurable using the property (max.web.request.size) and by default it is set to 50MB. Cheers, Murtadha On 11/17/2019, 3:34 PM, "Torsten Bergh Moss" wrote: I must say that I feel really confident that the problem has to do with the size of the UDF. I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked. However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries. And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel." Best wishes, Torsten From: Xikui Wang Sent: Sunday, November 17, 2019 12:21 AM To: dev@asterixdb.apache.org Subject: Re: Large UDFs I think the warning message that you see probably is orthogonal to the dependencies that you are trying to add, since the installation of UDF merely copies the jar files to a designated location for AsterixDB to discover. It shouldn't touch the code that raises the warning message. Maybe that's related to how you interacted with system? Not sure... As for handling large dependency libraries, besides making a fat jar, you can also copy the dependency jar files into the "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be deployed to the cluster together with AsterixDB and then be used by UDFs directly. Best, Xikui On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon wrote: > Sounds like a bug, can you share the UDF in question so I can debug it? > > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss > wrote: > > > > Greetings devs, > > > > > > Hope you are all enjoying your weekends. > > > > > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of > dependencies (one of them being the GPU-framework). In order to "bake" > these dependencies into the UDF I am packaging it as a > jar-with-dependencies, however, this jar ends up being too big to deploy as > a UDF as the Hyracks Http Server cries out > > > > > > [nioEventLoopGroup-5-7] WARN > org.apache.hyracks.http.server.HttpRequestAggregator - A large request > encountered. Closing the channel. > > > > > > Is there any way to adjust these file size limits, or should UDFs with > dependencies be handled some other way? I looked into the > HttpRequestAggregator.java file and tried following some trails, but I > can't seem to discover where the limit is actually set. > > > > > > Best wishes, > > > > Torsten >
Re: Large UDFs
Torsten, The maximum HTTP request size is configurable using the property (max.web.request.size) and by default it is set to 50MB. Cheers, Murtadha On 11/17/2019, 3:34 PM, "Torsten Bergh Moss" wrote: I must say that I feel really confident that the problem has to do with the size of the UDF. I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked. However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries. And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel." Best wishes, Torsten From: Xikui Wang Sent: Sunday, November 17, 2019 12:21 AM To: dev@asterixdb.apache.org Subject: Re: Large UDFs I think the warning message that you see probably is orthogonal to the dependencies that you are trying to add, since the installation of UDF merely copies the jar files to a designated location for AsterixDB to discover. It shouldn't touch the code that raises the warning message. Maybe that's related to how you interacted with system? Not sure... As for handling large dependency libraries, besides making a fat jar, you can also copy the dependency jar files into the "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be deployed to the cluster together with AsterixDB and then be used by UDFs directly. Best, Xikui On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon wrote: > Sounds like a bug, can you share the UDF in question so I can debug it? > > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss > wrote: > > > > Greetings devs, > > > > > > Hope you are all enjoying your weekends. > > > > > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of > dependencies (one of them being the GPU-framework). In order to "bake" > these dependencies into the UDF I am packaging it as a > jar-with-dependencies, however, this jar ends up being too big to deploy as > a UDF as the Hyracks Http Server cries out > > > > > > [nioEventLoopGroup-5-7] WARN > org.apache.hyracks.http.server.HttpRequestAggregator - A large request > encountered. Closing the channel. > > > > > > Is there any way to adjust these file size limits, or should UDFs with > dependencies be handled some other way? I looked into the > HttpRequestAggregator.java file and tried following some trails, but I > can't seem to discover where the limit is actually set. > > > > > > Best wishes, > > > > Torsten >
Re: Large UDFs
I must say that I feel really confident that the problem has to do with the size of the UDF. I realized a lot of the dependencies actually were related to Asterix, thus redundant, so I solved the dependency problem by unapologetically cloning the repos for the external libraries my UDF is explicitly using and adding the code to the repo. It worked. However, my UDF is based on machine learning (Naive Bayes for sentiment analysis of Tweets), and is trained on about 900 000 tweets. The trained model manifests as large dictionaries containing term frequencies for the different classes/sentiments. So in order to use my UDF I either have to upload it with the training data or serialized versions of these dictionaries. And I can see that if I mvn package my UDF without these large files (.csv or .ser) it is "accepted" by the server when I send it via POST, but if I add these large files to the repo and then mvn package the UDF then the server rejects it because of file size. In other words, it seems to solely depend on the presence of these big files. And I mean it kind of makes sense as that is exactly what the cc.log file is saying: "A large request encountered. Closing channel." Best wishes, Torsten From: Xikui Wang Sent: Sunday, November 17, 2019 12:21 AM To: dev@asterixdb.apache.org Subject: Re: Large UDFs I think the warning message that you see probably is orthogonal to the dependencies that you are trying to add, since the installation of UDF merely copies the jar files to a designated location for AsterixDB to discover. It shouldn't touch the code that raises the warning message. Maybe that's related to how you interacted with system? Not sure... As for handling large dependency libraries, besides making a fat jar, you can also copy the dependency jar files into the "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be deployed to the cluster together with AsterixDB and then be used by UDFs directly. Best, Xikui On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon wrote: > Sounds like a bug, can you share the UDF in question so I can debug it? > > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss > wrote: > > > > Greetings devs, > > > > > > Hope you are all enjoying your weekends. > > > > > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of > dependencies (one of them being the GPU-framework). In order to "bake" > these dependencies into the UDF I am packaging it as a > jar-with-dependencies, however, this jar ends up being too big to deploy as > a UDF as the Hyracks Http Server cries out > > > > > > [nioEventLoopGroup-5-7] WARN > org.apache.hyracks.http.server.HttpRequestAggregator - A large request > encountered. Closing the channel. > > > > > > Is there any way to adjust these file size limits, or should UDFs with > dependencies be handled some other way? I looked into the > HttpRequestAggregator.java file and tried following some trails, but I > can't seem to discover where the limit is actually set. > > > > > > Best wishes, > > > > Torsten >
Re: Large UDFs
I think the warning message that you see probably is orthogonal to the dependencies that you are trying to add, since the installation of UDF merely copies the jar files to a designated location for AsterixDB to discover. It shouldn't touch the code that raises the warning message. Maybe that's related to how you interacted with system? Not sure... As for handling large dependency libraries, besides making a fat jar, you can also copy the dependency jar files into the "apache-asterixdb-0.9.5-SNAPSHOT/repo" folder, so these jars can be deployed to the cluster together with AsterixDB and then be used by UDFs directly. Best, Xikui On Sat, Nov 16, 2019 at 2:55 PM Ian Maxon wrote: > Sounds like a bug, can you share the UDF in question so I can debug it? > > > On Nov 16, 2019, at 05:17, Torsten Bergh Moss > wrote: > > > > Greetings devs, > > > > > > Hope you are all enjoying your weekends. > > > > > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of > dependencies (one of them being the GPU-framework). In order to "bake" > these dependencies into the UDF I am packaging it as a > jar-with-dependencies, however, this jar ends up being too big to deploy as > a UDF as the Hyracks Http Server cries out > > > > > > [nioEventLoopGroup-5-7] WARN > org.apache.hyracks.http.server.HttpRequestAggregator - A large request > encountered. Closing the channel. > > > > > > Is there any way to adjust these file size limits, or should UDFs with > dependencies be handled some other way? I looked into the > HttpRequestAggregator.java file and tried following some trails, but I > can't seem to discover where the limit is actually set. > > > > > > Best wishes, > > > > Torsten >
Re: Large UDFs
Sounds like a bug, can you share the UDF in question so I can debug it? > On Nov 16, 2019, at 05:17, Torsten Bergh Moss > wrote: > > Greetings devs, > > > Hope you are all enjoying your weekends. > > > I am trying to build a GPU-based UDF, and this UDF relies on a bunch of > dependencies (one of them being the GPU-framework). In order to "bake" these > dependencies into the UDF I am packaging it as a jar-with-dependencies, > however, this jar ends up being too big to deploy as a UDF as the Hyracks > Http Server cries out > > > [nioEventLoopGroup-5-7] WARN > org.apache.hyracks.http.server.HttpRequestAggregator - A large request > encountered. Closing the channel. > > > Is there any way to adjust these file size limits, or should UDFs with > dependencies be handled some other way? I looked into the > HttpRequestAggregator.java file and tried following some trails, but I can't > seem to discover where the limit is actually set. > > > Best wishes, > > Torsten
Large UDFs
Greetings devs, Hope you are all enjoying your weekends. I am trying to build a GPU-based UDF, and this UDF relies on a bunch of dependencies (one of them being the GPU-framework). In order to "bake" these dependencies into the UDF I am packaging it as a jar-with-dependencies, however, this jar ends up being too big to deploy as a UDF as the Hyracks Http Server cries out [nioEventLoopGroup-5-7] WARN org.apache.hyracks.http.server.HttpRequestAggregator - A large request encountered. Closing the channel. Is there any way to adjust these file size limits, or should UDFs with dependencies be handled some other way? I looked into the HttpRequestAggregator.java file and tried following some trails, but I can't seem to discover where the limit is actually set. Best wishes, Torsten