RE: Resources/Distributed Cache on Spark
Sorry for the resend, but does anyone know who I might best talk to about this? Would it be worthwhile to bring this question to the dev list? Thanks again for the help, Ray From: Ray Navarette [mailto:ray.navare...@pb.com] Sent: Thursday, February 8, 2018 6:33 PM To: user@hive.apache.org Subject: RE: Resources/Distributed Cache on Spark Without using add files, we’d have to make sure these resources exist on every node, and would configure a hive session like this: set myCustomProperty=/path/to/directory/someSubDir/; select myCustomUDF(‘param1’,’param2’); With the shared resources, we can do this instead, at least with MR engine: add files file:///path/to/directory; set myCustomProperty=someSubDir/; select myCustomUDF(‘param1’,’param2’); In both cases, the property myCustomProperty is accessed inside the custom UDF, interpreted as a path, and used to read the content of a file within “someSubDir”. This works fine whenever we have the full path, or with the relative path in the MR engine when using add resources. I’m wondering if perhaps I’m getting lucky in that the MR engine is downloading the files to the working directory, and so the relative path is being properly resolved there, but some different behavior is happening in spark? I can give a full path if I know ahead of time where this file will be available on the remote node, hopefully by property, like ${hive.localResourceDir}/someSubDir. Thanks for the quick response and your help with this. Ray From: Sahil Takiar [mailto:takiar.sa...@gmail.com] Sent: Thursday, February 8, 2018 12:45 PM To: user@hive.apache.org<mailto:user@hive.apache.org> Subject: Re: Resources/Distributed Cache on Spark It should work. We have tests such as groupby_bigdata.q that run on HoS and work. They use the "add file" command. What are the exact commands you are running? What error are you seeing? On Thu, Feb 8, 2018 at 6:28 AM, Ray Navarette <ray.navare...@pb.com<mailto:ray.navare...@pb.com>> wrote: Hello, I’m hoping to find some information about using “ADD FILES ” when using the spark execution engine. I’ve seen some jira tickets reference this functionality, but little else. We have written some custom UDFs which require some external resources. When using the MR execution engine, we can reference the file paths using a relative path and they are properly distributed and resolved. When I try to do the same under spark engine, I receive an error saying the file is unavailable. Does “ADD FILES ” work on spark, and if so, how should I properly reference those files in order to read them in the executors? Thanks much for your help, Ray -- Sahil Takiar Software Engineer takiar.sa...@gmail.com<mailto:takiar.sa...@gmail.com> | (510) 673-0309
RE: Resources/Distributed Cache on Spark
Without using add files, we’d have to make sure these resources exist on every node, and would configure a hive session like this: set myCustomProperty=/path/to/directory/someSubDir/; select myCustomUDF(‘param1’,’param2’); With the shared resources, we can do this instead, at least with MR engine: add files file:///path/to/directory; set myCustomProperty=someSubDir/; select myCustomUDF(‘param1’,’param2’); In both cases, the property myCustomProperty is accessed inside the custom UDF, interpreted as a path, and used to read the content of a file within “someSubDir”. This works fine whenever we have the full path, or with the relative path in the MR engine when using add resources. I’m wondering if perhaps I’m getting lucky in that the MR engine is downloading the files to the working directory, and so the relative path is being properly resolved there, but some different behavior is happening in spark? I can give a full path if I know ahead of time where this file will be available on the remote node, hopefully by property, like ${hive.localResourceDir}/someSubDir. Thanks for the quick response and your help with this. Ray From: Sahil Takiar [mailto:takiar.sa...@gmail.com] Sent: Thursday, February 8, 2018 12:45 PM To: user@hive.apache.org Subject: Re: Resources/Distributed Cache on Spark It should work. We have tests such as groupby_bigdata.q that run on HoS and work. They use the "add file" command. What are the exact commands you are running? What error are you seeing? On Thu, Feb 8, 2018 at 6:28 AM, Ray Navarette <ray.navare...@pb.com<mailto:ray.navare...@pb.com>> wrote: Hello, I’m hoping to find some information about using “ADD FILES ” when using the spark execution engine. I’ve seen some jira tickets reference this functionality, but little else. We have written some custom UDFs which require some external resources. When using the MR execution engine, we can reference the file paths using a relative path and they are properly distributed and resolved. When I try to do the same under spark engine, I receive an error saying the file is unavailable. Does “ADD FILES ” work on spark, and if so, how should I properly reference those files in order to read them in the executors? Thanks much for your help, Ray -- Sahil Takiar Software Engineer takiar.sa...@gmail.com<mailto:takiar.sa...@gmail.com> | (510) 673-0309
Re: Resources/Distributed Cache on Spark
It should work. We have tests such as groupby_bigdata.q that run on HoS and work. They use the "add file" command. What are the exact commands you are running? What error are you seeing? On Thu, Feb 8, 2018 at 6:28 AM, Ray Navarettewrote: > Hello, > > > > I’m hoping to find some information about using “ADD FILES ” when > using the spark execution engine. I’ve seen some jira tickets reference > this functionality, but little else. We have written some custom UDFs > which require some external resources. When using the MR execution engine, > we can reference the file paths using a relative path and they are properly > distributed and resolved. When I try to do the same under spark engine, I > receive an error saying the file is unavailable. > > > > Does “ADD FILES ” work on spark, and if so, how should I properly > reference those files in order to read them in the executors? > > > > Thanks much for your help, > > Ray > -- Sahil Takiar Software Engineer takiar.sa...@gmail.com | (510) 673-0309