Rohini, thanks a lot, I'll check the parameter.
On Wed, Feb 20, 2013 at 1:39 AM, Rohini Palaniswamy <[email protected] > wrote: > Eugene, > As I said earlier, you can use a different dfs.umaskmode. Running pig > with -Ddfs.umaskmode=022 will give read access to all(755 instead of 700). > But all the files output, by the pig script will have those permission. > > Better thing would be when you write the serialized file in the below step, > write it with more accessible permissions. > 2. After that client side builds the filter, serialize it and move it to > server side. > > Regards, > Rohini > > > On Tue, Feb 19, 2013 at 4:26 AM, Eugene Morozov > <[email protected]>wrote: > > > Rohini, > > > > Sorry for misleading in previous e-mails with these users. Here is more > > robust explanation of my issue. > > > > This is what I've got when I've tried to run it. > > > > File has been successfully copied by using "tmpfiles". > > 2013-02-08 13:38:56,533 INFO > > org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile: File > > > > > [/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging/job_201302081322_0001/files/pairs-tmp#pairs-tmp] > > has been found > > 2013-02-08 13:38:56,539 ERROR > > org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile: Cannot read > > file: > > > > > [/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging/job_201302081322_0001/files/pairs-tmp#pairs-tmp] > > org.apache.hadoop.security.AccessControlException: Permission denied: > > user=hbase, access=EXECUTE, > > > > > inode="/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging":vagrant:supergroup:drwx------ > > > > > > org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile - it's my > > filter, it just lives in org.apache... package. > > > > 1. I have user vagrant and this user runs pig script. > > 2. After that client side builds the filter, serialize it and move it to > > server side. > > 3. RegionServer starts playing here: it deserializes the filter and tries > > to use it while reading table. > > 4. Filter in its turn tries to read the file, but since RegionServer has > > been started under system user called "hbase", the filter also has > > corresponding authentification and cannot access the file, which has been > > written with another user. > > > > Any ideas of what to try? > > > > On Sun, Feb 17, 2013 at 8:22 AM, Rohini Palaniswamy < > > [email protected] > > > wrote: > > > > > Hi Eugene, > > > Sorry. Missed your reply earlier. > > > > > > tmpfiles has been around for a while and will not be removed in > > hadoop > > > anytime soon. So don't worry about it. The hadoop configurations have > > never > > > been fully documented and people look at code and use them. They > usually > > > deprecate for years before removing it. > > > > > > The file will be created with the permissions based on the > > dfs.umaskmode > > > setting (or fs.permissions.umask-mode in Hadoop 0.23/2.x) and the owner > > of > > > the file will be the user who runs the pig script. The map job will be > > > launched as the same user by the pig script. I don't understand what > you > > > mean by user runs map task does not have permissions. What kind of > hadoop > > > authentication are you are doing such that the file is created as one > > user > > > and map job is launched as another user? > > > > > > Regards, > > > Rohini > > > > > > > > > On Sun, Feb 10, 2013 at 10:26 PM, Eugene Morozov > > > <[email protected]>wrote: > > > > > > > Hi, again. > > > > > > > > I've been able to successfully use the trick with DistributedCache > and > > > > "tmpfiles" - during run of my Pig script the files are copied by > > > JobClient > > > > to job-cache. > > > > > > > > But here is the issue. The files are there, but they have permission > > 700 > > > > and user that runs maptask (I suppose it's hbase) doesn't have > > permission > > > > to read them. Permissions are belong to my current OS user. > > > > > > > > In first, It looks like a bug, doesn't it? > > > > In second, what can I do about it? > > > > > > > > > > > > On Thu, Feb 7, 2013 at 11:42 AM, Eugene Morozov > > > > <[email protected]>wrote: > > > > > > > > > Rohini, > > > > > > > > > > thank you for the reply. > > > > > > > > > > Isn't it kinda hack to use "tmpfiles"? It's neither API nor good > > known > > > > > practice, it's internal details. How safe is it to use such a > trick? > > I > > > > mean > > > > > after month or so we probably update our CDH4 to whatever is there. > > > > > Will it still work? Will it be safe for the cluster or for my job? > > Who > > > > > knows what will be implemented there? > > > > > > > > > > You see, I can understand the code, find such a solution, but I > won't > > > be > > > > > able keep all of them in mind to check when we update the cluster. > > > > > > > > > > > > > > > On Thu, Feb 7, 2013 at 1:23 AM, Rohini Palaniswamy < > > > > > [email protected]> wrote: > > > > > > > > > >> You should be fine using tmpfiles and that's the way to do it. > > > > >> > > > > >> Else you will have to copy the file to hdfs, and call the > > > > >> DistributedCache.addFileToClassPath yourself (basically what > > tmpfiles > > > > >> setting is doing). But the problem there as you mentioned is > > cleaning > > > up > > > > >> the hdfs file after the job completes. If you use tmpfiles, it is > > > copied > > > > >> to > > > > >> the job's staging directory in user home and gets cleaned up > > > > automatically > > > > >> when job completes. If the file is not going to change between > > jobs, I > > > > >> would advise creating it in hdfs once in a fixed location and > > reusing > > > it > > > > >> across jobs doing only DistributedCache.addFileToClassPath(). But > if > > > it > > > > is > > > > >> dynamic and differs from job to job, tmpfiles is your choice. > > > > >> > > > > >> Regards, > > > > >> Rohini > > > > >> > > > > >> > > > > >> On Mon, Feb 4, 2013 at 1:26 PM, Eugene Morozov < > > > > [email protected] > > > > >> >wrote: > > > > >> > > > > >> > Hello, folks! > > > > >> > > > > > >> > I'm using greatly customized HBaseStorage in my pig script. > > > > >> > And during HBaseStorage.setLocation() I'm preparing a file with > > > values > > > > >> that > > > > >> > would be source for my filter. The filter is used during > > > > >> > HBaseStorage.getNext(). > > > > >> > > > > > >> > Since Pig script is basically MR job with many mappers, it means > > > that > > > > my > > > > >> > values-file must be accessible for all my Map tasks. There is > > > > >> > DistributedCache that should copy files across the cluster to > have > > > > them > > > > >> as > > > > >> > local for any map tasks. I don't want to write my file to HDFS > in > > > > first > > > > >> > place, because there is no way to clean it up after MR job is > done > > > > >> (may be > > > > >> > you can point me in the direction). On the other hand if I'm > > writing > > > > the > > > > >> > file to local file system "/tmp", then I may either specify > > > > >> deleteOnExit() > > > > >> > or just forget about it - linux will take care of its local > > "/tmp". > > > > >> > > > > > >> > But here is small problem. DistributedCache copies files only if > > it > > > is > > > > >> used > > > > >> > with command line parameter like "-files". In that case > > > > >> > GenericOptionsParsers copies all files, but DistributedCache API > > > > itself > > > > >> > allows only to specify parameters in jobConf - it doesn't > actually > > > do > > > > >> > copying. > > > > >> > > > > > >> > I've found that GenericOptionsParser specifies property > > "tmpfiles", > > > > >> which > > > > >> > is used by JobClient to copy files before it runs MR job. And > I've > > > > been > > > > >> > able to specify the same property in jobConf from my > HBaseStorage. > > > It > > > > >> does > > > > >> > the trick, but it's a hack. > > > > >> > Is there any other correct way to achieve the goal? > > > > >> > > > > > >> > Thanks in advance. > > > > >> > -- > > > > >> > Evgeny Morozov > > > > >> > Developer Grid Dynamics > > > > >> > Skype: morozov.evgeny > > > > >> > www.griddynamics.com > > > > >> > [email protected] > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > > -- > > > > > Evgeny Morozov > > > > > Developer Grid Dynamics > > > > > Skype: morozov.evgeny > > > > > www.griddynamics.com > > > > > [email protected] > > > > > > > > > > > > > > > > > > > > > -- > > > > Evgeny Morozov > > > > Developer Grid Dynamics > > > > Skype: morozov.evgeny > > > > www.griddynamics.com > > > > [email protected] > > > > > > > > > > > > > > > -- > > Evgeny Morozov > > Developer Grid Dynamics > > Skype: morozov.evgeny > > www.griddynamics.com > > [email protected] > > > -- Evgeny Morozov Developer Grid Dynamics Skype: morozov.evgeny www.griddynamics.com [email protected]
