Rohini,

thanks a lot, I'll check the parameter.

On Wed, Feb 20, 2013 at 1:39 AM, Rohini Palaniswamy <[email protected]
> wrote:

> Eugene,
>   As I said earlier, you can use a different dfs.umaskmode. Running pig
> with -Ddfs.umaskmode=022 will give read access to all(755 instead of 700).
> But all the files output, by the pig script will have those permission.
>
> Better thing would be when you write the serialized file in the below step,
> write it with more accessible permissions.
> 2. After that client side builds the filter, serialize it and move it to
> server side.
>
> Regards,
> Rohini
>
>
> On Tue, Feb 19, 2013 at 4:26 AM, Eugene Morozov
> <[email protected]>wrote:
>
> > Rohini,
> >
> > Sorry for misleading in previous e-mails with these users. Here is more
> > robust explanation of my issue.
> >
> > This is what I've got when I've tried to run it.
> >
> > File has been successfully copied by using "tmpfiles".
> > 2013-02-08 13:38:56,533 INFO
> > org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile: File
> >
> >
> [/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging/job_201302081322_0001/files/pairs-tmp#pairs-tmp]
> > has been found
> > 2013-02-08 13:38:56,539 ERROR
> > org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile: Cannot read
> > file:
> >
> >
> [/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging/job_201302081322_0001/files/pairs-tmp#pairs-tmp]
> > org.apache.hadoop.security.AccessControlException: Permission denied:
> > user=hbase, access=EXECUTE,
> >
> >
> inode="/var/lib/hadoop-hdfs/cache/mapred/mapred/staging/vagrant/.staging":vagrant:supergroup:drwx------
> >
> >
> > org.apache.hadoop.hbase.filter.PrefixFuzzyRowFilterWithFile - it's my
> > filter, it just lives in org.apache... package.
> >
> > 1. I have user vagrant and this user runs pig script.
> > 2. After that client side builds the filter, serialize it and move it to
> > server side.
> > 3. RegionServer starts playing here: it deserializes the filter and tries
> > to use it while reading table.
> > 4. Filter in its turn tries to read the file, but since RegionServer has
> > been started under system user called "hbase", the filter also has
> > corresponding authentification and cannot access the file, which has been
> > written with another user.
> >
> > Any ideas of what to try?
> >
> > On Sun, Feb 17, 2013 at 8:22 AM, Rohini Palaniswamy <
> > [email protected]
> > > wrote:
> >
> > > Hi Eugene,
> > >       Sorry. Missed your reply earlier.
> > >
> > >     tmpfiles has been around for a while and will not be removed in
> > hadoop
> > > anytime soon. So don't worry about it. The hadoop configurations have
> > never
> > > been fully documented and people look at code and use them. They
> usually
> > > deprecate for  years before removing it.
> > >
> > >   The file will be created with the permissions based on the
> > dfs.umaskmode
> > > setting (or fs.permissions.umask-mode in Hadoop 0.23/2.x) and the owner
> > of
> > > the file will be the user who runs the pig script. The map job will be
> > > launched as the same user by the pig script. I don't understand what
> you
> > > mean by user runs map task does not have permissions. What kind of
> hadoop
> > > authentication are you are doing such that the file is created as one
> > user
> > > and map job is launched as another user?
> > >
> > > Regards,
> > > Rohini
> > >
> > >
> > > On Sun, Feb 10, 2013 at 10:26 PM, Eugene Morozov
> > > <[email protected]>wrote:
> > >
> > > > Hi, again.
> > > >
> > > > I've been able to successfully use the trick with DistributedCache
> and
> > > > "tmpfiles" - during run of my Pig script the files are copied by
> > > JobClient
> > > > to job-cache.
> > > >
> > > > But here is the issue. The files are there, but they have permission
> > 700
> > > > and user that runs maptask (I suppose it's hbase) doesn't have
> > permission
> > > > to read them. Permissions are belong to my current OS user.
> > > >
> > > > In first, It looks like a bug, doesn't it?
> > > > In second, what can I do about it?
> > > >
> > > >
> > > > On Thu, Feb 7, 2013 at 11:42 AM, Eugene Morozov
> > > > <[email protected]>wrote:
> > > >
> > > > > Rohini,
> > > > >
> > > > > thank you for the reply.
> > > > >
> > > > > Isn't it kinda hack to use "tmpfiles"? It's neither API nor good
> > known
> > > > > practice, it's internal details. How safe is it to use such a
> trick?
> > I
> > > > mean
> > > > > after month or so we probably update our CDH4 to whatever is there.
> > > > > Will it still work? Will it be safe for the cluster or for my job?
> > Who
> > > > > knows what will be implemented there?
> > > > >
> > > > > You see, I can understand the code, find such a solution, but I
> won't
> > > be
> > > > > able keep all of them in mind to check when we update the cluster.
> > > > >
> > > > >
> > > > > On Thu, Feb 7, 2013 at 1:23 AM, Rohini Palaniswamy <
> > > > > [email protected]> wrote:
> > > > >
> > > > >> You should be fine using tmpfiles and that's the way to do it.
> > > > >>
> > > > >>  Else you will have to copy the file to hdfs, and call the
> > > > >> DistributedCache.addFileToClassPath yourself (basically what
> > tmpfiles
> > > > >> setting is doing). But the problem there as you mentioned is
> > cleaning
> > > up
> > > > >> the hdfs file after the job completes. If you use tmpfiles, it is
> > > copied
> > > > >> to
> > > > >> the job's staging directory in user home and gets cleaned up
> > > > automatically
> > > > >> when job completes. If the file is not going to change between
> > jobs, I
> > > > >> would advise creating it in hdfs once in a fixed location and
> > reusing
> > > it
> > > > >> across jobs doing only DistributedCache.addFileToClassPath(). But
> if
> > > it
> > > > is
> > > > >> dynamic and differs from job to job, tmpfiles is your choice.
> > > > >>
> > > > >> Regards,
> > > > >> Rohini
> > > > >>
> > > > >>
> > > > >> On Mon, Feb 4, 2013 at 1:26 PM, Eugene Morozov <
> > > > [email protected]
> > > > >> >wrote:
> > > > >>
> > > > >> > Hello, folks!
> > > > >> >
> > > > >> > I'm using greatly customized HBaseStorage in my pig script.
> > > > >> > And during HBaseStorage.setLocation() I'm preparing a file with
> > > values
> > > > >> that
> > > > >> > would be source for my filter. The filter is used  during
> > > > >> > HBaseStorage.getNext().
> > > > >> >
> > > > >> > Since Pig script is basically MR job with many mappers, it means
> > > that
> > > > my
> > > > >> > values-file must be accessible for all my Map tasks. There is
> > > > >> > DistributedCache that should copy files across the cluster to
> have
> > > > them
> > > > >> as
> > > > >> > local for any map tasks. I don't want to write my file to HDFS
> in
> > > > first
> > > > >> > place, because there is no way to clean it up after MR job is
> done
> > > > >>  (may be
> > > > >> > you can point me in the direction). On the other hand if I'm
> > writing
> > > > the
> > > > >> > file to local file system "/tmp", then I may either specify
> > > > >> deleteOnExit()
> > > > >> > or just forget about it - linux will take care of its local
> > "/tmp".
> > > > >> >
> > > > >> > But here is small problem. DistributedCache copies files only if
> > it
> > > is
> > > > >> used
> > > > >> > with command line parameter like "-files". In that case
> > > > >> > GenericOptionsParsers copies all files, but DistributedCache API
> > > > itself
> > > > >> > allows only to specify parameters in jobConf - it doesn't
> actually
> > > do
> > > > >> > copying.
> > > > >> >
> > > > >> > I've found that GenericOptionsParser specifies property
> > "tmpfiles",
> > > > >> which
> > > > >> > is used by JobClient to copy files before it runs MR job. And
> I've
> > > > been
> > > > >> > able to specify the same property in jobConf from my
> HBaseStorage.
> > > It
> > > > >> does
> > > > >> > the trick, but it's a hack.
> > > > >> > Is there any other correct way to achieve the goal?
> > > > >> >
> > > > >> > Thanks in advance.
> > > > >> > --
> > > > >> > Evgeny Morozov
> > > > >> > Developer Grid Dynamics
> > > > >> > Skype: morozov.evgeny
> > > > >> > www.griddynamics.com
> > > > >> > [email protected]
> > > > >> >
> > > > >>
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Evgeny Morozov
> > > > > Developer Grid Dynamics
> > > > > Skype: morozov.evgeny
> > > > > www.griddynamics.com
> > > > > [email protected]
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Evgeny Morozov
> > > > Developer Grid Dynamics
> > > > Skype: morozov.evgeny
> > > > www.griddynamics.com
> > > > [email protected]
> > > >
> > >
> >
> >
> >
> > --
> > Evgeny Morozov
> > Developer Grid Dynamics
> > Skype: morozov.evgeny
> > www.griddynamics.com
> > [email protected]
> >
>



-- 
Evgeny Morozov
Developer Grid Dynamics
Skype: morozov.evgeny
www.griddynamics.com
[email protected]

Reply via email to