[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2019-05-21 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16845100#comment-16845100
 ] 

Sergey Shelukhin commented on HDFS-14498:
-

Fixing it to reduce logging (to log rarely) might make it easier to get any 
useful info for a repro. We've hit it a few times and every time all useful 
logs are gone

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Priority: Major
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2019-05-20 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16844169#comment-16844169
 ] 

Sergey Shelukhin commented on HDFS-14498:
-

The version is somewhat modified 2.9 (no modifications in this area as far as I 
see).
Unfortunately when this thing starts spamming logs our NN logs roll to the 
limit in about 2-3h, so all the useful logs for this file are lost.

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Priority: Major
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2019-05-17 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16842339#comment-16842339
 ] 

Sergey Shelukhin commented on HDFS-14498:
-

I don't have a repro because this issue also basically destroys namenode logs. 
I'm assuming judging by the fact that the file blocks cannot be replicated that 
write failed at some stage and/or client died. The file only has one block, and 
that block is the one without replicas.

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Priority: Major
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2019-05-16 Thread Sergey Shelukhin (JIRA)


[ 
https://issues.apache.org/jira/browse/HDFS-14498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16841839#comment-16841839
 ] 

Sergey Shelukhin commented on HDFS-14498:
-

cc [~elgoiri] [~ashlhud] [~raviprak]

> LeaseManager can loop forever on the file for which create has failed 
> --
>
> Key: HDFS-14498
> URL: https://issues.apache.org/jira/browse/HDFS-14498
> Project: Hadoop HDFS
>  Issue Type: Bug
>Affects Versions: 2.9.0
>Reporter: Sergey Shelukhin
>Priority: Major
>
> The logs from file creation are long gone due to infinite lease logging, 
> however it presumably failed... the client who was trying to write this file 
> is definitely long dead.
> The version includes HDFS-4882.
> We get this log pattern repeating infinitely:
> {noformat}
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
> DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard 
> limit
> 2019-05-16 14:00:16,893 INFO 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
> Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
> Failed to release lease for file . Committed blocks are waiting to be 
> minimally replicated. Try again later.
> 2019-05-16 14:00:16,893 WARN 
> [org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
>  in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
> pending creates: 1]. It will be retried.
> org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
> NameSystem.internalReleaseLease: Failed to release lease for file . 
> Committed blocks are waiting to be minimally replicated. Try again later.
>   at 
> org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
>   at 
> org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
>   at java.lang.Thread.run(Thread.java:745)
> $  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
> 1" hdfs_nn*
> hdfs_nn.log:1068035
> hdfs_nn.log.2019-05-16-14:1516179
> hdfs_nn.log.2019-05-16-15:1538350
> {noformat}
> Aside from an actual bug fix, it might make sense to make LeaseManager not 
> log so much, in case if there are more bugs like this...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14498) LeaseManager can loop forever on the file for which create has failed

2019-05-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-14498:
---

 Summary: LeaseManager can loop forever on the file for which 
create has failed 
 Key: HDFS-14498
 URL: https://issues.apache.org/jira/browse/HDFS-14498
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.9.0
Reporter: Sergey Shelukhin


The logs from file creation are long gone due to infinite lease logging, 
however it presumably failed... the client who was trying to write this file is 
definitely long dead.
The version includes HDFS-4882.
We get this log pattern repeating infinitely:
{noformat}
2019-05-16 14:00:16,893 INFO 
[org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: [Lease.  Holder: 
DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1] has expired hard limit
2019-05-16 14:00:16,893 INFO 
[org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Recovering [Lease.  
Holder: DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 1], src=
2019-05-16 14:00:16,893 WARN 
[org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.internalReleaseLease: 
Failed to release lease for file . Committed blocks are waiting to be 
minimally replicated. Try again later.
2019-05-16 14:00:16,893 WARN 
[org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor@b27557f] 
org.apache.hadoop.hdfs.server.namenode.LeaseManager: Cannot release the path 
 in the lease [Lease.  Holder: DFSClient_NONMAPREDUCE_-20898906_61, 
pending creates: 1]. It will be retried.
org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException: DIR* 
NameSystem.internalReleaseLease: Failed to release lease for file . 
Committed blocks are waiting to be minimally replicated. Try again later.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.internalReleaseLease(FSNamesystem.java:3357)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager.checkLeases(LeaseManager.java:573)
at 
org.apache.hadoop.hdfs.server.namenode.LeaseManager$Monitor.run(LeaseManager.java:509)
at java.lang.Thread.run(Thread.java:745)



$  grep -c "Recovering.*DFSClient_NONMAPREDUCE_-20898906_61, pending creates: 
1" hdfs_nn*
hdfs_nn.log:1068035
hdfs_nn.log.2019-05-16-14:1516179
hdfs_nn.log.2019-05-16-15:1538350
{noformat}

Aside from an actual bug fix, it might make sense to make LeaseManager not log 
so much, in case if there are more bugs like this...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-14387) create a client-side override for dfs.namenode.block-placement-policy.default.prefer-local-node

2019-03-22 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-14387:
---

 Summary: create a client-side override for 
dfs.namenode.block-placement-policy.default.prefer-local-node 
 Key: HDFS-14387
 URL: https://issues.apache.org/jira/browse/HDFS-14387
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin


It should be possible for a service to decide whether it wants to use the local 
node preference; as it stands, if 
dfs.namenode.block-placement-policy.default.prefer-local-node is enabled, the 
services that run far fewer instances than there are DNs in the cluster 
unnecessarily concentrate their write load; the only way around it seems to be 
to disable prefer-local-node globally.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7878) API - expose a unique file identifier

2017-10-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16227160#comment-16227160
 ] 

Sergey Shelukhin commented on HDFS-7878:


Thank you for all the work getting this in! I thought it wouldn't actually ever 
happen :)

> API - expose a unique file identifier
> -
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Chris Douglas
>  Labels: BB2015-05-TBR
> Fix For: 3.0.0
>
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.07.patch, HDFS-7878.08.patch, 
> HDFS-7878.09.patch, HDFS-7878.10.patch, HDFS-7878.11.patch, 
> HDFS-7878.12.patch, HDFS-7878.13.patch, HDFS-7878.14.patch, 
> HDFS-7878.15.patch, HDFS-7878.16.patch, HDFS-7878.17.patch, 
> HDFS-7878.18.patch, HDFS-7878.19.patch, HDFS-7878.20.patch, 
> HDFS-7878.21.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2017-08-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16138848#comment-16138848
 ] 

Sergey Shelukhin commented on HDFS-7878:


I'd like to attempt to resurrect this. However from reading the comments above 
I don't sense a consensus here... At this point, I really don't care at all 
about the class and method structure anymore as long as there's a usable API.
Can someone please summarize the above discussion or guide one of the patch 
versions thru? It reads to me like every approach taken w.r.t. the classes/etc. 
was rejected by at least one person so I'm not sure which one to take.

> API - expose an unique file identifier
> --
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2016-09-01 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15456174#comment-15456174
 ] 

Sergey Shelukhin commented on HDFS-7878:


I meant when callers are in different processes/nodes, e.g. Hive generating 
splits for remote tasks. We can work around it by getting status and verifying 
file ID, then opening from status. However fileId-based open would be more 
convenient (and save a call, potentially - at least from the caller 
perspective, not sure if it's same difference from the NN calls perspective)...

> API - expose an unique file identifier
> --
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HDFS-7878) API - expose an unique file identifier

2016-08-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453828#comment-15453828
 ] 

Sergey Shelukhin edited comment on HDFS-7878 at 9/1/16 12:13 AM:
-

One idea behind open(long/InodeId) is to be able to open files consistently; 
e.g. for partial caching (one needs to be sure that the cached data and the 
data read from FS are for the same file, guarding against overwrites). File ID 
is easy to propagate between different readers for this purpose, but it seems 
that FileStatus would be rather inconvenient. It forces the caller who is 
dealing with the FS to get the status by name first (which also only works if 
the name is known; in our case we do know the name) and verify that fileId is 
consistent.
Is it possible to keep both APIs?


was (Author: sershe):
One idea behind open(long/InodeId) is to be able to open files consistently; 
e.g. for partial caching (one needs to be sure that the cached data and the 
data read from FS are for the same file, guarding against overwrites). File ID 
is easy to propagate between different readers for this purpose, but it seems 
that FileStatus would be rather inconvenient. It forces the caller to get the 
status by name first (which also only works if the name is known; in our case 
we do know the name) and verify that fileId is consistent.
Is it possible to keep both APIs?

> API - expose an unique file identifier
> --
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2016-08-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453828#comment-15453828
 ] 

Sergey Shelukhin commented on HDFS-7878:


One idea behind open(long/InodeId) is to be able to open files consistently; 
e.g. for partial caching (one needs to be sure that the cached data and the 
data read from FS are for the same file, guarding against overwrites). File ID 
is easy to propagate between different readers for this purpose, but it seems 
that FileStatus would be rather inconvenient. It forces the caller to get the 
status by name first (which also only works if the name is known; in our case 
we do know the name) and verify that fileId is consistent.
Is it possible to keep both APIs?

> API - expose an unique file identifier
> --
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2016-08-31 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15453830#comment-15453830
 ] 

Sergey Shelukhin commented on HDFS-7878:


Thanks for picking this up by the way!

> API - expose an unique file identifier
> --
>
> Key: HDFS-7878
> URL: https://issues.apache.org/jira/browse/HDFS-7878
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>Assignee: Sergey Shelukhin
>  Labels: BB2015-05-TBR
> Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
> HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, 
> HDFS-7878.06.patch, HDFS-7878.patch
>
>
> See HDFS-487.
> Even though that is resolved as duplicate, the ID is actually not exposed by 
> the JIRA it supposedly duplicates.
> INode ID for the file should be easy to expose; alternatively ID could be 
> derived from block IDs, to account for appends...
> This is useful e.g. for cache key by file, to make sure cache stays correct 
> when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10757) KMSClientProvider combined with KeyProviderCache can result in wrong UGI being used

2016-08-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-10757:

Summary: KMSClientProvider combined with KeyProviderCache can result in 
wrong UGI being used  (was: KMSClientProvider combined with KeyProviderCache 
results in wrong UGI being used)

> KMSClientProvider combined with KeyProviderCache can result in wrong UGI 
> being used
> ---
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10757) KMSClientProvider combined with KeyProviderCache results in wrong UGI being used

2016-08-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-10757:

Description: 
ClientContext::get gets the context from CACHE via a config setting based name, 
then KeyProviderCache stored in ClientContext gets the key provider cached by 
URI from the configuration, too. These would return the same KeyProvider 
regardless of current UGI.
KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular 
that all the users of DFS with KMSClientProvider in a process will get the KMS 
token (along with other credentials) of the first user, via the above cache.

Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
be UGI-aware, like the FS object cache.

Side note: the comment in createConnection that purports to handle the 
different UGI doesn't seem to cover what it says it covers. In our case, we 
have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
including a KMS token, added.

  was:
ClientContext::get gets the context from CACHE via a config setting based name, 
then KeyProviderCache stored in ClientContext gets the key provider cached by 
URI from the configuration, too. These would return the same KeyProvider 
regardless of current UGI.
KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular 
that all the users of DFS with KMSClientProvider in a process will get the KMS 
token (along with other credentials) of the first user, via the above cache.

Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
be UGI-aware, like the FS object cache.

Side note: the comment in createConnection that purports to handle the 
different UGI doesn't seem to cover it says it covers. In our case, we have two 
unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, including 
a KMS token, added.


> KMSClientProvider combined with KeyProviderCache results in wrong UGI being 
> used
> 
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover what it says it covers. In our case, we 
> have two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10757) KMSClientProvider combined with KeyProviderCache results in wrong UGI being used

2016-08-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-10757:

Description: 
ClientContext::get gets the context from CACHE via a config setting based name, 
then KeyProviderCache stored in ClientContext gets the key provider cached by 
URI from the configuration, too. These would return the same KeyProvider 
regardless of current UGI.
KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular 
that all the users of DFS with KMSClientProvider in a process will get the KMS 
token (along with other credentials) of the first user, via the above cache.

Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
be UGI-aware, like the FS object cache.

Side note: the comment in createConnection that purports to handle the 
different UGI doesn't seem to cover it says it covers. In our case, we have two 
unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, including 
a KMS token, added.

  was:
ClientContext::get gets the context from CACHE via a config setting based name, 
then KeyProviderCache stored in ClientContext gets the key provider cached by 
URI from the configuration, too. These would return the same KeyProvider 
regardless of current UGI.
KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular 
that all the users of DFS with KMSClientProvider in a process will get the KMS 
token (along with other credentials) of the first user, via the above cache.

Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
be UGI-aware, like the FS object cache.


> KMSClientProvider combined with KeyProviderCache results in wrong UGI being 
> used
> 
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.
> Side note: the comment in createConnection that purports to handle the 
> different UGI doesn't seem to cover it says it covers. In our case, we have 
> two unrelated UGIs with no auth (createRemoteUser) with bunch of tokens, 
> including a KMS token, added.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Updated] (HDFS-10757) KMSClientProvider combined with KeyProviderCache results in wrong UGI being used

2016-08-11 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-10757?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-10757:

Description: 
ClientContext::get gets the context from CACHE via a config setting based name, 
then KeyProviderCache stored in ClientContext gets the key provider cached by 
URI from the configuration, too. These would return the same KeyProvider 
regardless of current UGI.
KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular 
that all the users of DFS with KMSClientProvider in a process will get the KMS 
token (along with other credentials) of the first user, via the above cache.

Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
be UGI-aware, like the FS object cache.

  was:
ClientContext::get gets the context from cache via a config setting based name, 
then KeyProviderCache stored in ClientContext gets the key provider cached by 
URI stored in configuration, too.
KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular 
that all the users of DFS with KMSClientProvider in a process will get the KMS 
token (along with other credentials) of the first user...

Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
be UGI-aware, like the FS object cache.


> KMSClientProvider combined with KeyProviderCache results in wrong UGI being 
> used
> 
>
> Key: HDFS-10757
> URL: https://issues.apache.org/jira/browse/HDFS-10757
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>Priority: Critical
>
> ClientContext::get gets the context from CACHE via a config setting based 
> name, then KeyProviderCache stored in ClientContext gets the key provider 
> cached by URI from the configuration, too. These would return the same 
> KeyProvider regardless of current UGI.
> KMSClientProvider caches the UGI (actualUgi) in ctor; that means in 
> particular that all the users of DFS with KMSClientProvider in a process will 
> get the KMS token (along with other credentials) of the first user, via the 
> above cache.
> Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
> be UGI-aware, like the FS object cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10757) KMSClientProvider combined with KeyProviderCache results in wrong UGI being used

2016-08-11 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-10757:
---

 Summary: KMSClientProvider combined with KeyProviderCache results 
in wrong UGI being used
 Key: HDFS-10757
 URL: https://issues.apache.org/jira/browse/HDFS-10757
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin
Priority: Critical


ClientContext::get gets the context from cache via a config setting based name, 
then KeyProviderCache stored in ClientContext gets the key provider cached by 
URI stored in configuration, too.
KMSClientProvider caches the UGI (actualUgi) in ctor; that means in particular 
that all the users of DFS with KMSClientProvider in a process will get the KMS 
token (along with other credentials) of the first user...

Either KMSClientProvider shouldn't store the UGI, or one of the caches should 
be UGI-aware, like the FS object cache.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10414) allow disabling trash on per-directory basis

2016-05-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15288060#comment-15288060
 ] 

Sergey Shelukhin commented on HDFS-10414:
-

1) Is it true for all APIs?
2) This might be hard to modify for existing tools or scripts that are outside 
of one's control.

> allow disabling trash on per-directory basis
> 
>
> Key: HDFS-10414
> URL: https://issues.apache.org/jira/browse/HDFS-10414
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> For ETL, it might be useful to disable trash for certain directories only to 
> avoid the overhead, while keeping it enabled for rest of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Commented] (HDFS-10414) allow disabling trash on per-directory basis

2016-05-17 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-10414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15287091#comment-15287091
 ] 

Sergey Shelukhin commented on HDFS-10414:
-

Suppose there's ETL process consisting of a sequence of Hive queries/other 
tools that write intermediate data to an HDFS directory hierarchy; it would be 
nice to disable trash for the root of that hierarchy, so that the intermediate 
data is not preserved in the trash if it's deleted or moved to a different FS, 
for example. 
However, we don't want to disable the trash for the entire cluster, cause there 
is also production data there for which it should be enabled.

> allow disabling trash on per-directory basis
> 
>
> Key: HDFS-10414
> URL: https://issues.apache.org/jira/browse/HDFS-10414
> Project: Hadoop HDFS
>  Issue Type: Improvement
>Reporter: Sergey Shelukhin
>
> For ETL, it might be useful to disable trash for certain directories only to 
> avoid the overhead, while keeping it enabled for rest of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Created] (HDFS-10414) allow disabling trash on per-directory basis

2016-05-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-10414:
---

 Summary: allow disabling trash on per-directory basis
 Key: HDFS-10414
 URL: https://issues.apache.org/jira/browse/HDFS-10414
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin


For ETL, it might be useful to disable trash for certain directories only to 
avoid the overhead, while keeping it enabled for rest of the cluster.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org



[jira] [Resolved] (HDFS-9567) LlapServiceDriver can fail if only the packaged logger config is present

2015-12-16 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-9567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HDFS-9567.

Resolution: Invalid

Wrong project

> LlapServiceDriver can fail if only the packaged logger config is present
> 
>
> Key: HDFS-9567
> URL: https://issues.apache.org/jira/browse/HDFS-9567
> Project: Hadoop HDFS
>  Issue Type: Bug
>Reporter: Sergey Shelukhin
>
> I was incrementally updating my setup on some VM and didn't have the logger 
> config file, so the packaged one was picked up apparently, which caused this:
> {noformat}
> java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative 
> path in absolute URI: 
> jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties
>   at org.apache.hadoop.fs.Path.initialize(Path.java:205)
>   at org.apache.hadoop.fs.Path.(Path.java:171)
>   at 
> org.apache.hadoop.hive.llap.cli.LlapServiceDriver.run(LlapServiceDriver.java:234)
>   at 
> org.apache.hadoop.hive.llap.cli.LlapServiceDriver.main(LlapServiceDriver.java:58)
> Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
> jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties
>   at java.net.URI.checkPath(URI.java:1823)
>   at java.net.URI.(URI.java:745)
>   at org.apache.hadoop.fs.Path.initialize(Path.java:202)
>   ... 3 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-9567) LlapServiceDriver can fail if only the packaged logger config is present

2015-12-16 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-9567:
--

 Summary: LlapServiceDriver can fail if only the packaged logger 
config is present
 Key: HDFS-9567
 URL: https://issues.apache.org/jira/browse/HDFS-9567
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin


I was incrementally updating my setup on some VM and didn't have the logger 
config file, so the packaged one was picked up apparently, which caused this:
{noformat}
java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path 
in absolute URI: 
jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties
at org.apache.hadoop.fs.Path.initialize(Path.java:205)
at org.apache.hadoop.fs.Path.(Path.java:171)
at 
org.apache.hadoop.hive.llap.cli.LlapServiceDriver.run(LlapServiceDriver.java:234)
at 
org.apache.hadoop.hive.llap.cli.LlapServiceDriver.main(LlapServiceDriver.java:58)
Caused by: java.net.URISyntaxException: Relative path in absolute URI: 
jar:file:/home/vagrant/llap/apache-hive-2.0.0-SNAPSHOT-bin/lib/hive-llap-server-2.0.0-SNAPSHOT.jar!/llap-daemon-log4j2.properties
at java.net.URI.checkPath(URI.java:1823)
at java.net.URI.(URI.java:745)
at org.apache.hadoop.fs.Path.initialize(Path.java:202)
... 3 more
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Attachment: (was: HDFS-7878.05.patch)

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
 HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Attachment: HDFS-7878.05.patch

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
 HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-04-07 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Attachment: HDFS-7878.05.patch

changed to nullable field and added to a couple more places

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
 HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.05.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14376519#comment-14376519
 ] 

Sergey Shelukhin commented on HDFS-7878:


[~cmccabe] are you ok with the latest patch?

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
 HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-03-19 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Attachment: HDFS-7878.04.patch

Removed open, removed field from serialization.
I think inheritance is a much cleaner way to express an FS-specific field 
compared to a null field in the shared FileStatus...

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
 HDFS-7878.03.patch, HDFS-7878.04.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-19 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14369660#comment-14369660
 ] 

Sergey Shelukhin commented on HDFS-7878:


1) Sure
2) That seems error prone; the code operating on FileStatus will have to know 
whether fileId is supposed to be there (if it's not, it's ok to be null, if 
it's supposed to be there and is null, it's a bug); serialization will lose 
data.Writables don't work well with optional fields...

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
 HDFS-7878.03.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-03-17 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Attachment: HDFS-7878.03.patch

Attaching the patch with the class changes (actually it's the same as very 
first patch with some cleanup, javadoc etc.).
Additionally, I added the call on DFS to open file by ID.


 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, 
 HDFS-7878.03.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14364104#comment-14364104
 ] 

Sergey Shelukhin commented on HDFS-7878:


Note: confusingly, HdfsFileStatus is not FileStatus, that's an internal class

[~cmccabe] can you clarify what you mean by two RPCs? The API makes one RPC.
Two RPCs will only happen if the user calls both get ID and get status, which 
is not necessary.


 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-16 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14363709#comment-14363709
 ] 

Sergey Shelukhin commented on HDFS-7878:


That is approximately what was done in the first version of the patch (see 
attached) and then replaced in favor of new API... can you guys ([~cmccabe] and 
[~jingzhao]) decide which way is better :)
I can add the open-by-id method then...

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-11 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14357646#comment-14357646
 ] 

Sergey Shelukhin commented on HDFS-7878:


[~cmccabe] ping?

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353579#comment-14353579
 ] 

Sergey Shelukhin commented on HDFS-7878:


ping?

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-03-09 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Attachment: HDFS-7878.02.patch

added javadoc

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353801#comment-14353801
 ] 

Sergey Shelukhin commented on HDFS-7878:


in particular I wonder why getFileStatus API is privileged and has to be 
consistent with getFileId. 
If you call getFileStatus and open currently, you can have the same problem - 
status from one file, open from different file.
ID allows to overcome this by getting ID *first*, then using ID-based path. Of 
course if ID is obtained separately there's no guarantee but there's no way to 
overcome this.
I don't care either way about subclass or method approach.


 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14353766#comment-14353766
 ] 

Sergey Shelukhin commented on HDFS-7878:


For file opening case, cannot the file be opened by getting ID first, then 
using the ID-based path as indicated above?

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.02.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7895) open and getFileInfo APIs treat paths inconsistently wrt protocol

2015-03-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7895:
---
Summary: open and getFileInfo APIs treat paths inconsistently wrt protocol  
(was: open and getFileInfo APIs treat paths inconsistently)

 open and getFileInfo APIs treat paths inconsistently wrt protocol
 -

 Key: HDFS-7895
 URL: https://issues.apache.org/jira/browse/HDFS-7895
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Sergey Shelukhin
Assignee: Jing Zhao
Priority: Minor

 When open() is called with regular HDFS path, hdfs://blah/blah/blah, it 
 appears to work.
 However, getFileInfo doesn't
 {noformat}
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.InvalidPathException):
  Invalid path name Invalid file name: 
 hdfs://localhost:9000/apps/hive/warehouse/tpch_2.db/lineitem_orc/01_0
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4128)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 at org.apache.hadoop.ipc.Client.call(Client.java:1468)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
 {noformat}
 1) this seems inconsistent.
 2) not clear why the validation should reject what looks like a good HDFS 
 path. At least, client code should clean this stuff up on the way.
 [~prasanth_j] has the details, I just filed a bug so I could mention how 
 buggy HDFS is to [~jingzhao] :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7895) open and getFileInfo APIs treat paths inconsistently

2015-03-05 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-7895:
--

 Summary: open and getFileInfo APIs treat paths inconsistently
 Key: HDFS-7895
 URL: https://issues.apache.org/jira/browse/HDFS-7895
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin
Assignee: Jing Zhao
Priority: Minor


When open() is called with regular HDFS path, hdfs://blah/blah/blah, it appears 
to work.
However, getFileInfo doesn't
{noformat}
Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.InvalidPathException):
 Invalid path name Invalid file name: 
hdfs://localhost:9000/apps/hive/warehouse/tpch_2.db/lineitem_orc/01_0
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4128)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)

at org.apache.hadoop.ipc.Client.call(Client.java:1468)
at org.apache.hadoop.ipc.Client.call(Client.java:1399)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
{noformat}

1) this seems inconsistent.
2) not clear why the validation should reject what looks like a good HDFS path. 
At least, client code should clean this stuff up on the way.

[~prasanth_j] has the details, I just filed a bug so I could mention how buggy 
HDFS is to [~jingzhao] :)




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7895) open and getFileInfo APIs treat paths inconsistently

2015-03-05 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7895:
---
Affects Version/s: 2.6.0

 open and getFileInfo APIs treat paths inconsistently
 

 Key: HDFS-7895
 URL: https://issues.apache.org/jira/browse/HDFS-7895
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 2.6.0
Reporter: Sergey Shelukhin
Assignee: Jing Zhao
Priority: Minor

 When open() is called with regular HDFS path, hdfs://blah/blah/blah, it 
 appears to work.
 However, getFileInfo doesn't
 {noformat}
 Caused by: 
 org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.InvalidPathException):
  Invalid path name Invalid file name: 
 hdfs://localhost:9000/apps/hive/warehouse/tpch_2.db/lineitem_orc/01_0
 at 
 org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getFileInfo(FSNamesystem.java:4128)
 at 
 org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.getFileInfo(NameNodeRpcServer.java:838)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.getFileInfo(ClientNamenodeProtocolServerSideTranslatorPB.java:821)
 at 
 org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:619)
 at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:962)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2039)
 at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2035)
 at java.security.AccessController.doPrivileged(Native Method)
 at javax.security.auth.Subject.doAs(Subject.java:415)
 at 
 org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1628)
 at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2033)
 at org.apache.hadoop.ipc.Client.call(Client.java:1468)
 at org.apache.hadoop.ipc.Client.call(Client.java:1399)
 at 
 org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
 at com.sun.proxy.$Proxy16.getFileInfo(Unknown Source)
 at 
 org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:752)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
 at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
 at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:606)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
 at 
 org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
 at com.sun.proxy.$Proxy17.getFileInfo(Unknown Source)
 at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:1988)
 {noformat}
 1) this seems inconsistent.
 2) not clear why the validation should reject what looks like a good HDFS 
 path. At least, client code should clean this stuff up on the way.
 [~prasanth_j] has the details, I just filed a bug so I could mention how 
 buggy HDFS is to [~jingzhao] :)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-05 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349859#comment-14349859
 ] 

Sergey Shelukhin commented on HDFS-7878:


[~jingzhao] actually getFileId already normalizes the file path (responding to 
gtalk discussion)
So this is ready for +1 ;)

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Assigned] (HDFS-7878) API - expose an unique file identifier

2015-03-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin reassigned HDFS-7878:
--

Assignee: Sergey Shelukhin

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-03-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Attachment: HDFS-7878.patch

this patch exposes fileId via normal FileStatus.
I can add separate API instead, which will be a smaller change, but it will add 
a separate API... please advise

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
 Attachments: HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347653#comment-14347653
 ] 

Sergey Shelukhin commented on HDFS-7878:


[~jingzhao] can you please review?

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
 Attachments: HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-03-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Status: Patch Available  (was: Open)

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
 Attachments: HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7878) API - expose an unique file identifier

2015-03-04 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7878:
---
Attachment: HDFS-7878.01.patch

Updated to just have the API...

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-04 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14347889#comment-14347889
 ] 

Sergey Shelukhin commented on HDFS-7878:


This QA was for old patch...

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin
Assignee: Sergey Shelukhin
 Attachments: HDFS-7878.01.patch, HDFS-7878.patch


 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345824#comment-14345824
 ] 

Sergey Shelukhin commented on HDFS-7878:


HdfsFileStatus is not inherited from FileStatus.
Do you mean dfs.getClient().getFileInfo(path)?

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin

 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7878) API - expose an unique file identifier

2015-03-03 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-7878:
--

 Summary: API - expose an unique file identifier
 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin


See HDFS-487.
Even though that is resolved as duplicate, the ID is actually not exposed by 
the JIRA it supposedly duplicates.
INode ID for the file should be easy to expose; alternatively ID could be 
derived from block IDs, to account for appends...

This is useful e.g. for cache key by file, to make sure cache stays correct 
when file is overwritten.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7878) API - expose an unique file identifier

2015-03-03 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7878?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14345930#comment-14345930
 ] 

Sergey Shelukhin commented on HDFS-7878:


Can you make it public? Or better yet add API to FileSustem.

 API - expose an unique file identifier
 --

 Key: HDFS-7878
 URL: https://issues.apache.org/jira/browse/HDFS-7878
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin

 See HDFS-487.
 Even though that is resolved as duplicate, the ID is actually not exposed by 
 the JIRA it supposedly duplicates.
 INode ID for the file should be easy to expose; alternatively ID could be 
 derived from block IDs, to account for appends...
 This is useful e.g. for cache key by file, to make sure cache stays correct 
 when file is overwritten.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7825) HdfsDataInputStream::read(ByteBuffer) method doesn't conform to its API

2015-02-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334103#comment-14334103
 ] 

Sergey Shelukhin commented on HDFS-7825:


[~hagleitn] [~sseth] fyi :)

 HdfsDataInputStream::read(ByteBuffer) method doesn't conform to its API
 ---

 Key: HDFS-7825
 URL: https://issues.apache.org/jira/browse/HDFS-7825
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin

 ByteBufferReadable::read(ByteBuffer) javadoc says:
 {noformat}
 After a successful call, buf.position() and buf.limit() should be unchanged, 
 and therefore any data can be immediately read from buf. buf.mark() may be 
 cleared or updated. 
 {noformat}
 I have the following code: 
 {noformat}
 ByteBuffer directBuf = ByteBuffer.allocateDirect(len);
 int pos = directBuf.position();
 int count = file.read(directBuf);
 if (count  0) throw new EOFException();
 if (directBuf.position() != pos) {
   RecordReaderImpl.LOG.info(Warning - position mismatch from  + 
 file.getClass()
   + : after reading  + count + , expected  + pos +  but 
 got  + directBuf.position());
 }
 {noformat}
 and I get:
 {noformat}
 15/02/23 15:30:56 [pool-4-thread-1] INFO orc.RecordReaderImpl : Warning - 
 position mismatch from class 
 org.apache.hadoop.hdfs.client.HdfsDataInputStream: after reading 6, expected 
 0 but got 6
 {noformat}
 So the position is changed, unlike the API doc indicates.
 Also, while I haven't verified yet, it may be that the 0-length read is not 
 handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HDFS-7825) read(ByteBuffer) method doesn't conform to its API

2015-02-23 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-7825:
--

 Summary: read(ByteBuffer) method doesn't conform to its API
 Key: HDFS-7825
 URL: https://issues.apache.org/jira/browse/HDFS-7825
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin


ByteBufferReadable::read(ByteBuffer) javadoc says:
{noformat}
After a successful call, buf.position() and buf.limit() should be unchanged, 
and therefore any data can be immediately read from buf. buf.mark() may be 
cleared or updated. 
{noformat}

I have the following code: 
{noformat}
ByteBuffer directBuf = ByteBuffer.allocateDirect(len);
int pos = directBuf.position();

int count = file.read(directBuf);
if (count  0) throw new EOFException();
if (directBuf.position() != pos) {
  RecordReaderImpl.LOG.info(Warning - position mismatch from  + 
file.getClass()
  + : after reading  + count + , expected  + pos +  but 
got  + directBuf.position());
}
{noformat}

and I get:

{noformat}
15/02/23 15:30:56 [pool-4-thread-1] INFO orc.RecordReaderImpl : Warning - 
position mismatch from class org.apache.hadoop.hdfs.client.HdfsDataInputStream: 
after reading 6, expected 0 but got 6
{noformat}
So the position is changed, unlike the API doc indicates.

Also, while I haven't verified yet, it may be that the 0-length read is not 
handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-7825) HdfsDataInputStream::read(ByteBuffer) method doesn't conform to its API

2015-02-23 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14334098#comment-14334098
 ] 

Sergey Shelukhin commented on HDFS-7825:


[~jingzhao] can you take a look?

 HdfsDataInputStream::read(ByteBuffer) method doesn't conform to its API
 ---

 Key: HDFS-7825
 URL: https://issues.apache.org/jira/browse/HDFS-7825
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin

 ByteBufferReadable::read(ByteBuffer) javadoc says:
 {noformat}
 After a successful call, buf.position() and buf.limit() should be unchanged, 
 and therefore any data can be immediately read from buf. buf.mark() may be 
 cleared or updated. 
 {noformat}
 I have the following code: 
 {noformat}
 ByteBuffer directBuf = ByteBuffer.allocateDirect(len);
 int pos = directBuf.position();
 int count = file.read(directBuf);
 if (count  0) throw new EOFException();
 if (directBuf.position() != pos) {
   RecordReaderImpl.LOG.info(Warning - position mismatch from  + 
 file.getClass()
   + : after reading  + count + , expected  + pos +  but 
 got  + directBuf.position());
 }
 {noformat}
 and I get:
 {noformat}
 15/02/23 15:30:56 [pool-4-thread-1] INFO orc.RecordReaderImpl : Warning - 
 position mismatch from class 
 org.apache.hadoop.hdfs.client.HdfsDataInputStream: after reading 6, expected 
 0 but got 6
 {noformat}
 So the position is changed, unlike the API doc indicates.
 Also, while I haven't verified yet, it may be that the 0-length read is not 
 handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HDFS-7825) HdfsDataInputStream::read(ByteBuffer) method doesn't conform to its API

2015-02-23 Thread Sergey Shelukhin (JIRA)

 [ 
https://issues.apache.org/jira/browse/HDFS-7825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin updated HDFS-7825:
---
Summary: HdfsDataInputStream::read(ByteBuffer) method doesn't conform to 
its API  (was: read(ByteBuffer) method doesn't conform to its API)

 HdfsDataInputStream::read(ByteBuffer) method doesn't conform to its API
 ---

 Key: HDFS-7825
 URL: https://issues.apache.org/jira/browse/HDFS-7825
 Project: Hadoop HDFS
  Issue Type: Bug
Reporter: Sergey Shelukhin

 ByteBufferReadable::read(ByteBuffer) javadoc says:
 {noformat}
 After a successful call, buf.position() and buf.limit() should be unchanged, 
 and therefore any data can be immediately read from buf. buf.mark() may be 
 cleared or updated. 
 {noformat}
 I have the following code: 
 {noformat}
 ByteBuffer directBuf = ByteBuffer.allocateDirect(len);
 int pos = directBuf.position();
 int count = file.read(directBuf);
 if (count  0) throw new EOFException();
 if (directBuf.position() != pos) {
   RecordReaderImpl.LOG.info(Warning - position mismatch from  + 
 file.getClass()
   + : after reading  + count + , expected  + pos +  but 
 got  + directBuf.position());
 }
 {noformat}
 and I get:
 {noformat}
 15/02/23 15:30:56 [pool-4-thread-1] INFO orc.RecordReaderImpl : Warning - 
 position mismatch from class 
 org.apache.hadoop.hdfs.client.HdfsDataInputStream: after reading 6, expected 
 0 but got 6
 {noformat}
 So the position is changed, unlike the API doc indicates.
 Also, while I haven't verified yet, it may be that the 0-length read is not 
 handled properly.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HDFS-5916) provide API to bulk delete directories/files

2014-02-10 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896902#comment-13896902
 ] 

Sergey Shelukhin commented on HDFS-5916:


1-2-3 are both up to you, for the case I have in mind it should operate like a 
sequence of regular deletes, for (1) probably best-effort, 2 - no, 3 - 
non-atomically. But that could be controlled by parameters.
4 - what do other operations do? As far as I recall some of them can recover.

Can you provide details on how to enforce multiple RPC calls in one for this 
case?  We currently use FileSystem/DistributedFileSystem interface.
The workaround wouldn't work, due to legacy users as well as due to the fact 
that the files/dirs are already in the same path, it's just that we don't want 
to delete all of them - e.g. from /path/A, /path/B/, /path/C/ and /path/D we 
only want to delete B and D (of course with longer lists)

 provide API to bulk delete directories/files
 

 Key: HDFS-5916
 URL: https://issues.apache.org/jira/browse/HDFS-5916
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin

 It would be nice to have an API to delete directories and files in bulk - for 
 example, when deleting Hive partitions or HBase regions in large numbers, the 
 code could avoid many trips to NN. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Created] (HDFS-5916) provide API to bulk delete directories/files

2014-02-09 Thread Sergey Shelukhin (JIRA)
Sergey Shelukhin created HDFS-5916:
--

 Summary: provide API to bulk delete directories/files
 Key: HDFS-5916
 URL: https://issues.apache.org/jira/browse/HDFS-5916
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin


It would be nice to have an API to delete directories and files in bulk - for 
example, when deleting Hive partitions or HBase regions in large numbers, the 
code could avoid many trips to NN. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HDFS-5916) provide API to bulk delete directories/files

2014-02-09 Thread Sergey Shelukhin (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-5916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13896195#comment-13896195
 ] 

Sergey Shelukhin commented on HDFS-5916:


I mean the programmatic API (e.g. on FileSystem class) for an arbitrary list of 
directories (which have common parent sub-tree in these cases, but don't have 
to I guess).
i.e. ListPath dirList = ...; FileSystem fs = ...; fs.deleteAll(dirList);

 provide API to bulk delete directories/files
 

 Key: HDFS-5916
 URL: https://issues.apache.org/jira/browse/HDFS-5916
 Project: Hadoop HDFS
  Issue Type: Improvement
Reporter: Sergey Shelukhin

 It would be nice to have an API to delete directories and files in bulk - for 
 example, when deleting Hive partitions or HBase regions in large numbers, the 
 code could avoid many trips to NN. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)