[jira] [Reopened] (HADOOP-10434) Is it possible to use "df" to calculate the dfs usage instead of "du"

2016-12-18 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HADOOP-10434:
--

Reopening to close as Duplicate status vs. Fixed.

> Is it possible to use "df" to calculate the dfs usage instead of "du"
> -
>
> Key: HADOOP-10434
> URL: https://issues.apache.org/jira/browse/HADOOP-10434
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.3.0
>Reporter: MaoYuan Xian
>Priority: Minor
>  Labels: BB2015-05-TBR
> Attachments: HADOOP-10434-1.patch
>
>
> When we run datanode from the machine with big disk volume, it's found du 
> operations from org.apache.hadoop.fs.DU's DURefreshThread cost lots of disk 
> performance.
> As we use the whole disk for hdfs storage, it is possible calculate volume 
> usage via "df" command. Is it necessary adding the "df" option for usage 
> calculation in hdfs 
> (org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-10434) Is it possible to use "df" to calculate the dfs usage instead of "du"

2016-12-18 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-10434.
--
Resolution: Duplicate

> Is it possible to use "df" to calculate the dfs usage instead of "du"
> -
>
> Key: HADOOP-10434
> URL: https://issues.apache.org/jira/browse/HADOOP-10434
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 2.3.0
>Reporter: MaoYuan Xian
>Priority: Minor
>  Labels: BB2015-05-TBR
> Fix For: 2.8.0
>
> Attachments: HADOOP-10434-1.patch
>
>
> When we run datanode from the machine with big disk volume, it's found du 
> operations from org.apache.hadoop.fs.DU's DURefreshThread cost lots of disk 
> performance.
> As we use the whole disk for hdfs storage, it is possible calculate volume 
> usage via "df" command. Is it necessary adding the "df" option for usage 
> calculation in hdfs 
> (org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.BlockPoolSlice)?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13817) Add a finite shell command timeout to ShellBasedUnixGroupsMapping

2016-11-14 Thread Harsh J (JIRA)
Harsh J created HADOOP-13817:


 Summary: Add a finite shell command timeout to 
ShellBasedUnixGroupsMapping
 Key: HADOOP-13817
 URL: https://issues.apache.org/jira/browse/HADOOP-13817
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.6.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


The ShellBasedUnixGroupsMapping run various {{id}} commands via the 
ShellCommandExecutor modules without a timeout set (its set to 0, which implies 
infinite).

If this command hangs for a long time on the OS end due to an unresponsive 
groups backend or other reasons, it also blocks the handlers that use it on the 
NameNode (or other services that use this class). That inadvertently causes odd 
timeout troubles on the client end where its forced to retry (only to likely 
run into such hangs again with every attempt until at least one command 
returns).

It would be helpful to have a finite command timeout after which we may give up 
on the command and return the result equivalent of no groups found.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-8134) DNS claims to return a hostname but returns a PTR record in some cases

2016-10-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8134.
-
Resolution: Not A Problem
  Assignee: (was: Harsh J)

This hasn't proven as a problem in late. Closing as stale.

> DNS claims to return a hostname but returns a PTR record in some cases
> --
>
> Key: HADOOP-8134
> URL: https://issues.apache.org/jira/browse/HADOOP-8134
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: util
>Affects Versions: 0.23.0
>Reporter: Harsh J
>Priority: Minor
>
> Per Shrijeet on HBASE-4109:
> {quote}
> If you are using an interface anything other than 'default' (literally that 
> keyword) DNS.java's getDefaultHost will return a string which will have a 
> trailing period at the end. It seems javadoc of reverseDns in DNS.java (see 
> below) is conflicting with what that function is actually doing. 
> It is returning a PTR record while claims it returns a hostname. The PTR 
> record always has period at the end , RFC: 
> http://irbs.net/bog-4.9.5/bog47.html
> We make call to DNS.getDefaultHost at more than one places and treat that as 
> actual hostname.
> Quoting HRegionServer for example
> String machineName = DNS.getDefaultHost(conf.get(
> "hbase.regionserver.dns.interface", "default"), conf.get(
> "hbase.regionserver.dns.nameserver", "default"));
> We may want to sanitize the string returned from DNS class. Or better we can 
> take a path of overhauling the way we do DNS name matching all over.
> {quote}
> While HBase has worked around the issue, we should fix the methods that 
> aren't doing what they've intended.
> 1. We fix the method. This may be an 'incompatible change'. But I do not know 
> who outside of us uses DNS classes.
> 2. We fix HDFS's DN at the calling end, cause that is affected by the 
> trailing period in its reporting back to the NN as well (Just affects NN->DN 
> weblinks, non critical).
> For 2, we can close this and open a HDFS JIRA.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-7505) EOFException in RPC stack should have a nicer error message

2016-10-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-7505.
-
Resolution: Duplicate
  Assignee: (was: Harsh J)

This seems to be taken care (in part) via HADOOP-7346

> EOFException in RPC stack should have a nicer error message
> ---
>
> Key: HADOOP-7505
> URL: https://issues.apache.org/jira/browse/HADOOP-7505
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Affects Versions: 0.23.0
>Reporter: Eli Collins
>Priority: Minor
>
> Lots of user logs involve a user running mismatched versions, and for some 
> reason or another, they get EOFException instead of a proper version mismatch 
> exception. We should be able to catch this at appropriate points, and have a 
> nicer exception message explaining that it's a possible version mismatch, or 
> that they're trying to connect to the incorrect port.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-8579) Websites for HDFS and MapReduce both send users to video training resource which is non-public

2016-10-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8579?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8579.
-
Resolution: Not A Problem
  Assignee: (was: Harsh J)

This does not appear to be a problem after the project re-merge.

> Websites for HDFS and MapReduce both send users to video training resource 
> which is non-public
> --
>
> Key: HADOOP-8579
> URL: https://issues.apache.org/jira/browse/HADOOP-8579
> Project: Hadoop Common
>  Issue Type: Bug
> Environment: website
>Reporter: David L. Willson
>Priority: Minor
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> Main pages for HDFS and MapReduce send new user to unavailable training 
> resource.
> These two pages:
> http://hadoop.apache.org/mapreduce/
> http://hadoop.apache.org/hdfs/
> Link to this page:
> http://vimeo.com/3584536
> That page is not public, and not shared to all registered Vimeo users, and I 
> see nothing indicating how to ask for access to the resource.
> Please make the vids public, or remove the link of disappointment.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-8863) Eclipse plugin may not be working on Juno due to changes in it

2016-10-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8863.
-
Resolution: Won't Fix
  Assignee: (was: Harsh J)

The eclipse plugin is formally out.

> Eclipse plugin may not be working on Juno due to changes in it
> --
>
> Key: HADOOP-8863
> URL: https://issues.apache.org/jira/browse/HADOOP-8863
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: contrib/eclipse-plugin
>Affects Versions: 1.2.0
>Reporter: Harsh J
>
> We need to debug/investigate why it is so.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-13515) Redundant transitionToActive call can cause a NameNode to crash

2016-08-18 Thread Harsh J (JIRA)
Harsh J created HADOOP-13515:


 Summary: Redundant transitionToActive call can cause a NameNode to 
crash
 Key: HADOOP-13515
 URL: https://issues.apache.org/jira/browse/HADOOP-13515
 Project: Hadoop Common
  Issue Type: Bug
  Components: ha
Affects Versions: 2.5.0
Reporter: Harsh J
Priority: Minor


The situation in parts is similar to HADOOP-8217, but the cause is different 
and so is the result.

Consider this situation:

- At the beginning NN1 is Active, NN2 is Standby
- ZKFC1 faces a ZK disconnect (not a session timeout, just a socket disconnect) 
and thereby reconnects

{code}
2016-08-11 07:00:46,068 INFO org.apache.zookeeper.ClientCnxn: Client session 
timed out, have not heard from server in 4000ms for sessionid 
0x4566f0c97500bd9, closing socket connection and attempting reconnect
2016-08-11 07:00:46,169 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session 
disconnected. Entering neutral mode...
…
2016-08-11 07:00:46,610 INFO org.apache.hadoop.ha.ActiveStandbyElector: Session 
connected.
{code}

- The reconnection on the ZKFC1 triggers the elector code, and the elector 
re-run finds that NN1 should be the new active (a redundant decision cause NN1 
is already active)

{code}
2016-08-11 07:00:46,615 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
Checking for any old active which needs to be fenced...
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: Old 
node exists: …
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: But old 
node has our own data, so don't need to fence it.
{code}

- The ZKFC1 sets the new ZK data, and fires a NN1 RPC call of transitionToActive

{code}
2016-08-11 07:00:46,630 INFO org.apache.hadoop.ha.ActiveStandbyElector: Writing 
znode /hadoop-ha/nameservice1/ActiveBreadCrumb to indicate that the local node 
is the most recent active...
2016-08-11 07:00:46,649 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 175: 
Call -> nn01/10.10.10.10:8022: transitionToActive {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
{code}

- At the same time as the transitionToActive call is in progress at NN1, but 
not complete yet, the ZK session of ZKFC1 is timed out by ZK Quorum, and a 
watch notification is sent to ZKFC2

{code}
2016-08-11 07:01:00,003 DEBUG org.apache.zookeeper.ClientCnxn: Got notification 
sessionid:0x4566f0c97500bde
2016-08-11 07:01:00,004 DEBUG org.apache.zookeeper.ClientCnxn: Got WatchedEvent 
state:SyncConnected type:NodeDeleted 
path:/hadoop-ha/nameservice1/ActiveStandbyElectorLock for sessionid 
0x4566f0c97500bde
{code}

- ZKFC2 responds by marking NN2 as standby, which succeeds (NN hasn't handled 
transitionToActive call yet due to busy status, but has handled 
transitionToStandby before it)

{code}
2016-08-11 07:01:00,013 INFO org.apache.hadoop.ha.ActiveStandbyElector: 
Checking for any old active which needs to be fenced...
2016-08-11 07:01:00,018 INFO org.apache.hadoop.ha.ZKFailoverController: Should 
fence: NameNode at nn01/10.10.10.10:8022
2016-08-11 07:01:00,020 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 412: 
Call -> nn01/10.10.10.10:8022: transitionToStandby {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
2016-08-11 07:01:03,880 DEBUG org.apache.hadoop.ipc.ProtobufRpcEngine: Call: 
transitionToStandby took 3860ms
{code}

- ZKFC2 then marks NN2 as active, and NN2 begins its transition (is in midst of 
it, not done yet at this point)

{code}
2016-08-11 07:01:03,894 INFO org.apache.hadoop.ha.ZKFailoverController: Trying 
to make NameNode at nn02/11.11.11.11:8022 active...
2016-08-11 07:01:03,895 TRACE org.apache.hadoop.ipc.ProtobufRpcEngine: 412: 
Call -> nn02/11.11.11.11:8022: transitionToActive {reqInfo { reqSource: 
REQUEST_BY_ZKFC }}
…
{code}

{code}
2016-08-11 07:01:09,558 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for active state
…
2016-08-11 07:01:19,968 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
{code}

- At the same time in parallel NN1 processes the transitionToActive requests 
finally, and becomes active

{code}
2016-08-11 07:01:13,281 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required 
for active state
…
2016-08-11 07:01:19,599 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
…
2016-08-11 07:01:19,602 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: 
Starting log segment at 5635
{code}

- NN2's active transition fails as a result of this parallel active transition 
on NN1 which has completed right before it tries to take over

{code}
2016-08-11 07:01:19,968 INFO 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Will take over writing 
edit logs at txnid 5635
2016-08-11 07:01:22,799 FATAL org.apache.hadoop.hdfs.server.namenode.NameNode: 
Error encountered requiring NN

[jira] [Created] (HADOOP-13056) Print expected values when rejecting a server's determined principal

2016-04-22 Thread Harsh J (JIRA)
Harsh J created HADOOP-13056:


 Summary: Print expected values when rejecting a server's 
determined principal
 Key: HADOOP-13056
 URL: https://issues.apache.org/jira/browse/HADOOP-13056
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Affects Versions: 2.5.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial


When an address-constructed service principal by a client does not match a 
provided pattern or the configured principal property, the error is very 
uninformative on what the specific cause is. Currently the only error printed 
is, in both cases:

{code}
 java.lang.IllegalArgumentException: Server has invalid Kerberos principal: 
hdfs/host.internal@REALM
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-13051) Test for special characters in path being respected during globPaths

2016-04-22 Thread Harsh J (JIRA)
Harsh J created HADOOP-13051:


 Summary: Test for special characters in path being respected 
during globPaths
 Key: HADOOP-13051
 URL: https://issues.apache.org/jira/browse/HADOOP-13051
 Project: Hadoop Common
  Issue Type: Test
  Components: fs
Affects Versions: 3.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


On {{branch-2}}, the below is the (incorrect) behaviour today, where paths with 
special characters get dropped during globStatus calls:

{code}
bin/hdfs dfs -mkdir /foo
bin/hdfs dfs -touchz /foo/foo1
bin/hdfs dfs -touchz $'/foo/foo1\r'
bin/hdfs dfs -ls '/foo/*'
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1^M
bin/hdfs dfs -ls '/foo/*'
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1
{code}

Whereas trunk has the right behaviour, subtly fixed via the pattern library 
change of HADOOP-12436:

{code}
bin/hdfs dfs -mkdir /foo
bin/hdfs dfs -touchz /foo/foo1
bin/hdfs dfs -touchz $'/foo/foo1\r'
bin/hdfs dfs -ls '/foo/*'
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1^M
bin/hdfs dfs -ls '/foo/*'
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1
-rw-r--r--   3 harsh supergroup  0 2016-04-22 17:35 /foo/foo1^M
{code}

(I've placed a ^M explicitly to indicate presence of the intentional hidden 
character)

We should still add a simple test-case to cover this situation for future 
regressions.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12970) Intermittent signature match failures in S3AFileSystem due connection closure

2016-03-28 Thread Harsh J (JIRA)
Harsh J created HADOOP-12970:


 Summary: Intermittent signature match failures in S3AFileSystem 
due connection closure
 Key: HADOOP-12970
 URL: https://issues.apache.org/jira/browse/HADOOP-12970
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.7.0
Reporter: Harsh J
Assignee: Harsh J


S3AFileSystem's use of {{ObjectMetadata#clone()}} method inside the 
{{copyFile}} implementation may fail in circumstances where the connection used 
for obtaining the metadata is closed by the server (i.e. response carries a 
{{Connection: close}} header). Due to this header not being stripped away when 
the {{ObjectMetadata}} is created, and due to us cloning it for use in the next 
{{CopyObjectRequest}}, it causes the request to use {{Connection: close}} 
headers as a part of itself.

This causes signer related exceptions because the client now includes the 
{{Connection}} header as part of the {{SignedHeaders}}, but the S3 server does 
not receive the same value for it ({{Connection}} headers are likely stripped 
away before the S3 Server tries to match signature hashes), causing a failure 
like below:

{code}
2016-03-29 19:59:30,120 DEBUG [s3a-transfer-shared--pool1-t35] 
org.apache.http.wire: >> "Authorization: AWS4-HMAC-SHA256 
Credential=XXX/20160329/eu-central-1/s3/aws4_request, 
SignedHeaders=accept-ranges;connection;content-length;content-type;etag;host;last-modified;user-agent;x-amz-acl;x-amz-content-sha256;x-amz-copy-source;x-amz-date;x-amz-metadata-directive;x-amz-server-side-encryption;x-amz-version-id,
 Signature=MNOPQRSTUVWXYZ[\r][\n]"
…
com.amazonaws.services.s3.model.AmazonS3Exception: The request signature we 
calculated does not match the signature you provided. Check your key and 
signing method. (Service: Amazon S3; Status Code: 403; Error Code: 
SignatureDoesNotMatch; Request ID: ABC), S3 Extended Request ID: XYZ
{code}

This is intermittent because the S3 Server does not always add a {{Connection: 
close}} directive in its response, but whenever we receive it AND we clone it, 
the above exception would happen for the copy request. The copy request is 
often used in the context of FileOutputCommitter, when a lot of the MR attempt 
files on {{s3a://}} destination filesystem are to be moved to their parent 
directories post-commit.

I've also submitted a fix upstream with AWS Java SDK to strip out the 
{{Connection}} headers when dealing with {{ObjectMetadata}}, which is pending 
acceptance and release at: https://github.com/aws/aws-sdk-java/pull/669, but 
until that release is available and can be used by us, we'll need to workaround 
the clone approach by manually excluding the {{Connection}} header (not 
straight-forward due to the {{metadata}} object being private with no mutable 
access). We can remove such a change in future when there's a release available 
with the upstream fix.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12894) Add yarn.app.mapreduce.am.log.level to mapred-default.xml

2016-03-05 Thread Harsh J (JIRA)
Harsh J created HADOOP-12894:


 Summary: Add yarn.app.mapreduce.am.log.level to mapred-default.xml
 Key: HADOOP-12894
 URL: https://issues.apache.org/jira/browse/HADOOP-12894
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Affects Versions: 2.9.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-12549) Extend HDFS-7456 default generically to all pattern lookups

2015-11-03 Thread Harsh J (JIRA)
Harsh J created HADOOP-12549:


 Summary: Extend HDFS-7456 default generically to all pattern 
lookups
 Key: HADOOP-12549
 URL: https://issues.apache.org/jira/browse/HADOOP-12549
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc, security
Affects Versions: 2.7.1
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


In HDFS-7546 we added a hdfs-default.xml property to bring back the regular 
behaviour of trusting all principals (as was the case before HADOOP-9789). 
However, the change only targeted HDFS users and also only those that used the 
default-loading mechanism of Configuration class (i.e. not {{new 
Configuration(false)}} users).

I'd like to propose adding the same default to the generic RPC client code 
also, so the default affects all form of clients equally.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-9461) JobTracker and NameNode both grant delegation tokens to non-secure clients

2015-03-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9461.
-
Resolution: Won't Fix

Not an issue on trunk/branch-2.

 JobTracker and NameNode both grant delegation tokens to non-secure clients
 --

 Key: HADOOP-9461
 URL: https://issues.apache.org/jira/browse/HADOOP-9461
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor

 If one looks at the MAPREDUCE-1516 added logic in JobTracker.java's 
 isAllowedDelegationTokenOp() method, and apply non-secure states of 
 UGI.isSecurityEnabled == false and authMethod == SIMPLE, the return result is 
 true when the intention is false (due to the shorted conditionals).
 This is allowing non-secure JobClients to easily request and use 
 DelegationTokens and cause unwanted errors to be printed in the JobTracker 
 when the renewer attempts to run. Ideally such clients ought to get an error 
 if they request a DT in non-secure mode.
 HDFS in trunk and branch-1 both too have the same problem. Trunk MR 
 (HistoryServer) and YARN are however, unaffected due to a simpler, inlined 
 logic instead of reuse of this faulty method.
 Note that fixing this will break Oozie today, due to the merged logic of 
 OOZIE-734. Oozie will require a fix as well if this is to be fixed in 
 branch-1. As a result, I'm going to mark this as an Incompatible Change.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11512) Use getTrimmedStrings when reading serialization keys

2015-01-27 Thread Harsh J (JIRA)
Harsh J created HADOOP-11512:


 Summary: Use getTrimmedStrings when reading serialization keys
 Key: HADOOP-11512
 URL: https://issues.apache.org/jira/browse/HADOOP-11512
 Project: Hadoop Common
  Issue Type: Bug
  Components: conf
Affects Versions: 2.6.0
Reporter: Harsh J
Priority: Minor


In the file 
{{hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/serializer/SerializationFactory.java}},
 we grab the IO_SERIALIZATIONS_KEY config as Configuration#getStrings(…) which 
does not trim the input. This could cause confusing user issues if someone 
manually overrides the key in the XML files/Configuration object without using 
the dynamic approach.

The call should instead use Configuration#getTrimmedStrings(…), so the 
whitespace is trimmed before the class names are searched on the classpath.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (HADOOP-11488) Difference in default connection timeout for S3A FS

2015-01-18 Thread Harsh J (JIRA)
Harsh J created HADOOP-11488:


 Summary: Difference in default connection timeout for S3A FS
 Key: HADOOP-11488
 URL: https://issues.apache.org/jira/browse/HADOOP-11488
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 2.6.0
Reporter: Harsh J
Priority: Minor


The core-default.xml defines fs.s3a.connection.timeout as 5000, and the code 
under hadoop-tools/hadoop-aws defines it as 5.

We should update the former to 50s so it gets taken properly, as we're also 
noticing that 5s is often too low, especially in cases such as large DistCp 
operations (which fail with {{Read timed out}} errors from the S3 service).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: a friendly suggestion for developers when uploading patches

2014-12-04 Thread Harsh J
I've added you in as YongjunZhang. Please let me know if you are still
unable to edit after a relogin.

On Wed, Dec 3, 2014 at 1:43 AM, Yongjun Zhang yzh...@cloudera.com wrote:
 Thanks Allen, Andrew and Tsuyoshi.

 My wiki user name is YongjunZhang, I will appreciate it very much if
 someone can give me the permission to edit the wiki pages. Thanks.

 --Yongjun

 On Tue, Dec 2, 2014 at 11:04 AM, Andrew Wang andrew.w...@cloudera.com
 wrote:

 I just updated the wiki to say that the version number format is preferred.
 Yongjun, if you email out your wiki username, someone (?) can give you
 privs.

 On Tue, Dec 2, 2014 at 10:16 AM, Allen Wittenauer a...@altiscale.com
 wrote:

  I think people forget we have a wiki that documents this and other things
  ...
 
  https://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch
 
  On Dec 2, 2014, at 10:01 AM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com
  wrote:
 
   jiraNameId.[branchName.]revisionNum.patch*
  
   +1 for this format. Thanks for starting the discussion, Yongjun.
  
   - Tsuyoshi
  
   On Tue, Dec 2, 2014 at 9:34 AM, Yongjun Zhang yzh...@cloudera.com
  wrote:
   Thank you all for the feedback.
  
   About how many digits to use, I personally find it's not annoying to
  type
   one extra digit, but as long as we have the rev number, it achieves
 the
   goal of identifying individual patch.
  
   About the rest of the name, as long as we keep it the same for the
 same
   patch, it would work fine.
  
   This boils down to patch naming guideline:
  
   *jiraNameId.[branchName.]revisionNum.patch*
  
   - Example jiraNameId: HADOOP-1234, HDFS-4321
   - When the patch is targeted for trunk, then there is no need for
  the
   branchName portion, otherwise, specify the branchName accordingly.
  Example:
   branch1, branch2.
   - It's recommended to use three digits for revisionNum for
 better
   sorting of different versions of patches.
  
   Would anyone who has the privilege please help to modify the following
  page
  
   http://wiki.apache.org/hadoop/HowToContribute#Naming_your_patch
  
   accordingly?
  
   Thanks a lot.
  
   --Yongjun
  
   On Mon, Dec 1, 2014 at 10:22 AM, Colin McCabe cmcc...@alumni.cmu.edu
 
   wrote:
  
   On Wed, Nov 26, 2014 at 2:58 PM, Karthik Kambatla 
 ka...@cloudera.com
   wrote:
  
   Yongjun, thanks for starting this thread. I personally like Steve's
   suggestions, but think two digits should be enough.
  
   I propose we limit the restrictions to versioning the patches with
   version
   numbers and .patch extension. People have their own preferences for
  the
   rest of the name (e.g. MAPREDUCE, MapReduce, MR, mr, mapred) and I
  don't
   see a gain in forcing everyone to use one.
  
   Putting the suggestions (tight and loose) on the wiki would help new
   contributors as well.
  
  
   +1
  
   best,
   Colin
  
  
   On Wed, Nov 26, 2014 at 2:43 PM, Eric Payne
   erichadoo...@yahoo.com.invalid
  
   wrote:
  
   +1.The different color for newest patch doesn't work very well if
  you
   are color blind, so I do appreciate a revision number in the name.
  
From: Yongjun Zhang yzh...@cloudera.com
   To: common-dev@hadoop.apache.org
   Sent: Tuesday, November 25, 2014 11:37 PM
   Subject: Re: a friendly suggestion for developers when uploading
   patches
  
   Thanks Harsh for the info and Andrew for sharing the script. It
 looks
   that
   the script is intelligent enough to pick the latest attachment even
  if
   all
   attachments have the same name.
  
   Yet, I hope we use the following as the guideline for patch names:
  
   *projectName*-*jiraNum*-*revNum*.patch
  
  
   So we can easily identify individual patch revs.
  
   Thanks.
  
   --Yongjun
  
   On Tue, Nov 25, 2014 at 5:54 PM, Andrew Wang 
  andrew.w...@cloudera.com
  
   wrote:
  
   This might be a good time to mention my fetch-patch script, I use
 it
   to
   easily download the latest attachment on a jira:
  
   https://github.com/umbrant/dotfiles/blob/master/bin/fetch-patch
  
   On Tue, Nov 25, 2014 at 5:44 PM, Harsh J ha...@cloudera.com
  wrote:
  
   For the same filename, you can observe also that the JIRA colors
   the
   latest one to be different than the older ones automatically -
 this
   is
   what I rely on.
  
   On Sat, Nov 22, 2014 at 12:36 AM, Yongjun Zhang 
   yzh...@cloudera.com
  
   wrote:
   Hi,
  
   When I look at patches uploaded to jiras, from time to time I
   notice
   that
   different revisions of the patch is uploaded with the same patch
   file
   name,
   some time for quite a few times. It's confusing which is which.
  
   I'd suggest that as a guideline, we do the following when
   uploading a
   patch:
  
 - include a revision number in the patch file name.A
 - include a comment, stating that a new patch is uploaded,
   including
   the
 revision number of the patch in the comment.
  
   This way, it's easier to refer to a specific version of a patch,
   and
   to
   know which

Re: a friendly suggestion for developers when uploading patches

2014-11-25 Thread Harsh J
For the same filename, you can observe also that the JIRA colors the
latest one to be different than the older ones automatically - this is
what I rely on.

On Sat, Nov 22, 2014 at 12:36 AM, Yongjun Zhang yzh...@cloudera.com wrote:
 Hi,

 When I look at patches uploaded to jiras, from time to time I notice that
 different revisions of the patch is uploaded with the same patch file name,
 some time for quite a few times. It's confusing which is which.

 I'd suggest that as a guideline, we do the following when uploading a patch:

- include a revision number in the patch file name.A
- include a comment, stating that a new patch is uploaded, including the
revision number of the patch in the comment.

 This way, it's easier to refer to a specific version of a patch, and to
 know which patch a comment is made about.

 Hope that makes sense to you.

 Thanks.

 --Yongjun



-- 
Harsh J


[jira] [Created] (HADOOP-11224) Improve error messages for all permission related failures

2014-10-23 Thread Harsh J (JIRA)
Harsh J created HADOOP-11224:


 Summary: Improve error messages for all permission related failures
 Key: HADOOP-11224
 URL: https://issues.apache.org/jira/browse/HADOOP-11224
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.2.0
Reporter: Harsh J
Priority: Trivial


If a bad file create request fails, you get a juicy error self-describing the 
reason almost:

{code}Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Permission denied: user=root, access=WRITE, 
inode=/:hdfs:supergroup:drwxr-xr-x{code}

However, if a setPermission fails, one only gets a vague:

{code}Caused by: 
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.AccessControlException):
 Permission denied{code}

It would be nicer if all forms of permission failures logged the accessed inode 
and current ownership and permissions in the same way.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HADOOP-8719) Workaround for kerberos-related log errors upon running any hadoop command on OSX

2014-07-11 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8719.
-

Resolution: Fixed

When this was committed, OSX was not a targeted platform for security or native 
support. If that has changed recently, lets revert this fix over a new JIRA - I 
see no issues with doing that. The fix here merely got rid of a verbose warning 
appearing unnecessarily over unsecured pseudo-distributed clusters running on 
OSX.

Re-resolving. Thanks!

 Workaround for kerberos-related log errors upon running any hadoop command on 
 OSX
 -

 Key: HADOOP-8719
 URL: https://issues.apache.org/jira/browse/HADOOP-8719
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.0-alpha
 Environment: Mac OS X 10.7, Java 1.6.0_26
Reporter: Jianbin Wei
Priority: Trivial
 Fix For: 3.0.0

 Attachments: HADOOP-8719.patch, HADOOP-8719.patch, HADOOP-8719.patch, 
 HADOOP-8719.patch


 When starting Hadoop on OS X 10.7 (Lion) using start-all.sh, Hadoop logs 
 the following errors:
 2011-07-28 11:45:31.469 java[77427:1a03] Unable to load realm info from 
 SCDynamicStore
 Hadoop does seem to function properly despite this.
 The workaround takes only 10 minutes.
 There are numerous discussions about this:
 google Unable to load realm mapping info from SCDynamicStore returns 1770 
 hits.  Each one has many discussions.  
 Assume each discussion take only 5 minute, a 10-minute fix can save ~150 
 hours.  This does not count much search of this issue and its 
 solution/workaround, which can easily hit (wasted) thousands of hours!!!



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HADOOP-10707) support bzip2 in python avro tool

2014-06-17 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-10707.
--

Resolution: Invalid

Moved to AVRO-1527

 support bzip2 in python avro tool
 -

 Key: HADOOP-10707
 URL: https://issues.apache.org/jira/browse/HADOOP-10707
 Project: Hadoop Common
  Issue Type: Improvement
  Components: tools
Reporter: Eustache
Priority: Minor
  Labels: avro

 The Python tool to decode avro files is currently missing support for bzip2 
 compression.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Edit permission to Hadoop Wiki page

2014-06-17 Thread Harsh J
Hi,

You should be able to edit pages on the wiki.apache.org/hadoop wiki as
your username's in there (thanks Steve!). Are you unable to? Let us
know.

On Tue, Jun 17, 2014 at 1:55 AM, Asokan, M maso...@syncsort.com wrote:
 I would like to update the page 
 http://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support with 
 my company's Hadoop related offerings.

 My Wiki user id is: masokan

 Can someone point out how I can get edit permission?

 Thanks in advance.

 -- Asokan



 



 ATTENTION: -

 The information contained in this message (including any files transmitted 
 with this message) may contain proprietary, trade secret or other 
 confidential and/or legally privileged information. Any pricing information 
 contained in this message or in any files transmitted with this message is 
 always confidential and cannot be shared with any third parties without prior 
 written approval from Syncsort. This message is intended to be read only by 
 the individual or entity to whom it is addressed or by their designee. If the 
 reader of this message is not the intended recipient, you are on notice that 
 any use, disclosure, copying or distribution of this message, in any form, is 
 strictly prohibited. If you have received this message in error, please 
 immediately notify the sender and/or Syncsort and destroy all copies of this 
 message in your possession, custody or control.



-- 
Harsh J


Re: I couldn't assign ticket to myself. Can someone add me to the contributors list.

2014-05-04 Thread Harsh J
Done and done. Looking forward to your contribution!

On Mon, May 5, 2014 at 12:33 AM, Anandha L Ranganathan
analog.s...@gmail.com wrote:
 Can someone add me to the contributors list so that I can I want to assign
 the ticket to myself.

 https://issues.apache.org/jira/browse/YARN-1918

 Thanks
 Anand



-- 
Harsh J


[jira] [Created] (HADOOP-10572) Example NFS mount command must pass noacl as it isn't supported by the server yet

2014-05-03 Thread Harsh J (JIRA)
Harsh J created HADOOP-10572:


 Summary: Example NFS mount command must pass noacl as it isn't 
supported by the server yet
 Key: HADOOP-10572
 URL: https://issues.apache.org/jira/browse/HADOOP-10572
 Project: Hadoop Common
  Issue Type: Improvement
  Components: nfs
Affects Versions: 2.4.0
Reporter: Harsh J
Priority: Trivial


Use of the documented default mount command results in the below server side 
log WARN event, cause the client tries to locate the ACL program (#100227):

{code}
12:26:11.975 AM TRACE   org.apache.hadoop.oncrpc.RpcCall
Xid:-1114380537, messageType:RPC_CALL, rpcVersion:2, program:100227, version:3, 
procedure:0, credential:(AuthFlavor:AUTH_NONE), verifier:(AuthFlavor:AUTH_NONE)
12:26:11.976 AM TRACE   org.apache.hadoop.oncrpc.RpcProgram 
NFS3 procedure #0
12:26:11.976 AM WARNorg.apache.hadoop.oncrpc.RpcProgram 
Invalid RPC call program 100227
{code}

The client mount command must pass {{noacl}} to avoid this.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


Re: Wiki Edit Permission

2014-04-27 Thread Harsh J
User Zhijie Shen has been added to contributors group on wiki. Let us
know if you still face issues!


On Sat, Apr 26, 2014 at 12:20 AM, Zhijie Shen zs...@hortonworks.com wrote:

 To whom it may concern,

 would you mind granting me Wiki edit permission? My username is Zhijie
 Shen.

 Thanks,
 Zhijie

 --
 Zhijie Shen
 Hortonworks Inc.
 http://hortonworks.com/

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.




-- 
Harsh J


Re: To be able to edit Hadoop distributions and commercial support page

2014-04-27 Thread Harsh J
Hello Amol,

Certainly - What is your wiki username, so we may add you to the can-edit
list?


On Thu, Apr 24, 2014 at 1:57 AM, Amol Kekre a...@datatorrent.com wrote:


 Can someone give me edit writes to the following page
 https://wiki.apache.org/hadoop/Distributions%20and%20Commercial%20Support

 DataTorrent is a Hadoop native platform and I want to maintain our blurb
 on it.

 Thanks,
 Amol




-- 
Harsh J


Re: Hadoop v1.8 data transfer protocol

2014-04-06 Thread Harsh J
There's been no Apache Hadoop release versioned v1.8 historically, nor
is one upcoming. Do you mean 0.18?

Either way, can you point to the specific code lines in BlockSender
which have you confused? The sendBlock and sendPacket methods would
interest you I assume, but they appear to be well constructed/named
internally and commented in a few important spots.

On Mon, Apr 7, 2014 at 6:39 AM, Dhaivat Pandya dhaivatpan...@gmail.com wrote:
 Hi,

 I'm trying to figure out how data is transferred between client and
 DataNode in Hadoop v1.8.

 This is my understanding so far:

 The client first fires an OP_READ_BLOCK request. The DataNode responds with
 a status code, checksum header, chunk offset, packet length, sequence
 number, the last packet boolean, the length and the data (in that order).

 However, I'm running into an issue. First of all, which of these lengths
 describes the length of the data? I tried both PacketLength and Length it
 seems that they leave data on the stream (I tried to cat a file with the
 numbers 1-1000 in it).

 Also, how does the DataNode signal the start of another packet? After
 Length number of bytes have been read, I assumed that the header would be
 repeated, but this is not the case (I'm not getting sane values for any of
 the fields of the header).

 I've looked through the DataXceiver, BlockSender, DFSClient
 (RemoteBlockReader) classes but I still can't quite grasp how this data
 transfer is conducted.

 Any help would be appreciated,

 Dhaivat Pandya



-- 
Harsh J


Re: Wiki Editing

2014-02-15 Thread Harsh J
Hey Steve,

(-user@)

Sorry on the delay, missed this one. You should be all set now - do
report problems here, if any. Thanks!

On Sat, Feb 15, 2014 at 2:47 PM, Steve Kallestad st...@tabtonic.com wrote:
 Quick note.  As of yet, I have not received write permissions on the Hadoop
 Wiki.



 My login name is SteveKallestad.



 I appreciate any help getting started.



 Thanks,

 Steve



 From: Arpit Agarwal [mailto:aagar...@hortonworks.com]
 Sent: Tuesday, February 11, 2014 10:37 AM
 To: common-dev@hadoop.apache.org
 Subject: Re: Wiki Editing



 +common-dev, bcc user



 Hi Steve,



 I'm wondering if someone wouldn't mind adding my user to the list so I can
 add my (small) contribution to the project.



 A wiki admin should be able to do this for you (a few of them are on this
 mailing list). Feel free to send a reminder to the list if no one has added
 you in a day or so.



 Additionally, I'd like to help update the maven site documentation to add
 some clarity, but I know I'll have to look into how to get going on that
 side of the street.  Correct me if I'm wrong, but the process there would be
 to submit bugs with a patch into Jira, and there is probably a utility
 somewhere that I can run which will ensure that whatever changes I propose
 meet the project standards.



 Documentation patches are always welcome. There is a test-patch.sh script in
 the source tree which can be used to validate your patch.



 Alternatively if you generate your patch against trunk you can cheat and
 click 'Submit Patch' in the Jira to have Jenkins validate the patch for you.
 To build and stage the site locally you can run something like mvn
 site:stage -DstagingDirectory=/tmp/myhadoopsite. This is useful to manually
 verify the formatting looks as expected.



 Thanks,

 Arpit



 On Tue, Feb 11, 2014 at 6:01 AM, One Box one...@tabtonic.com wrote:

 I wanted to contribute to the Wiki tonight, but once I created an account it
 shows that all of the pages are immutable.



 I never did receive an email confirmation, but it did allow me to log in.



 After reading through some of the help documentation, I saw that with some
 ASF projects you have to be added to a list of Wiki Editors manually in
 order to prevent spam.



 I'm wondering if someone wouldn't mind adding my user to the list so I can
 add my (small) contribution to the project.



 My login name is SteveKallestad.



 There is a page that spells out instructions for building from source on
 Windows.  I struggled a bit building  on Ubuntu.  I documented the process
 and I'd like to add it.



 Additionally, I'd like to help update the maven site documentation to add
 some clarity, but I know I'll have to look into how to get going on that
 side of the street.  Correct me if I'm wrong, but the process there would be
 to submit bugs with a patch into Jira, and there is probably a utility
 somewhere that I can run which will ensure that whatever changes I propose
 meet the project standards.



 Any help to get me going is appreciated.



 Thanks,

 Steve




 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader of
 this message is not the intended recipient, you are hereby notified that any
 printing, copying, dissemination, distribution, disclosure or forwarding of
 this communication is strictly prohibited. If you have received this
 communication in error, please contact the sender immediately and delete it
 from your system. Thank You.



-- 
Harsh J


Re: write permissions

2014-02-13 Thread Harsh J
Sorry on delay - done. Can you retry now to see if you're able to? Let us
know if you face any issues.


On Fri, Feb 14, 2014 at 12:53 AM, Kevin Wincott ke...@sthenica.com wrote:

  please?

 On 04/02/14 12:25, Kevin Wincott wrote:

  Hello



 Please can I have write permissions on the wiki for user kevinwincott so
 that I may add us to the Hadoop Users list



 Kevin Wincott

 *Data Architect*

 T: 0800 471 4701

 www.sthenica.com

 [image: sthenica logo]





-- 
Harsh J


Re: Datanode registration, port number

2013-12-23 Thread Harsh J
Hi,

On Mon, Dec 23, 2013 at 9:41 AM, Dhaivat Pandya dhaivatpan...@gmail.com wrote:
 Hi,

 I'm currently trying to build a cache layer that should sit on top of the
 datanode. Essentially, the namenode should know the port number of the
 cache layer instead of that of the datanode (since the namenode then relays
 this information to the default HDFS client). All of the communication
 between the datanode and the namenode currently flows through my cache
 layer (including heartbeats, etc.)

Curious Q: What does your cache layer aim to do btw? If its a data
cache, have you checked out the design being implemented currently by
https://issues.apache.org/jira/browse/HDFS-4949?

 *First question*: is there a way to tell the namenode where a datanode
 should be? Any way to trick it into thinking that the datanode is on a port
 number where it actually isn't? As far as I can tell, the port number is
 obtained from the DatanodeId object; can this be set in the configuration
 so that the port number derived is that of the cache layer?

The NN receives a DN host and port from the DN directly. The DN sends
it whatever its running on. See
https://github.com/apache/hadoop-common/blob/release-2.2.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L690

 I spent quite a bit of time on the above question and I could not find any
 sort of configuration option that would let me do that. So, I delved into
 the HDFS source code and tracked down the DatanodeRegistration class.
 However, I can't seem to find out *how* the NameNode figures out the
 Datanode's port number or if I could somehow change the packets to reflect
 the port number of cache layer?

See 
https://github.com/apache/hadoop-common/blob/release-2.2.0/hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java#L690
(as above) for how the DN emits it. And no, IMO, that (packet
changes) is not the right way to go about it if you're planning an
overhaul. Its easier and more supportable to make proper code changes
instead.

 *Second question: *how does the namenode
 figure out a newly-registered Datanode's port number?

Same as before. Registration sends the service addresses (so NN may
use them for sending to clients), beyond which the DN's heartbeats are
mere client-like connections to the NN, carried out on regular
ephemeral ports.

-- 
Harsh J


Re: How can I get FSNamesystem of running NameNode in cluster?

2013-12-09 Thread Harsh J
Hi Yoonmin,

Yes, your conclusions here are correct. The FSNamesystem is an object
internal to the NameNode server runtime.

On Mon, Dec 9, 2013 at 8:49 PM, Yoonmin Nam rony...@dgist.ac.kr wrote:
 Oh, I see. However a minicluster cannot replace the namenode, right?
 I knew that the minicluster is for testing components of hadoop.

 Then, the only way of implementing some features which use namenode or
 datanode is just in internal of namenode or datanode.
 Am I right?

 Thanks!

 -Original Message-
 From: Daryn Sharp [mailto:da...@yahoo-inc.com]
 Sent: Monday, December 09, 2013 11:42 PM
 To: common-dev@hadoop.apache.org
 Subject: Re: How can I get FSNamesystem of running NameNode in cluster?

 Are you adding something internal to the NN?  If not, you cannot get the
 namesystem instance via a client unless you are using a minicluster object.

 Daryn

 On Dec 9, 2013, at 7:11 AM, Yoonmin Nam rony...@dgist.ac.kr wrote:

 I want to get a running instance of FSNamesystem of HDFS. However, it
 is somewhat complicated than I expected.

 If I can get NameNode instance of running cluster, then it can be
 solved because there is a method getNamespace().

 Is there anyone who know about this stuff?

 I thought that using Servlet stuff is not normal way to do this
 because my program is not web-application.

 Thanks!












-- 
Harsh J


[jira] [Resolved] (HADOOP-10002) Tool's config option wouldn't work on secure clusters

2013-09-28 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-10002?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-10002.
--

   Resolution: Duplicate
Fix Version/s: 2.0.3-alpha

Sorry about the noise. This should be fixed by HADOOP-9021 - turns out I wasn't 
looking at the right 2.0.x sources when debugging this.

 Tool's config option wouldn't work on secure clusters
 -

 Key: HADOOP-10002
 URL: https://issues.apache.org/jira/browse/HADOOP-10002
 Project: Hadoop Common
  Issue Type: Bug
  Components: security, util
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor
 Fix For: 2.0.3-alpha


 The Tool framework provides a way for clients to run without classpath 
 *-site.xml configs, by letting users pass -conf file to parse into the 
 app's Configuration object.
 In a secure cluster config setup, such a runner will not work cause of 
 UserGroupInformation.isSecurityEnabled() check, which is used in Server.java 
 to determine what form of communication to use, will load statically a {{new 
 Configuration()}} object to inspect if security is turned on during its 
 initialization, which ignores the application config object and tries to load 
 from classpath and ends up loading non-secure defaults.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HADOOP-10002) Tool's config option wouldn't work on secure clusters

2013-09-27 Thread Harsh J (JIRA)
Harsh J created HADOOP-10002:


 Summary: Tool's config option wouldn't work on secure clusters
 Key: HADOOP-10002
 URL: https://issues.apache.org/jira/browse/HADOOP-10002
 Project: Hadoop Common
  Issue Type: Bug
  Components: security, util
Affects Versions: 2.0.6-alpha
Reporter: Harsh J
Priority: Minor


The Tool framework provides a way for clients to run without classpath 
*-site.xml configs, by letting users pass -conf file to parse into the 
app's Configuration object.

In a secure cluster config setup, such a runner will not work cause of 
UserGroupInformation.isSecurityEnabled() check, which is used in Server.java to 
determine what form of communication to use, will load statically a {{new 
Configuration()}} object to inspect if security is turned on during its 
initialization, which ignores the application config object and tries to load 
from classpath and ends up loading non-secure defaults.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: question about when do resource matching in YARN

2013-09-24 Thread Harsh J
Yes, but the heartbeat coupling isn't necessary I think. One could
even use ZK write/watch approach for faster assignment of regular
work?

On Tue, Sep 24, 2013 at 2:24 PM, Steve Loughran ste...@hortonworks.com wrote:
 On 21 September 2013 09:19, Sandy Ryza sandy.r...@cloudera.com wrote:

 I don't believe there is any reason scheduling decisions need to be coupled
 with NodeManager heartbeats.  It doesn't sidestep any race conditions
 because a NodeManager could die immediately after heartbeating.


 historically its been done for scale: you don't need the JT reaching out to
 4K TT's just to give them work to do, instead let them connect in anyway
 and get work that way. And once they start reporting in completion then
 they can get given more work. It's very biased towards worker nodes talk
 to the master over master approaches workers

 --
 CONFIDENTIALITY NOTICE
 NOTICE: This message is intended for the use of the individual or entity to
 which it is addressed and may contain information that is confidential,
 privileged and exempt from disclosure under applicable law. If the reader
 of this message is not the intended recipient, you are hereby notified that
 any printing, copying, dissemination, distribution, disclosure or
 forwarding of this communication is strictly prohibited. If you have
 received this communication in error, please contact the sender immediately
 and delete it from your system. Thank You.



-- 
Harsh J


Re: issue of building with native

2013-09-18 Thread Harsh J
/.m2/repository/org/codehaus/plexus/plexus-component-annotations/1.5.5/plexus-component-annotations-1.5.5.jar
 [ERROR] urls[23] = 
 file:/home/hhf/.m2/repository/org/sonatype/plexus/plexus-sec-dispatcher/1.3/plexus-sec-dispatcher-1.3.jar
 [ERROR] urls[24] = 
 file:/home/hhf/.m2/repository/org/sonatype/plexus/plexus-cipher/1.4/plexus-cipher-1.4.jar
 [ERROR] urls[25] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-sink-api/1.2/doxia-sink-api-1.2.jar
 [ERROR] urls[26] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-logging-api/1.2/doxia-logging-api-1.2.jar
 [ERROR] urls[27] = 
 file:/home/hhf/.m2/repository/junit/junit/3.8.1/junit-3.8.1.jar
 [ERROR] urls[28] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-core/1.2/doxia-core-1.2.jar
 [ERROR] urls[29] = 
 file:/home/hhf/.m2/repository/xerces/xercesImpl/2.9.1/xercesImpl-2.9.1.jar
 [ERROR] urls[30] = 
 file:/home/hhf/.m2/repository/xml-apis/xml-apis/1.3.04/xml-apis-1.3.04.jar
 [ERROR] urls[31] = 
 file:/home/hhf/.m2/repository/org/apache/httpcomponents/httpclient/4.0.2/httpclient-4.0.2.jar
 [ERROR] urls[32] = 
 file:/home/hhf/.m2/repository/org/apache/httpcomponents/httpcore/4.0.1/httpcore-4.0.1.jar
 [ERROR] urls[33] = 
 file:/home/hhf/.m2/repository/commons-codec/commons-codec/1.3/commons-codec-1.3.jar
 [ERROR] urls[34] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-module-xhtml/1.2/doxia-module-xhtml-1.2.jar
 [ERROR] urls[35] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-module-apt/1.2/doxia-module-apt-1.2.jar
 [ERROR] urls[36] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-module-xdoc/1.2/doxia-module-xdoc-1.2.jar
 [ERROR] urls[37] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-module-fml/1.2/doxia-module-fml-1.2.jar
 [ERROR] urls[38] = 
 file:/home/hhf/.m2/repository/javax/servlet/servlet-api/2.5/servlet-api-2.5.jar
 [ERROR] urls[39] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-decoration-model/1.2/doxia-decoration-model-1.2.jar
 [ERROR] urls[40] = 
 file:/home/hhf/.m2/repository/org/apache/maven/doxia/doxia-site-renderer/1.2/doxia-site-renderer-1.2.jar
 [ERROR] urls[41] = 
 file:/home/hhf/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar
 [ERROR] urls[42] = 
 file:/home/hhf/.m2/repository/org/apache/maven/shared/maven-doxia-tools/1.4/maven-doxia-tools-1.4.jar
 [ERROR] urls[43] = 
 file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-archiver/1.0/plexus-archiver-1.0.jar
 [ERROR] urls[44] = 
 file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-io/1.0/plexus-io-1.0.jar
 [ERROR] urls[45] = 
 file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-i18n/1.0-beta-7/plexus-i18n-1.0-beta-7.jar
 [ERROR] urls[46] = 
 file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-velocity/1.1.8/plexus-velocity-1.1.8.jar
 [ERROR] urls[47] = 
 file:/home/hhf/.m2/repository/org/codehaus/plexus/plexus-utils/1.5.10/plexus-utils-1.5.10.jar
 [ERROR] urls[48] = 
 file:/home/hhf/.m2/repository/org/mortbay/jetty/jetty/6.1.25/jetty-6.1.25.jar
 [ERROR] urls[49] = 
 file:/home/hhf/.m2/repository/org/mortbay/jetty/servlet-api/2.5-20081211/servlet-api-2.5-20081211.jar
 [ERROR] urls[50] = 
 file:/home/hhf/.m2/repository/org/mortbay/jetty/jetty-util/6.1.25/jetty-util-6.1.25.jar
 [ERROR] urls[51] = 
 file:/home/hhf/.m2/repository/commons-lang/commons-lang/2.5/commons-lang-2.5.jar
 [ERROR] urls[52] = 
 file:/home/hhf/.m2/repository/commons-io/commons-io/1.4/commons-io-1.4.jar
 [ERROR] Number of foreign imports: 1
 [ERROR] import: Entry[import  from realm ClassRealm[maven.api, parent: null]]
 [ERROR]
 [ERROR] -: 
 org.sonatype.aether.graph.DependencyFilter



-- 
Harsh J


[jira] [Reopened] (HADOOP-9878) getting rid of all the 'bin/../' from all the paths

2013-08-27 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HADOOP-9878:
-


 getting rid of all the 'bin/../' from all the paths
 ---

 Key: HADOOP-9878
 URL: https://issues.apache.org/jira/browse/HADOOP-9878
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf
Reporter: kaveh minooie
Priority: Trivial
 Fix For: 2.1.0-beta

   Original Estimate: 1m
  Remaining Estimate: 1m

 by simply replacing line 34 of libexec/hadoop-config.sh from:
 {quote}
 export HADOOP_PREFIX=`dirname $this`/..
 {quote}
 to 
 {quote}
 export HADOOP_PREFIX=$( cd $config_bin/..; pwd -P )
 {quote}
 we can eliminate all the annoying 'bin/../' from the library paths and make 
 the output of commands like ps a lot more readable. not to mention that OS  
 would do just a bit less work as well. I can post a patch for it as well if 
 it is needed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9878) getting rid of all the 'bin/../' from all the paths

2013-08-27 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9878?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9878.
-

Resolution: Duplicate

 getting rid of all the 'bin/../' from all the paths
 ---

 Key: HADOOP-9878
 URL: https://issues.apache.org/jira/browse/HADOOP-9878
 Project: Hadoop Common
  Issue Type: Improvement
  Components: conf
Reporter: kaveh minooie
Priority: Trivial
   Original Estimate: 1m
  Remaining Estimate: 1m

 by simply replacing line 34 of libexec/hadoop-config.sh from:
 {quote}
 export HADOOP_PREFIX=`dirname $this`/..
 {quote}
 to 
 {quote}
 export HADOOP_PREFIX=$( cd $config_bin/..; pwd -P )
 {quote}
 we can eliminate all the annoying 'bin/../' from the library paths and make 
 the output of commands like ps a lot more readable. not to mention that OS  
 would do just a bit less work as well. I can post a patch for it as well if 
 it is needed

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9346) Upgrading to protoc 2.5.0 fails the build

2013-08-13 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9346.
-

Resolution: Duplicate

Thanks for pinging Ravi. I'd discussed with Alejandro that this could be 
closed. Looks like we added a dupe link but failed to close. Closing now.

 Upgrading to protoc 2.5.0 fails the build
 -

 Key: HADOOP-9346
 URL: https://issues.apache.org/jira/browse/HADOOP-9346
 Project: Hadoop Common
  Issue Type: Task
  Components: build
Affects Versions: 3.0.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor
  Labels: protobuf
 Attachments: HADOOP-9346.patch


 Reported over the impala lists, one of the errors received is:
 {code}
 src/hadoop-common-project/hadoop-common/target/generated-sources/java/org/apache/hadoop/ha/proto/ZKFCProtocolProtos.java:[104,37]
  can not find symbol.
 symbol: class Parser
 location: package com.google.protobuf
 {code}
 Worth looking into as we'll eventually someday bump our protobuf deps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: protobuf upgrade on Jenkins slaves causing build failures?

2013-08-12 Thread Harsh J
Hey Chris,

Yep the protobuf version on the machine was upped. We have a parallel
ongoing discussion on the same lists under the title Upgrade to
protobuf 2.5.0 for the 2.1.0 release which you can follow for the
updates being done.

On Mon, Aug 12, 2013 at 11:00 PM, Chris Nauroth
cnaur...@hortonworks.com wrote:
 I'm curious if protobuf may have been upgraded to 2.5.0 on the Jenkins
 slaves, ahead of committing the Hadoop code's dependency upgrade to 2.5.0.
  We've started to see build failures due to cannot find symbol
 com.google.protobuf.Parser.  This is the earliest example I could find,
 which happened 8/11 10:31 AM:

 https://builds.apache.org/job/Hadoop-Yarn-trunk/298/

 The Parser class does not exist in 2.4.1, but it does exist in 2.5.0, which
 leads me to believe that the Jenkins machines were upgraded to start using
 a 2.5.0 protoc binary.

 Chris Nauroth
 Hortonworks
 http://hortonworks.com/



-- 
Harsh J


[jira] [Created] (HADOOP-9861) Invert ReflectionUtils' stack trace

2013-08-10 Thread Harsh J (JIRA)
Harsh J created HADOOP-9861:
---

 Summary: Invert ReflectionUtils' stack trace
 Key: HADOOP-9861
 URL: https://issues.apache.org/jira/browse/HADOOP-9861
 Project: Hadoop Common
  Issue Type: Improvement
  Components: util
Affects Versions: 2.0.5-alpha
Reporter: Harsh J


Often an MR task (as an example) may fail at the configure stage due to a 
misconfiguration or whatever, and the only thing a user gets by virtue of MR 
pulling limited bytes of the diagnostic error data is the top part of the 
stacktrace:

{code}
java.lang.RuntimeException: Error in configuring object
at 
org.apache.hadoop.util.ReflectionUtils.setJobConf(ReflectionUtils.java:93)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:64)
at 
org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:432)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:372)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
{code}

This is absolutely useless to a user, and he also goes ahead and blames the 
framework for having an issue, rather than thinking (non-intuitively) to go see 
the whole task log for the full trace, especially the last part.

Hundreds of time its been a mere class thats missing, etc. but there's just too 
much pain involved here to troubleshoot.

Would be much much better, if we inverted the trace. For example, here's what 
Hive can return back if we did so, for a random trouble I pulled from the web:

{code}
java.lang.RuntimeException: Error in configuring object
Caused by: java.lang.NullPointerException
at 
org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector.toString(StructObjectInspector.java:64)
at java.lang.String.valueOf(String.java:2826)
at java.lang.StringBuilder.append(StringBuilder.java:115)
at 
org.apache.hadoop.hive.ql.exec.UnionOperator.initializeOp(UnionOperator.java:110)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:451)
at 
org.apache.hadoop.hive.ql.exec.Operator.initializeChildren(Operator.java:407)
at 
org.apache.hadoop.hive.ql.exec.TableScanOperator.initializeOp(TableScanOperator.java:186)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at 
org.apache.hadoop.hive.ql.exec.MapOperator.initializeOp(MapOperator.java:563)
at org.apache.hadoop.hive.ql.exec.Operator.initialize(Operator.java:375)
at org.apache.hadoop.hive.ql.exec.ExecMapper.configure(ExecMapper.java:100)
... 22 more
{code}

This way the user can at least be sure what part's really failing, and not get 
lost trying to work their way through reflection utils and upwards/downwards.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Problem building branch-2

2013-05-24 Thread Harsh J
You seem to be using protoc-2.5.0, which is known not to work with
Hadoop yet: https://issues.apache.org/jira/browse/HADOOP-9346

Downgrading to 2.4.1 should let you go ahead.

On Sat, May 25, 2013 at 12:21 AM, Ralph Castain r...@open-mpi.org wrote:
 Hi folks

 I'm trying to build the head of branch-2 on a CentOS box and hitting a rash 
 of errors like the following (all from the protobuf support area):

 [ERROR] Failed to execute goal 
 org.apache.maven.plugins:maven-compiler-plugin:2.5.1:compile 
 (default-compile) on project hadoop-common: Compilation failure: Compilation 
 failure:
 [ERROR] 
 /home/common/hadoop/hadoop-common/hadoop-common-project/hadoop-common/target/generated-sources/java/org/apache/hadoop/ipc/protobuf/RpcHeaderProtos.java:[278,37]
  error: cannot find symbol
 [ERROR] symbol:   class Parser
 [ERROR] location: package com.google.protobuf

 Per the BUILDING.txt instructions, I was using a command line of mvn install 
 -DskipTests from the top level directory.

 Any suggestions? I assume I must have some path incorrectly set or need to 
 build the sub-projects manually in some order, but I'm unsure of the nature 
 of the problem.

 Thanks
 Ralph




--
Harsh J


Re: [DISCUSS] - Committing client code to 3rd Party FileSystems within Hadoop Common

2013-05-23 Thread Harsh J
I think we do a fairly good work maintaining a stable and public FileSystem
and FileContext API for third-party plugins to exist outside of Apache
Hadoop but still be able to work well across versions.

The question of test pops up though, specifically that of testing against
trunk to catch regressions across various implementations, but it'd be much
work for us to also maintain glusterfs dependencies and mechanisms as part
of trunk.

We do provide trunk build snapshot artifacts publicly for downstream
projects to test against, which I think may help cover the continuous
testing concerns, if there are those.

Right now, I don't think the S3 FS we maintain really works all that well.
I also recall, per recent conversations on the lists, that AMZN has started
shipping their own library for a better implementation rather than
perfecting the implementation we have here (correct me if am wrong but I
think the changes were not all contributed back). I see some work going on
for OpenStack's Swift, for which I think Steve also raised a similar
discussion here: http://search-hadoop.com/m/W1S5h2SrxlG, but I don't recall
if the conversation proceeded at the time.

What's your perspective as the releaser though? Would you not find
maintaining this outside easier, especially in terms of maintaining your
code for quicker releases, for both bug fixes and features - also given
that you can CI it against Apache Hadoop trunk at the same time?


On Thu, May 23, 2013 at 11:47 PM, Stephen Watt sw...@redhat.com wrote:

 (Resending - I think the first time I sent this out it got lost within all
 the ByLaws voting)

 Hi Folks

 My name is Steve Watt and I am presently working on enabling glusterfs to
 be used as a Hadoop FileSystem. Most of the work thus far has involved
 developing a Hadoop FileSystem plugin for glusterfs. I'm getting to the
 point where the plugin is becoming stable and I've been trying to
 understand where the right place is to host/manage/version it.

 Steve Loughran was kind enough to point out a few past threads in the
 community (such as
 http://lucene.472066.n3.nabble.com/Need-to-add-fs-shim-to-use-QFS-td4012118.html)
 that show a project disposition to move away from Hadoop Common containing
 client code (plugins) for 3rd party FileSystems. This makes sense and
 allows the filesystem plugin developer more autonomy as well as reduces
 Hadoop Common's dependence on 3rd Party libraries.

 Before I embark down that path, can the PMC/Committers verify that the
 preference is still to have client code for 3rd Party FileSystems hosted
 and managed outside of Hadoop Common?

 Regards
 Steve Watt




-- 
Harsh J


Re: Non existent config file to 'fs -conf'

2013-05-23 Thread Harsh J
The quiet behavior sorta goes all the way back to the very first
import of Nutch into Apache Incubator:
http://svn.apache.org/viewvc?view=revisionrevision=155829 and seems
to deal with being relaxed about not finding added resources other
than required defaults. The behavior has almost been the same for over
8 years now :-)

The quiet flag is code-settable, but the output it would produce is
pretty verbose. I suppose we can turn it on at the FsShell level,
while also making those parsing…, etc. INFO level logs into
checked-for DEBUG level logs. Would that suffice?

On Fri, May 24, 2013 at 12:52 AM, Ashwin Shankar asha...@yahoo-inc.com wrote:
 Hi,
 I'm working on  
 HADOOP-9582https://issues.apache.org/jira/browse/HADOOP-9582  and I have a 
 question about the current implementation in hadoop-common.
 Here is a brief background about the bug : Basically if I give a non-exitent 
 file to hadoop fs  –conf NONEXISTENT_FILE,
 the current implementation never complains.
 But looking at the code(Configuration.loadResources()) it seems that in-fact 
 we check if input file exists and we throw an exception if the 'quiet' flag 
 is false.
 Problem is the 'quiet' flag is always true.
 Can somebody explain the rationale behind this behavior ? Would we break any 
 use-case if we complain when non-exitent file is given as input?

 Why we want this fixed :  say the user makes a typo and gives the wrong path 
 ,the code is just going to ignore this,not complain
 and use the default conf files(if the env variables are set). This would 
 confuse the user when he finds that the configs are different from what he 
 gave as input(typo) .
 Thoughts?

 Thanks,
 Ashwin




-- 
Harsh J


Re: [VOTE] Plan to create release candidate for 0.23.8

2013-05-17 Thread Harsh J
+1

On Sat, May 18, 2013 at 2:40 AM, Thomas Graves tgra...@yahoo-inc.com wrote:
 Hello all,

 We've had a few critical issues come up in 0.23.7 that I think warrants a
 0.23.8 release. The main one is MAPREDUCE-5211.  There are a couple of
 other issues that I want finished up and get in before we spin it.  Those
 include HDFS-3875, HDFS-4805, and HDFS-4835.  I think those are on track
 to finish up early next week.   So I hope to spin 0.23.8 soon after this
 vote completes.

 Please vote '+1' to approve this plan. Voting will close on Friday May
 24th at 2:00pm PDT.

 Thanks,
 Tom Graves




-- 
Harsh J


[jira] [Created] (HADOOP-9567) Provide auto-renewal for keytab based logins

2013-05-16 Thread Harsh J (JIRA)
Harsh J created HADOOP-9567:
---

 Summary: Provide auto-renewal for keytab based logins
 Key: HADOOP-9567
 URL: https://issues.apache.org/jira/browse/HADOOP-9567
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.0.0-alpha
Reporter: Harsh J
Priority: Minor


We do a renewal for cached tickets (obtained via kinit before using a Hadoop 
application) but we explicitly seem to avoid doing a renewal for keytab based 
logins (done from within the client code) when we could do that as well via a 
similar thread.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Issue HADOOP-8905

2013-05-08 Thread Harsh J
Hi Steve,

Per my knowledge, no one is currently working on this or has planned
to. The request is also unassigned so you can go right ahead!

Do ping the common-dev@ with any review requests, or discussion
requests, should no one respond to the JIRA comments in time.

On Wed, May 8, 2013 at 4:19 AM,  sya...@stevendyates.com wrote:
 Hi Dev list,

 I am looking into implementing Add metrics for HTTP Server (HADOOP-8905) and
 would first like to seek clarification that no one else has covered this off
 to their knowledge within an existing JIRA or nobody has the intention to
 cover this off shortly.

 Kind Regards
 Steve




-- 
Harsh J


Re: Failing to run ant test on clean Hadoop branch-1 checkout

2013-04-27 Thread Harsh J
Hi Amit,

The common-dev list is more suited for Apache Hadoop
development-related questions, so I've moved it to that and bcc'd
user@. Each failed test also produces a log under the build directory
for the real reason of failure - can you also inspect that to
determine the reason behind the failures? If there are genuine bugs
from your analysis, and the failures are consistent, please do file a
JIRA as well.

On Sun, Apr 28, 2013 at 12:05 AM, Amit Sela am...@infolinks.com wrote:
 Hi all,

 I'm trying to run ant test on a clean Hadoop branch-1 checkout.
 ant works fine but when I run ant test I get a lot of failures:

 Test org.apache.hadoop.cli.TestCLI FAILED
 Test org.apache.hadoop.fs.TestFileUtil FAILED
 Test org.apache.hadoop.fs.TestHarFileSystem FAILED
 Test org.apache.hadoop.fs.TestUrlStreamHandler FAILED
 Test org.apache.hadoop.hdfs.TestAbandonBlock FAILED
 Test org.apache.hadoop.hdfs.TestBlocksScheduledCounter FAILED
 Test org.apache.hadoop.hdfs.TestDFSShell FAILED
 Test org.apache.hadoop.hdfs.TestDFSShellGenericOptions FAILED
 Test org.apache.hadoop.hdfs.TestDataTransferProtocol FAILED
 Test org.apache.hadoop.hdfs.TestDatanodeReport FAILED
 Test org.apache.hadoop.hdfs.TestDistributedFileSystem FAILED
 Test org.apache.hadoop.hdfs.TestFSInputChecker FAILED
 Test org.apache.hadoop.hdfs.TestFSOutputSummer FAILED
 Test org.apache.hadoop.hdfs.TestFileAppend FAILED
 Test org.apache.hadoop.hdfs.TestFileAppend2 FAILED
 Test org.apache.hadoop.hdfs.TestFileAppend3 FAILED
 Test org.apache.hadoop.hdfs.TestFileCorruption FAILED
 Test org.apache.hadoop.hdfs.TestFileStatus FAILED
 Test org.apache.hadoop.hdfs.TestGetBlocks FAILED
 Test org.apache.hadoop.hdfs.TestHDFSTrash FAILED
 Test org.apache.hadoop.hdfs.TestLease FAILED
 Test org.apache.hadoop.hdfs.TestLeaseRecovery FAILED
 Test org.apache.hadoop.hdfs.TestLocalDFS FAILED
 Test org.apache.hadoop.hdfs.TestMissingBlocksAlert FAILED
 Test org.apache.hadoop.hdfs.TestPread FAILED
 Test org.apache.hadoop.hdfs.TestQuota FAILED
 Test org.apache.hadoop.hdfs.TestRestartDFS FAILED

 and more

 I do get some warnings before the tests start:
 Clover not found. Code coverage reports disabled.
 warning: [options] bootstrap class path not set in conjunction with -source
 1.6
 3/04/27 21:29:44 INFO mortbay.log: Logging to
 org.slf4j.impl.Log4jLoggerAdapter(org.mortbay.log) via
 org.mortbay.log.Slf4jLog
 Trying to override old definition of task jsp-compile
 warning: [options] bootstrap class path not set in conjunction with -source
 1.6
 Note: Some input files use unchecked or unsafe operations.
 Note: Recompile with -Xlint:unchecked for details.

 Thanks,

 Amit.




-- 
Harsh J


[jira] [Resolved] (HADOOP-9510) DU command should provide a -h flag to display a more human readable format.

2013-04-25 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9510.
-

Resolution: Not A Problem

This is already available in the revamped shell apps under 2.x releases today; 
see 
http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/FileSystemShell.html#du

 DU command should provide a -h flag to display a more human readable format.
 

 Key: HADOOP-9510
 URL: https://issues.apache.org/jira/browse/HADOOP-9510
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Corey J. Nolet
Priority: Minor

 Would be useful to have the sizes print out as 500M or 3.4G instead of bytes 
 only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9496) Bad merge of HADOOP-9450 on branch-2 breaks all bin/hadoop calls that need HADOOP_CLASSPATH

2013-04-23 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9496.
-

   Resolution: Fixed
Fix Version/s: 2.0.5-beta

Committed revision 1471230 to fix this properly.

 Bad merge of HADOOP-9450 on branch-2 breaks all bin/hadoop calls that need 
 HADOOP_CLASSPATH 
 

 Key: HADOOP-9496
 URL: https://issues.apache.org/jira/browse/HADOOP-9496
 Project: Hadoop Common
  Issue Type: Bug
  Components: bin
Affects Versions: 2.0.5-beta
Reporter: Gopal V
Assignee: Harsh J
Priority: Critical
 Fix For: 2.0.5-beta

 Attachments: HADOOP-9496.patch


 Merge of HADOOP-9450 to branch-2 is broken for hadoop-config.sh
 on trunk
 http://svn.apache.org/viewvc/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh?r1=1453486r2=1469214pathrev=1469214
 vs on branch-2
 http://svn.apache.org/viewvc/hadoop/common/branches/branch-2/hadoop-common-project/hadoop-common/src/main/bin/hadoop-config.sh?r1=1390222r2=1469215
 This is breaking all hadoop client code which needs HADOOP_CLASSPATH to be 
 set correctly.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [VOTE] Release Apache Hadoop 2.0.4-alpha

2013-04-16 Thread Harsh J
+1

Built from source successfully, verified signatures, stood up a 1-node
cluster with CS, ran one Pi MR job, and the DistributedShell
application.

On Wed, Apr 17, 2013 at 6:10 AM, Sandy Ryza sandy.r...@cloudera.com wrote:
 +1 (non-binding)

 Built from source and ran sample jobs concurrently with the fair scheduler
 on a single node cluster.


 On Fri, Apr 12, 2013 at 2:56 PM, Arun C Murthy a...@hortonworks.com wrote:

 Folks,

 I've created a release candidate (RC2) for hadoop-2.0.4-alpha that I would
 like to release.

 The RC is available at:
 http://people.apache.org/~acmurthy/hadoop-2.0.4-alpha-rc2/
 The RC tag in svn is here:
 http://svn.apache.org/repos/asf/hadoop/common/tags/release-2.0.4-alpha-rc2

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Arun


 --
 Arun C. Murthy
 Hortonworks Inc.
 http://hortonworks.com/






-- 
Harsh J


Re: [VOTE] Release Apache Hadoop 0.23.7

2013-04-16 Thread Harsh J
+1

Downloaded sources, built successfully, stood up a 1-node cluster and
ran a Pi MR job.

On Wed, Apr 17, 2013 at 2:27 AM, Hitesh Shah hit...@hortonworks.com wrote:
 +1.

 Downloaded source, built and ran a couple of sample jobs on a single node 
 cluster.

 -- Hitesh

 On Apr 11, 2013, at 12:55 PM, Thomas Graves wrote:

 I've created a release candidate (RC0) for hadoop-0.23.7 that I would like
 to release.

 This release is a sustaining release with several important bug fixes in
 it.

 The RC is available at:
 http://people.apache.org/~tgraves/hadoop-0.23.7-candidate-0/
 The RC tag in svn is here:
 http://svn.apache.org/viewvc/hadoop/common/tags/release-0.23.7-rc0/

 The maven artifacts are available via repository.apache.org.

 Please try the release and vote; the vote will run for the usual 7 days.

 thanks,
 Tom Graves





-- 
Harsh J


Re: git clone hadoop taking too much time almost 12 hrs

2013-04-10 Thread Harsh J
I once blogged about cloning big repositories after experiencing the
mammoth Android's repos were:
http://www.harshj.com/2010/08/29/a-less-known-thing-about-cloning-git-repositories/

Try a git clone with a --depth=1 option, to reduce total download by not
getting all the history objects. This would have some side-effects vs. a
regular clone, but should be fine for contributions.


On Wed, Apr 10, 2013 at 11:53 PM, mugisha moses mossp...@gmail.com wrote:

 The whole repo is like 290 mb  so make sure you have a decent internet
 connection


 On Wed, Apr 10, 2013 at 9:03 PM, maisnam ns maisnam...@gmail.com wrote:

  Thanks Andrew for your suggestion,I will clone it from the mirror.
 
  Regards
  Niranjan Singh
 
 
  On Wed, Apr 10, 2013 at 11:04 PM, Andrew Wang andrew.w...@cloudera.com
  wrote:
 
   Hi Niranjan,
  
   Try doing your initial clone from the github mirror instead, I found it
  to
   be much faster:
  
   https://github.com/apache/hadoop-common
  
   I use the apache git for subsequent pulls.
  
   Best,
   Andrew
  
  
   On Tue, Apr 9, 2013 at 6:15 PM, maisnam ns maisnam...@gmail.com
 wrote:
  
Hi,
   
I am trying to execute  - git clone git://
git.apache.org/hadoop-common.git so that I could setup a development
environment for Hadoop under the Eclipse IDE but it is taking too
 much
time.
   
Can somebody let me know why it is taking too much time, I have a
 high
speed internet connection and I don't think connectivity is the issue
   here.
   
Thanks
Niranjan Singh
   
  
 




-- 
Harsh J


Re: Building Hadoop from source code

2013-04-09 Thread Harsh J
 with ant so
 the
   ant
 command I gave is  ant clean compile bin-package. Don't forget
 to
download
 ivy.jar and copy into you ant home/ lib folder. Once the build
 is
 triggered
 Hadoop should get built along with the changes you made.

 If , I am not mistaken , you modified some hadoop files say
 BlockLocation.java, in your
 Hadoopx.x\src\core\org\apache\hadoop\fs\BlockLocation.java.

 The jar will be in
 Hadoopx.x\build\hadoop-0.20.3-dev-core.jar(In
  my
 version)

 Hope this clears your doubt.

 Regards
 Niranjan Singh


 On Tue, Apr 9, 2013 at 1:38 PM, Mohammad Mustaqeem
 3m.mustaq...@gmail.comwrote:

  @Steve
  I am new to Hadoop developement.
  Can you please tell me, whats is the location of tar file??
 
 
  On Tue, Apr 9, 2013 at 12:09 AM, Steve Loughran 
ste...@hortonworks.com
  wrote:
 
   On 8 April 2013 16:08, Mohammad Mustaqeem 
3m.mustaq...@gmail.com
  wrote:
  
Please, tell what I am doing wrong??
Whats the problem??
   
  
   a lot of these seem to be network-related tests. You can
 turn
off all
 the
   tests; look in BUILDING.TXT at the root of the source tree
  for
the
  various
   operations, then add -DskipTests to the end of every
 command,
such as
  
   mvn package -Pdist -Dtar -DskipTests
  
   to build the .tar packages
  
mvn package -Pdist -Dtar -DskipTests
  -Dmaven.javadoc.skip=true
   to turn off the javadoc creation too, for an even faster
  build
  
 
 
 
  --
  *With regards ---*
  *Mohammad Mustaqeem*,
  M.Tech (CSE)
  MNNIT Allahabad
  9026604270
 




 --
 *With regards ---*
 *Mohammad Mustaqeem*,
  M.Tech (CSE)
 MNNIT Allahabad
 9026604270



   
   
--
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270
   
   
   
   
   
   
--
*With regards ---*
*Mohammad Mustaqeem*,
M.Tech (CSE)
MNNIT Allahabad
9026604270
   
   
   
  
  
   --
   http://www.lingcc.com
  
 
 
 
  --
  *With regards ---*
  *Mohammad Mustaqeem*,
  M.Tech (CSE)
  MNNIT Allahabad
  9026604270
 



 --
 http://www.lingcc.com




 --
 *With regards ---*
 *Mohammad Mustaqeem*,
 M.Tech (CSE)
 MNNIT Allahabad
 9026604270



--
Harsh J


[jira] [Created] (HADOOP-9461) JobTracker and NameNode both grant delegation tokens to non-secure clients

2013-04-06 Thread Harsh J (JIRA)
Harsh J created HADOOP-9461:
---

 Summary: JobTracker and NameNode both grant delegation tokens to 
non-secure clients
 Key: HADOOP-9461
 URL: https://issues.apache.org/jira/browse/HADOOP-9461
 Project: Hadoop Common
  Issue Type: Bug
  Components: security
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


If one looks at the MAPREDUCE-1516 added logic in JobTracker.java's 
isAllowedDelegationTokenOp() method, and apply non-secure states of 
UGI.isSecurityEnabled == false and authMethod == SIMPLE, the return result is 
true when the intention is false (due to the shorted conditionals).

This is allowing non-secure JobClients to easily request and use 
DelegationTokens and cause unwanted errors to be printed in the JobTracker when 
the renewer attempts to run. Ideally such clients ought to get an error if they 
request a DT in non-secure mode.

HDFS in trunk and branch-1 both too have the same problem. Trunk MR 
(HistoryServer) and YARN are however, unaffected due to a simpler, inlined 
logic instead of reuse of this faulty method.

Note that fixing this will break Oozie today, due to the merged logic of 
OOZIE-734. Oozie will require a fix as well if this is to be fixed in branch-1. 
As a result, I'm going to mark this as an Incompatible Change.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Error in first build : Cannot run program protoc: CreateProcess error=2

2013-04-03 Thread Harsh J
I'm not sure if trunk works with 2.5.x protoc yet. Try again with
2.4.1 please? I remember filing a JIRA for this, will hunt and send in
a bit.

On Wed, Apr 3, 2013 at 6:47 PM, Chandrashekhar Kotekar
shekhar.kote...@gmail.com wrote:
 Hi,

 Just now I have downloaded Hadoop source code. I have successfully run mvn
 clean target but while trying mvn install target I am getting following
 error :

 *[INFO] Executed tasks
 [INFO]
 [INFO] --- hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) @
 hadoop-common ---
 [WARNING] [protoc,
 --java_out=D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\target\generated-sources\java,
 -ID:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto,
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\HAServiceProtocol.proto,
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\IpcConnectionContext.proto,
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ProtobufRpcEngine.proto,
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ProtocolInfo.proto,
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\RpcHeader.proto,
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\Security.proto,
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ZKFCProtocol.proto]
  failed: java.io.IOException: Cannot run program protoc: CreateProcess
 error=2, The system cannot find the file specified
 [ERROR] protoc compiler error
 [INFO]
 
 [INFO] Reactor Summary:*

 I have copied exe file of Protoc and added path to that exe file in PATH
 variable. When I run echo %PATH% I get output something like below:

 *C:\Users\shekharecho %PATH%*
 *C:\Program Files\TortoiseSVN\bin;C:\Program
 Files\Java\jdk1.7.0_11\bin;D:\softwares\apache-maven-3.0.5\bin;D:\softwares\protoc-2.5.0-win32;D:\softwares\apache-ant-1.9.0\bin;
 *

 So I think I have successfully put protoc in my path.

 I would like to know why this error is coming. Request someone to please
 help.


 Thanks and Regards,
 Chandrash3khar K0tekar
 Mobile - 8884631122



-- 
Harsh J


Re: Error in first build : Cannot run program protoc: CreateProcess error=2

2013-04-03 Thread Harsh J
So this is the JIRA for tracking the protoc 2.5 issue:
https://issues.apache.org/jira/browse/HADOOP-9346.

Regarding the msbuild trouble, your issue is simply that you perhaps
do not have Visual Studio 2010 Professional/Windows SDK installed?
Apache Hadoop trunk has Windows specific build profiles in it since it
now supports the Windows platform but it has its own requirements of
build. Check out the section Building on Windows under your
checkout/clone's BUILDING.txt or
http://svn.apache.org/repos/asf/hadoop/common/trunk/BUILDING.txt.

I'm not currently sure how to turn the auto-detection off to make it
compile as it would on Linux minus natives (like how it used to,
before the feature drop), but perhaps there should be a way to allow
that.

On Wed, Apr 3, 2013 at 8:16 PM, Chandrashekhar Kotekar
shekhar.kote...@gmail.com wrote:
 Thanks a lot for your help.

 You were right. Problem was with Protoc version 1.5 only. I downloaded and
 added protoc 1.4 version and now that error is gone. However now I am stuck
 at this new error. Now maven is not able to find msbuild. Error is as
 follows :

 [ERROR] Failed to execute goal org.codehaus.mojo:exec-maven-plugin:1.2:exec
 (compile-ms-winutils) on project hadoop-common: Command execution failed.
 Cannot run program msbuild (in directory
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common):
 CreateProcess error=2, The system cannot find the file specified - [Help 1]


 Can you please help with this error? I googled for this kind of error but
 couldn't find anything related to this error.

 One more thing I would like to know, how to search into mailing list for
 this kind of errors?

 I think maybe many people before me must have faced these type of errors.
 If I could search into mailing list or some forum related to Hadoop then I
 do not need to disturb all the people in the mailing list for this kind of
 trivial errors and people can concentrate on more important stuff.



 Regards,
 Chandrash3khar K0tekar
 Mobile - 8884631122


 On Wed, Apr 3, 2013 at 7:21 PM, Harsh J ha...@cloudera.com wrote:

 I'm not sure if trunk works with 2.5.x protoc yet. Try again with
 2.4.1 please? I remember filing a JIRA for this, will hunt and send in
 a bit.

 On Wed, Apr 3, 2013 at 6:47 PM, Chandrashekhar Kotekar
 shekhar.kote...@gmail.com wrote:
  Hi,
 
  Just now I have downloaded Hadoop source code. I have successfully run
 mvn
  clean target but while trying mvn install target I am getting
 following
  error :
 
  *[INFO] Executed tasks
  [INFO]
  [INFO] --- hadoop-maven-plugins:3.0.0-SNAPSHOT:protoc (compile-protoc) @
  hadoop-common ---
  [WARNING] [protoc,
 
 --java_out=D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\target\generated-sources\java,
 
 -ID:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto,
 
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\HAServiceProtocol.proto,
 
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\IpcConnectionContext.proto,
 
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ProtobufRpcEngine.proto,
 
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ProtocolInfo.proto,
 
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\RpcHeader.proto,
 
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\Security.proto,
 
 D:\HadoopSource\hadoop-trunk\hadoop-common-project\hadoop-common\src\main\proto\ZKFCProtocol.proto]
   failed: java.io.IOException: Cannot run program protoc: CreateProcess
  error=2, The system cannot find the file specified
  [ERROR] protoc compiler error
  [INFO]
  
  [INFO] Reactor Summary:*
 
  I have copied exe file of Protoc and added path to that exe file in PATH
  variable. When I run echo %PATH% I get output something like below:
 
  *C:\Users\shekharecho %PATH%*
  *C:\Program Files\TortoiseSVN\bin;C:\Program
 
 Files\Java\jdk1.7.0_11\bin;D:\softwares\apache-maven-3.0.5\bin;D:\softwares\protoc-2.5.0-win32;D:\softwares\apache-ant-1.9.0\bin;
  *
 
  So I think I have successfully put protoc in my path.
 
  I would like to know why this error is coming. Request someone to please
  help.
 
 
  Thanks and Regards,
  Chandrash3khar K0tekar
  Mobile - 8884631122



 --
 Harsh J




-- 
Harsh J


Re: Hadoop Eclipse plug-in distribution

2013-04-03 Thread Harsh J
Hi Rafael,

We do have a new project developing the plugin separately. Please
check out and join the development efforts at
http://hdt.incubator.apache.org (Hadoop Development Tools).

On Thu, Apr 4, 2013 at 2:01 AM, Rafael Medeiros Teixeira
rafaelmed...@gmail.com wrote:
 Hi,

 First of all I'm not sure if this is the correct audience for my question,
 apologies if it is not.

 I have seen many accounts of Hadoop developers who had a hard time figuring
 out how to compile and install the Hadoop plug-in for Eclipse. My question
 is: are there any reasons, technical or otherwise, to distribute the
 plug-in as source code with Hadoop releases instead of as an Eclipse
 project? Having the plug-in as a separate Eclipse project would allow
 installling it via update site, which is the most common way of installing
 plug-ins. I also think it makes sense from a development standpoint, since
 contributors to the eclipse plugin are rarely the same ones that contribute
 to MapReduce.

 Regards,
 Rafael M.



-- 
Harsh J


Re: Reading partition for reducer

2013-04-01 Thread Harsh J
The question should be more specific here: Do you want to process a
map's sorted total output or do you want to pre-process a whole
partition (i.e. all data pertaining to one reducer)? Former would be
more ideal inside MapTask.java, latter in ReduceTask.java.

On Mon, Apr 1, 2013 at 5:36 PM, Vikas Jadhav vikascjadha...@gmail.com wrote:
 Hello

 I want to process output of mapper to processed before it is sent to
 reducer.

 @ what point i should hook in my code processing


 i guess it is ReduceTask.java file

 if anyone knows reagarding this please help me in this.


 Thank You.


 --
 *
 *
 *

 Thanx and Regards*
 * Vikas Jadhav*



-- 
Harsh J


Re: Rack Awareness

2013-03-25 Thread Harsh J
Have you locally tested the script? What do you mean by not work? Do
you not see it loaded, not see it send back proper values, etc. -
what?

P.s. What version?

P.s. User questions are not to be sent to the developer/issue lists.
They should be sent just to u...@hadoop.apache.org. Thanks!

On Mon, Mar 25, 2013 at 3:50 PM, preethi ganeshan
preethiganesha...@gmail.com wrote:
 Hi,
 I wanted to known how to use a network topology script . I have set the
 net.topology.script.file.name to the topology script . But it does not
 work. What else must be done ?

 Thank you
 Regards ,
 Preethi



-- 
Harsh J


Re: shuffling one intermediate pair to more than one reducer

2013-03-24 Thread Harsh J
This one is rather easy. Not sure why you'd open a JIRA for a request.
JIRA is to be used for feature requests and bug/improvement requests.

Also, wasn't this discussed some time back by you? See
http://search-hadoop.com/?q=%22pair+from+mapper+to+multiple+reducer%22
for the many replies of solutions you already received. If you
disregarded them all for some reason, please state so when following
up with the same question all over again.

P.s. Please do not cross post to all lists you know of. Use one list
per question, based on relevancy, and help avoid confusion.

On Mon, Mar 25, 2013 at 10:33 AM, Vikas Jadhav vikascjadha...@gmail.com wrote:
 Hello

 I have use case where i want to shuffle same pair to more than one reducer.
 is there anyone tried this or can give suggestion how to implement it.


 I have crated jira for same
 https://issues.apache.org/jira/browse/MAPREDUCE-5063

 Thank you.
 --


 Thanx and Regards
  Vikas Jadhav



--
Harsh J


Re: the part of the intermediate output fed to a reducer

2013-03-23 Thread Harsh J
Hi,

On Sun, Mar 24, 2013 at 12:00 AM, preethi ganeshan
preethiganesha...@gmail.com wrote:
 Hey all,
 I am working on project that schedules data local reduce tasks.

Great, are you planning to contribute it upstream too? See
https://issues.apache.org/jira/browse/MAPREDUCE-199. I'm also hoping
you're working on trunk and not the maintenance branch branch-1, which
is very outdated with where MR is today.

 However , i wanted to know if there is a way using MapTask.java to keep track 
 of the
 inputs and size of the input to every reducer. In other words what code do
 i add to get the size of the intermediate output that is fed to a reduce
 task before a reduce task begins.

Change the thinking here a bit: A map does not feed a reduce (i.e. its
not a push). A reduce consumes a map output after its completion (they
map task JVM may terminate for all it cares). Upon a map's completion,
its counters are available at the central (i.e. the ApplicationMaster)
which the reduce task can poll for sizes (it may already be doing
this).

--
Harsh J


[jira] [Resolved] (HADOOP-2781) Hadoop/Groovy integration

2013-03-22 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-2781?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-2781.
-

Resolution: Won't Fix

Closing per comment below and inactive for couple years now:

bq. Grool was a dead end.

Possible alternatives (given FlumeJava's mention): Apache Crunch - 
http://crunch.apache.org and/or Cascading - http://cascading.org.

 Hadoop/Groovy integration
 -

 Key: HADOOP-2781
 URL: https://issues.apache.org/jira/browse/HADOOP-2781
 Project: Hadoop Common
  Issue Type: New Feature
 Environment: Any
Reporter: Ted Dunning
 Attachments: trunk.tgz


 This is a place-holder issue to hold initial release of the groovy 
 integration for hadoop.
 The goal is to be able to write very simple map-reduce programs in just a few 
 lines of code in a functional style.  Word count should be less than 5 lines 
 of code! 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9424) The hadoop jar invocation should include the passed jar on the classpath as a whole

2013-03-21 Thread Harsh J (JIRA)
Harsh J created HADOOP-9424:
---

 Summary: The hadoop jar invocation should include the passed jar 
on the classpath as a whole
 Key: HADOOP-9424
 URL: https://issues.apache.org/jira/browse/HADOOP-9424
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Assignee: Harsh J
Priority: Minor


When you have a case such as this:

{{X.jar - Classes = Main, Foo}}
{{Y.jar - Classes = Bar}}

With implementation details such as:

* Main references Bar and invokes a public, static method on it.
* Bar does a class lookup to find Foo (Class.forName(Foo)).

Then when you do a {{HADOOP_CLASSPATH=Y.jar hadoop jar X.jar Main}}, the Bar's 
method fails with a ClassNotFound exception cause of the way RunJar runs.

RunJar extracts the passed jar and includes its contents on the ClassLoader of 
its current thread but the {{Class.forName(…)}} call from another class does 
not check that class loader and hence cannot find the class as its not on any 
classpath it is aware of.

The script of hadoop jar should ideally include the passed jar argument to 
the CLASSPATH before RunJar is invoked, for this above case to pass.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Help on Hadoop wiki page

2013-03-21 Thread Harsh J
Thanks Robin, I've added your ID into the contributor's group. You should
be able to make the edits yourself now - please go ahead.


On Thu, Mar 21, 2013 at 6:41 PM, Robin Schumacher ro...@datastax.comwrote:

 Harsh - thanks for getting back to me and pointing my request to the right
 place. My ID is RobinSchumacher.

 Regards,

 --Robin


 On Wed, Mar 20, 2013 at 8:33 PM, Harsh J ha...@cloudera.com wrote:

 Hi Robin,

 Moving this to common-dev as thats the right list to send this to. Can
 you pass us your Apache Hadoop Wiki user ID so we can add you in as a
 contributor and you can edit this in yourself (and make any other
 contributions as well)?

 On Wed, Mar 20, 2013 at 8:06 PM, Robin Schumacher ro...@datastax.com
 wrote:
  I apologize in advance if there is another address I should be sending
 this
  to, but could someone please add DataStax to the commercial support
 page on
  the wiki (http://wiki.apache.org/hadoop
  /Distributions%20and%20Commercial%20Support)?
 
  Below is the text we'd like used. Please let us know if you have any
  questions or need any of the text changed.
 
  Thanks in advance.
 
  Robin Schumacher
  VP Products, DataStax
 
 
  DataStax provides a distribution of Hadoop that is fully integrated
 with Apache
  Cassandra
 https://datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra
 
   and Apache Solr
 https://datastax.com/what-we-offer/products-services/datastax-enterprise/apache-solr
 
  in
  its DataStax Enterprise
  platform
 https://datastax.com/what-we-offer/products-services/datastax-enterprise
 .
  DataStax Enterprise is completely free to use for development
 environments
  with no restrictions. In addition, DataStax supplies
  OpsCenter
 https://datastax.com/what-we-offer/products-services/datastax-opscenter
  for
  visual management and monitoring, along with expert
  supporthttps://datastax.com/what-we-offer/products-services/support
  , training 
 https://datastax.com/what-we-offer/products-services/training,
  andconsulting services
 https://datastax.com/what-we-offer/products-services/consulting
   for Hadoop, Cassandra, and Solr.



 --
 Harsh J




 --
 http://www.datastax.com/events/cassandrasummit2013




-- 
Harsh J


Re: Help on Hadoop wiki page

2013-03-20 Thread Harsh J
Hi Robin,

Moving this to common-dev as thats the right list to send this to. Can
you pass us your Apache Hadoop Wiki user ID so we can add you in as a
contributor and you can edit this in yourself (and make any other
contributions as well)?

On Wed, Mar 20, 2013 at 8:06 PM, Robin Schumacher ro...@datastax.com wrote:
 I apologize in advance if there is another address I should be sending this
 to, but could someone please add DataStax to the commercial support page on
 the wiki (http://wiki.apache.org/hadoop
 /Distributions%20and%20Commercial%20Support)?

 Below is the text we'd like used. Please let us know if you have any
 questions or need any of the text changed.

 Thanks in advance.

 Robin Schumacher
 VP Products, DataStax


 DataStax provides a distribution of Hadoop that is fully integrated with 
 Apache
 Cassandrahttps://datastax.com/what-we-offer/products-services/datastax-enterprise/apache-cassandra
  and Apache 
 Solrhttps://datastax.com/what-we-offer/products-services/datastax-enterprise/apache-solr
 in
 its DataStax Enterprise
 platformhttps://datastax.com/what-we-offer/products-services/datastax-enterprise.
 DataStax Enterprise is completely free to use for development environments
 with no restrictions. In addition, DataStax supplies
 OpsCenterhttps://datastax.com/what-we-offer/products-services/datastax-opscenter
 for
 visual management and monitoring, along with expert
 supporthttps://datastax.com/what-we-offer/products-services/support
 , training https://datastax.com/what-we-offer/products-services/training,
 andconsulting 
 serviceshttps://datastax.com/what-we-offer/products-services/consulting
  for Hadoop, Cassandra, and Solr.



-- 
Harsh J


[jira] [Resolved] (HADOOP-6942) Ability for having user's classes take precedence over the system classes for tasks' classpath

2013-03-19 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-6942.
-

Resolution: Duplicate

Fixed via MAPREDUCE-1938. Closing as dupe.

 Ability for having user's classes take precedence over the system classes for 
 tasks' classpath
 --

 Key: HADOOP-6942
 URL: https://issues.apache.org/jira/browse/HADOOP-6942
 Project: Hadoop Common
  Issue Type: Improvement
  Components: scripts
Affects Versions: 0.22.0
Reporter: Krishna Ramachandran
 Attachments: HADOOP-6942.y20.patch, hadoop-common-6942.patch


 Fix bin/hadoop script to facilitate mapred-1938

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: how to define new InputFormat with streaming?

2013-03-17 Thread Harsh J
The issue is that Streaming expects the old/stable MR API
(org.apache.hadoop.mapred.InputFormat) as its input format class, but your
WholeFileInputFormat is using the new MR API
(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form
will let you pass.

This has nothing to do with your version/distribution of Hadoop.


On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran ste...@hortonworks.comwrote:

 On 15 March 2013 09:18, springring springr...@126.com wrote:

   Hi,
 
   my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new
  InputFormat in hadoop book , but there is error
  class org.apache.hadoop.streaming.WholeFileInputFormat not
  org.apache.hadoop.mapred.InputFormat
 
  Hadoop version is 0.20, but the streaming still depend on 0.10 mapred
 api?
 


 1. please don't spam all the lists
 2. grab a later version of the apache releases if you want help on them on
 these mailing lists, or go to the cloudera lists, where they will probably
 say upgrade to CDH 4.x before asking questions.

 thanks




-- 
Harsh J


Re: Re: how to define new InputFormat with streaming?

2013-03-17 Thread Harsh J
It isn't as easy as changing that import line:

 package org.apache.hadoop.mapred.lib.input does not exist

The right package is package org.apache.hadoop.mapred.

On Mon, Mar 18, 2013 at 7:22 AM, springring springr...@126.com wrote:
 thanks
 I modify the java file with old mapred API, but there is still error

  javac -classpath 
 /usr/lib/hadoop/hadoop-core-0.20.2-cdh3u3.jar:/usr/lib/hadoop/lib/* -d class9 
 ./*.java
 ./WholeFileInputFormat.java:16: error: package 
 org.apache.hadoop.mapred.lib.input does not exist
 import org.apache.hadoop.mapred.lib.input.*;

 does it because hadoop-0.20.2-cdh3u3 not include mapred API?






 At 2013-03-17 14:22:43,Harsh J ha...@cloudera.com wrote:
The issue is that Streaming expects the old/stable MR API
(org.apache.hadoop.mapred.InputFormat) as its input format class, but your
WholeFileInputFormat is using the new MR API
(org.apache.hadoop.mapreduce.lib.input.InputFormat). Using the older form
will let you pass.

This has nothing to do with your version/distribution of Hadoop.


On Fri, Mar 15, 2013 at 4:28 PM, Steve Loughran ste...@hortonworks.comwrote:

 On 15 March 2013 09:18, springring springr...@126.com wrote:

   Hi,
 
   my hadoop version is Hadoop 0.20.2-cdh3u3 and I want to define new
  InputFormat in hadoop book , but there is error
  class org.apache.hadoop.streaming.WholeFileInputFormat not
  org.apache.hadoop.mapred.InputFormat
 
  Hadoop version is 0.20, but the streaming still depend on 0.10 mapred
 api?
 


 1. please don't spam all the lists
 2. grab a later version of the apache releases if you want help on them on
 these mailing lists, or go to the cloudera lists, where they will probably
 say upgrade to CDH 4.x before asking questions.

 thanks




--
Harsh J



--
Harsh J


Re: [VOTE] Plan to create release candidate for 0.23.7

2013-03-16 Thread Harsh J
+1


On Sat, Mar 16, 2013 at 12:19 AM, Karthik Kambatla ka...@cloudera.comwrote:

 +1 (non-binding)

 On Fri, Mar 15, 2013 at 9:12 AM, Robert Evans ev...@yahoo-inc.com wrote:

  +1
 
  On 3/13/13 11:31 AM, Thomas Graves tgra...@yahoo-inc.com wrote:
 
  Hello all,
  
  I think enough critical bug fixes have went in to branch-0.23 that
  warrant another release. I plan on creating a 0.23.7 release by the end
  March.
  
  Please vote '+1' to approve this plan.  Voting will close on Wednesday
  3/20 at 10:00am PDT.
  
  Thanks,
  Tom Graves
  (release manager)
 
 




-- 
Harsh J


Re: [VOTE] Plan to create release candidate Monday 3/18

2013-03-16 Thread Harsh J
+1


On Fri, Mar 15, 2013 at 9:53 PM, Robert Evans ev...@yahoo-inc.com wrote:

 +1

 On 3/10/13 10:38 PM, Matt Foley ma...@apache.org wrote:

 Hi all,
 I have created branch-1.2 from branch-1, and propose to cut the first
 release candidate for 1.2.0 on Monday 3/18 (a week from tomorrow), or as
 soon thereafter as I can achieve a stable build.
 
 Between 1.1.2 and the current 1.2.0, there are 176 patches!!  Draft
 release
 notes are available at .../branch-1.2/src/docs/releasenotes.html in the
 sources.
 
 Any non-destabilizing patches committed to branch-1.2 during the coming
 week (and of course also committed to branch-1) will be included in the
 RC.
  However, at this point I request that any big new developments not yet in
 branch-1.2 be targeted for 1.3.
 
 Release plans have to be voted on too, so please vote '+1' to approve this
 plan.  Voting will close on Sunday 3/17 at 8:30pm PDT.
 
 Thanks,
 --Matt
 (release manager)




-- 
Harsh J


Re: Technical question on Capacity Scheduler.

2013-03-05 Thread Harsh J
The CS does support running jobs in parallel. Are you observing just
the UI or are also noticing a FIFO behavior in logs where assignments
can be seen with timestamps?

On Wed, Mar 6, 2013 at 9:03 AM, Jagmohan Chauhan
simplefundumn...@gmail.com wrote:
 Hi All

 Can someone please reply to my queries?

 On Sun, Mar 3, 2013 at 5:47 PM, Jagmohan Chauhan simplefundumn...@gmail.com
 wrote:

 Thanks Harsh.

 I have a few more questions.

 Q1: I found it in my experiments using CS that for any user , its next job
 does not start until its current one is finished. Is it true and are there
 any exceptions and if true then why is it so?  I I did not find any such
 condition in the implementation of CS.

 Q2: The concept of reserved slots  is true only if speculative execution
 is on. Am i correct ? If yes,then the code dealing with reserved slots wont
 be executed if speculative execution is off?

 PS: I am working on MRv1.


 On Sun, Mar 3, 2013 at 2:41 AM, Harsh J ha...@cloudera.com wrote:

 On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan 
 simplefundumn...@gmail.com
  wrote:

   Hi
 
  I am going through the Capacity Scheduler implementation. There is one
  thing i did not understand clearly.
 

 Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd
 suggest reading the newer one for any implementation or research goals,
 for
 it to be more current and future-applicable.


  1. Does the o ff-switch task refers to a task in which data has to be
  fetched over the network. It means its not node-local ?
 

 Off-switch would imply off-rack, i.e. not node local, nor rack-local.


  2. Does off-switch task  includes only the tasks for which map input
 has to
  be fetched from a node on a different rack across the switch or it also
  includes task where data has to be fetched from another node on same
 rack
  on same switch?
 

 A task's input split is generally supposed to define all locations of
 available inputs. If the CS is unable to schedule to any of those
 locations, nor their racks, then it schedules an off-rack (see above) task
 which has to pull the input from a different rack.


 
  --
  Thanks and Regards
  Jagmohan Chauhan
  MSc student,CS
  Univ. of Saskatchewan
  IEEE Graduate Student Member
 
  http://homepage.usask.ca/~jac735/
 

 Feel free to post any further impl. related questions! :)

 --
 Harsh J




 --
 Thanks and Regards
 Jagmohan Chauhan
 MSc student,CS
 Univ. of Saskatchewan
 IEEE Graduate Student Member

 http://homepage.usask.ca/~jac735/




 --
 Thanks and Regards
 Jagmohan Chauhan
 MSc student,CS
 Univ. of Saskatchewan
 IEEE Graduate Student Member

 http://homepage.usask.ca/~jac735/



--
Harsh J


Re: [Vote] Merge branch-trunk-win to trunk

2013-03-04 Thread Harsh J
Thanks Suresh. Regarding where; we can state it on
http://wiki.apache.org/hadoop/HowToContribute in the test-patch
section perhaps.

+1 on the merge.

On Mon, Mar 4, 2013 at 11:39 PM, Suresh Srinivas sur...@hortonworks.com wrote:
 On Sun, Mar 3, 2013 at 8:50 PM, Harsh J ha...@cloudera.com wrote:

 Have we agreed (and stated it somewhere proper) that a -1 obtained for
 a Windows CI build for a test-patch will not block the ongoing work
 (unless it is Windows specific) and patches may still be committed to
 trunk despite that?


 This thread is long and possibly hard to follow. Yes, I and several others
 have
 stated that for now it is okay to commit even if Windows precommit build
 posts -1.


 I'm +1 if someone can assert and add the above into the formal
 guidelines. I'd still prefer that Windows does its releases separately
 as that ensures more quality for its audience and better testing
 periods (and wouldn't block anything), but we can come to that iff we
 are unable to maintain the currently proposed model.


 Which do you think is the right place to add this?

 At this time we are voting for merging into trunk. I prefer having a single
 release
 that supports both Linux and windows. Based on working on Windows support
 I think this is doable and should not hold up releases for Linux.



--
Harsh J


Re: Technical question on Capacity Scheduler.

2013-03-03 Thread Harsh J
On Sun, Mar 3, 2013 at 1:41 PM, Jagmohan Chauhan simplefundumn...@gmail.com
 wrote:

  Hi

 I am going through the Capacity Scheduler implementation. There is one
 thing i did not understand clearly.


Are you reading the YARN CapacityScheduler or the older, MRv1 one? I'd
suggest reading the newer one for any implementation or research goals, for
it to be more current and future-applicable.


 1. Does the o ff-switch task refers to a task in which data has to be
 fetched over the network. It means its not node-local ?


Off-switch would imply off-rack, i.e. not node local, nor rack-local.


 2. Does off-switch task  includes only the tasks for which map input has to
 be fetched from a node on a different rack across the switch or it also
 includes task where data has to be fetched from another node on same rack
 on same switch?


A task's input split is generally supposed to define all locations of
available inputs. If the CS is unable to schedule to any of those
locations, nor their racks, then it schedules an off-rack (see above) task
which has to pull the input from a different rack.



 --
 Thanks and Regards
 Jagmohan Chauhan
 MSc student,CS
 Univ. of Saskatchewan
 IEEE Graduate Student Member

 http://homepage.usask.ca/~jac735/


Feel free to post any further impl. related questions! :)

-- 
Harsh J


Re: [Vote] Merge branch-trunk-win to trunk

2013-03-03 Thread Harsh J
Have we agreed (and stated it somewhere proper) that a -1 obtained for
a Windows CI build for a test-patch will not block the ongoing work
(unless it is Windows specific) and patches may still be committed to
trunk despite that?

I'm +1 if someone can assert and add the above into the formal
guidelines. I'd still prefer that Windows does its releases separately
as that ensures more quality for its audience and better testing
periods (and wouldn't block anything), but we can come to that iff we
are unable to maintain the currently proposed model.

On Mon, Mar 4, 2013 at 7:39 AM, Tsuyoshi OZAWA ozawa.tsuyo...@gmail.com wrote:
 +1 (non-binding),

 Windows support is attractive for lots users.
 From point a view from Hadoop developer, Matt said that CI supports
 cross platform testing, and it's quite reasonable condition to merge.

 Thanks,
 Tsuyoshi



--
Harsh J


[jira] [Created] (HADOOP-9346) Upgrading to protoc 2.5.0 fails the build

2013-02-28 Thread Harsh J (JIRA)
Harsh J created HADOOP-9346:
---

 Summary: Upgrading to protoc 2.5.0 fails the build
 Key: HADOOP-9346
 URL: https://issues.apache.org/jira/browse/HADOOP-9346
 Project: Hadoop Common
  Issue Type: Task
Reporter: Harsh J
Priority: Minor


Reported over the impala lists, one of the errors received is:

{code}
src/hadoop-common-project/hadoop-common/target/generated-sources/java/org/apache/hadoop/ha/proto/ZKFCProtocolProtos.java:[104,37]
 can not find symbol.
symbol: class Parser
location: package com.google.protobuf
{code}

Worth looking into as we'll eventually someday bump our protobuf deps.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: [Vote] Merge branch-trunk-win to trunk

2013-02-27 Thread Harsh J
 since, the Hadoop team in Microsoft have
 made significant progress in the following areas:
 (PS: Some of these items are already included in Suresh's email, but
 including again for completeness)

 - Command-line scripts for the Hadoop surface area
 - Mapping the HDFS permissions model to Windows
 - Abstracted and reconciled mismatches around differences in Path
 semantics in Java and Windows
 - Native Task Controller for Windows
 - Implementation of a Block Placement Policy to support cloud
 environments, more specifically Azure.
 - Implementation of Hadoop native libraries for Windows (compression
 codecs, native I/O) - Several reliability issues, including
 race-conditions, intermittent test failures, resource leaks.
 - Several new unit test cases written for the above changes

 In the process, we have closely engaged with the Apache open source
 community and have got great support and assistance from the community
in
 terms of contributing fixes, code review comments and commits.

 In addition, the Hadoop team at Microsoft has also made good progress in
 other projects including Hive, Pig, Sqoop, Oozie, HCat and HBase. Many
of
 these changes have already been committed to the respective trunks with
 help from various committers and contributors. It is great to see the
 commitment of the community to support multiple platforms, and we look
 forward to the day when a developer/customer is able to successfully
deploy
 a complete solution stack based on Apache Hadoop releases.

 Next Steps:

 All of the above changes are part of the Windows Azure HDInsight and
 HDInsight Server products from Microsoft. We have successfully
on-boarded
 several internal customers and have been running production workloads on
 Windows Azure HDInsight. Our vision is to create a big data platform
based
 on Hadoop, and we are committed to helping make Hadoop a world-class
 solution that anyone can use to solve their biggest data challenges.

 As an immediate next step, we would like to have a discussion around how
 we can ensure that the quality of the mainline Hadoop branches on
Windows
 is maintained. To this end, we would like to get to the state where we
have
 pre-checkin validation gates and nightly test runs enabled on Windows.
If
 you have any suggestions around this, please do send an email.  We are
 committed to helping sustain the long-term quality of Hadoop on both
Linux
 and Windows.

 We sincerely thank the community for their contribution and support so
 far. And hope to continue having a close engagement in the future.

 -Microsoft HDInsight Team


 -Original Message-
 From: Suresh Srinivas [mailto:sur...@hortonworks.com]
 Sent: Thursday, February 7, 2013 5:42 PM
 To: common-dev@hadoop.apache.org; yarn-...@hadoop.apache.org;
 hdfs-...@hadoop.apache.org; mapreduce-...@hadoop.apache.org
 Subject: Heads up - merge branch-trunk-win to trunk

 The support for Hadoop on Windows was proposed in HADOOP-8079
 https://issues.apache.org/jira/browse/HADOOP-8079 almost a year ago.
The
 goal was to make Hadoop natively integrated, full-featured, and
performance
 and scalability tuned on Windows Server or Windows Azure.
 We are happy to announce that a lot of progress has been made in this
 regard.

 Initial work started in a feature branch, branch-1-win, based on
branch-1.
 The details related to the work done in the branch can be seen in
 CHANGES.txt

http://svn.apache.org/viewvc/hadoop/common/branches/branch-1-win/CHANGES.
branch-1-win.txt?view=markup
 .
 This work has been ported to a branch, branch-trunk-win, based on trunk.
 Merge patch for this is available on
 HADOOP-8562https://issues.apache.org/jira/browse/HADOOP-8562
 .

 Highlights of the work done so far:
 1. Necessary changes in Hadoop to run natively on Windows. These changes
 handle differences in platforms related to path names, process/task
 management etc.
 2. Addition of winutils tools for managing file permissions and
ownership,
 user group mapping, hardlinks, symbolic links, chmod, disk utilization,
and
 process/task management.
 3. Added cmd scripts equivalent to existing shell scripts
 hadoop-daemon.sh, start and stop scripts.
 4. Addition of block placement policy implemnation to support cloud
 enviroment, more specifically Azure.

 We are very close to wrapping up the work in branch-trunk-win and
getting
 ready for a merge. Currently the merge patch is passing close to 100% of
 unit tests on Linux. Soon I will call for a vote to merge this branch
into
 trunk.

 Next steps:
 1. Call for vote to merge branch-trunk-win to trunk, when the work
 completes and precommit build is clean.
 2. Start a discussion on adding Jenkins precommit builds on windows and
 how to integrate that with the existing commit process.

 Let me know if you have any questions.

 Regards,
 Suresh




--
http://hortonworks.com/download/




--
Harsh J


Re: APIs to move data blocks within HDFS

2013-02-22 Thread Harsh J
There's no filesystem (i.e. client) level APIs to do this, but the
Balancer tool of HDFS does exactly this. Reading its sources should
let you understand what kinda calls you need to make to reuse the
balancer protocol and achieve what you need.

In trunk, the balancer is at
hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/balancer/Balancer.java

HTH, and feel free to ask any relevant follow up questions.

On Fri, Feb 22, 2013 at 11:43 PM, Karthiek C karthi...@gmail.com wrote:
 Hi,

 Is there any APIs to move data blocks in HDFS from one node to another *
 after* they have been added to HDFS? Also can we write some sort of
 pluggable module (like scheduler) that controls how data gets placed in
 hadoop cluster? I am working with hadoop-1.0.3 version and I couldn't find
 any filesystem APIs available to do that.

 PS: I am working on a research project where we want to investigate how to
 optimally place data in hadoop.

 Thanks,
 Karthiek



--
Harsh J


[jira] [Created] (HADOOP-9322) LdapGroupsMapping doesn't seem to set a timeout for its directory search

2013-02-21 Thread Harsh J (JIRA)
Harsh J created HADOOP-9322:
---

 Summary: LdapGroupsMapping doesn't seem to set a timeout for its 
directory search
 Key: HADOOP-9322
 URL: https://issues.apache.org/jira/browse/HADOOP-9322
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Priority: Minor


We don't appear to be setting a timeout via 
http://docs.oracle.com/javase/6/docs/api/javax/naming/directory/SearchControls.html#setTimeLimit(int)
 before we search with 
http://docs.oracle.com/javase/6/docs/api/javax/naming/directory/DirContext.html#search(javax.naming.Name,%20java.lang.String,%20javax.naming.directory.SearchControls).

This may occasionally lead to some unwanted NN pauses due to lock-holding on 
the operations that do group lookups. A timeout is better to define than rely 
on 0 (infinite wait).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Compile and deploy source code for Hadoop 1.0.4

2013-02-09 Thread Harsh J
Hi Trupti,

Welcome! My responses inline.

On Sat, Feb 9, 2013 at 7:59 PM, Trupti Gaikwad trups.gaik...@gmail.com wrote:
 Hi,

 I want to work on release 1.0.4 source code. As per Hadoop
 wiki HowToContribute, I can download source code from trunk or from release
 1.0.4 tag.

Although I do not know your goal here, note that the trunk is the best
place to do dev work if your goal is also to get your work accepted at
the end. We allow 1.x to continue receiving improvements but refuse
divergence in features compared to trunk and and the ongoing branch-2
releases. Just something to consider!

 1. Source code from hadoop/common/trunk with revision 1397701 corresponding
 to release 1.0.4:
 I downloaded the source with svn revision 1397701 mentioned in release tag.
 My source code gets compiled, however tar file created by build does not
 contain start-mapred.sh file? It does contain start-yarn.sh. Even if source
 revision is old, why I am not getting start-mapred.sh. I really don't want
 to use resourcemanager or node manager to run my mapred job. How can I
 start jobtracker and tasktracker?

Unfortunately SVN revisions aren't exactly what you think they are.
What you need is to actually checkout a tag, not a revision. To get a
1.0.4 tag checked out from the Apache SVN repository, your command
could be:

$ svn checkout http://svn.apache.org/repos/asf/hadoop/common/tags/release-1.0.4/
hadoop-1.0.4
$ cd hadoop-1.0.4/

Likewise, if you want to work on the tip of the 1.x branch instead,
checkout the branch branch-1:

$ svn checkout http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1/
hadoop-1
$ cd hadoop-1/

 2. Source code from tag release 1.0.4:
 Hadoop wiki also mentions that, If I want to work against any specific
 release then I will have to download release tag.
 I copied my code to src and tried to build it. However my code is not
 getting compiled because as I have developed it in above hadoop-common
 project. I am getting compilation error as there are inconsistencies in
 org.apache.hadoop.fs.FileSystem interface. Shall I develop my class by
 implementing interfaces provided in release 1.0.4?

You're attempting to build trunk (accidentally, in your case). See
above for getting proper 1.x code.

However, if you still wish to build trunk, whose build system is
different from the older 1.x system, some simple notes for building
trunk can be found here:
http://wiki.apache.org/hadoop/QwertyManiac/BuildingHadoopTrunk

 So
 1. How to get all projects from hadoop-common?
 2. what is correct way to compile and deploy any changes in core for
 release 1.0.4?

I believe I've answered both questions in the above inlines. Do feel
free to post any further questions you have!

--
Harsh J


Re: pre-historic record IO stuff, is this used anywhere?

2013-02-08 Thread Harsh J
Hadoop streaming is also tied to recordio as it is today:
https://issues.apache.org/jira/browse/MAPREDUCE-3303, but it can be
removed per Klaas.

On Sat, Feb 9, 2013 at 6:48 AM, Alejandro Abdelnur t...@cloudera.com wrote:
 This seems to be used only in tests in common and in a standalone class in
 streaming tests.

 What is the purpose of these classes as they don't seem to be used in the
 any of the source that ends up in Hadoop?

 hadoop-common-project/hadoop-common/src/test/ddl/buffer.jr
 hadoop-common-project/hadoop-common/src/test/ddl/int.jr
 hadoop-common-project/hadoop-common/src/test/ddl/string.jr
 hadoop-common-project/hadoop-common/src/test/ddl/test.jr
 hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/FromCpp.java
 hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/RecordBench.java
 hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/TestBuffer.java
 hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/TestRecordIO.java
 hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/TestRecordVersioning.java
 hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/record/ToCpp.java
 hadoop-tools/hadoop-streaming/src/test/java/org/apache/hadoop/typedbytes/TestIO.java


 I've deleted the above classes, cleaned up the common POM (not to compile
 the JR files) and everything compiles fine.

 To me all this is dead code, if so, can we nuke them?

 Thx

 --
 Alejandro



--
Harsh J


[jira] [Reopened] (HADOOP-9241) DU refresh interval is not configurable

2013-01-29 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J reopened HADOOP-9241:
-


Thanks Nicholas; I have reverted HADOOP-9241 from trunk and branch-2. I will 
attach a proper patch now.

 DU refresh interval is not configurable
 ---

 Key: HADOOP-9241
 URL: https://issues.apache.org/jira/browse/HADOOP-9241
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 2.0.3-alpha

 Attachments: HADOOP-9241.patch


 While the {{DF}} class's refresh interval is configurable, the {{DU}}'s 
 isn't. We should ensure both be configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9257) HADOOP-9241 changed DN's default DU interval to 1m instead of 10m accidentally

2013-01-28 Thread Harsh J (JIRA)
Harsh J created HADOOP-9257:
---

 Summary: HADOOP-9241 changed DN's default DU interval to 1m 
instead of 10m accidentally
 Key: HADOOP-9257
 URL: https://issues.apache.org/jira/browse/HADOOP-9257
 Project: Hadoop Common
  Issue Type: Bug
  Components: util
Affects Versions: 2.0.3-alpha
Reporter: Harsh J
Assignee: Harsh J


Suresh caught this on HADOOP-9241:

{quote}
Even for trivial jiras, I suggest getting the code review done before 
committing the code. Such changes are easy and quick to review.
In this patch, did DU interval become 1 minute instead of 10 minutes?
{code}
-this(path, 60L);
-//10 minutes default refresh interval
+this(path, conf.getLong(CommonConfigurationKeys.FS_DU_INTERVAL_KEY,
+CommonConfigurationKeys.FS_DU_INTERVAL_DEFAULT));


+  /** See a href={@docRoot}/../core-default.htmlcore-default.xml/a */
+  public static final String  FS_DU_INTERVAL_KEY = fs.du.interval;
+  /** Default value for FS_DU_INTERVAL_KEY */
+  public static final longFS_DU_INTERVAL_DEFAULT = 6;
{code}
{quote}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9241) DU refresh interval is not configurable

2013-01-24 Thread Harsh J (JIRA)
Harsh J created HADOOP-9241:
---

 Summary: DU refresh interval is not configurable
 Key: HADOOP-9241
 URL: https://issues.apache.org/jira/browse/HADOOP-9241
 Project: Hadoop Common
  Issue Type: Improvement
Affects Versions: 2.0.2-alpha
Reporter: Harsh J
Priority: Trivial


While the {{DF}} class's refresh interval is configurable, the {{DU}}'s isn't. 
We should ensure both be configurable.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9243) Some improvements to the mailing lists webpage for lowering unrelated content rate

2013-01-24 Thread Harsh J (JIRA)
Harsh J created HADOOP-9243:
---

 Summary: Some improvements to the mailing lists webpage for 
lowering unrelated content rate
 Key: HADOOP-9243
 URL: https://issues.apache.org/jira/browse/HADOOP-9243
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Harsh J
Priority: Minor


From Steve on HADOOP-9329:

{quote}
* could you add a bit of text to say user@ is not the place to discuss 
installation problems related to any third party products that install some 
variant of Hadoop on people's desktops and servers. You're the one who ends up 
having to bounce off all the CDH-related queries -it would help you too.
* For the new Invalid JIRA link to paste into JIRA issues about this, I point 
to the distributions and Commercial support page on the wiki -something similar 
on the mailing lists page would avoid having to put any specific vendor links 
into the mailing lists page, and support a higher/more open update process. See 
http://wiki.apache.org/hadoop/InvalidJiraIssues
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9239) Move the general@ description to the end of lists in the mailing lists web page

2013-01-23 Thread Harsh J (JIRA)
Harsh J created HADOOP-9239:
---

 Summary: Move the general@ description to the end of lists in the 
mailing lists web page
 Key: HADOOP-9239
 URL: https://issues.apache.org/jira/browse/HADOOP-9239
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Harsh J
Priority: Minor


We have users unnecessarily subscribing to and abusing the general@ list mainly 
cause of its presence as the first option in the page 
http://hadoop.apache.org/mailing_lists.html, and secondarily cause of its name.

This is to at least address the first one that is causing growing pain to its 
subscribers. Lets move it to the bottom of the presented list of lists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Hadoop datajoin package

2013-01-15 Thread Harsh J
Ah, my bad. The two appear to be different things. I am not aware of any
work being done for that datajoin tool package; also not sure if its really
used out there.


On Tue, Jan 15, 2013 at 4:51 PM, Hemanth Yamijala yhema...@gmail.comwrote:

 Thanks, Harsh.

 Where does this fit in then ?


 http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-tools/hadoop-datajoin/src/main/java/org/apache/hadoop/contrib/utils/join/

 Is it to be deprecated and removed ?

 Thanks
 Hemanth


 On Mon, Jan 14, 2013 at 8:08 PM, Harsh J ha...@cloudera.com wrote:

  Already done and available in trunk and 2.x releases today:
 
 
 http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapreduce/lib/join/
 
 
  On Mon, Jan 14, 2013 at 7:44 PM, Hemanth Yamijala yhema...@gmail.com
  wrote:
 
   On the user list, there was a question about the Hadoop datajoin
 package.
   Specifically, its dependency on the old API.
  
   Is this package still in use ? Should we file a JIRA to migrate it to
 the
   new API ?
  
   Thanks
   hemanth
  
 
 
 
  --
  Harsh J
 




-- 
Harsh J


[jira] [Resolved] (HADOOP-8274) In pseudo or cluster model under Cygwin, tasktracker can not create a new job because of symlink problem.

2013-01-14 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8274.
-

Resolution: Won't Fix

For Windows, since the mainstream branch does not support it actively, am 
closing this as a Won't Fix.

I'm certain the same issue does not happen on the branch-1-win 1.x branch (or 
the branch-trunk-win branch), and I urge you to use that instead if you wish to 
continue using Windows for development or other usage. Find the 
Windows-optimized sources at 
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-1-win/ or 
http://svn.apache.org/repos/asf/hadoop/common/branches/branch-trunk-win/.

 In pseudo or cluster model under Cygwin, tasktracker can not create a new job 
 because of symlink problem.
 -

 Key: HADOOP-8274
 URL: https://issues.apache.org/jira/browse/HADOOP-8274
 Project: Hadoop Common
  Issue Type: Bug
Affects Versions: 0.20.205.0, 1.0.0, 1.0.1, 0.22.0
 Environment: windows7+cygwin 1.7.11-1+jdk1.6.0_31+hadoop 1.0.0
Reporter: tim.wu

 The standalone model is ok. But, in pseudo or cluster model, it always throw 
 errors, even I just run wordcount example.
 The HDFS works fine, but tasktracker can not create threads(jvm) for new job. 
  It is empty under /logs/userlogs/job-/attempt-/.
 The reason looks like that in windows, Java can not recognize a symlink of 
 folder as a folder. 
 The detail description is as following,
 ==
 First, the error log of tasktracker is like:
 ==
 12/03/28 14:35:13 INFO mapred.JvmManager: In JvmRunner constructed JVM ID: 
 jvm_201203280212_0005_m_-1386636958
 12/03/28 14:35:13 INFO mapred.JvmManager: JVM Runner 
 jvm_201203280212_0005_m_-1386636958 spawned.
 12/03/28 14:35:17 INFO mapred.JvmManager: JVM Not killed 
 jvm_201203280212_0005_m_-1386636958 but just removed
 12/03/28 14:35:17 INFO mapred.JvmManager: JVM : 
 jvm_201203280212_0005_m_-1386636958 exited with exit code -1. Number of tasks 
 it ran: 0
 12/03/28 14:35:17 WARN mapred.TaskRunner: 
 attempt_201203280212_0005_m_02_0 : Child Error
 java.io.IOException: Task process exit with nonzero status of -1.
 at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
 12/03/28 14:35:21 INFO mapred.TaskTracker: addFreeSlot : current free slots : 
 2
 12/03/28 14:35:24 INFO mapred.TaskTracker: LaunchTaskAction (registerTask): 
 attempt_201203280212_0005_m_02_1 task's state:UNASSIGNED
 12/03/28 14:35:24 INFO mapred.TaskTracker: Trying to launch : 
 attempt_201203280212_0005_m_02_1 which needs 1 slots
 12/03/28 14:35:24 INFO mapred.TaskTracker: In TaskLauncher, current free 
 slots : 2 and trying to launch attempt_201203280212_0005_m_02_1 which 
 needs 1 slots
 12/03/28 14:35:24 WARN mapred.TaskLog: Failed to retrieve stdout log for 
 task: attempt_201203280212_0005_m_02_0
 java.io.FileNotFoundException: 
 D:\cygwin\home\timwu\hadoop-1.0.0\logs\userlogs\job_201203280212_0005\attempt_201203280212_0005_m_02_0\log.index
  (The system cannot find the path specified)
 at java.io.FileInputStream.open(Native Method)
 at java.io.FileInputStream.init(FileInputStream.java:120)
 at 
 org.apache.hadoop.io.SecureIOUtils.openForRead(SecureIOUtils.java:102)
 at 
 org.apache.hadoop.mapred.TaskLog.getAllLogsFileDetails(TaskLog.java:188)
 at org.apache.hadoop.mapred.TaskLog$Reader.init(TaskLog.java:423)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.printTaskLog(TaskLogServlet.java:81)
 at 
 org.apache.hadoop.mapred.TaskLogServlet.doGet(TaskLogServlet.java:296)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
 at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
 at 
 org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
 at 
 org.apache.hadoop.http.HttpServer$QuotingInputFilter.doFilter(HttpServer.java:835)
 at 
 org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
 at 
 org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
 at 
 org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
 at 
 org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
 at 
 org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
 at 
 org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
 at 
 org.mortbay.jetty.handler.ContextHandlerCollection.handle

[jira] [Resolved] (HADOOP-7386) Support concatenated bzip2 files

2012-12-10 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-7386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-7386.
-

Resolution: Duplicate

Thanks for confirming! Resolving as dupe.

 Support concatenated bzip2 files
 

 Key: HADOOP-7386
 URL: https://issues.apache.org/jira/browse/HADOOP-7386
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Allen Wittenauer
Assignee: Karthik Kambatla

 HADOOP-6835 added the framework and direct support for concatenated gzip 
 files.  We should do the same for bzip files.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: SPEC files?

2012-12-04 Thread Harsh J
The right branch is branch-0.3 for Bigtop. You can get more
information upstream at Apache Bigtop itself
(http://bigtop.apache.org).

Branch 0.3 of the same URL Steve posted:
https://github.com/apache/bigtop/tree/branch-0.3/bigtop-packages/src/rpm/hadoop

On Tue, Dec 4, 2012 at 11:17 PM, Steve Loughran ste...@hortonworks.com wrote:
 The RPMs are being built with bigtop;

 grab it from here
 https://github.com/apache/bigtop/tree/master/bigtop-packages/src/rpm/hadoop

 I'm not sure which branch to use for hadoop-1.1.1; let me check that

 On 4 December 2012 17:24, Michael Johnson m...@michaelpjohnson.com wrote:

 Hello All,

 I've browsed the common-dev list (the last six months of it anyway) and
 haven't seen this request. So here it goes: Does anyone have an SRPM/SPEC
 file for building the Hadoop 1.1.1 binaries? I found an old 0.20.0 SPEC on
 the internet, and before I attempt to create one I thought I'd ask here.
 Any help would be greatly appreciated.

 Sincerely,
 Michael Johnson
 m...@michaelpjohnson.com




-- 
Harsh J


Re: Do we support contatenated/splittable bzip2 files in branch-1?

2012-12-03 Thread Harsh J
Thanks Yu, will appreciate if you can post your observances over
https://issues.apache.org/jira/browse/HADOOP-7386.

On Mon, Dec 3, 2012 at 9:22 PM, Yu Li car...@gmail.com wrote:
 Hi Harsh,

 Thanks a lot for the information!

 My fault not looking into HADOOP-4012 carefully, will try and veriry
 whether HADOOP-7823 has resolved the issue on both write and read side, and
 report back.

 On 3 December 2012 19:42, Harsh J ha...@cloudera.com wrote:

 Hi Yu Li,

 The JIRA HADOOP-7823 backported support for splitting Bzip2 files plus
 MR support for it, into branch-1, and it is already available in the
 1.1.x releases out currently.

 Concatenated Bzip2 files, i.e., HADOOP-7386, is not implemented yet
 (AFAIK), but Chris over HADOOP-6335 suggests that HADOOP-4012 may have
 fixed it - so can you try and report back?

 On Mon, Dec 3, 2012 at 3:19 PM, Yu Li car...@gmail.com wrote:
  Dear all,
 
  About splitting support for bzip2, I checked on the JIRA list and found
  HADOOP-7386 marked as Won't fix; I also found some work done in
  branch-0.21(also in trunk), say HADOOP-4012 and MAPREDUCE-830, but not
  integrated/migrated into branch-1, so I guess we don't support
 contatenated
  bzip2 in branch-1, correct? If so, is there any special reason? Many
 thanks!
 
  --
  Best Regards,
  Li Yu



 --
 Harsh J




 --
 Best Regards,
 Li Yu



-- 
Harsh J


Re: Hadoop in ubuntu 12.04

2012-12-02 Thread Harsh J
Hi,

Please use u...@hadoop.apache.org for usage oriented questions. The
common-dev list is for developers and contributors working on the
Hadoop project.

It is certainly possible to run Hadoop for development purposes on
almost any Linux distribution out there. You can follow Michael Noll's
single node installation guide on your Ubuntu, viewable at
http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

On Sun, Dec 2, 2012 at 7:31 PM, Twaha Daudi udde...@gmail.com wrote:
 Dear all,
 I would like to test hadoop in ubuntu 12.04 laptop. with hdd 500GB and RAM
 4.Is it possible to run it on laptop?.
 sorry for the silly question.
 thank you
 cheers
 huu



-- 
Harsh J


[jira] [Resolved] (HADOOP-8301) Common (hadoop-tools) side of MAPREDUCE-4172

2012-12-01 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-8301?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-8301.
-

Resolution: Won't Fix

Patches were too broad and have gone stale. Will address these forms of issue 
over separate, smaller and more divided JIRAs in future.

Closing out parent JIRA MAPREDUCE-4172, and hence closing out this.

 Common (hadoop-tools) side of MAPREDUCE-4172
 

 Key: HADOOP-8301
 URL: https://issues.apache.org/jira/browse/HADOOP-8301
 Project: Hadoop Common
  Issue Type: Task
  Components: build
Affects Versions: 3.0.0
Reporter: Harsh J
Assignee: Harsh J

 Patches on MAPREDUCE-4172 (for MR-relevant projects) that requires to run off 
 of Hadoop Common project for Hadoop QA.
 One sub-task per hadoop-tools submodule will be added here for reviews.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: Mailing list admin?

2012-11-28 Thread Harsh J
Is there no one amongst us who has list administrative rights? There's
quite some problems affecting all users of the lists that needs to be
addressed. Can I be granted rights (or at least be told how to get
them), to do the moderation/administration work myself?

On Wed, Oct 24, 2012 at 10:24 AM, Harsh J ha...@cloudera.com wrote:
 Ping?

 On Thu, Oct 18, 2012 at 5:25 PM, Harsh J ha...@cloudera.com wrote:
 Hey project devs,

 Can someone let me know who the MLs admin is? INFRA suggested that
 instead of going to them, I could reach out to the admin group local
 to the project itself (I didn't know we had admins locally).

 P.s. Happy to volunteer to administrate as well.

 Please ping me directly,
 Thanks,
 Harsh J



 --
 Harsh J



-- 
Harsh J


Re: Mailing list admin?

2012-11-28 Thread Harsh J
Thanks Doug, I'll do this. Sorry I didn't investigate enough to find
out about the owner lists!

On Wed, Nov 28, 2012 at 10:30 PM, Doug Cutting cutt...@apache.org wrote:
 The moderators of this list can be contacted at common-dev-owner at
 hadoop.apache.org.

 If you'd like to become a moderator, please send a message to apmail
 at apache.org asking to become a moderator.  CC private at
 hadoop.apache.org to keep the PMC in the loop.

 http://www.apache.org/dev/committers.html#mailing-list-moderators

 Doug

 On Wed, Nov 28, 2012 at 4:23 AM, Harsh J ha...@cloudera.com wrote:
 Is there no one amongst us who has list administrative rights? There's
 quite some problems affecting all users of the lists that needs to be
 addressed. Can I be granted rights (or at least be told how to get
 them), to do the moderation/administration work myself?

 On Wed, Oct 24, 2012 at 10:24 AM, Harsh J ha...@cloudera.com wrote:
 Ping?

 On Thu, Oct 18, 2012 at 5:25 PM, Harsh J ha...@cloudera.com wrote:
 Hey project devs,

 Can someone let me know who the MLs admin is? INFRA suggested that
 instead of going to them, I could reach out to the admin group local
 to the project itself (I didn't know we had admins locally).

 P.s. Happy to volunteer to administrate as well.

 Please ping me directly,
 Thanks,
 Harsh J



 --
 Harsh J



 --
 Harsh J



-- 
Harsh J


Re: Anybody know how to configure SSH for eclipse plugin

2012-11-27 Thread Harsh J
Hi,

The Eclipse plugin communicates directly with the RPC servers of NN, DNs,
and JT. It does not, in any way, use SSH to do this.

If your machine can't setup a fully functional connection to the cluster,
then the Eclipse plugin will currently not work and you have to fall back
into doing the manual compile-jar - scp - invoke cycle, or rely on a
local MR cluster (pseudo-distributed) or the local job runner (standalone)
to test your programs.

There is a proposal for splitting out the Eclipse plugin code to its own
project, at http://wiki.apache.org/incubator/HadoopDevelopmentToolsProposal,
and adding further enhancements to it. Perhaps when this project is setup,
you can request for such a feature where the plugin may itself automate the
jar production plus scp and invoke it at the other end - for such scenarios.


On Wed, Nov 28, 2012 at 1:25 AM, yiyu jia jia.y...@gmail.com wrote:

 Hi Glen,

 Thanks a lot for response! I tried last night by setup ssh tunnel for
 certain ports (like 9000 and 9001). Still not successful. My SSH
 communication among servers are set to be passwordless with certification
 file.

 I suspect eclipse plugin does not support SSH with password or
 certification files. Anybody can confirm if my suspection is right or
 wrong?

 thanks a lot!

 Yiyu

 On Tue, Nov 27, 2012 at 9:44 AM, Glen Mazza gma...@talend.com wrote:

  Unsure, perhaps better to ask on the u...@hadoop.apache.org list.
 
  Glen
 
 
  On 11/26/2012 07:17 PM, yiyu jia wrote:
 
  Hi,
 
  Anybody tell me how to configure SSH for eclipse plugin? I guess eclipse
  plugin use SSH to connect with Map/Reduce locations. But, I found that
 it
  always use my local machine' s account name to connect with hadoop host
  servers.
 
  thanks and regards,
 
  Yiyu
 
 
 
 
 
  --
  Glen Mazza
  Talend Community Coders - coders.talend.com
  blog: www.jroller.com/gmazza
 
 


 --
 **
 * Mr. Jia Yiyu*
 *   *
 * Email: jia.y...@gmail.com  *
 *   *
 * Web: http://yiyujia.blogspot.com/*
 ***




-- 
Harsh J


Re: Anybody know how to configure SSH for eclipse plugin

2012-11-27 Thread Harsh J
Interesting, so this used to be present but isn't anymore? In any case, the
new project and the members behind it are your best bet for further
enhancements.


On Wed, Nov 28, 2012 at 2:09 AM, yiyu jia jia.y...@gmail.com wrote:

 hi Harsh,

 thank you for helps! I think I was blur yesterday. But, it is partly
 misleaded by the comments in  HadoopServer.java file


  * p
  * This class does not create any SSH connection anymore. Tunneling must be
  * setup outside of Eclipse for now (using Putty or ttssh -Dlt;portgt;
  * lt;hostgt;/tt)
  *

 thanks again!

 yiyu


 On Tue, Nov 27, 2012 at 3:16 PM, Harsh J ha...@cloudera.com wrote:

  Hi,
 
  The Eclipse plugin communicates directly with the RPC servers of NN, DNs,
  and JT. It does not, in any way, use SSH to do this.
 
  If your machine can't setup a fully functional connection to the cluster,
  then the Eclipse plugin will currently not work and you have to fall back
  into doing the manual compile-jar - scp - invoke cycle, or rely on a
  local MR cluster (pseudo-distributed) or the local job runner
 (standalone)
  to test your programs.
 
  There is a proposal for splitting out the Eclipse plugin code to its own
  project, at
  http://wiki.apache.org/incubator/HadoopDevelopmentToolsProposal,
  and adding further enhancements to it. Perhaps when this project is
 setup,
  you can request for such a feature where the plugin may itself automate
 the
  jar production plus scp and invoke it at the other end - for such
  scenarios.
 
 
  On Wed, Nov 28, 2012 at 1:25 AM, yiyu jia jia.y...@gmail.com wrote:
 
   Hi Glen,
  
   Thanks a lot for response! I tried last night by setup ssh tunnel for
   certain ports (like 9000 and 9001). Still not successful. My SSH
   communication among servers are set to be passwordless with
 certification
   file.
  
   I suspect eclipse plugin does not support SSH with password or
   certification files. Anybody can confirm if my suspection is right or
   wrong?
  
   thanks a lot!
  
   Yiyu
  
   On Tue, Nov 27, 2012 at 9:44 AM, Glen Mazza gma...@talend.com wrote:
  
Unsure, perhaps better to ask on the u...@hadoop.apache.org list.
   
Glen
   
   
On 11/26/2012 07:17 PM, yiyu jia wrote:
   
Hi,
   
Anybody tell me how to configure SSH for eclipse plugin? I guess
  eclipse
plugin use SSH to connect with Map/Reduce locations. But, I found
 that
   it
always use my local machine' s account name to connect with hadoop
  host
servers.
   
thanks and regards,
   
Yiyu
   
   
   
   
   
--
Glen Mazza
Talend Community Coders - coders.talend.com
blog: www.jroller.com/gmazza
   
   
  
  
   --
   **
   * Mr. Jia Yiyu*
   *   *
   * Email: jia.y...@gmail.com  *
   *   *
   * Web: http://yiyujia.blogspot.com/*
   ***
  
 
 
 
  --
  Harsh J
 



 --
 **
 * Mr. Jia Yiyu*
 *   *
 * Email: jia.y...@gmail.com  *
 *   *
 * Web: http://yiyujia.blogspot.com/*
 ***




-- 
Harsh J


[jira] [Resolved] (HADOOP-9091) Allow daemon startup when at least 1 (or configurable) disk is in an OK state.

2012-11-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9091.
-

Resolution: Fixed

This feature is already available in all our current releases via the DN volume 
failure toleration properties. Please see 
https://issues.apache.org/jira/browse/HDFS-1592.

Resolving as not a problem. Please update to an inclusive release to have this 
addressed in your environment.

 Allow daemon startup when at least 1 (or configurable) disk is in an OK state.
 --

 Key: HADOOP-9091
 URL: https://issues.apache.org/jira/browse/HADOOP-9091
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 0.20.2
Reporter: Jelle Smet
  Labels: features, hadoop

 The given example is if datanode disk definitions but should be applicable to 
 all configuration where a list of disks are provided.
 I have defined multiple local disks defined for a datanode:
 property
 namedfs.data.dir/name
 value/data/01/dfs/dn,/data/02/dfs/dn,/data/03/dfs/dn,/data/04/dfs/dn,/data/05/dfs/dn,/data/06/dfs/dn/value
 finaltrue/final
 /property
 When one of those disks breaks and is unmounted then the mountpoint (such as 
 /data/03 in this example) becomes a regular directory which doesn't have the 
 valid permissions and possible directory structure Hadoop is expecting.
 When this situation happens, the datanode fails to restart because of this 
 while actually we have enough disks in an OK state to proceed.  The only way 
 around this is to alter the configuration and omit that specific disk 
 configuration.
 To my opinion, It would be more practical to let Hadoop daemons start when at 
 least 1 disks/partition in the provided list is in a usable state.  This 
 prevents having to roll out custom configurations for systems which have 
 temporarily a disk (and therefor directory layout) missing.  This might also 
 be configurable that at least X partitions out of he available ones are in OK 
 state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HADOOP-9066) Sorting for FileStatus[]

2012-11-26 Thread Harsh J (JIRA)

 [ 
https://issues.apache.org/jira/browse/HADOOP-9066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Harsh J resolved HADOOP-9066.
-

Resolution: Invalid

Since HADOOP-8934 is already adding FileStatus data based sorting in a place 
that matters, and this JIRA seems to just add a simple example of utilizing 
FileStatus comparatives, am resolving this as Invalid at the moment, as the 
example isn't too much of a value (given that the Javadoc already is clear for 
FileStatus, and there's no use-case for this stuff in MR, etc.) so far.

 Sorting for FileStatus[]
 

 Key: HADOOP-9066
 URL: https://issues.apache.org/jira/browse/HADOOP-9066
 Project: Hadoop Common
  Issue Type: Improvement
 Environment: java7 , RedHat9 , Hadoop 0.20.2 
 ,eclipse-jee-juno-linux-gtk.tar.gz
Reporter: david king
  Labels: patch
 Attachments: ConcreteFileStatusAscComparable.java, 
 ConcreteFileStatusDescComparable.java, FileStatusComparable.java, 
 FileStatusTool.java, TestFileStatusTool.java


   I will submit a batch of FileStatusTool that used to sort FileStatus by the 
 Comparator, the Comparator not only customer to realizate , but alse use the 
 example code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HADOOP-9068) Reuse (and not duplicate) globbing logic between FileSystem and FileContext

2012-11-20 Thread Harsh J (JIRA)
Harsh J created HADOOP-9068:
---

 Summary: Reuse (and not duplicate) globbing logic between 
FileSystem and FileContext
 Key: HADOOP-9068
 URL: https://issues.apache.org/jira/browse/HADOOP-9068
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 2.0.0-alpha
Reporter: Harsh J


FileSystem's globbing code is currently duplicated in FileContext.Util class. 
We should reuse the implementation rather than maintain two pieces of it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


Re: which part of Hadoop is responsible of distributing the input file fragments to datanodes?

2012-11-11 Thread Harsh J
Assuming you speak of the HDFS file-writing code, look at DFSClient
and its utilization of DFSOutputStream (see the write(…) areas).

On Sun, Nov 11, 2012 at 4:36 PM, salmakhalil salma_7...@hotmail.com wrote:


 Hi,

 I am trying to find the part of Hadoop that is responsible of distributing
 the input file fragments to the datanodes. I need to understand the source
 code that is responsible of distributing the input files.

 Can anyone help me in detecting this part of code. I tried to read the
 namenode.java file but I could not find anything that can help me.

 Thanks in advance,
 Salam



 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/which-part-of-Hadoop-is-responsible-of-distributing-the-input-file-fragments-to-datanodes-tp4019530.html
 Sent from the Hadoop lucene-dev mailing list archive at Nabble.com.



-- 
Harsh J


  1   2   3   4   >