Re: [VOTE] Merge feature branch YARN-5355 (Timeline Service v2) to trunk

2017-08-30 Thread Colin McCabe
The "git" way of doing things would be to rebase the feature branch on master (trunk) and then commit the patch stack. Squashing the entire feature into a 10 MB megapatch is the "svn" way of doing things. The svn workflow evolved because merging feature branches back to trunk was really painful i

Re: [DISCUSS] Branches and versions for Hadoop 3

2017-08-28 Thread Colin McCabe
On Mon, Aug 28, 2017, at 14:22, Allen Wittenauer wrote: > > > On Aug 28, 2017, at 12:41 PM, Jason Lowe wrote: > > > > I think this gets back to the "if it's worth committing" part. > > This brings us back to my original question: > > "Doesn't this place an undue burden on the contr

Re: inotify

2016-07-05 Thread Colin McCabe
I think it makes sense to have an AddBlockEvent. It seems like we could provide something like the block ID, block pool ID, and genstamp, as well as the inode ID and path of the file which the block was added to. Clearly, we cannot provide the length, since we don't know how many bytes the client

Re: HDFS Block compression

2016-07-05 Thread Colin McCabe
We have discussed this in the past. I think the single biggest issue is that HDFS doesn't understand the schema of the data which is stored in it. So it may not be aware of what compression scheme would be most appropriate for the application and data. While it is true that HDFS doens't allow ra

Re: [DISCUSS] Increased use of feature branches

2016-06-13 Thread Colin McCabe
> I did a google search, but was not able to find a thread like that. > Thanks in advance. Hmm, perhaps I was thinking of the release vote process. Can anyone confirm? It would be nice if this information could appear on the bylaws page... best, Colin > > Thanks > Anu > >

Re: [DISCUSS] Increased use of feature branches

2016-06-13 Thread Colin McCabe
On Sun, Jun 12, 2016, at 05:06, Steve Loughran wrote: > > On 10 Jun 2016, at 20:37, Anu Engineer wrote: > > > > I actively work on two branches (Diskbalancer and ozone) and I agree with > > most of what Sangjin said. > > There is an overhead in working with branches, there are both technical >

Re: Compile proto

2016-05-10 Thread Colin McCabe
Hi Kun Ren, You have to add your new proto file to the relevant pom.xml file. best, Colin On Fri, May 6, 2016, at 13:04, Kun Ren wrote: > Hi Genius, > > I added a new proto into the > HADOOP_DIR/hadoop-common-project/hadoop-common/src/main/proto, > > however,every time when I run the following

Re: Another thought on client-side support of HDFS federation

2016-05-02 Thread Colin McCabe
Hi Tianyi HE, Thanks for sharing this! This reminds me of the httpfs daemon. This daemon basically sits in front of an HDFS cluster and accepts requests, which it serves by forwarding them to the underlying HDFS instance. There is some documentation about it here: https://hadoop.apache.org/docs

Re: 2.7.3 release plan

2016-04-04 Thread Colin McCabe
I agree that HDFS-8578 should be a prerequisite for backporting HDFS-8791. I think we are overestimating the number of people affected by HDFS-8791, and underestimating the disruption that would be caused by a layout version upgrade in a dot release. As Andrew, Sean, and others in the thread poin

Re: Revive HADOOP-2705?

2015-12-18 Thread Colin McCabe
s_quickly > > One of the conclusions: > > "Minimize I/O operations by reading an array at a time, not a byte at > a time. An 8Kbyte array is a good size." > > > On Tue, Dec 15, 2015 at 3:41 PM, Colin McCabe wrote: >> Hi David, >> >> Do you have benchmar

Re: Revive HADOOP-2705?

2015-12-15 Thread Colin McCabe
Hi David, Do you have benchmarks to justify changing this configuration? best, Colin On Wed, Dec 9, 2015 at 8:05 AM, dam6923 . wrote: > Hello! > > A while back, Java 1.6, the size of the internal internal file-reading > buffers were bumped-up to 8192 bytes. > > http://grepcode.com/file/reposito

Re: DISCUSS: is the order in FS.listStatus() required to be sorted?

2015-06-16 Thread Colin McCabe
On Tue, Jun 16, 2015 at 3:02 AM, Steve Loughran wrote: > >> On 15 Jun 2015, at 21:22, Colin P. McCabe wrote: >> >> One possibility is that we could randomize the order of returned >> results in HDFS (at least within a given batch of results returned >> from the NN). This is similar to how the Go

Re: HDFS audit log

2015-05-05 Thread Colin McCabe
I think HDFS INotify is a better choice if you need: * guaranteed backwards compatibility * rapid and unambiguous parsing (via protobuf) * clear Java API for retrieving the data (I.e. not rsync on a text file) * ability to resume reading at a given point if the consumer process fails We are using

Re: fsck output compatibility question with regard to HDFS-7281

2015-05-05 Thread Colin McCabe
How about just having a --json option for the fsck command? That's what we did in Ceph for some command line tools. It would make the output easier to consume and easier to provide compatibility for. Colin On Apr 28, 2015 12:32 PM, "Allen Wittenauer" wrote: > > A lot of the summary information

Re: upstream jenkins build broken?

2015-03-11 Thread Colin McCabe
> Hortonworks > http://hortonworks.com/ > > > > > > > On 3/11/15, 2:10 PM, "Colin McCabe" wrote: > >>Is there a maven plugin or setting we can use to simply remove >>directories that have no executable permissions on them? Clearly we >>have the

Re: upstream jenkins build broken?

2015-03-11 Thread Colin McCabe
Is there a maven plugin or setting we can use to simply remove directories that have no executable permissions on them? Clearly we have the permission to do this from a technical point of view (since we created the directories as the jenkins user), it's simply that the code refuses to do it. Othe

Re: 2.7 status

2015-02-17 Thread Colin McCabe
+1 for starting thinking about releasing 2.7 soon. Re: building Windows binaries. Do we release binaries for all the Linux and UNIX architectures? I thought we didn't. It seems a little inconsistent to release binaries just for Windows, but not for those other architectures and OSes. I wonder

Re: max concurrent connection to HDFS name node

2015-02-12 Thread Colin McCabe
The NN can do somewhere around 30,000 - 50,000 RPCs per second currently, depending on configuration. In general you do not want to have extremely high NN RPC traffic, because it will slow things down. You might consider re-architecting your application to do more DN traffic and less NN traffic, i

Re: NFSv3 Filesystem Connector

2015-01-14 Thread Colin McCabe
Why not just use LocalFileSystem with an NFS mount (or several)? I read through the README but I didn't see that question answered anywhere. best, Colin On Tue, Jan 13, 2015 at 1:35 PM, Gokul Soundararajan wrote: > Hi, > > We (Jingxin Feng, Xing Lin, and I) have been working on providing a > F

Re: Symbolic links disablement

2014-12-31 Thread Colin McCabe
As far as I know, nobody is working on this at the moment. There are a lot of issues that would need to be worked through before we could enable symlinks in production. We never quite agreed on the semantics of how symlinks should work... for example, some people advocated that listing a director

Re: Switching to Java 7

2014-12-08 Thread Colin McCabe
On Mon, Dec 8, 2014 at 7:46 AM, Steve Loughran wrote: > On 8 December 2014 at 14:58, Ted Yu wrote: > >> Looks like there was still OutOfMemoryError : >> >> >> https://builds.apache.org/job/Hadoop-Hdfs-trunk/1964/testReport/junit/org.apache.hadoop.hdfs.server.namenode.snapshot/TestRenameWithSnapsh

Re: Thinking ahead to hadoop-2.7

2014-12-08 Thread Colin McCabe
On Fri, Dec 5, 2014 at 11:15 AM, Karthik Kambatla wrote: > It would be nice to cut the branch for the next "feature" release (not just > Java 7) in the first week of January, so we can get the RC out by the end > of the month? > > Yesterday, this came up in an offline discussion on ATS. Given peop

Re: Why do reads take as long as replicated writes?

2014-11-10 Thread Colin McCabe
I strongly suggest benchmarking a modern version of Hadoop rather than Hadoop 1.x. The native CRC stuff from HDFS-3528 greatly reduces CPU consumption on the read path. I wrote about some other read path optimizations in Hadoop 2.x here: http://www.club.cc.cmu.edu/~cmccabe/d/2014.04_ApacheCon_HDF

Re: Guava

2014-11-10 Thread Colin McCabe
I'm usually an advocate for getting rid of unnecessary dependencies (cough, jetty, cough), but a lot of the things in Guava are really useful. Immutable collections, BiMap, Multisets, Arrays#asList, the stuff for writing hashCode() and equals(), String#Joiner, the list goes on. We particularly us

Re: builds failing on H9 with "cannot access java.lang.Runnable"

2014-10-03 Thread Colin McCabe
>> all the slaves are getting re-booted give it some more time >> >> -giri >> >> On Fri, Oct 3, 2014 at 1:13 PM, Ted Yu wrote: >> >>> Adding builds@ >>> >>> On Fri, Oct 3, 2014 at 1:07 PM, Colin McCabe >>> wrote: >>&

builds failing on H9 with "cannot access java.lang.Runnable"

2014-10-03 Thread Colin McCabe
It looks like builds are failing on the H9 host with "cannot access java.lang.Runnable" Example from https://builds.apache.org/job/PreCommit-HDFS-Build/8313/artifact/patchprocess/trunkJavacWarnings.txt : [INFO] [INFO] BUILD

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-24 Thread Colin McCabe
e absolutely need to address it before the merge to 2.6. We are starting to see a lot of users of HDFS-4949, and I want to make sure that there is a reasonable story for using both features at the same time. Let's continue this discussion on HDFS-6919 and HDFS-6988 and see if we can come up w

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-24 Thread Colin McCabe
een better to float the idea of a merge on the JIRA before actually calling it, to avoid having discussions like this where we are racing the clock. thanks, Colin > > On Tue, Sep 23, 2014 at 6:09 PM, Colin McCabe > wrote: > >> This seems like a really aggressive timeframe fo

Re: [VOTE] Merge HDFS-6581 to trunk - Writing to replicas in memory.

2014-09-23 Thread Colin McCabe
This seems like a really aggressive timeframe for a merge. We still haven't implemented: * Checksum skipping on read and write from lazy persisted replicas. * Allowing mmaped reads from the lazy persisted data. * Any eviction strategy other than LRU. * Integration with cache pool limits (how do H

Re: [DISCUSS] Allow continue reading from being-written file using same stream

2014-09-19 Thread Colin McCabe
On Fri, Sep 19, 2014 at 9:41 AM, Vinayakumar B wrote: > Thanks Colin for the detailed explanation. > > On Fri, Sep 19, 2014 at 9:38 PM, Colin McCabe > wrote: >> >> On Thu, Sep 18, 2014 at 11:06 AM, Vinayakumar B > wrote: >> > bq. I don't know about the

Re: [DISCUSS] Allow continue reading from being-written file using same stream

2014-09-19 Thread Colin McCabe
On Thu, Sep 18, 2014 at 11:06 AM, Vinayakumar B wrote: > bq. I don't know about the merits of this, but I do know that native > filesystems > implement this by not raising the EOF exception on the seek() but only on > the read ... some of the non-HDFS filesystems Hadoop support work this way. Pre

Re: In hindsight... Re: Thinking ahead to hadoop-2.6

2014-09-15 Thread Colin McCabe
On Mon, Sep 15, 2014 at 10:48 AM, Allen Wittenauer wrote: > > It’s now September. With the passage of time, I have a lot of doubts > about this plan and where that trajectory takes us. > > * The list of changes that are already in branch-2 scare the crap out of any > risk adverse person

Re: Updates on migration to git

2014-08-27 Thread Colin McCabe
Thanks for making this happen, Karthik and Daniel. Great job. best, Colin On Tue, Aug 26, 2014 at 5:59 PM, Karthik Kambatla wrote: > Yes, we have requested for force-push disabled on trunk and branch-* > branches. I didn't test it though :P, it is not writable yet. > > > On Tue, Aug 26, 2014 at

Re: HDFS-6902 FileWriter should be closed in finally block in BlockReceiver#receiveBlock()

2014-08-25 Thread Colin McCabe
Let's discuss this on the JIRA. I think Tsuyoshi OZAWA's solution is good. Colin On Thu, Aug 21, 2014 at 7:08 AM, Ted Yu wrote: > bq. else there is a memory leak > > Moving call of close() would prevent the leak. > > bq. but then this code snippet could be java and can be messy > > The code is

Re: [DISCUSS] Switch to log4j 2

2014-08-18 Thread Colin McCabe
On Fri, Aug 15, 2014 at 8:50 AM, Aaron T. Myers wrote: > Not necessarily opposed to switching logging frameworks, but I believe we > can actually support async logging with today's logging system if we wanted > to, e.g. as was done for the HDFS audit logger in this JIRA: > > https://issues.apache.

Re: [VOTE] Migration from subversion to git for version control

2014-08-11 Thread Colin McCabe
+1. best, Colin On Fri, Aug 8, 2014 at 7:57 PM, Karthik Kambatla wrote: > I have put together this proposal based on recent discussion on this topic. > > Please vote on the proposal. The vote runs for 7 days. > >1. Migrate from subversion to git for version control. >2. Force-push to be

Re: Finding file size during block placement

2014-07-25 Thread Colin McCabe
On Wed, Jul 23, 2014 at 8:15 AM, Arjun wrote: > Hi, > > I want to write a block placement policy that takes the size of the file > being placed into account. Something like what is done in CoHadoop or BEEMR > paper. I have the following questions: > > Hadoop uses a stream metaphor. So at the tim

Re: [DISCUSS] Assume Private-Unstable for classes that are not annotated

2014-07-25 Thread Colin McCabe
+1. Colin On Tue, Jul 22, 2014 at 2:54 PM, Karthik Kambatla wrote: > Hi devs > > As you might have noticed, we have several classes and methods in them that > are not annotated at all. This is seldom intentional. Avoiding incompatible > changes to all these classes can be considerable baggage.

Re: [Vote] Merge The HDFS XAttrs Feature Branch (HDFS-2006) to Trunk

2014-05-20 Thread Colin McCabe
Great job, guys. +1. I don't think we need to finish libhdfs support before we merge (unless you want to). Colin On Wed, May 14, 2014 at 5:47 AM, Gangumalla, Uma wrote: > Hello HDFS Devs, > I would like to call for a vote to merge the HDFS Extended Attributes > (XAttrs) feature from the HDF

Re: In-Memory Reference FS implementations

2014-03-06 Thread Colin McCabe
NetFlix's Apache-licensed S3mper system provides consistency for an S3-backed store. http://techblog.netflix.com/2014/01/s3mper-consistency-in-cloud.html It would be nice to see this or something like it integrated with Hadoop. I fear that a lot of applications are not ready for eventual consiste

Re: [VOTE] Release Apache Hadoop 2.3.0

2014-02-11 Thread Colin McCabe
Looks good. +1, also non-binding. I downloaded the source tarball, checked md5, built, ran some unit tests, ran an HDFS cluster. cheers, Colin On Tue, Feb 11, 2014 at 6:53 PM, Andrew Wang wrote: > Thanks for putting this together Arun. > > +1 non-binding > > Downloaded source tarball > Verifie

Re: Is there a way to get a Block through block id?

2014-01-29 Thread Colin McCabe
> during fsck (NmaenodeFsck). But still, if there's any method to directly > get the Block instance by blockid through any api, it would be great to > know. :-) > > On 21 January 2014 03:46, Colin McCabe wrote: > > > In order to uniquely identify a block in hadoop 2.

Re: Is there a way to get a Block through block id?

2014-01-20 Thread Colin McCabe
In order to uniquely identify a block in hadoop 2.2, you are going to need both a block and a block pool ID. You can construct a Block object with those two items. On Wed, Jan 15, 2014 at 8:46 AM, Yu Li wrote: > Dear all, > > As titled, I actually have two questions here: > > 1. In current rele

Re: deadNodes in DFSInputStream

2013-12-31 Thread Colin McCabe
Take a look at HDFS-4273, which fixes some issues with the read retry logic. cheers, Colin On Tue, Dec 31, 2013 at 1:25 AM, lei liu wrote: > I use Hbase-0.94 and CDH-4.3.1 > When RegionServer read data from loca datanode, if local datanode is dead, > the local datanode is add to deadNodes, and R

Re: ByteBuffer-based read API for pread

2013-12-31 Thread Colin McCabe
It's true that HDFS (and Hadoop generally) doesn't currently have a ByteBuffer-based pread API. There is a JIRA open for this issue, HDFS-3246. I do not know if implementing a ByteBuffer API for pread would be as big of a performance gain as implementing it for regular read. One issue is that wh

Re: Next releases

2013-12-06 Thread Colin McCabe
If 2.4 is released in January, I think it's very unlikely to include symlinks. There is still a lot of work to be done before they're usable. You can look at the progress on HADOOP-10019. For some of the subtasks, it will require some community discussion before any code can be written. For bet

Re: Deprecate BackupNode

2013-12-05 Thread Colin McCabe
+1 Colin On Dec 4, 2013 3:07 PM, "Suresh Srinivas" wrote: > It is almost an year a jira proposed deprecating backup node - > https://issues.apache.org/jira/browse/HDFS-4114. > > Maintaining it adds unnecessary work. As an example, when I added support > for retry cache there were bunch of code p

Re: Next releases

2013-11-14 Thread Colin McCabe
On Wed, Nov 13, 2013 at 10:10 AM, Arun C Murthy wrote: > > On Nov 12, 2013, at 1:54 PM, Todd Lipcon wrote: > >> On Mon, Nov 11, 2013 at 2:57 PM, Colin McCabe wrote: >> >>> To be honest, I'm not aware of anything in 2.2.1 that shouldn't be >>> the

Re: Next releases

2013-11-11 Thread Colin McCabe
HADOOP-10020 is a JIRA that disables symlinks temporarily. They will be disabled in 2.2.1 as well, if the plan is to have only minor fixes in that branch. To be honest, I'm not aware of anything in 2.2.1 that shouldn't be there. However, I have only been following the HDFS and common side of thi

Re: HDFS single datanode cluster issues

2013-11-07 Thread Colin McCabe
First of all, HDFS isn't really the right choice for single-node environments. I would recommend using LocalFileSystem in this case. If you're evaluating HDFS and only have one computer, it will really be better to run several VMs to see how it works, rather than running just one Datanode. You ar

Re: Replacing the JSP web UIs to HTML 5 applications

2013-11-01 Thread Colin McCabe
> > > backwards in terms of unit testing. >> > > > > >> > > > > I take a look at TestNamenodeJspHelper / TestDatanodeJspHelper / >> > > > > TestClusterJspHelper. It seems to me that we can merge these tests >> > > > >

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-28 Thread Colin McCabe
With 3 +1s, the vote passes. Thanks, all. best, Colin On Fri, Oct 25, 2013 at 4:01 PM, Colin McCabe wrote: > On Fri, Oct 25, 2013 at 10:07 AM, Suresh Srinivas > wrote: >> I posted a comment in the other thread about feature branch merges. >> >> My preference is to ma

Re: libhdfs portability

2013-10-28 Thread Colin McCabe
On Mon, Oct 28, 2013 at 4:24 PM, Kyle Sletmoe wrote: > I have written a WebHDFSClient and I do not believe that reusing > connections is enough to noticeably speed up transfers in my case. I did > some tests and on average it took roughly 14 minutes to transfer a 3.6 GB > file to an HDFS on my loc

Re: Replacing the JSP web UIs to HTML 5 applications

2013-10-28 Thread Colin McCabe
This is a really interesting project, Haohui. I think it will make our web UI much nicer. I have a few concerns about removing the old web UI, however: * If we're going to remove the old web UI, I think the new web UI has to have the same level of unit testing. We shouldn't go backwards in term

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-25 Thread Colin McCabe
other recent major features like HDFS-2802 (snapshots), >> > > > HDFS-347 (short-circuit reads via sharing file descriptors), and >> > > > HADOOP-8562 (Windows compatibility). In this thread, we've diverged >> > from >> > > > that process b

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-24 Thread Colin McCabe
orks.com/ >> > >> > >> > >> > On Fri, Oct 18, 2013 at 1:37 PM, Chris Nauroth > > >wrote: >> > >> > > +1 >> > > >> > > Sounds great! >> > > >> > > Regarding testing caching+federation, thi

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-18 Thread Colin McCabe
I don't see > branch-2 mentioned, so I assume that we're not voting on merge to branch-2 > yet. > > Before I cast my vote, can you please discuss whether or not it's feasible > to complete all of the above in the next 7 days? For the issues assigned > to me, I do ex

Re: [VOTE] Merge HDFS-4949 to trunk

2013-10-17 Thread Colin McCabe
+1. Thanks, guys. best, Colin On Thu, Oct 17, 2013 at 3:01 PM, Andrew Wang wrote: > Hello all, > > I'd like to call a vote to merge the HDFS-4949 branch (in-memory caching) > to trunk. Colin McCabe and I have been hard at work the last 3.5 months > implementing this feature

Re: Build Still Unstable: CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7 - Build # 21

2013-10-16 Thread Colin McCabe
Sorry for the noise. I posted to the wrong list. best, Colin On Wed, Oct 16, 2013 at 9:13 AM, Colin McCabe wrote: > This looks pretty similar to https://jira.cloudera.com/browse/CDH-10759 > > Probably need to take a look at this test to see why it's not managing > its

Re: Build Still Unstable: CDH5beta1-Hadoop-Common-2.1.0-CDH-JDK7-tests-JDK7-run-JDK7 - Build # 21

2013-10-16 Thread Colin McCabe
This looks pretty similar to https://jira.cloudera.com/browse/CDH-10759 Probably need to take a look at this test to see why it's not managing its threads correctly. Colin On Tue, Oct 15, 2013 at 8:37 AM, Jenkins wrote: > I offer a cookie, to whoever fixes me. See >

Re: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)

2013-10-02 Thread Colin McCabe
I don't think HADOOP-9972 is a must-do for the next Apache release, whatever version number it ends up having. It's just adding a new API, not changing any existing ones, and it can be done entirely in generic code. (The globber doesn't involve FileSystem or AFS subclasses). My understanding is

Re: 2.1.2 (Was: Re: [VOTE] Release Apache Hadoop 2.1.1-beta)

2013-10-02 Thread Colin McCabe
On Tue, Oct 1, 2013 at 8:59 PM, Arun C Murthy wrote: > Yes, sorry if it wasn't clear. > > As others seem to agree, I think we'll be better getting a protocol/api > stable GA done and then iterating on bugs etc. > > I'm not super worried about HADOOP-9984 since symlinks just made it to > branch-2

Re: symlink support in Hadoop 2 GA

2013-09-19 Thread Colin McCabe
What we're trying to get to here is a consensus on whether FileSystem#listStatus and FileSystem#globStatus should return symlinks __as_symlinks__. If 2.1-beta goes out with these semantics, I think we are not going to be able to change them later. That is what will happen in the "do nothing" scen

Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Colin McCabe
The issue is not modifying existing APIs. The issue is that code has been written that makes assumptions that are incompatible with the existence of things that are not files or directories. For example, there is a lot of code out there that looks at FileStatus#isFile, and if it returns false, as

Re: symlink support in Hadoop 2 GA

2013-09-17 Thread Colin McCabe
I think it makes sense to finish symlinks support in the Hadoop 2 GA release. Colin On Mon, Sep 16, 2013 at 6:49 PM, Andrew Wang wrote: > Hi all, > > I wanted to broadcast plans for putting the FileSystem symlinks work > (HADOOP-8040) into branch-2.1 for the pending Hadoop 2 GA release. I think

Re: hdfs native build failing in trunk

2013-09-16 Thread Colin McCabe
The relevant line is: [exec] gcc: vfork: Resource temporarily unavailable Looks like the build slave was overloaded and could not create new processes? Colin On Mon, Sep 16, 2013 at 4:43 AM, Alejandro Abdelnur wrote: > It seems a commit of native code in YARN has triggered a native build in

Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-22 Thread Colin McCabe
On Wed, Aug 21, 2013 at 3:49 PM, Stack wrote: > On Wed, Aug 21, 2013 at 1:25 PM, Colin McCabe wrote: > >> St.Ack wrote: >> >> > + Once I figured where the logs were, found that JAVA_HOME was not being >> > exported (don't need this in hadoop-2.0.5 for inst

Re: [VOTE] Release Apache Hadoop 2.1.0-beta

2013-08-21 Thread Colin McCabe
St.Ack wrote: > + Once I figured where the logs were, found that JAVA_HOME was not being > exported (don't need this in hadoop-2.0.5 for instance). Adding an > exported JAVA_HOME to my running shell which don't seem right but it took > care of it (I gave up pretty quick on messing w/ > yarn.nodem

Re: Secure deletion of blocks

2013-08-20 Thread Colin McCabe
Just to clarify, ext4 has the option to turn off journalling. ext3 does not. Not sure about reiser. Colin On Tue, Aug 20, 2013 at 12:42 PM, Colin McCabe wrote: > > If I've got the right idea about this at all? > > From the man page for wipe(1); > > "Journaling

Re: Secure deletion of blocks

2013-08-20 Thread Colin McCabe
> If I've got the right idea about this at all? >From the man page for wipe(1); "Journaling filesystems (such as Ext3 or ReiserFS) are now being used by default by most Linux distributions. No secure deletion program that does filesystem-level calls can sanitize files on such filesystems, because

Re: Feature request to provide DFSInputStream subclassing mechanism

2013-08-08 Thread Colin McCabe
There is work underway to decouple the block layer and the namespace layer of HDFS from each other. Once this is done, block behaviors like the one you describe will be easy to implement. It's a use case very similar to the hierarchical storage management (HSM) use case that we've discussed befor

Re: I'm interested in working with HDFS-4680. Can somebody be a mentor?

2013-07-17 Thread Colin McCabe
think he wanted to do it incrementally. best, Colin McCabe On Wed, Jul 17, 2013 at 1:44 PM, Sreejith Ramakrishnan wrote: > Hey, > > I was originally researching options to work on ACCUMULO-1197. Basically, > it was a bid to pass trace functionality through the DFSClient. I discussed >

Re: data loss after cluster wide power loss

2013-07-08 Thread Colin McCabe
nivas wrote: > On Wed, Jul 3, 2013 at 8:12 AM, Colin McCabe wrote: > >> On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas >> wrote: >> > Dave, >> > >> > Thanks for the detailed email. Sorry I did not read all the details you >> had >> >

Re: data loss after cluster wide power loss

2013-07-03 Thread Colin McCabe
On Mon, Jul 1, 2013 at 8:48 PM, Suresh Srinivas wrote: > Dave, > > Thanks for the detailed email. Sorry I did not read all the details you had > sent earlier completely (on my phone). As you said, this is not related to > data loss related to HBase log and hsync. I think you are right; the rename

Re: dfs.datanode.socket.reuse.keepalive

2013-06-17 Thread Colin McCabe
source >> (DN threads) is likely to be more contended. >> >> -Todd >> >> On Fri, Jun 7, 2013 at 4:29 PM, Colin McCabe wrote: >> >>> Hi all, >>> >>> HDFS-941 added dfs.datanode.socket.reuse.keepalive. This allows >>> DataXcei

Re: Why is FileSystem.createNonRecursive deprecated?

2013-06-12 Thread Colin McCabe
This seems inconsistent. If the method is deprecated just because it's in org.apache.hadoop.FileSystem, shouldn't all FileSystem methods be marked as deprecated? On the other hand, a user opening up FileSystem.java would probably not realize that it is deprecated. The JavaDoc for the class itself

dfs.datanode.socket.reuse.keepalive

2013-06-07 Thread Colin McCabe
Hi all, HDFS-941 added dfs.datanode.socket.reuse.keepalive. This allows DataXceiver worker threads in the DataNode to linger for a second or two after finishing a request, in case the client wants to send another request. On the client side, HDFS-941 added a SocketCache, so that subsequent clien

Re: [jira] [Created] (HDFS-4824) FileInputStreamCache.close leaves dangling reference to FileInputStreamCache.cacheCleaner

2013-05-15 Thread Colin McCabe
Hi Shouvanik, Why not try asking the Talend community? Also, this question belongs on the user list. thanks, Colin On Wed, May 15, 2013 at 4:20 AM, Shouvanik Haldar < shouvanik.hal...@gmail.com> wrote: > Hi, > > I am facing a problem. > > I am using Talend for scheduling and running a job. Bu

Re: Is Hadoop SequenceFile binary safe?

2013-05-02 Thread Colin McCabe
It seems like we could just set up an escape sequence and make it actually binary-safe, rather than just probabilistically. The escape sequence would only be inserted when there would otherwise be confusion between data and a sync marker. best, Colin On Thu, May 2, 2013 at 3:26 AM, Hs wrote:

Re: VOTE: HDFS-347 merge

2013-04-12 Thread Colin McCabe
win the votes finally. > > Does there need some additional configuration to enable these features? > > > > On Fri, Apr 12, 2013 at 2:05 AM, Colin McCabe >wrote: > > > The merge vote is now closed. With three +1s, it passes. > > > > thanks, > > Colin

Re: VOTE: HDFS-347 merge

2013-04-11 Thread Colin McCabe
in production scenarios. It is as > > functional as the old version and way easier to set up/configure. > > > > -Todd > > > > On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe > wrote: > > > >> Hi all, > >> > >> I think it's time

Re: testHDFSConf.xml

2013-04-10 Thread Colin McCabe
On Wed, Apr 10, 2013 at 10:16 AM, Jay Vyas wrote: > Hello HDFS brethren ! > > I've noticed that the testHDFSConf.xml has alot of references to > supergroup. > > > https://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-hdfs-project/hadoop-hdfs/src/test/resources/testHDFSConf.xml > > 1) I wond

Re: VOTE: HDFS-347 merge

2013-04-08 Thread Colin McCabe
of the code in the branch, and > we have people now running this code in production scenarios. It is as > functional as the old version and way easier to set up/configure. > > -Todd > > On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe > wrote: > > > Hi all, > > > >

Re: VOTE: HDFS-347 merge

2013-04-08 Thread Colin McCabe
thers to follow. > > Tsz-Wo > > > > > > From: Suresh Srinivas > To: "hdfs-dev@hadoop.apache.org" > Sent: Wednesday, March 6, 2013 5:09 AM > Subject: Re: VOTE: HDFS-347 merge > > Thanks Colin. Will check it out as soon as I can. > > > On Tue,

Re: VOTE: HDFS-347 merge

2013-04-02 Thread Colin McCabe
On Mon, Apr 1, 2013 at 6:58 PM, Colin McCabe wrote: > On Mon, Apr 1, 2013 at 5:04 PM, Suresh Srinivas wrote: > >> Colin, >> >> For the record, the last email in the previous thread in ended with the >> following comment from Nicholas: >> > It is great t

Re: VOTE: HDFS-347 merge

2013-04-01 Thread Colin McCabe
inor style change, or renaming function X to Y, then I think we can easily do it after the merge. thanks, Colin > I did not see any response (unless I missed it). Can you please address it? > > Regards, > Suresh > > > On Mon, Apr 1, 2013 at 4:32 PM, Colin McCabe > wrote

VOTE: HDFS-347 merge

2013-04-01 Thread Colin McCabe
Hi all, I think it's time to merge the HDFS-347 branch back to trunk. It's been under review and testing for several months, and provides both a performance advantage, and the ability to use short-circuit local reads without compromising system security. Previously, we tried to merge this and th

Re: Heartbeat interval and timeout: why 3 secs and 10 min?

2013-03-13 Thread Colin McCabe
My understanding is that the 10 minute timeout helps to avoid replication storms, especially during startup. You might be interested in HDFS-3703, which adds a "stale" state which datanodes are placed into after 30 seconds of missing heartbeats. (This is an optional feature controlled by dfs.name

Re: VOTE: HDFS-347 merge

2013-03-05 Thread Colin McCabe
On Tue, Feb 26, 2013 at 5:09 PM, Suresh Srinivas wrote: >> >> Suresh, if you're willing to "support and maintain" HDFS-2246, do you >> have cycles to propose a patch to the HDFS-347 branch reintegrating >> HDFS-2246 with the simplifications you outlined? In your review, did >> you find anything el

Re: VOTE: HDFS-347 merge

2013-02-27 Thread Colin McCabe
Here is a compromise proposal, which hopefully will satisfy both sides: We keep the old block reader and have a configuration option that enables it. So in addition to dfs.client.use.legacy.blockreader, which we already have, we would have dfs.client.use.legacy.blockreader.local. Does that make s

Re: VOTE: HDFS-347 merge

2013-02-25 Thread Colin McCabe
On Sat, Feb 23, 2013 at 4:23 PM, Tsz Wo Sze wrote: > I still do not see a valid reason to remove HDFS-2246 immediately. Some > users may have insecure clusters and they don't want to change their > configuration. > > BTW, is Unix Domain Socket supported by all Unix-like systems? Does anyone >

Re: VOTE: HDFS-347 merge

2013-02-22 Thread Colin McCabe
On Thu, Feb 21, 2013 at 1:24 PM, Chris Douglas wrote: > On Wed, Feb 20, 2013 at 5:12 PM, Aaron T. Myers wrote: >> Given that the only substantive concerns with HDFS-347 seem to be about >> Windows support for local reads, for now we only merge this branch to >> trunk. Support for doing HDFS-2246

VOTE: HDFS-347 merge

2013-02-17 Thread Colin McCabe
S-347 on a number of clusters. This iniial VOTE is to merge only into trunk. Just as we have done with our other recent merges, we will consider merging into branch-2 after the code has been in trunk for few weeks. Please cast your vote by EOD Sunday 2/24. best, Colin McCabe

HDFS-347 (Short-circuit local reads with security)

2013-01-15 Thread Colin McCabe
, see Todd Lipcon's comment here: [2] best, Colin McCabe [1]. https://reviews.apache.org/r/8554/ [2]. https://issues.apache.org/jira/browse/HDFS-347?focusedCommentId=13551755&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13551755

Re: Release of Decompressor resources in CodecPool

2012-12-27 Thread Colin McCabe
I think that you're right. It looks like BuiltInGzipDecompressor, which is marked as DoNotPool, ends up owning some JNI-managed resources. In this case, just relying on the GC to get around to calling the finalizer isn't a great idea. I think you should open a JIRA. cheers, Colin On Mon, Dec

Re: Recovering fsImage from Namenode Logs

2012-12-27 Thread Colin McCabe
On Thu, Dec 20, 2012 at 12:33 AM, ishan chhabra wrote: > Unfortunately, the checkpoint image that I have has the deletes recorded. I > cannot use it. I do have an image that is 15 days old, which I am currently > running. > > I looked at the my logs and I have the filename, block allocated and > g

Re: FSDataInputStream.read returns -1 with growing file and never continues reading

2012-12-27 Thread Colin McCabe
Also, read() returning -1 is not an error, it's EOF. This is the same as for the regular Java InputStream. best, Colin On Thu, Dec 20, 2012 at 10:32 AM, Christoph Rupp wrote: > Thank you, Harsh. I appreciate it. > > 2012/12/20 Harsh J > >> Hi Christoph, >> >> If you use sync/hflush/hsync, the

Re: How to speedup test case running?

2012-10-22 Thread Colin McCabe
Hi, You can run a specific test with mvn eclipse -Dtest= I find that junit tests start more quickly when run within Eclipse. If you're interested, you can find instructions on setting up eclipse here: http://wiki.apache.org/hadoop/EclipseEnvironment cheers, Colin On Sun, Oct 21, 2012 at 7:00 P

Re: MiniDFSCluster

2012-09-05 Thread Colin McCabe
Hi Vlad, I think you might be on to something. File a JIRA? It should be a simple improvement, I think. cheers, Colin On Wed, Sep 5, 2012 at 10:42 AM, Vladimir Rozov wrote: > There are few methods on MiniDFSCluster class that are declared as static > (getBlockFile, getStorageDirPath), thoug

Re: validating user IDs

2012-06-11 Thread Colin McCabe
; the current OS limit? Even if this means detecting the OS version and > assuming its default limit. > > thx > > On Mon, Jun 11, 2012 at 3:57 PM, Colin McCabe wrote: > >> Hi all, >> >> I recently pulled the latest source, and ran a full build.  The >> comma

  1   2   >