Re: [NOTICE] hadoop subversion moving on Weds. 23 January

2008-01-22 Thread Doug Cutting
Reminder: this will happen at 10am PST tomorrow. Doug Doug Cutting wrote: As a part of the move to a TLP, our subversion repository will move. Let's plan the move for one week from today, next Wednesday, 23 January. The new repository will be: https://svn.apache.org/repos/asf/hadoop

Re: [NOTICE] hadoop subversion moving on Weds. 23 January

2008-01-23 Thread Doug Cutting
The move is complete. Please update your workspaces with the command: svn switch https://svn.apache.org/repos/asf/hadoop/core/trunk Doug Doug Cutting wrote: Reminder: this will happen at 10am PST tomorrow. Doug Doug Cutting wrote: As a part of the move to a TLP, our subversion repository

Re: Read MapFileOutputFormat output in ascending key order

2008-02-13 Thread Doug Cutting
Would one of the SequenceFile#merge() methods suffice? http://hadoop.apache.org/core/docs/current/api/org/apache/hadoop/io/SequenceFile.Sorter.html#merge(java.util.List,%20org.apache.hadoop.fs.Path) Doug Andrzej Bialecki wrote: Hi, Any suggestions how to do that? Let's say I have several part

Re: Read MapFileOutputFormat output in ascending key order

2008-02-13 Thread Doug Cutting
Andrzej Bialecki wrote: Hmm ... the idea was to avoid the cost of additional I/O, and read the parts directly as they are. If I understand it correctly, the Sorter.merge() needs to rewrite the files in order to merge them, which means a lot of I/O. It only rewrites things if there are more pa

Re: InputFormat for tarball

2008-02-19 Thread Doug Cutting
Goel, Ankur wrote: Hi All, Is there an input format available for reading from tarballs (.tar.gz files) ? Not at present. There is support for reading .gz files, but not .tar files. A problem is that that there's no way to read a chunk of such archives without reading everything

Re: [Hadoop Wiki] Update of "PoweredBy" by BehnamRezaei

2008-02-21 Thread Doug Cutting
Apache Wiki wrote: The following page has been changed by BehnamRezaei: [ ...] + * [http://www.netseer.com NetSeer] - I just saw that there will be a talk about NetSeer tonight in LA: http://www.linux.ucla.edu/ Will the slides be posted publicly? Thanks! Doug

Re: problem in jira?

2008-02-22 Thread Doug Cutting
Tom White wrote: Speaking of Jira issues, the workflow actions for https://issues.apache.org/jira/browse/HADOOP-2845 don't include Submit Patch - anyone seen this before? When the status is "InProgress" only the assignee can transition to "Patch Available". So you need to assign it to yoursel

Re: mapreduce does the wrong thing with dfs permissions?

2008-02-26 Thread Doug Cutting
It looks like the 'mapreduce' user does not have permission to list the job directory. Can you provide 'ls' output of that directory? Have you altered permission settings at all in your configuration? Doug Michael Bieniosek wrote: In this job, the namenode, the jobtracker, and the job submi

Re: mapreduce does the wrong thing with dfs permissions?

2008-02-26 Thread Doug Cutting
Hairong Kuang wrote: Got the problem. We assume that tasktrackers and jobtracker run as the same user. Something we might better emphasize in the documentation? Doug

Re: add incompatible flag to Jira

2008-03-11 Thread Doug Cutting
Nigel Daley wrote: 1. an incompatible checkbox This really only passes the buck, doesn't it? The real problem is correctly identifying all the incompatible changes. We already have a mechanism, a section in CHANGES.txt, but we don't always manage to correctly list all the incompatible chan

Re: How can I tell that I have "permission denied" error returned to me

2008-03-24 Thread Doug Cutting
This is a shortcoming of Hadoop RPC. Ideally exceptions thrown on the server would be re-thrown on the client, but the concern is that their class might not exist there, so we instead transmit the just class name and the error string and do not attempt to re-throw the original exception and in

Re: How can I tell that I have "permission denied" error returned to me

2008-03-24 Thread Doug Cutting
Olga Natkovich wrote: Doug, thanks. Would what I proposed be reasonable short term? Yes, that should work. I think it would be reasonable to fix RPC so that, if the exception class exists on the client and has a constructor with a single String argument, then the original exception will be r

Re: Gigablast.com search engine- 10BILLION PAGES!

2008-06-19 Thread Doug Cutting
Ted Dunning wrote: One way that this sort of statement can come out of a marketing person's mouth is if you scan 10 billion pages, decide that 95% of them will never appear on any results list and only actually index 500 million. The classic way to boost your count by an order of magnitude is t

Re: Determining "Hidden" Files in Hadoop

2008-06-27 Thread Doug Cutting
Lincoln Ritter wrote: Is there a defined standard for hidden files or a public interface for determining file visibility? MapReduce's FileInputFormat, and its many subclasses, ignore files and directories whose names begin with either "." or "_". However FsShell's 'ls' and 'lsr' commands do

Re: Determining "Hidden" Files in Hadoop

2008-06-27 Thread Doug Cutting
Lincoln Ritter wrote: I can see from the private hiddenFileFilter (used by listPaths) that '.' and '_' prefixed stuff is considered hidden, I just want to make sure that this is "standard". Yes, it is standard for mapreduce input and output directories. I'm working on getting Nutch 0.9 worki

Re: Question on HiddenFileFilter Patch

2008-06-27 Thread Doug Cutting
Michael Gottesman wrote: Hello. I finished the hidden file filter patch off of the latest trunk (in case you are curious here it is in pastie http://pastie.org/223722). Please develop this as an issue in Jira, attaching patches there. http://wiki.apache.org/hadoop/HowToContribute Looking at y

Re: porting HDFS to Zookeeper??

2008-06-30 Thread Doug Cutting
Sangmin Lee wrote: I heard that there was a plans to porting HDFS to Zookeeper to get a high availability. Is this true? If so, could anyone let me know about the status of this effort? I have heard folks talk of this, but have heard of no concrete plans yet. The radical approach would be to

RMI based on Hadoop RPC

2008-07-21 Thread Doug Cutting
FYI, Jason Rutherglen has implemented RMI based on Hadoop RPC. https://issues.apache.org/jira/browse/LUCENE-1336 Doug

Re: [jira] Issue Comment Edited: (HADOOP-3860) Compare name-node performance when journaling is performed into local hard-drives or nfs.

2008-07-30 Thread Doug Cutting
Konstantin, I can't easily tell what you changed in this comment without re-reading the whole thing. Can you please add a new comment summarizing your update? Editing Jira comments doesn't really work very well for folks who are trying to follow a discussion... Doug

[VOTE] Release Hadoop 0.18.0 (candidate 0)

2008-08-04 Thread Doug Cutting
I've created a candidate build for Hadoop 0.18.0. http://people.apache.org/~cutting/hadoop-0.18.0-candidate-0/ Should we release this? Doug

Re: [VOTE] Release Hadoop 0.18.0 (candidate 0)

2008-08-04 Thread Doug Cutting
Doug Cutting wrote: Should we release this? -1 I just realized that this does not yet include a pre-built libhdfs, nor does it include release notes. Doug

new Hadoop Core committer: Steve Loughran

2008-08-07 Thread Doug Cutting
I'm pleased to announce that the Hadoop PMC has voted to add Steve Loughran as a Hadoop Core committer. Welcome, Steve! Doug

on the road next three weeks

2008-08-08 Thread Doug Cutting
I'll be away from home the next 3 weeks, visiting the UK with my family. I will attempt to keep up with email, but will have less time for code reviews, etc. until I return on 2 September. Doug

Re: ACTION NEEDED: Hadoop 0.17 Release Notes

2008-04-16 Thread Doug Cutting
Nigel Daley wrote: As you know, we've added a release note field in Jira from which we can build a reasonable set of user facing release notes. I always thought that the message in CHANGES.txt should be written for end-users. Isn't that the case? If so, shouldn't we strive to improve & bett

Re: ACTION NEEDED: Hadoop 0.17 Release Notes

2008-04-16 Thread Doug Cutting
Nigel Daley wrote: We've discussed all this before when the new fields were proposed. I know. And I never really got it then. I'm not trying to be difficult. Honest! 2. CHANGES.txt. We include one message per JIRA issue here, organized into sections. These are meant to be user-readable

Re: ACTION NEEDED: Hadoop 0.17 Release Notes

2008-04-16 Thread Doug Cutting
Nigel Daley wrote: But how do you get around the noise? User's don't want to be presented with every little doc/test/bug/improvement. But there are some significant improvements and bug fixes they need to know about. The CHANGES.txt can't address that IMO. Can't we put a section at the top

ongoing ยท Wide Finder 2

2008-05-01 Thread Doug Cutting
Anyone want to play? The goal is to find a small program that quickly computes some statistics over 45GB of log data on a 32-core box. Hadoop seems like a good candidate. Streaming? Pig? Java? http://www.tbray.org/ongoing/When/200x/2008/05/01/Wide-Finder-2 Doug

Re: Parallelism of sorts

2008-05-05 Thread Doug Cutting
Brice Arnould wrote: I was asking myself if it could be a good idea to parallelize some of the alogorithms of Hadoop, such as MergeSorter, for the case a single job of run on a multicore system. One can already exploit parallelism on a multicore system by using "pseudo-distributed" mode and in

Re: design Q #2: why the JSP pages?

2008-05-08 Thread Doug Cutting
The general rule has been that urls that produce HTML are written in JSP, and urls that generate data are written as servlets. I don't think this has ever been explicitly discussed, it's just the way we've gone. (Note that Hadoop's JSP's are compiled offline, so that we don't require a java c

Re: Design Q: logging/nesting of exceptions

2008-05-08 Thread Doug Cutting
Steve Loughran wrote: Is this all a deliberate decision, or just a accidental policy that can be changed if someone is prepared to go through the code and make the changes? I'd go with something closer to an accidental policy. My suspicion is that logging framework didn't print nested excep

Re: Design Q: logging/nesting of exceptions

2008-05-09 Thread Doug Cutting
Steve Loughran wrote: OK, that means when I encounter them I can change them. Yes, please. And if you're feeling ambitious, you could change a bunch of them wholesale as a distinct issue. Doug

Re: Proposal to add Zookeeper as a new Hadoop subproject

2008-05-09 Thread Doug Cutting
Owen O'Malley wrote: I'd like to propose Zookeeper, which is a distributed coordination service, roughly similar to Google's Chubby, as a new Hadoop subproject. I think Zookeeper would be a great basis for both Map/Reduce and HDFS high availability. The current version at http://zookeeper.sour

Re: svn commit: r691799 - in /hadoop/core/trunk: CHANGES.txt src/contrib/fairscheduler/src/test/org/apache/hadoop/mapred/TestFairScheduler.java

2008-09-05 Thread Doug Cutting
[EMAIL PROTECTED] wrote: HADOOP-4050. Fix TestFailScheduler to use absolute paths for the work directory. (Matei Zaharia via omalley) s/Fail/Fair/g? Doug

Re: svn commit: r692248 - in /hadoop/core/trunk: ./ conf/ src/contrib/fairscheduler/src/test/org/apache/hadoop/mapred/ src/core/org/apache/hadoop/security/ src/examples/org/apache/hadoop/examples/ src

2008-09-05 Thread Doug Cutting
[EMAIL PROTECTED] wrote: + + queue.name + default Shouldn't this name be prefixed with "mapred."? Also, it might better be named something like "mapred.job.default.queue.name", since it's the default value for a job parameter, no? Doug

Re: git clone of hadoop core?

2008-09-12 Thread Doug Cutting
FYI, Jukka Zitting has been exploring using Git with Apache's SVN over on the [EMAIL PROTECTED] mailing list. http://markmail.org/message/fzzy7nepk7olx5fl I don't know how his methods compare to Owen's & Pat's. Doug Chris K Wensel wrote: Just curious if there has been any progress/changes on

Re: git clone of hadoop core?

2008-09-12 Thread Doug Cutting
Performance of svn.eu.apache.org should be better since the load is generally lower on that server. Git can place big demands on svn, and is thus discouraged on the primary server and encouraged on the eu mirror. Doug Owen O'Malley wrote: Ok, using the link that Doug sent out, I've managed to

Re: [jira] Issue Comment Edited: (HADOOP-4108) FileSystem support for POSIX access method

2008-09-16 Thread Doug Cutting
Pete Wyckoff (JIRA) wrote: wyckoff edited comment on HADOOP-4108 at 9/16/08 2:15 PM: Please refrain from editing Jira comments and descriptions. It makes the discussion very hard to follow. Descriptions should describe the problem and not present solutions. Comments should discuss solution

prohibit jira comment edits

2008-09-16 Thread Doug Cutting
Nigel Daley wrote: Should we perhaps only permit project admins to edit & remove comments? +1 To elaborate on my rationale: comments log a discussion. If folks are permitted to edit and remove their comments then they can make the subsequent comments of others irrelevant. Folks should use

Re: [jira] Issue Comment Edited: (HADOOP-4108) FileSystem support for POSIX access method

2008-09-16 Thread Doug Cutting
Chris Douglas wrote: +1 to the overall sentiment (broadened to deleting attachments from JIRA), but I'm -1 to preventing editing of comments unless it has a grace period of at least 15 minutes. Can you retract email for 15 minutes? Each comment generates an email, and many people follow comm

Re: prohibit jira comment edits

2008-09-18 Thread Doug Cutting
Steve Loughran wrote: Lots of people in apache have full JIRA admin rights that could ignore your permission restrictions without even noticing. I think the change to make would be to prohibit *anyone* from editing a comment, even an admin, and permit deletion of comments only by admins. (O

Re: prohibit jira comment edits

2008-09-23 Thread Doug Cutting
Owen O'Malley wrote: I'm in favor of leaving edits enabled and just use social engineering. Why have you changed your mind? Why do you think social engineering will work now? It has not in the past. Despite regular complaints over the years, edits have proceeded at a steady pace: http:

Re: Exception for HADOOP-4006 to go into feature froze 0.19 branch.

2008-09-23 Thread Doug Cutting
+1 I'm okay with this going into the branch. Doug Raghu Angadi wrote: HADOOP-4006 does some code reorg, mainly HDFS. It was ready to be committed for 0.19 before the 5 pm deadline for 0.19 last Friday. But I messed it up committed it around 5:37pm. 0.19 was branched around 5:10 and I didn't

Re: API changes for 0.18 to 0.19

2008-09-26 Thread Doug Cutting
Owen O'Malley wrote: I've uploaded the current API changes to people: http://people.apache.org/~omalley/hadoop-0.19.0-dev/jdiff/changes.html Please take a chance to look over the diffs for unintentional changes. These contain broken links, due to a bug in our use of jdiff. I've fixed that,

Re: [Hadoop Wiki] Update of "HowToRelease" by DougCutting

2008-09-26 Thread Doug Cutting
Nigel Daley wrote: Doug, does "jdiff.stable" also need to be edited in the build.xml or set on cmd line? Good point. It should be updated in trunk after we've made a release, but not merged to branches, so branches always point to a prior branches release. (It can't be updated until the "st

RPC versioning

2008-10-03 Thread Doug Cutting
It has been proposed in the discussions defining Hadoop 1.0 that we extend our back-compatibility policy. http://wiki.apache.org/hadoop/Release1.0Requirements Currently we only attempt to promise that application code will run without change against compatible versions of Hadoop. If one has

Re: [jira] Issue Comment Edited: (HADOOP-4044) Create symbolic links in HDFS

2008-10-06 Thread Doug Cutting
Raghu Angadi (JIRA) wrote: edit: minor. my first edit since Sept 16th.. Sigh. Why? I have no idea what you changed, I read the first version of your comment and now decline to re-read your entire comment. If you want me to know what you've changed you should add a new comment amending you

Re: CRC32 performance

2008-10-06 Thread Doug Cutting
How are you profiling? I don't trust most profilers. Have you tried, e.g., disabling checksums and seeing how much performance is actually gained? For the local filesystem, you can easily disable checksums by binding file: URI's to RawLocalFileSystem in your configuration. Doug Bryan Duxb

Re: CRC32 performance

2008-10-06 Thread Doug Cutting
Bryan Duxbury wrote: I am profiling with YourKit on random reducers. I'm also running on HDFS, so I don't know how one would go about disabling CRCs. Hack the CRC-computing code to fill things with zeros? Doug

Re: CRC32 performance

2008-10-07 Thread Doug Cutting
be sure I understand what I'd have to do: if I make it stop computing CRCs altogether, I need to make changes in the datanode as well, right? To stop checking validity of CRCs? Will this break anything interesting and unexpected? On Oct 6, 2008, at 4:58 PM, Doug Cutting wrote: Bryan D

Re: Fwd: Jira bug

2008-10-14 Thread Doug Cutting
If there are multiple attachments, and their names leave any ambiguity, then I always select the "All" tab and choose the attachment from the end of the comment log rather than the attachment pane, to make sure I get the most recently added, since the comment log is chronologically ordered and

Re: [Hadoop Wiki] Update of "HowToRelease" by DougCutting

2008-10-20 Thread Doug Cutting
Nigel Daley wrote: Doug, just getting back to this. Does this look right? http://wiki.apache.org/hadoop/HowToRelease?action=diff Looks good to me. Thanks! Doug

Re: Multi-language serialization discussion

2008-10-24 Thread Doug Cutting
Bryan Duxbury wrote: I've been reading the discussion about what serialization/RPC project to use on http://wiki.apache.org/hadoop/Release1.0Requirements, and I thought I'd throw in a pro-Thrift vote. I've been thinking about this, and here's where I've come to: It's not just RPC. We need a

Re: Multi-language serialization discussion

2008-10-24 Thread Doug Cutting
Chad Walters wrote: Re-open that discussion and I imagine you might get some interested parties. I think I just did, no? Bumping up a level, rather than inventing a whole new set of Hadoop-specific RPC and serialization mechanisms Whatever we use, we'd probably end up recycling much of Had

Re: Multi-language serialization discussion

2008-10-27 Thread Doug Cutting
Ted Dunning wrote: I don't think that it would be a major inconvenience in any of the major scripting languages to change the meaning of "open" to mean that you must read the IDL for a file, generate a reading script, load that and now be ready to read. This is a scripting language after all.

Re: Multi-language serialization discussion

2008-10-27 Thread Doug Cutting
Pete Wyckoff wrote: Fyi - Hadoop already has this for Java - in hive/serde/DynamicSerDe. This is exactly that and gives one the ability to read and write thrift and non-thrift data without compilation. Is this what you mean? http://svn.apache.org/repos/asf/hadoop/core/trunk/src/contrib/hive/

Re: Multi-language serialization discussion

2008-10-28 Thread Doug Cutting
Sanjay Radia wrote: I like the self describing data for the reasons you have state. Q. I assume that in many cases the reader of some serialized data is expecting a particular data-definition (or versions of it). In this case the reader has the expected data-definition that was generated from t

Re: Feature Designs and Test Plans

2008-11-20 Thread Doug Cutting
Nigel Daley wrote: I propose that before we commit issues marked as "New Feature", they must have: 1. a design doc attachment 2. a test plan attachment (Templates to be provided for both) Clarifying test expectations is a good idea in principle. I gather that what you're after in (1) is

commit Forrest output?

2008-12-01 Thread Doug Cutting
We currently re-generate PDF and HTML documentation whenever we commit a documentation patch, which creates huge commit messages that few read. This was originally done so that folks who check out the sources from subversion did not need to install forrest in order to read the documentation.

Re: Hadoop Beijing Meeting has successfully concluded! www.hadooper.cn is ready now.

2008-12-01 Thread Doug Cutting
It looks as though you have translated some Hadoop documentation. Is that right? It would be great to get your translations included in Hadoop distributions. Perhaps you could convert these to Forrest format and attach them to an issue in Jira? I note that the Apache HTTPD project has many

Re: commit Forrest output?

2008-12-01 Thread Doug Cutting
Tsz Wo (Nicholas), Sze wrote: If we do (1) or (2), what should we do for making sure that the patch is good? Hudson already builds the Forrest documentation for each patch. Also, anyone committing a documentation patch should run 'ant docs' and inspect the Forrest output before committing.

Re: commit Forrest output?

2008-12-01 Thread Doug Cutting
Chris Douglas wrote: If we do (1) or (2), it would also be helpful to add the generated docs to the svn and git ignore lists. -C I think we'd just change build.xml to put the generated docs in build/ not in docs/. They'd get bundled in docs/ only in releases. Doug

Re: Maven vs Ivy

2008-12-02 Thread Doug Cutting
Ivy adds a feature to Ant while Maven replaces Ant. If folks are happy with Ant, and seek just to add dependency management, then Ivy's a good choice. Doug Ashish Thusoo wrote: Folks, I found that sometime back there was some discussion on which of these to use in order to manage dependenc

Re: [VOTE] 0.20 freeze extension

2008-12-05 Thread Doug Cutting
I'd love to see HADOOP-1230 in a release sooner rather than later. +1 Doug Owen O'Malley wrote: I've got hadoop-1230 code complete and I'm working through the bugs and I'd really like to get it in to 0.20. Can we push back the freeze by a week to 12 Dec? Thanks, Owen

Re: Point hadoop.com to apache.hadoop.org instead of apache.hadoop.org/core?

2008-12-10 Thread Doug Cutting
Done. Doug Jeff Hammerbacher wrote: Hey, I might be the only one who navigates to Hadoop via http://www.hadoop.com, but it would be useful to me to have that URL redirect to the Hadoop project site instead of the core site. I'm not sure who controls that redirect, or how others feel, and this

Re: non-apache projects

2008-12-15 Thread Doug Cutting
My instinct would be to add a link next to the Support link, or perhaps even as a separate section on the Support page. I think the rules should be as on the Support page: alphabetical listing, no endorsement implied, etc. We should remove things which do not directly provide added value to H

Re: commit Forrest output?

2008-12-19 Thread Doug Cutting
Hemanth Yamijala wrote: Is there a consensus on this topic, agreeing to not committing Forrest generated documentation ? I saw no votes against, so I think so. I filed an issue for this. https://issues.apache.org/jira/browse/HADOOP-4920 Should I commit all the files, or only the ones that t

Re: short-circuiting HDFS reads

2009-01-07 Thread Doug Cutting
Please see https://issues.apache.org/jira/browse/HADOOP-4801. Doug Jun Rao wrote: Hi, Today, HDFS always reads through a socket even when the data is local to the client. This adds a lot of overhead, especially for warm reads. It should be possible for a dfs client to test if a block to be re

Re: short-circuiting HDFS reads

2009-01-08 Thread Doug Cutting
Chris K Wensel wrote: Any comments on the probability (currently) that reads by a Task are over the network vs. being "local", as seen in your tests? That is, are 10% of block reads over the network, or 90% of reads? Greater than 90% of map reads are typically local in a sort job, like 98-99%

Re: Should we continue to support Windows?

2009-01-23 Thread Doug Cutting
Konstantin Shvachko wrote: HADOOP-5114. The problem seems to be in ConnectException. The rest is just how it is revealed by different servers/clients. If this is indeed just a single bug that's causing 30 tests to fail, I'm not sure it's yet time to abandon Windows compatibility. It would su

Re: Should we continue to support Windows?

2009-01-26 Thread Doug Cutting
Steve Loughran wrote: 4. It looks like nobody bothers to test the release regularly on windows. Um, we test releases before they are released. We don't need to test them any more regularly than that. It would be good to test trunk and perhaps active branches more regularly so that we're les

Re: Zeroconf for hadoop

2009-01-26 Thread Doug Cutting
Owen O'Malley wrote: allssh -h node1000-3000 bin/hadoop-daemon.sh start tasktracker and it will use ssh in parallel to connect to every node between node1000 and node3000. Our's is a mess, but it would be great if someone contributed a script like that. *smile* It would be a one-line change

Re: Hadoop 0.19.1

2009-02-06 Thread Doug Cutting
Sanjay Radia wrote: For me the lesson is that large complex projects should be branched. We already maintain release branches. What's under discussion is the maintenance of feature branches. We do this today through patch files, merging each time they are applied. The proposal is to use a

Re: Hadoop 0.19.1

2009-02-06 Thread Doug Cutting
Sanjay Radia wrote: On Feb 6, 2009, at 10:35 AM, Doug Cutting wrote: Commits to a feature branch should not require reviews, since these are equivalent to updating a patch. Agree, but it would be wise for the community to get their feedback to the project team earlier rather than later

Re: Hadoop 0.19.1

2009-02-06 Thread Doug Cutting
Doug Cutting wrote: Commits to a feature branch will send a message to the dev list, like any other commit. And when folks commit to a feature branch, they should reference the Jira issue id, as in any other commit, so that folks browsing Jira can see the commits. What would be sweet is if

Re: Hadoop 0.19.1

2009-02-06 Thread Doug Cutting
Konstantin Shvachko wrote: +1. I agree: no review requirement for feature branches, and 1-5. I would add to this (6) merging a feature branch to an official branch goes through regular patch process, that is, a new jira is created with the patch attachment, which now goes through the review proce

Re: anybody knows an apache-license-compatible impl of Integer.parseInt?

2009-02-10 Thread Doug Cutting
Zheng Shao wrote: Does anybody know an implementation that I can use for hive (apache license)? http://svn.apache.org/viewvc/harmony/enhanced/classlib/trunk/modules/luni/src/main/java/java/lang/Integer.java?revision=732988 Doug

Re: Hadoop 0.19.1

2009-02-18 Thread Doug Cutting
Steve Loughran wrote: One thing about Git is that it has a more laid back notion of what is "trunk"; you'd end up with more a blurred distro, with -maybe- the Y! production scheme, the Cloudera branch, the steves-modified-branch, etc, etc -with people able to pick and choose which to merge in.

Re: Hadoop with case-preservation and case-insensitivity

2009-03-05 Thread Doug Cutting
Paul Sheer wrote: I have the requirement to use Hadoop with case-insensitivity and case-preservation ala Windows. I think you may have difficultly convincing folks that Hadoop should directly support this mode of operation, and it's also a bad idea to run a hacked version of HDFS, since that

Re: creating a branch for Hadoop-3628

2009-03-05 Thread Doug Cutting
Steve Loughran wrote: Following up the discussion we had recently on doing big changes via Git versus hadoop branches, can we try this out by creating a branch for the service lifecycle stuff, HADOOP-3628 https://issues.apache.org/jira/browse/HADOOP-3628 That sounds fine to me. +1 We need

Re: creating a branch for Hadoop-3628

2009-03-05 Thread Doug Cutting
Owen O'Malley wrote: I think we should keep the branch structure flat, with just hadoop/core/branches/HADOOP-3638 I thought you'd raise that right after I sent that last message. I agree: there are tools out of our control that we'd like to use (like git-svn & eclipse) that assume the branc

Re: Hadoop with case-preservation and case-insensitivity

2009-03-05 Thread Doug Cutting
Paul Sheer wrote: Sorry if I gave the impression that Hadoop ought to support this feature in general. No, I was only asking about my own setup and I'm happy to maintain my own private branch. You didn't imply that Hadoop ought to support it. But maintaining your own private branch is a bad

Re: svn commit: r752949 [1/3] - in /hadoop/core: branches/HADOOP-3628/src/test/org/apache/hadoop/cli/ branches/HADOOP-3628/src/test/org/apache/hadoop/hdfs/ trunk/conf/ trunk/ivy/ trunk/src/contrib/str

2009-03-12 Thread Doug Cutting
ste...@apache.org wrote: Added: hadoop/core/trunk/src/core/org/apache/hadoop/util/MockService.java Shouldn't MockService go in the test/ tree? Doug

Re: svn commit: r755965 [1/2] - in /hadoop/core/branches/HADOOP-3628: ./ ivy/ src/contrib/streaming/src/test/org/apache/hadoop/streaming/ src/core/org/apache/hadoop/conf/ src/core/org/apache/hadoop/ht

2009-03-19 Thread Doug Cutting
ste...@apache.org wrote: Added: hadoop/core/branches/HADOOP-3628/src/core/org/apache/hadoop/util/MockService.java Again I ask: shouldn't this go in the test/ tree? Doug

Re: Design for security in Hadoop

2009-03-20 Thread Doug Cutting
Amandeep Khurana wrote: http://www.soe.ucsc.edu/~akhurana/Hadoop_Security.pdf How does this relate to the current proposal in Jira? https://issues.apache.org/jira/browse/HADOOP-4343 Doug

Re: Design for security in Hadoop

2009-03-25 Thread Doug Cutting
Amandeep Khurana wrote: 1. The Jira covers only authentication using Kerberos. I dont think Kerberos is the best way to do it since I feel the scalability is limited. All keys have to be negotiated by the Kerberos server. The design in HADOOP-4343 seeks to minimize the number of key negotiatio

Re: New Committer

2009-03-25 Thread Doug Cutting
Hasan Yusuf wrote: I would like to start contributing to the Hadoop Project. I am very much new to JIRA so I am not sure how the assignment of bugs work. Looking over all the outstanding issues, I would like to work on issue *HADOOP-4802*. Can th

Re: Optimizing Hadoop MR with File Based File Systems

2009-05-06 Thread Doug Cutting
Jonathan Seidman wrote: We've created an implementation of FileSystem which allows us to use Sector (http://sector.sourceforge.net/) as the backing store for Hadoop. This implementation is functionally complete, and we can now run Hadoop MapReduce jobs against data stored in Sector. Please con

Re: project split coming soon!

2009-05-15 Thread Doug Cutting
Jim Kellerman (POWERSET) wrote: I have to agree with Dhruba. I don't see the need to split up committers (esp if they are on the PMC). PMC members are moot here, since all PMC members have permission to write anywhere in the Hadoop tree. So the question is whether all non-PMC committers to C

Re: project split coming soon!

2009-05-15 Thread Doug Cutting
Dhruba Borthakur wrote: The goal of my earlier email was to keep the committer community together instead of splitting. The goal of splitting the project is in part to split the community. The Core Jira traffic is currently higher than many folks can easily follow. By splitting the project w

Re: project split coming soon!

2009-05-18 Thread Doug Cutting
Steve Loughran wrote: Is one of the intents of the split to move things to different release cycles? Long-term, yes. The hope is that we can stabilize APIs, RPC wire formats, etc. to the degree that parts can be independently upgraded. Doug

Re: [VOTE] freeze date for Hadoop 0.21

2009-05-27 Thread Doug Cutting
Owen O'Malley wrote: I'd like to propose a code freeze and branch date of 7/31. One major exception is for HDFS file append, which I think we need in 0.21 and will take longer than that. So will we have append turned on by default at the time of the branch, so that bugfixes against it are per

Re: 0.19.2 release needed

2009-05-27 Thread Doug Cutting
Scott Carey wrote: I would like to see a 0.19.2 release soon. Any committer can build a release candidate and call a release vote. Are there any committers sympathetic to Scott who would like to volunteer to drive the 0.19.2 release forward? I think Y! skipped directly from 0.18 to 0.20, an

Re: Project split postponed until 6/1

2009-06-01 Thread Doug Cutting
Tom White wrote: I think we (committers) should aim to keep the queue short, by not leaving patches in the "Patch Available" state for more than a few days. +100 So I suggest that we take advantage of the project split to clear out the patch queue, and to keep it short in future. It might be

Re: more information about project split

2009-06-22 Thread Doug Cutting
Raghu Angadi wrote: I would like to receive the updates (at least the ones with comments) without having to watch each of them. +1 The full process should be logged in email. Doug

Re: more information about project split

2009-06-23 Thread Doug Cutting
Owen O'Malley wrote: I think the community is better served by having a mailing list that is dominated by people posting rather than a deluge of jira traffic. This is a somewhat false dichotomy: Jira messages are postings by people. Folks should not make changes in Jira without realizing thi

Re: PROJECTS SPLIT

2009-06-23 Thread Doug Cutting
Sharad Agarwal wrote: Also scripts are only available in common. Should we make an external link in hdfs and mapreduce for "bin" folder pointing to "common/trunk/bin" for now ? FWIW, the way Nutch has long handled this is that Hadoop included the scripts in the jar file, and the Nutch build.xm

Re: more information about project split

2009-06-24 Thread Doug Cutting
Amr Awadallah wrote: I can't set email filters for which jiras I am interested in getting full updates on, that would mean I have to set an additional filter for each jira ticket one by one, not very scalable. Is that what you suggesting? I think all that Dhruba is suggesting is that if a dev

Re: more information about project split

2009-06-26 Thread Doug Cutting
Amr Awadallah wrote: To re-iterate, not all JIRAs are imporant to me, there are some key ones that I would like to get all updates on, and there are others that I would just like to check once in a while but don't really have capacity to be getting email updates for. How do we accommodate that?

Re: New subproject logos

2009-06-26 Thread Doug Cutting
I like them except I think they should be all lowercase, to be consistent with the style of the existing Hadoop logo. Doug Nigel Daley wrote: Here are some logos for the new subprojects http://www.flickr.com/photos/88199...@n00/3661433605/ Please vote +1 if you like 'em and -1 if you don't.

Re: Fwd: more information about project split

2009-06-26 Thread Doug Cutting
Nigel Daley wrote: Doug, given Owen's away this week, can you update the Jira config to implement (4). Until this is done, the patch testing process can't see when Jira's change states and thus nothing is getting tested. The first step is creating the -issues lists. I just filed a Jira for t

  1   2   3   4   5   6   7   8   9   10   >