[Hadoop Wiki] Update of "HowToCommit" by MartonElek
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToCommit" page has been changed by MartonElek: https://wiki.apache.org/hadoop/HowToCommit?action=diff=42=43 Comment: fix man doc generation (from ant to mvn) The end user documentation is maintained in the main repository (hadoop.git) and the results are committed to the hadoop-site during each release. The website itself is managed in the hadoop-site.git repository (both the source and the rendered form). - To commit end-user documentation changes to trunk or a branch, ask the user to submit only changes made to the *.xml files in {{{src/docs}}}. Apply that patch, run {{{ant docs}}} to generate the html, and then commit. End-user documentation is only published to the web when releases are made, as described in HowToRelease. + To commit end-user documentation create a patch as usual and modify the content of src/site directory of any hadoop project (eg. ./hadoop-common-project/hadoop-auth/src/site). You can regenerate the docs with {{mvn site}}. End-user documentation is only published to the web when releases are made, as described in HowToRelease. - To commit changes to the website and re-publish them: {{{ + To commit changes to the website and re-publish them: + {{{ git clone https://gitbox.apache.org/repos/asf/hadoop-site.git -b asf-site #edit site under ./src hugo @@ -101, +102 @@ The commit will be reflected on Apache Hadoop site automatically. - Note: you can check the rendering locally: with 'hugo serve && firefox http://localhost:1313' + Note: you can check the rendering locally: with {{hugo serve && firefox http://localhost:1313}} == Patches that break HDFS, YARN and MapReduce == - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToCommit" by MartonElek
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToCommit" page has been changed by MartonElek: https://wiki.apache.org/hadoop/HowToCommit?action=diff=41=42 Comment: Update site generation * [[http://www.apache.org/dev/new-committers-guide.html|Apache New Committer Guide]] * [[http://www.apache.org/dev/committers.html|Apache Committer FAQ]] - The first act of a new core committer is typically to add their name to the [[http://hadoop.apache.org/common/credits.html|credits]] page. This requires changing the XML source in http://svn.apache.org/repos/asf/hadoop/common/site/main/author/src/documentation/content/xdocs/who.xml. Once done, update the Hadoop website as described [[#Documentation|here]]. + The first act of a new core committer is typically to add their name to the [[http://hadoop.apache.org/common/credits.html|credits]] page. This requires changing the site source in https://github.com/apache/hadoop-site/blob/asf-site/src/who.md. Once done, update the Hadoop website as described [[#Documentation|here]] (TLDR; don't forget to regenerate the site with hugo, and commit the generated results, too). == Review == @@ -79, +79 @@ <> Committing Documentation - Hadoop's official documentation is authored using [[http://forrest.apache.org/|Forrest]]. To commit documentation changes you must have Apache Forrest installed, and set the forrest directory on your {{{$FORREST_HOME}}}. Note that the current version ([[wget http://archive.apache.org/dist/forrest/0.9/apache-forrest-0.9.tar.gz|0.9]]) work properly with Java 8. Documentation is of two types: + Hadoop's official documentation is authored using [[https://gohugo.io/|hugo]]. To commit documentation changes you must have Hugo installed (single binary available for all the platforms, part of the package repositories, brew/pacman/yum...). Documentation is of two types: + 1. End-user documentation, versioned with releases; and, - 1. The website. This is maintained separately in subversion, republished as it is changed. + 1. The website. + + The end user documentation is maintained in the main repository (hadoop.git) and the results are committed to the hadoop-site during each release. The website itself is managed in the hadoop-site.git repository (both the source and the rendered form). To commit end-user documentation changes to trunk or a branch, ask the user to submit only changes made to the *.xml files in {{{src/docs}}}. Apply that patch, run {{{ant docs}}} to generate the html, and then commit. End-user documentation is only published to the web when releases are made, as described in HowToRelease. To commit changes to the website and re-publish them: {{{ - svn co https://svn.apache.org/repos/asf/hadoop/common/site - cd site/main - $FORREST_HOME/tools/ant/bin/ant -Dforrest.home=$FORREST_HOME # Newer version of Ant does not work. Use the Ant bundled with forrest. - firefox publish/index.html # preview the changes - svn stat # check for new pages - svn add # add any new pages - svn commit + + git clone https://gitbox.apache.org/repos/asf/hadoop-site.git -b asf-site + #edit site under ./src + hugo + # add both the ./src and ./content directories (source and rendered version) + git add . + git commit + git push }}} The commit will be reflected on Apache Hadoop site automatically. + + Note: you can check the rendering locally: with 'hugo serve && firefox http://localhost:1313' == Patches that break HDFS, YARN and MapReduce == - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by MartonElek
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by MartonElek: https://wiki.apache.org/hadoop/HowToRelease?action=diff=100=101 Comment: HADOOP-15205, dist profile is requierd to upload sources to the maven repo 1. Push branch-X.Y.Z and the newly created tag to the remote repo. 1. Deploy the maven artifacts, on your personal computer. Please be sure you have completed the prerequisite step of preparing the {{{settings.xml}}} file before the deployment. You might want to do this in private and clear your history file as your gpg-passphrase is in clear text. {{{ - mvn deploy -Psign -DskipTests -DskipShade + mvn deploy -Psign,dist -DskipTests -DskipShade }}} 1. Copy release files to a public place and ensure they are readable. Note that {{{home.apache.org}}} only supports SFTP, so this may be easier with a graphical SFTP client like Nautilus, Konqueror, etc. {{{ - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HadoopJavaVersions" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HadoopJavaVersions" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/HadoopJavaVersions?action=diff=33=34 + Moved to Confluence Wiki: https://cwiki.apache.org/confluence/display/HADOOP/Hadoop+Java+Versions + + The following contents are deprecated. + = Hadoop Java Versions = Version 2.7 and later of Apache Hadoop requires Java 7. It is built and tested on both OpenJDK and Oracle (HotSpot)'s JDK/JRE. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToCommit" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToCommit" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/HowToCommit?action=diff=40=41 Comment: Git repository is moved, changing the URL == Commit individual patches == - Hadoop uses git for the main source. The writable repo is at - https://git-wip-us.apache.org/repos/asf/hadoop.git + Hadoop uses git for the main source. The writable repo is at - https://gitbox.apache.org/repos/asf/hadoop.git Initial setup We try to keep our history all linear and avoid merge commits. To this end, we highly recommend using git pull --rebase. In general, it is a good practice to have this ''always'' turned on. If you haven't done so already, you should probably run the following: - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "GitAndHadoop" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "GitAndHadoop" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/GitAndHadoop?action=diff=26=27 Comment: Fix typo Content moved to https://cwiki.apache.org/confluence/display/HADOOP/Git+And+Hadoop - Please email common-...@hadoop.apache.org for cwiki access. + Please email common-...@hadoop.apache.org for cwiki access. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToContribute" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToContribute" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/HowToContribute?action=diff=119=120 Comment: Fix url = How to Contribute to Hadoop = - Content moved to Confluence - https://cwiki.apache.org/confluence/display/HADOOP/HowToContribute + Content moved to Confluence - https://cwiki.apache.org/confluence/display/HADOOP/How+To+Contribute Email common-...@hadoop.apache.org if you need write access to the cwiki. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=54=55 # Please don't have tracking URLs. We'll only cut them. }}} + === Hands-On Big Data Processing with Hadoop 3 (Video) === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hands-big-data-processing-hadoop-3-video|Hands-On Big Data Processing with Hadoop 3 (Video)]] + + '''Author:''' Sudhanshu Saxena + + '''Publisher:''' Packt + + '''Date of Publishing:''' October 2018 + + Perform real-time data analytics, stream and batch processing on your application using Hadoop + === Modern Big Data Processing with Hadoop === '''Name:''' [[https://www.amazon.com/dp/B0787KY8RH/|Modern Big Data Processing with Hadoop]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/HowToRelease?action=diff=99=100 {{{ svn ci -m "Publishing the bits for release ${version}" }}} + 1. Usually binary tarball becomes larger than 300MB, so it cannot be directly uploaded to the distribution directory. Use the dev directory (https://dist.apache.org/repos/dist/dev/hadoop/) first and then move it to the distribution directory by {{{svn move}}}. 1. Update upstream branches to make them aware of this new release: 1. Copy and commit the CHANGES.md and RELEASENOTES.md: {{{ - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by MartonElek
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by MartonElek: https://wiki.apache.org/hadoop/HowToRelease?action=diff=98=99 mvn versions:set -DnewVersion=X.Y.Z }}} - Note: Please also also update the hadoop.version property in the root pom.xml (see HADOOP-15369) + Note: Please also also update the hadoop.version property in the root pom.xml and hadoop.assemblies.version in hadoop-project/pom.xml (see HADOOP-15369) + + {{{ + mvn versions:set-property -Dproperty=hadoop.version -DnewVersion=X.Y.Z + + mvn versions:set-property -Dproperty=hadoop.assemblies.version -DnewVersion=X.Y.Z + }}} Now, for any branches in {trunk, branch-X, branch-X.Y, branch-X.Y.Z} that have changed, push them to the remote repo taking care of any conflicts. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by MartonElek
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by MartonElek: https://wiki.apache.org/hadoop/HowToRelease?action=diff=97=98 Comment: Reminder to change hadoop.version mvn versions:set -DnewVersion=X.Y.Z }}} + Note: Please also also update the hadoop.version property in the root pom.xml (see HADOOP-15369) + Now, for any branches in {trunk, branch-X, branch-X.Y, branch-X.Y.Z} that have changed, push them to the remote repo taking care of any conflicts. - + {{{ git push }}} - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by MartonElek
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by MartonElek: https://wiki.apache.org/hadoop/HowToRelease?action=diff=96=97 Comment: Update site generation part. 1. effect the release of artifacts by selecting the staged repository and then clicking {{{Release}}} 1. If there were multiple RCs, simply drop the staging repositories corresponding to failed RCs. 1. Wait 24 hours for release to propagate to mirrors. - 1. Edit the website. + 1. Edit the website (Generic docs about the new website generation can be found [[[https://cwiki.apache.org/confluence/display/HADOOP/How+to+generate+and+push+ASF+web+site+after+HADOOP-14163|here]]) 1. Checkout the website if you haven't already {{{ - svn co https://svn.apache.org/repos/asf/hadoop/common/site/main hadoop-common-site + git clone https://gitbox.apache.org/repos/asf/hadoop-site.git -b asf-site - }}} + }}} - 1. Update the documentation links in {{{author/src/documentation/content/xdocs/site.xml}}}. - 1. Update the release news in {{{author/src/documentation/content/xdocs/releases.xml}}}. - 1. Update the news on the home page {{{author/src/documentation/content/xdocs/index.xml}}}. + 1. [[https://gohugo.io/getting-started/installing/|Install hugo]] if you haven't already ((tldr; apt-get install/pacman -S/brew install hugo)) + 1. Create the new release announcement + {{{ + cat << EOF > src/release/${VERSION}.md + --- + title: Release ${VERSION} available + date: 201X-XX-XX + linked: true + --- + + + This is the first stable release of Apache Hadoop TODO line. It contains TODO bug fixes, improvements and enhancements since TODO. + + Users are encouraged to read the [overview of major changes][1] since TODO. + For details of 435 bug fixes, improvements, and other enhancements since the previous TODO release, + please check [release notes][2] and [changelog][3] + detail the changes since TODO. + + [1]: /docs/r${VERSION}/index.html + [2]: http://hadoop.apache.org/docs/r${VERSION}/hadoop-project-dist/hadoop-common/release/${VERSION}/RELEASENOTES.${VERSION}.html + [3]: http://hadoop.apache.org/docs/r${VERSION}/hadoop-project-dist/hadoop-common/release/${VERSION}/CHANGES.${VERSION}.html + + EOF + }}} + 1. Note: update all the TODO + the date. '''Don't use date from the future''', it won't be rendered. + 1. Remove the {{{linked: true}}} line from the previous release file, eg. from src/release/3.0.0.md. Docs/downloads of the releases with {{{linked:true}}} will be linked from the menu. - 1. Copy the new release docs to svn and update the {{{docs/current}}} link, by doing the following: + 1. add the docs and update the {{{content/docs/current}}} link, by doing the following: {{{ - cd publish/docs + cd content/docs tar xvf /path/to/hadoop-${version}-site.tar.gz # Update current2, current, stable and stable2 as needed. # For example @@ -191, +226 @@ ln -s current2 current }}} 1. Similarly update the symlinks for stable if need be. - 1. Add the documentation changes. + 1. Check the rendering of the new site: {{{hugo serve && firefox http://localhost:1313}}} + 1. Regenerate the site, review it, then commit it per the instructions in HowToCommit. (The generated HTML files also should be committed. Both src and the rendered site are in the same repo.) {{{ + hugo + git add . + git commit + git push - svn add publish/docs/r${version} - }}} - 1. Regenerate the site, review it, then commit it per the instructions in HowToCommit. - {{{ - - - svn commit -m "Updated site for release X.Y.Z." }}} 1. Send announcements to the user and developer lists once the site changes are visible. 1. --(In JIRA, close issues resolved in the release. Disable mail notifications for this bulk change.)-- Recommend '''not''' closing, since it prevents JIRAs from being edited and makes it more difficult to track backports. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "ContributorsGroup" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "ContributorsGroup" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/ContributorsGroup?action=diff=118=119 Comment: remove "Packt Publishing" because of random Python book spam * OtisGospodnetic * OwenOMalley * Pacoffre - * Packt Publishing * PatrickHunt * PatrickKling * Paul Broenen - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Packt Publishing" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Packt Publishing" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/Packt%20Publishing Comment: tell packt publishing they've been locked out New page: Hi, if this is your account I've locked you out from editing for a while, because that last book about Python and tk was clearly not hadoop related I'll reenable you in a month or two, or you can work out my email address and we can discuss what is accepable. thanks. SteveLoughran - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/Books?action=diff=53=54 Comment: cut a random python book out and about to lock down packt for a bit as punishment {{{#!wiki comment/dotted Attention people adding new entries. + # Only reference books about Hadoop and related programs, not random PHP stuff. # Please include publishing date and version of Hadoop the book is relevant to. # Please write this in a neutral voice, not "this book will help you", as that implies that the ASF has opinions on the matter. Someone will just edit the claims out. @@ -15, +16 @@ # Please don't have tracking URLs. We'll only cut them. }}} - === Python GUI programming with Tkinter === - - '''Name:''' [[https://www.amazon.com/dp/1788835883/|Python GUI Programming with Tkinter]] - - '''Author:''' Alan D. Moore - - '''Publisher:''' Packt - - '''Date of Publishing:''' May 2018 - - Find out how to create visually stunning and feature-rich applications by empowering Python's built-in TKinter GUI toolkit - === Modern Big Data Processing with Hadoop === '''Name:''' [[https://www.amazon.com/dp/B0787KY8RH/|Modern Big Data Processing with Hadoop]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "GithubIntegration" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "GithubIntegration" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/GithubIntegration?action=diff=2=3 Comment: Removing content and leaving link to cwiki where new content resides. - = Github Setup and Pull Requests (PRs) = + Content moved to https://cwiki.apache.org/confluence/display/HADOOP/GitHub+Integration - There are several ways to setup Git for committers and contributors. Contributors can safely setup Git any way they choose but committers should take extra care since they can push new commits to the trunk at Apache and various policies there make backing out mistakes problematic. To keep the commit history clean take note of the use of `--squash` below when merging into `apache/trunk`. + Please email common-...@hadoop.apache.org for cwiki access. - == Git setup for Committers == - - This describes setup for one local repo and two remotes. It allows you to push the code on your machine to either your Github repo or to git-wip-us.apache.org. You will want to fork github's apache/hadoop to your own account on github, this will enable Pull Requests of your own. Cloning this fork locally will set up "origin" to point to your remote fork on github as the default remote. So if you perform `git push origin trunk` it will go to github. - - To attach to the apache git repo do the following: - - {{{ - git remote add apache https://git-wip-us.apache.org/repos/asf/hadoop.git - }}} - - To check your remote setup: - - {{{ - git remote -v - }}} - - you should see something like this: - - {{{ - originhttps://github.com/your-github-id/hadoop.git (fetch) - originhttps://github.com/your-github-id/hadoop.git (push) - apachehttps://git-wip-us.apache.org/repos/asf/hadoop.git (fetch) - apachehttps://git-wip-us.apache.org/repos/asf/hadoop.git (push) - }}} - - Now if you want to experiment with a branch everything, by default, points to your github account because `origin` is the. You can work as normal using only github until you are ready to merge with the apache remote. Some conventions will integrate with Apache Jira ticket numbers. - - {{{ - git checkout -b feature/hadoop- # typically is a Jira ticket number - #do some work on the branch - git commit -a -m "doing some work" - git push origin feature/hadoop- # notice pushing to **origin** not **apache** - }}} - - Once you are ready to commit to the apache remote you can merge and push them directly or better yet create a PR. - - We recommend creating new branches under `feature/` to help group ongoing work, especially now that as of November 2015, forced updates are disabled on ASF branches. We hope to reinstate that ability on feature branches to aid development. - - == How to create a PR (committers) == - - Push your branch to Github: - - {{{ - git checkout `feature/hadoop-` - git rebase apache/trunk # to make it apply to the current trunk - git push origin `feature/hadoop-` - }}} - - 1. Go to your `feature/hadoop-` branch on Github. Since you forked it from Github's `apache/hadoop` it will default any PR to go to `apache/trunk`. - 1. Click the green "Compare, review, and create pull request" button. - 1. You can edit the to and from for the PR if it isn't correct. The "base fork" should be `apache/hadoop` unless you are collaborating separately with one of the committers on the list. The "base" will be trunk. Don't submit a PR to one of the other branches unless you know what you are doing. The "head fork" will be your forked repo and the "compare" will be your `feature/hadoop-` branch. - 1. Click the "Create pull request" button and name the request "HADOOP-" all caps. This will connect the comments of the PR to the mailing list and Jira comments. - From now on the PR lives on github's `apache/hadoop` repository. You use the commenting UI there. - - If you are looking for a review or sharing with someone else say so in the comments but don't worry about automated merging of your PR —you will have to do that later. The PR is tied to your branch so you can respond to comments, make fixes, and commit them from your local repo. They will appear on the PR page and be mirrored to Jira and the mailing list. - When you are satisfied and want to push it to Apache's remote repo proceed with Merging a PR - - == How to create a PR (contributors) == - - Create pull requests: [[https://help.github.com/articles/creating-a-pull-request|GitHub PR docs]]. - - Pull requests are made to apache/hadoop repository on Github. In the Github UI you should pick the trunk branch to target the PR as described for committers. This will be reviewed and commented on so the merge is not automatic. This can be used for discussing a contributions in progress. - - == Merging a PR (yours or contributors) == - - Start with reading -
[Hadoop Wiki] Update of "GitAndHadoop" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "GitAndHadoop" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/GitAndHadoop?action=diff=25=26 Comment: Removing content and leaving link to cwiki where new content resides. - = Git And Hadoop = + Content moved to https://cwiki.apache.org/confluence/display/HADOOP/Git+And+Hadoop - A lot of people use Git with Hadoop because they have their own patches to make to Hadoop, and Git helps them manage it. + Please email common-...@hadoop.apache.org for cwiki access. - * GitHub provide some good lessons on git at [[http://learn.github.com]] - * Apache serves up read-only Git versions of their source at [[http://git.apache.org/]]. Committers can commit changes to writable Git repository. See HowToCommit - - This page tells you how to work with Git. See HowToContribute for instructions on building and testing Hadoop. - <> - - - == Key Git Concepts == - The key concepts of Git. - - * Git doesn't store changes, it snapshots the entire source tree. Good for fast switch and rollback, bad for binaries. (as an enhancement, if a file hasn't changed, it doesn't re-replicate it). - * Git stores all "events" as SHA1 checksummed objects; you have deltas, tags and commits, where a commit describes the status of items in the tree. - * Git is very branch centric; you work in your own branch off local or central repositories - * You had better enjoy merging. - - - == Checking out the source == - - You need a copy of git on your system. Some IDEs ship with Git support; this page assumes you are using the command line. - - Clone a local Git repository from the Apache repository. The Hadoop subprojects (common, HDFS, and MapReduce) live inside a combined repository called `hadoop.git`. - - {{{ - git clone git://git.apache.org/hadoop.git - }}} - - '''Committers:''' for read/write access use - {{{ - https://git-wip-us.apache.org/repos/asf/hadoop.git - }}} - - The total download is a few hundred MB, so the initial checkout process works best when the network is fast. Once downloaded, Git works offline -though you will need to perform your initial builds online so that the build tools can download dependencies. - - == Grafts for complete project history == - - The Hadoop project has undergone some movement in where its component parts have been versioned. Because of that, commands like `git log --follow` needs to have a little help. To graft the history back together into a coherent whole, insert the following contents into `hadoop/.git/info/grafts`: - - {{{ - # Project split - 5128a9a453d64bfe1ed978cf9ffed27985eeef36 6c16dc8cf2b28818c852e95302920a278d07ad0c - 6a3ac690e493c7da45bbf2ae2054768c427fd0e1 6c16dc8cf2b28818c852e95302920a278d07ad0c - 546d96754ffee3142bcbbf4563c624c053d0ed0d 6c16dc8cf2b28818c852e95302920a278d07ad0c - # Project un-split in new writable git repo - a196766ea07775f18ded69bd9e8d239f8cfd3ccc 928d485e2743115fe37f9d123ce9a635c5afb91a - cd66945f62635f589ff93468e94c0039684a8b6d 77f628ff5925c25ba2ee4ce14590789eb2e7b85b - }}} - - You can then use commands like `git blame --follow` with success. - - == Forking onto GitHub == - - You can create your own fork of the ASF project. This is required if you want to contribute patches by submitting pull requests. However you can choose to skip this step and attach patch files directly on Apache Jiras. - - 1. Create a GitHub login at http://github.com/ ; Add your public SSH keys - 1. Go to https://github.com/apache/hadoop/ - 1. Click fork in the github UI. This gives you your own repository URL. - 1. In the existing clone, add the new repository: - {{{git remote add -f github g...@github.com:MYUSERNAMEHERE/hadoop.git}}} - - This gives you a local repository with two remote repositories: {{{origin}}} and {{{github}}}. {{{origin}}} has the Apache branches, which you can update whenever you want to get the latest ASF version: - - {{{ - git checkout -b trunk origin/trunk - git pull origin - }}} - - Your own branches can be merged with trunk, and pushed out to GitHub. To generate patches for attaching to Apache JIRAs, check everything in to your specific branch, merge that with (a recently pulled) trunk, then diff the two: - {{{ git diff trunk > ../hadoop-patches/HADOOP-XYX.patch }}} - - - == Branching == - - Git makes it easy to branch. The recommended process for working with Apache projects is: one branch per JIRA issue. That makes it easy to isolate development and track the development of each change. It does mean if you have your own branch that you release, one that merges in more than one issue, you have to invest some effort in merging everything in. Try not to make changes in different branches that are hard to merge, and learn your way round the git rebase command to handle changes across branches. Better yet: do not use rebase once you have created a chain of
[Hadoop Wiki] Update of "HowToContribute" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToContribute" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/HowToContribute?action=diff=118=119 Comment: Removing content and leaving link to cwiki where new content resides. = How to Contribute to Hadoop = - This page describes the mechanics of ''how'' to contribute software to Apache Hadoop. For ideas about ''what'' you might contribute, please see the ProjectSuggestions page. + Content moved to Confluence - https://cwiki.apache.org/confluence/display/HADOOP/HowToContribute - <> + Email common-...@hadoop.apache.org if you need write access to the cwiki. - == Dev Environment Setup == - Here are some things you will need to build and test Hadoop. Be prepared to invest some time to set up a working Hadoop dev environment. Try getting the project to build and test locally first before you start writing code. - - === Get the source code === - First of all, you need the Hadoop source code. The official location for Hadoop is the Apache Git repository. See GitAndHadoop - - === Read BUILDING.txt === - Once you have the source code, we strongly recommend reading BUILDING.txt located in the root of the source tree. It has up to date information on how to build Hadoop on various platforms along with some workarounds for platform-specific quirks. The latest [[https://git-wip-us.apache.org/repos/asf?p=hadoop.git;a=blob;f=BUILDING.txt|BUILDING.txt]] for the current trunk can also be viewed on the web. - - - === Integrated Development Environment (IDE) === - You are free to use whatever IDE you prefer or your favorite text editor. Note that: - * Building and testing is often done on the command line or at least via the Maven support in the IDEs. - * Set up the IDE to follow the source layout rules of the project. - * Disable any added value "reformat" and "strip trailing spaces" features as it can create extra noise when reviewing patches. - - === Build Tools === - * A Java Development Kit. The Hadoop developers recommend [[http://java.com/|Oracle Java 8]]. You may also use [[http://openjdk.java.net/|OpenJDK]]. - * Google Protocol Buffers. Check out the ProtocolBuffers guide for help installing protobuf. - * [[http://maven.apache.org/|Apache Maven]] version 3 or later (for Hadoop 0.23+) - * The Java API javadocs. - Ensure these are installed by executing {{{mvn}}}, {{{git}}} and {{{javac}}} respectively. - - As the Hadoop builds use the external Maven repository to download artifacts, Maven needs to be set up with the proxy settings needed to make external HTTP requests. The first build of every Hadoop project needs internet connectivity to download Maven dependencies. - 1. Be online for that first build, on a good network - 1. To set the Maven proxy setttings, see http://maven.apache.org/guides/mini/guide-proxies.html - 1. Because Maven doesn't pass proxy settings down to the Ant tasks it runs [[https://issues.apache.org/jira/browse/HDFS-2381|HDFS-2381]] some parts of the Hadoop build may fail. The fix for this is to pass down the Ant proxy settings in the build Unix: {{{mvn $ANT_OPTS}}}; Windows: {{{mvn %ANT_OPTS%}}}. - 1. Tomcat is always downloaded, even when building offline. Setting {{{-Dtomcat.download.url}}} to a local copy and {{{-Dtomcat.version}}} to the version pointed to by the URL will avoid that download. - - - === Native libraries === - On Linux, you need the tools to create the native libraries: LZO headers,zlib headers, gcc, OpenSSL headers, cmake, protobuf dev tools, and libtool, and the GNU autotools (automake, autoconf, etc). - - For RHEL (and hence also CentOS): - {{{ - yum -y install lzo-devel zlib-devel gcc gcc-c++ autoconf automake libtool openssl-devel fuse-devel cmake - }}} - - For Debian and Ubuntu: - {{{ - apt-get -y install maven build-essential autoconf automake libtool cmake zlib1g-dev pkg-config libssl-dev libfuse-dev - }}} - - Native libraries are mandatory for Windows. For instructions see Hadoop2OnWindows. - - === Hardware Setup === - * Lots of RAM, especially if you are using a modern IDE. ECC RAM is recommended in large-RAM systems. - * Disk Space. Always handy. - * Network Connectivity. Hadoop tests are not guaranteed to all work if a machine does not have a network connection -and especially if it does not know its own name. - * Keep your computer's clock up to date via an NTP server, and set up the time zone correctly. This is good for avoiding change-log confusion. - - == Making Changes == - Before you start, send a message to the [[http://hadoop.apache.org/core/mailing_lists.html|Hadoop developer mailing list]], or file a bug report in [[Jira]]. Describe your proposed changes and check that they fit in with what others are doing and have planned for the project. Be patient, it may take folks a while to understand your requirements. If you want to
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=52=53 == Hadoop Videos == + === Hands-On Big Data Analysis with Hadoop 3 (Video) === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hands-big-data-analysis-hadoop-3-video|Hands-On Big Data Analysis with Hadoop 3 (Video)]] + + '''Author:''' Tomasz Lelek + + '''Publisher:''' Packt + + '''Date of Publishing:''' August 2018 + + Perform real-time data analytics with Hadoop + + === Hands-On Beginner’s Guide on Big Data and Hadoop 3 (Video) === '''Name:''' [[https://www.packtpub.com/application-development/hands-beginner%E2%80%99s-guide-big-data-and-hadoop-3-video|Hands-On Beginner’s Guide on Big Data and Hadoop 3 (Video)]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=51=52 == Hadoop Videos == + === Hands-On Beginner’s Guide on Big Data and Hadoop 3 (Video) === + + '''Name:''' [[https://www.packtpub.com/application-development/hands-beginner%E2%80%99s-guide-big-data-and-hadoop-3-video|Hands-On Beginner’s Guide on Big Data and Hadoop 3 (Video)]] + + '''Author:''' Milind Jagre + + '''Publisher:''' Packt + + '''Date of Publishing:''' July 2018 + + Effectively store, manage, and analyze large Datasets with HDFS, SQOOP, YARN, and MapReduce + + === Hadoop Administration and Cluster Management (Video) === '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hadoop-administration-and-cluster-management-video|Hadoop Administration and Cluster Management (Video)]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "AmazonS3" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "AmazonS3" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/AmazonS3?action=diff=23=24 Comment: purge down to the minimum, point people at troubleshooting, tell them not to mix JARs. = S3 Support in Apache Hadoop = + Apache Hadoop ships with a connector to S3 called "S3A", with the url prefix "s3a:"; its previous connectors "s3", and "s3n" are deprecated and/or deleted from recent Hadoop versions. - [[http://aws.amazon.com/s3|Amazon S3]] (Simple Storage Service) is a data storage service. You are billed - monthly for storage and data transfer. Transfer between S3 and [[AmazonEC2]] instances in the same geographical location are free. Most importantly, the data is preserved when a transient Hadoop cluster is shut down - This makes use of S3 common in Hadoop clusters on EC2. It is also used sometimes for backing up remote cluster. - - Hadoop provides multiple filesystem clients for reading and writing to and from Amazon S3 or compatible service. + 1. Consult the [[http://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html|Latest Hadoop documentation]] for the specifics on using any the S3A connector. + 1. For Hadoop 2.x releases, the latest [[https://github.com/apache/hadoop/blob/branch-2/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/index.md|troubleshooting documentation]]. + 1. For Hadoop 3.x releases, the latest [[https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/troubleshooting_s3a.md|troubleshooting documentation]]. - === Recommended: S3A (URI scheme: s3a://) - Hadoop 2.7+ === + == S3 Support in Amazon EMR == + Amazon's EMR Service is based upon Apache Hadoop, but contains modifications and their own closed-source S3 client. Consult [[http://docs.aws.amazon.com/ElasticMapReduce/latest/ManagementGuide/emr-plan-file-systems.html|Amazon's documentation on this]]. + Only Amazon can provide support and/or field bug reports related to their S3 support. - '''S3A is the recommended S3 Client for Hadoop 2.7 and later''' - - A successor to the S3 Native, s3n:// filesystem, the S3a: system uses Amazon's libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema. - - S3A has been usable in production since Hadoop 2.7, and is undergoing active maintenance for enhanced security, scalability and performance. - - History - - 1. Hadoop 2.6: Initial Implementation: [[https://issues.apache.org/jira/browse/HADOOP-10400|HADOOP-10400]] - 2. Hadoop 2.7: Production Ready: [[https://issues.apache.org/jira/browse/HADOOP-11571|HADOOP-11571]] - 3. Hadoop 2.8: Performance, robustness and security [[https://issues.apache.org/jira/browse/HADOOP-11694|HADOOP-11694]] - 4. Hadoop 2.9: Even more features: [[https://issues.apache.org/jira/browse/HADOOP-13204|HADOOP-13204]] - - July 2016: For details of ongoing work on S3a, consult [[www.slideshare.net/HadoopSummit/hadoop-cloud-storage-object-store-integration-in-production|Hadoop & Cloud Storage: Object Store Integration in Production]] - - '''important:''' S3A requires the exact version of the amazon-aws-sdk against which Hadoop was built (and is bundled with). If you try to upgrade the library by dropping in a later version, things will break. - === Unmainteained: S3N FileSystem (URI scheme: s3n://) === + == Important: Classpath setup == + 1. The S3A connector is implemented in the hadoop-aws JAR. If it is not on the classpath: stack trace. + 1. Do not attempt to mix a "hadoop-aws" version with other hadoop artifacts from different versions. They must be from exactly the same release. Otherwise: stack trace. + 1. The S3A connector is depends on AWS SDK JARs. If they are not on the classpath: stack trace. + 1. Do not attempt to use an amazon S3 SDK JAR different from the one which the hadoop version was built with. Otherwise: stack trace highly likely. + 1. The normative list of dependencies of a specific version of the hadoop-aws JAR are stored in Maven, which can be viewed on [[http://mvnrepository.com/artifact/org.apache.hadoop/hadoop-aws|mvnrepsitory]]. - '''S3N is the S3 Client for Hadoop 2.6 and earlier. From Hadoop 2.7+, switch to s3a''' - - A native filesystem for reading and writing regular files on S3.With this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The S3N code is stable and widely used, but is not adding any new features (which is why it remains stable). - - S3N requires a compatible version of the jets3t
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=50=51 == Hadoop Videos == + === Hadoop Administration and Cluster Management (Video) === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hadoop-administration-and-cluster-management-video|Hadoop Administration and Cluster Management (Video)]] + + '''Author:''' Gurmukh Singh + + '''Publisher:''' Packt + + '''Date of Publishing:''' May 2018 + + Planning, deploying, managing, monitoring and performance-tuning your Hadoop cluster with Apache Hadoop + + === Solving 10 Hadoop'able Problems (Video) === '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/solving-10-hadoopable-problems-video|Solving 10 Hadoop'able Problems (Video)]] @@ -487, +500 @@ Need solutions to your big data problems? Here are 10 real-world projects demonstrating problems solved using Hadoop === Learn By Example: Hadoop, MapReduce for Big Data problems (Video) === - + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/learn-example-hadoop-mapreduce-big-data-problems-video|Learn By Example: Hadoop, MapReduce for Big Data problems (Video)]] '''Author:''' Loonycorn - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=49=50 # Please don't have tracking URLs. We'll only cut them. }}} + === Python GUI programming with Tkinter === + + '''Name:''' [[https://www.amazon.com/dp/1788835883/|Python GUI Programming with Tkinter]] + + '''Author:''' Alan D. Moore + + '''Publisher:''' Packt + + '''Date of Publishing:''' May 2018 + + Find out how to create visually stunning and feature-rich applications by empowering Python's built-in TKinter GUI toolkit + === Modern Big Data Processing with Hadoop === '''Name:''' [[https://www.amazon.com/dp/B0787KY8RH/|Modern Big Data Processing with Hadoop]] @@ -24, +36 @@ '''Publisher:''' Packt '''Date of Publishing:''' March 2018 + + A comprehensive guide to design, build and execute effective Big Data strategies using Hadoop === Deep Learning with Hadoop === - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToReleasePreDSBCR" by KonstantinShvachko
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToReleasePreDSBCR" page has been changed by KonstantinShvachko: https://wiki.apache.org/hadoop/HowToReleasePreDSBCR?action=diff=89=90 Comment: Change the deploy command so that it uploads source and javadoc artifacts on Nexus. 1. --(Use [[https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder|this Jenkins job]] to create the final release files)-- Create final release files {{{ - mvn clean package -Psrc -Pdist -Pnative -Dtar -DskipTests + mvn clean deploy -Psign,src,dist,native -Dtar -DskipTests - mvn deploy -Psign -DskipTests mvn site site:stage -DskipTests }}} + 1. Make sure that on [[https://repository.apache.org|Nexus]] all artifacts have corresponding sources and javaDoc jars. 1. Copy release files to the distribution directory 1. Check out the corresponding svn repo if need be {{{ - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=48=49 === Modern Big Data Processing with Hadoop === - '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/modern-big-data-processing-hadoop|Modern Big Data Processing with Hadoop]] + '''Name:''' [[https://www.amazon.com/dp/B0787KY8RH/|Modern Big Data Processing with Hadoop]] '''Author:''' V. Naresh Kumar, Prashant Shindgikar - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=47=48 # Please don't have tracking URLs. We'll only cut them. }}} + === Modern Big Data Processing with Hadoop === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/modern-big-data-processing-hadoop|Modern Big Data Processing with Hadoop]] + + '''Author:''' V. Naresh Kumar, Prashant Shindgikar + + '''Publisher:''' Packt + + '''Date of Publishing:''' March 2018 === Deep Learning with Hadoop === - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "PoweredBy" by XingWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "PoweredBy" page has been changed by XingWang: https://wiki.apache.org/hadoop/PoweredBy?action=diff=444=445 Comment: added Moesif.com. * ''Automatic PDF creation & IR '' * ''2 node cluster (Windows Vista/CYGWIN, & CentOS) for developing MapReduce programs. '' + * ''[[https://www.moesif.com/|Moesif API Insights]] '' + * ''We use Hadoop for ETL and processing time series event data for alerts/notifications along with visualizations for frontend.'' + * ''2 master nodes and 6 data nodes running on Azure using HDInsight'' + * ''[[http://www.mylife.com/|MyLife]] '' * ''18 node cluster (Quad-Core AMD Opteron 2347, 1TB/node storage) '' * ''Powers data for search and aggregation '' - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=46=47 == Hadoop Videos == + + === Solving 10 Hadoop'able Problems (Video) === + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/solving-10-hadoopable-problems-video|Solving 10 Hadoop'able Problems (Video)]] + + '''Author:''' Tomasz Lelek + + '''Publisher:''' Packt + + '''Date of Publishing:''' February 2018 + + Need solutions to your big data problems? Here are 10 real-world projects demonstrating problems solved using Hadoop + === Learn By Example: Hadoop, MapReduce for Big Data problems (Video) === - + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/learn-example-hadoop-mapreduce-big-data-problems-video|Learn By Example: Hadoop, MapReduce for Big Data problems (Video)]] '''Author:''' Loonycorn - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=45=46 == Hadoop Videos == + === Learn By Example: Hadoop, MapReduce for Big Data problems (Video) === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/learn-example-hadoop-mapreduce-big-data-problems-video|Learn By Example: Hadoop, MapReduce for Big Data problems (Video)]] + + '''Author:''' Loonycorn + + '''Publisher:''' Packt + + '''Date of Publishing:''' Jan 2018 + + A hands-on workout in Hadoop, MapReduce and the art of thinking "parallel" === The Ultimate Hands-on Hadoop (Video) === - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/Books?action=diff=44=45 Comment: revert. That's Hbase, not hadoop. And I'm thinking we should cut all videos out from here == Hadoop Videos == - === Learn by Example : HBase - The Hadoop Database (Video) === - '''Name:''' [[https://www.packtpub.com/application-development/learn-example-hbase-hadoop-database-video|Learn by Example : HBase - The Hadoop Database (Video)]] - - '''Author:''' Loonycorn - - '''Publisher:''' Packt - - '''Date of Publishing:''' December 2017 - - 25 solved examples to get you up to speed with HBase === The Ultimate Hands-on Hadoop (Video) === - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=43=44 == Hadoop Videos == + === Learn by Example : HBase - The Hadoop Database (Video) === + + '''Name:''' [[https://www.packtpub.com/application-development/learn-example-hbase-hadoop-database-video|Learn by Example : HBase - The Hadoop Database (Video)]] + + '''Author:''' Loonycorn + + '''Publisher:''' Packt + + '''Date of Publishing:''' December 2017 + + 25 solved examples to get you up to speed with HBase + === The Ultimate Hands-on Hadoop (Video) === '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/ultimate-hands-hadoop-video | The Ultimate Hands-on Hadoop (Video)]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToReleasePreDSBCR" by KonstantinShvachko
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToReleasePreDSBCR" page has been changed by KonstantinShvachko: https://wiki.apache.org/hadoop/HowToReleasePreDSBCR?action=diff=88=89 1. Update the symlinks to current2 and stable2. The release directory usually contains just two releases, the most recent from two branches. 1. Commit the changes (it requires a PMC privilege) {{{ + svn add hadoop-${version} svn ci -m "Publishing the bits for release ${version}" }}} 1. In [[https://repository.apache.org|Nexus]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToReleasePreDSBCR" by KonstantinShvachko
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToReleasePreDSBCR" page has been changed by KonstantinShvachko: https://wiki.apache.org/hadoop/HowToReleasePreDSBCR?action=diff=87=88 git tag -s rel/release-X.Y.Z -m "Hadoop X.Y.Z release" git push origin rel/release-X.Y.Z }}} - 1. Use [[https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder|this Jenkins job]] to create the final release files + 1. --(Use [[https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder|this Jenkins job]] to create the final release files)-- + Create final release files + {{{ + mvn clean package -Psrc -Pdist -Pnative -Dtar -DskipTests + mvn deploy -Psign -DskipTests + mvn site site:stage -DskipTests + }}} 1. Copy release files to the distribution directory 1. Check out the corresponding svn repo if need be {{{ - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToCommit" by EricYang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToCommit" page has been changed by EricYang: https://wiki.apache.org/hadoop/HowToCommit?action=diff=39=40 1. Set the assignee if it is not set. If you cannot set the contributor to the assignee, you need to add the contributor into Contributors role in the project. Please see [[#Roles|Adding Contributors role]] for the detail. This How-to-commit [[http://www.youtube.com/watch?v=txW3m7qWdzw=youtu.be|video]] has guidance on the commit process, albeit using svn. Most of the process is still the same, except that we now use git instead. + + Merging a feature branch + When merging a feature branch to trunk, use no fast forward option to provide a single commit to digest history of the feature branch. Commit history of feature branch will remain in feature branch. + {{{ + # Start a new feature + git checkout -b new-feature trunk + # Edit some files + git add + git commit -m "Start a feature" + # Edit some files + git add + git commit -m "Finish a feature" + # Merge in the new-feature branch + git checkout trunk + git merge --no-ff new-feature + }}} <> Committing Documentation - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToReleasePreDSBCR" by KonstantinShvachko
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToReleasePreDSBCR" page has been changed by KonstantinShvachko: https://wiki.apache.org/hadoop/HowToReleasePreDSBCR?action=diff=86=87 }}} + 1. Verify that CHANGES.txt reflect all relevant commits since the previous release. Add and commit missing ones to CHANGES.txt. = Branching = When releasing Hadoop X.Y.Z, the following branching changes are required. Note that a release can match more than one of the following if-conditions. For a major release, one needs to make the changes for minor and point releases as well. Similarly, a new minor release is also a new point release. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "PoweredBy" by DavidTing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "PoweredBy" page has been changed by DavidTing: https://wiki.apache.org/hadoop/PoweredBy?action=diff=443=444 * ''Data mining '' * ''Machine learning '' + * ''[[https://fquotes.com/|FQuotes]] '' + * ''We use Hadoop for analyzing quotes, quote authors and quote topics. '' + * ''[[http://freestylers.jp/|Freestylers]] - Image retrieval engine '' * ''We, the Japanese company Freestylers, use Hadoop to build the image processing environment for image-based product recommendation system mainly on Amazon EC2, from April 2009. '' * ''Our Apache Hadoop environment produces the original database for fast access from our web application. '' - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Ozone" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Ozone" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/Ozone?action=diff=5=6 - The ''Ozone Quick Start Guide'' has moved to the [[https://cwiki.apache.org/confluence/display/HADOOP/Ozone|Apache Confluence wiki]]. + The [[https://cwiki.apache.org/confluence/display/HADOOP/Ozone|Ozone Quick Start Guide]] has moved to the ''Apache Confluence wiki''. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Ozone" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Ozone" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/Ozone?action=diff=4=5 - <> + The ''Ozone Quick Start Guide'' has moved to the [[https://cwiki.apache.org/confluence/display/HADOOP/Ozone|Apache Confluence wiki]]. - = Introduction = - Ozone is an Object Store for Hadoop that is currently under development. See the Ozone Apache Jira [[https://issues.apache.org/jira/browse/HDFS-7240|HDFS-7240]] for more details. Ozone is currently in a prototype phase. - - This wiki page is intended as a guide for Ozone contributors. - - = Compiling Ozone = - Setup your development environment if you haven't done so already ([[https://wiki.apache.org/hadoop/HowToContribute|Instructions here]]). Switch to the HDFS-7240 branch, apply the in-progress patch for [[https://issues.apache.org/jira/browse/HDFS-10363|HDFS-10363]] and build a Hadoop distribution as usual. - - = Configuration = - Create a new ozone-site.xml file in your Hadoop configuration directory and add the following settings for a bare minimal configuration. - - {{{ - - - ozone.enabled - true - - - - ozone.handler.type - local - - - - ozone.scm.client.address - 127.0.0.1:9860 - - - }}} - - The default client port is 9860 and the default service port is 9861. These ports are used by clients and DataNodes respectively to connect to the StorageContainerManager service. - - These port numbers can be changed with the `ozone.scm.client.address` and `ozone.scm.datanode.address` settings respectively. - - = Starting Services = - Format the HDFS NameNode and start the NameNode and DataNode services as usual. Then stop the NameNode and start the Ozone StorageContainerManager using the shell command - {{{ - $ hdfs --daemon start scm - }}} - - The requirement to first start then stop the NameNode will be fixed soon. - - = Performing Ozone REST operations = - [[https://issues.apache.org/jira/secure/attachment/12799549/ozone_user_v0.pdf|Ozone REST API specification]] - - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=95=96 svn commit -m "Updated site for release X.Y.Z." }}} 1. Send announcements to the user and developer lists once the site changes are visible. - 1. In JIRA, close issues resolved in the release. Disable mail notifications for this bulk change. + 1. --(In JIRA, close issues resolved in the release. Disable mail notifications for this bulk change.)-- Recommend '''not''' closing, since it prevents JIRAs from being edited and makes it more difficult to track backports. = See Also = * [[http://www.apache.org/dev/release.html|Apache Releases FAQ]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=94=95 Comment: More detailed instructions on how to bulk update JIRA versions = Preparation = 1. If you have not already done so, [[http://www.apache.org/dev/release-signing.html#keys-policy|append your code signing key]] to the [[https://dist.apache.org/repos/dist/release/hadoop/common/KEYS|KEYS]] file. Once you commit your changes, they will automatically be propagated to the website. Also [[http://www.apache.org/dev/release-signing.html#keys-policy|upload your key to a public key server]] if you haven't. End users use the KEYS file (along with the [[http://www.apache.org/dev/release-signing.html#web-of-trust|web of trust]]) to validate that releases were done by an Apache committer. For more details on signing releases, see [[http://www.apache.org/dev/release-signing.html|Signing Releases]] and [[http://www.apache.org/dev/mirror-step-by-step.html?Step-By-Step|Step-By-Step Guide to Mirroring Releases]]. - 1. Bulk update JIRA to unassign from this release all issues that are open non-blockers + 1. Bulk update JIRA to unassign from this release all issues that are open non-blockers. This is involved since you can only bulk change issues within the same project, so minimally requires four bulk changes for each of HADOOP, HDFS, MAPREDUCE, and YARN. Editing the "Target Version/s" field is also a blind write, so you need to be careful not to lose any other fix versions that are set. For updating 3.0.0-beta1 to 3.0.0, the process looked like this: + 1. Start with this query: + {{{ + project in (HADOOP, HDFS, YARN, MAPREDUCE) AND "Target Version/s" = 3.0.0-beta1 and statusCategory != Done + }}} + 1. Filter this list down until it's only issues with a Target Version of just "3.0.0-beta1". My query ended up looking like: + {{{ + project in (HADOOP, HDFS, YARN, MAPREDUCE) AND "Target Version/s" = 3.0.0-beta1 and "Target Versions/" not in (2.9.0, 2.8.3, 2.8.2) AND statusCategory != Done + }}} + 1. Do the bulk update for each project individually to set the target version to 3.0.0. + 1. Check the query for the next most common set of target versions and again filter it down: + {{{ + project in (HADOOP, HDFS, YARN, MAPREDUCE) AND "Target Version/s" = 3.0.0-beta1 and "Target Version/s" = 2.9.0 and statusCategory != Done + project in (HADOOP, HDFS, YARN, MAPREDUCE) AND "Target Version/s" = 3.0.0-beta1 and "Target Version/s" = 2.9.0 and "Target Version/s" not in (2.8.2, 2.8.3) and statusCategory != Done + }}} + 1. Do the bulk update for each project individually to set the target version field to (3.0.0, 2.9.0). + 1. Return to the original query. If there aren't too many, update the remaining straggler issues by hand (faster than doing the bulk edits): + {{{ + project in (HADOOP, HDFS, YARN, MAPREDUCE) AND "Target Version/s" = 3.0.0-beta1 and statusCategory != Done + }}} + 1. Send follow-up notification to the developer list that this was done. 1. To deploy artifacts to the Apache Maven repository create {{{~/.m2/settings.xml}}}: {{{ - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=42=43 Comment: New video course added == Hadoop Videos == + === The Ultimate Hands-on Hadoop (Video) === + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/ultimate-hands-hadoop-video | The Ultimate Hands-on Hadoop (Video)]] + '''Author:''' Frank Kane + + '''Publisher:''' Packt + + '''Date of Publishing:''' June 2017 + + Design distributed systems that manage Big Data using Hadoop and related technologies. + === Getting Started with Hadoop 2.x (Video) === '''Name:''' [[https://www.packtpub.com/networking-and-servers/getting-started-hadoop-2x-video|Getting Started with Hadoop 2.x (Video)]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=93=94 Comment: Fix skipShade profile for deploy step 1. Push branch-X.Y.Z and the newly created tag to the remote repo. 1. Deploy the maven artifacts, on your personal computer. Please be sure you have completed the prerequisite step of preparing the {{{settings.xml}}} file before the deployment. You might want to do this in private and clear your history file as your gpg-passphrase is in clear text. {{{ - mvn deploy -Psign -DskipTests -DskipShading + mvn deploy -Psign -DskipTests -DskipShade }}} 1. Copy release files to a public place and ensure they are readable. Note that {{{home.apache.org}}} only supports SFTP, so this may be easier with a graphical SFTP client like Nautilus, Konqueror, etc. {{{ - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=41=42 Comment: Added new video '''Publisher:''' Manning '''Date of Publishing (est.):''' October 2015 - - + + == Hadoop Videos == + === Getting Started with Hadoop 2.x (Video) === + + '''Name:''' [[https://www.packtpub.com/networking-and-servers/getting-started-hadoop-2x-video|Getting Started with Hadoop 2.x (Video)]] + + '''Author:''' A K M Zahiduzzaman + + '''Publisher:''' Packt + + '''Date of Publishing:''' April 30, 2017 + + Build a strong foundation by exploring Hadoop ecosystem with real-world examples. + === Taming Big Data with MapReduce and Hadoop - Hands On! (Video) === '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/taming-big-data-mapreduce-and-hadoop-hands-video|Taming Big Data with MapReduce and Hadoop - Hands On! (Video)]] @@ -463, +475 @@ '''Date of Publishing:''' September 12, 2016 Master the art of processing Big Data using Hadoop and MapReduce with the help of real-world examples. - + + Hadoop in Action introduces the subject and shows how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "S3ABadRequest" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "S3ABadRequest" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/S3ABadRequest Comment: Initial set of Bad Request causes New page: a5a4867f3b HADOOP-14120 = Troubleshooting S3A Bad Request Errors = The S3A client can see the error message "Bad Request" for many reasons —it is the standard response from Amazon S3 if it could not satisfy the request *for any reason*. The main issues are covered in the [[http://hadoop.apache.org/docs/current//hadoop-aws/tools/hadoop-aws/index.html#Troubleshooting_S3A|Troubleshooting S3A]] section of the hadoop-aws module's documentation. == Common Causes of Bad Request Error Messages == === Credentials === * Your credentials are wrong. * Somehow the credentials have not been set properly before the S3A Filesystem instance was created. As a single instance per bucket is created per-JVM, the first configuration used to connect to a bucket is the one used thereafter. * You've been trying to set the credentials in the URI, but got the URL-escaping wrong. Stop trying to do that, it's a security disaster. Embrace per-bucket configuration. * You are trying to use per-bucket configuration for the credentials, but got the bucket name wrong there. * You are using session credentials, and the session has expired. === Endpoints === * You are trying to use a V4 auth endpoint without declaring the endpoint of that region in the {{{fs.s3a.endpoint}}}. * You are trying to use a V3 auth endpoint but have set up S3 to use an explicit V4 auth endpoint. As they do not redirect to the central endpoint, you must declare the relevant endpoint explicitly. * You are trying to use a private S3 service but have forgotten to set the {{{fs.s3a.endpoint}}}; AWS is rejecting your private login. * You are trying to talk to a private S3 service but somehow it is talking to an HTTP page rather than an implementation of the S3 REST API. === Encryption === * You are trying to use SSE-C with a key that cannot decrypt the remote data. * You are trying to work with a bucket which is configured to require encryption, but the client doesn't use it. === Classpath === * A version of Joda-time incompatible with the JVM is on the classpath. It must be version 2.9.1 or later. === System === * The client machine doesn't know when it is. Check the clock and the timezone settings. * Your DNS setup is returning the wrong IP address for the endpoint. * Your network is a mess. As you can see, there is a wide variety of possible causes, spread across: credential setup, endpoint configuration, system configuration and other aspects of the S3A client. We are hampered in helping diagnose this by the need to keep those credentials secret. == Logging at lower levels == The AWS SDK and the Apache HTTP components can be configured to log at more detail, as can S3A itself. {{{ log4j.logger.org.apache.hadoop.fs.s3a=DEBUG log4j.logger.com.amazonaws.request=DEBUG log4j.logger.org.apache.http=DEBUG log4j.logger.org.apache.http.wire=ERROR }}} Be aware that logging HTTP headers may leak sensitive AWS account information, so the output should not be shared. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "PoweredBy" by RemySaissy
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "PoweredBy" page has been changed by RemySaissy: https://wiki.apache.org/hadoop/PoweredBy?action=diff=440=441 Comment: Criteo company description updated. * ''[[http://criteo.com|Criteo]] - Criteo is a global leader in online performance advertising '' * ''[[http://labs.criteo.com/blog|Criteo R]] uses Hadoop as a consolidated platform for storage, analytics and back-end processing, including Machine Learning algorithms '' - * ''We currently have a dedicated cluster of 1117 nodes, 39PB storage, 75TB RAM, 22000 cores running full steam 24/7, and growing by the day '' - * ''Each node has 24 HT cores, 96GB RAM, 42TB HDD '' - * ''Hardware and platform management is done through [[http://www.getchef.com/|Chef]], we run YARN '' - * ''We run a mix of ad-hoc Hive queries for BI, [[http://www.cascading.org/|Cascading]] jobs, raw mapreduce jobs, and streaming [[http://www.mono-project.com/|Mono]] jobs, as well as some Pig '' - * ''To be delivered in Q2 2015 a second cluster of 600 nodes, each 48HT cores, 256GB RAM, 96TB HDD '' + * ''We have 5 clusters in total, 2 of which are production, each with a corresponding pre-production and an experimental one '' + * ''More than 47,896 cores in ~2,560 machines running Hadoop (> 4,300 machines by the end of 2017) '' + * ''Our main cluster: 1,353 machines (24 cores w 15*6TB disk & 256GB RAM) '' +* ''Growth to ~3,000 machines by the end of 2017 '' + * ''We run a mix of '' +* ''Ad-hoc Hive queries for BI '' +* ''Cascading/Scalding jobs '' +* ''Mapreduce jobs '' +* ''Spark jobs '' +* ''Streaming Mono jobs '' * ''[[http://www.crs4.it|CRS4]] '' * ''Hadoop deployed dynamically on subsets of a 400-node cluster '' - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Misty" by SteveLoughran
Dear wiki user, You have subscribed to a wiki page "Hadoop Wiki" for change notification. The page "Misty" has been deleted by SteveLoughran: https://wiki.apache.org/hadoop/Misty?action=diff=1=2 Comment: junk user page - ##master-page:HomepageTemplate - #format wiki - #language en - == @``ME@ == - Email: <> - ## You can even more obfuscate your email address by adding more uppercase letters followed by a leading and trailing blank. - - ... - - - CategoryHomepage - - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=40=41 Comment: Added new book }}} + === Deep Learning with Hadoop === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/deep-learning-hadoop|Deep Learning with Hadoop]] + + '''Author:''' Dipayan Dev + + '''Publisher:''' Packt + + '''Date of Publishing:''' February 2017 + + Build, implement and scale distributed deep learning models for large-scale datasets. + === Hadoop Blueprints === '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hadoop-blueprints|Hadoop Blueprints]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToSetupYourDevelopmentEnvironment" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToSetupYourDevelopmentEnvironment" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/HowToSetupYourDevelopmentEnvironment?action=diff=34=35 Comment: add the details on OSX install, especially protoc setup now that homebrew 1.x doesn't support protobuf 2.5 This page describes how to get your environment setup and is IDE agnostic. = Requirements = - * Java 6 or 7 - * Maven + * Java 7 or 8 (Branch 2) or Java 8 (trunk) + * Maven 3.3 or later * Your favorite IDE + * Protobuf 2.5.0 = Setup Your Development Environment in Linux = - The instructions below talk about how to get an environment setup using the command line to build, control source, and test. These instructions are therefore IDE independent. Take a look at EclipseEnvironment for instructions on how to configure Eclipse to build, control source, and test. If you prefer ItelliJ IDEA, then take a look [[HadoopUnderIDEA| here]] + The instructions below talk about how to get an environment setup using the command line to build, control source, and test. These instructions are therefore IDE independent. Take a look at EclipseEnvironment for instructions on how to configure Eclipse to build, control source, and test. If you prefer IntelliJ IDEA, then take a look [[HadoopUnderIDEA| here]] - * Choose a good place to put your code. You will eventually use your source code to run Hadoop, so choose wisely. For example ~/code/hadoop. + * Choose a good place to put your code. You will eventually use your source code to run Hadoop, so choose wisely. For example {{{~/code/hadoop}}}. - * Get the source. This is documented in HowToContribute. Put the source in ~/code/hadoop (or whatever you chose) so that you have ~/code/hadoop/hadoop-common + * Get the source. This is documented in HowToContribute. Put the source in {{{~/code/hadoop (or whatever you chose) so that you have {{{~/code/hadoop/hadoop-common}}} - * cd into ''hadoop-common'', or whatever you named the directory + * cd into {{{hadoop-common}}}, or whatever you named the directory - * attempt to run ''mvn install'' + * attempt to run {{{mvn install}}} . To build without tests: {{{mvn install -DskipTests}}} * If you get any strange errors (other than JUnit test failures and errors), then consult the ''Build Errors'' section below. * follow GettingStartedWithHadoop to learn how to run Hadoop. * If you run in to any problems, refer to the ''Runtime Errors'' below, along with the troubleshooting document here: TroubleShooting + + = Setup Your Development Environment in OSX = + + + The Linux instructions match, except that: + + XCode is needed for the command line compiler and other tools. + + + protobuf 2.5.0 needs to be built by hand, as macports and homebrew no longer ship that version. + + Follow the instructions in the building from source [[http://sleepythread.blogspot.co.uk/2013/11/installing-protoc-25x-compiler-google.html|Installing protoc 2.5.x compiler on mac]] ''but change the URL for the protobuf archive to [[https://github.com/google/protobuf/releases/download/v2.5.0/protobuf-2.5.0.tar.gz]]''. + + To verify that protobuf is correctly installed, the command {{{protoc --version}}} must print out the string {{{libprotoc 2.5.0}}}. + = Run HDFS in pseudo-distributed mode from the dev tree = - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Ozone" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Ozone" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/Ozone?action=diff=3=4 The requirement to first start then stop the NameNode will be fixed soon. = Performing Ozone REST operations = - Content arriving soon. + [[https://issues.apache.org/jira/secure/attachment/12799549/ozone_user_v0.pdf|Ozone REST API specification]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=92=93 1. Update the news on the home page {{{author/src/documentation/content/xdocs/index.xml}}}. 1. Copy the new release docs to svn and update the {{{docs/current}}} link, by doing the following: {{{ - tar xvf /www/www.apache.org/dist/hadoop/core/hadoop-${version}/hadoop-${version}.tar.gz - cp -rp hadoop-${version}/share/doc/hadoop publish/docs/r${version} - rm -r hadoop-${version} cd publish/docs + tar xvf /path/to/hadoop-${version}-site.tar.gz # Update current2, current, stable and stable2 as needed. # For example rm current2 current - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "PoweredBy" by DavidTing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "PoweredBy" page has been changed by DavidTing: https://wiki.apache.org/hadoop/PoweredBy?action=diff=439=440 * ''Each (commodity) node has 8 cores and 12 TB of storage. '' * ''We are heavy users of both streaming as well as the Java APIs. We have built a higher level data warehousing framework using these features called Hive (see the http://hadoop.apache.org/hive/). We have also developed a FUSE implementation over HDFS. '' - * ''[[http://www.follownews.com/|FollowNews]] '' + * ''[[https://www.follownews.com/|FollowNews]] '' * ''We use Hadoop for storing logs, news analysis, tag analysis. '' * ''[[http://www.foxaudiencenetwork.com|FOX Audience Network]] '' @@ -437, +437 @@ * ''Apache Hive, Apache Avro, Apache Kafka, and other bits and pieces... '' * ''We use these things for discovering People You May Know and [[http://www.linkedin.com/careerexplorer/dashboard|other]] [[http://inmaps.linkedinlabs.com/|fun]] [[http://www.linkedin.com/skills/|facts]]. '' + * ''[[https://www.livebet.com|LiveBet]] '' + * ''We use Hadoop for storing logs, odds analysis, markets analysis. '' + * ''[[http://www.lookery.com|Lookery]] '' * ''We use Hadoop to process clickstream and demographic data in order to create web analytic reports. '' * ''Our cluster runs across Amazon's EC2 infrastructure and makes use of the streaming module to use Python for most operations. '' - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=91=92 1. Push branch-X.Y.Z and the newly created tag to the remote repo. 1. Deploy the maven artifacts, on your personal computer. Please be sure you have completed the prerequisite step of preparing the {{{settings.xml}}} file before the deployment. You might want to do this in private and clear your history file as your gpg-passphrase is in clear text. {{{ - mvn deploy -DskipTests + mvn deploy -Psign -DskipTests -DskipShading }}} 1. Copy release files to a public place and ensure they are readable. Note that {{{home.apache.org}}} only supports SFTP, so this may be easier with a graphical SFTP client like Nautilus, Konqueror, etc. {{{ - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=90=91 }}} 1. While it should fail {{{create-release}}} if there are issues, doublecheck the rat log to find and fix any potential licensing issues. {{{ - grep 'Rat check' target/artifacts/mvn_apache_rat.log + grep 'Rat check' patchprocess/mvn_apache_rat.log }}} 1. Check that release files look ok - e.g. install it somewhere fresh and run examples from tutorial, do a fresh build, read the release notes looking for WARNINGs, etc. 1. Set environment variable version for later steps. {{{export version=X.Y.Z-RCN}}} - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "ConnectionRefused" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "ConnectionRefused" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/ConnectionRefused?action=diff=16=17 Comment: fix name 1. If you are using a Hadoop-based product from a third party, -please use the support channels provided by the vendor. 1. Please do not file bug reports related to your problem, as they will be closed as [[http://wiki.apache.org/hadoop/InvalidJiraIssues|Invalid]] - See also [[http://serverfault.com/questions/725262/what-causes-the-connection-refused-message|Stack Overflow]] + See also [[http://serverfault.com/questions/725262/what-causes-the-connection-refused-message|Server Overflow]] None of these are Hadoop problems, they are hadoop, host, network and firewall configuration issues. As it is your cluster, [[YourNetworkYourProblem|only you can find out and track down the problem.]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "ConnectionRefused" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "ConnectionRefused" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/ConnectionRefused?action=diff=15=16 Comment: ref to stack overflow 1. If you are using a Hadoop-based product from a third party, -please use the support channels provided by the vendor. 1. Please do not file bug reports related to your problem, as they will be closed as [[http://wiki.apache.org/hadoop/InvalidJiraIssues|Invalid]] - None of these are Hadoop problems, they are host, network and firewall configuration issues. As it is your cluster, [[YourNetworkYourProblem|only you can find out and track down the problem.]] + See also [[http://serverfault.com/questions/725262/what-causes-the-connection-refused-message|Stack Overflow]] + None of these are Hadoop problems, they are hadoop, host, network and firewall configuration issues. As it is your cluster, [[YourNetworkYourProblem|only you can find out and track down the problem.]] + - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "ConnectionRefused" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "ConnectionRefused" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/ConnectionRefused?action=diff=13=14 Comment: link to ambari port list If the application or cluster is not working, and this message appears in the log, then it is more serious. + The exception text declares both the hostname and the port to which the connection failed. The port can be used to identify the service. For example, port 9000 is the HDFS port. Consult the [[https://ambari.apache.org/1.2.5/installing-hadoop-using-ambari/content/reference_chap2.html|Ambari port reference]], and/or those of the supplier of your Hadoop management tools. + 1. Check the hostname the client using is correct. If it's in a Hadoop configuration option: examine it carefully, try doing an ping by hand. 1. Check the IP address the client is trying to talk to for the hostname is correct. + 1. Make sure the destination address in the exception isn't 0.0.0.0 -this means that you haven't actually configured the client with the real address for that service, and instead it is picking up the server-side property telling it to listen on every port for connections. - 1. Make sure the destination address in the exception isn't 0.0.0.0 -this means that you haven't actually configured the client with the real address for that. - service, and instead it is picking up the server-side property telling it to listen on every port for connections. 1. If the error message says the remote service is on "127.0.0.1" or "localhost" that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken. 1. Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this). 1. Check the port the client is trying to talk to using matches that the server is offering a service on. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=39=40 Comment: URL change Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across large data sets. === Hadoop Explained (Free eBook Download) === - '''Name:''' [[https://www.packtpub.com/packt/free-ebook/hadoop-explained|Hadoop Explained]] + '''Name:''' [[https://www.packtpub.com/packt/free-ebook/hadoop-explained-2|Hadoop Explained]] '''Author:''' Aravind Shenoy - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "GitAndHadoop" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "GitAndHadoop" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/GitAndHadoop?action=diff=24=25 Comment: Remove some obsolete instructions == Forking onto GitHub == - You can create your own fork of the ASF project, put in branches and stuff as you desire. GitHub prefer you to explicitly fork their copies of Hadoop. + You can create your own fork of the ASF project. This is required if you want to contribute patches by submitting pull requests. However you can choose to skip this step and attach patch files directly on Apache Jiras. 1. Create a GitHub login at http://github.com/ ; Add your public SSH keys + 1. Go to https://github.com/apache/hadoop/ + 1. Click fork in the github UI. This gives you your own repository URL. - 1. Go to http://github.com/apache and search for the Hadoop and other Apache projects you want (avro is handy alongside the others) - 1. For each project, fork in the github UI. This gives you your own repository URL which you can then clone locally with {{{git clone}}} - 1. For each patch, branch. - - At the time of writing (December 2009), GitHub was updating its copy of the Apache repositories every hour. As the Apache repositories were updating every 15 minutes, provided these frequencies are retained, a GitHub-fork derived version will be at worst 1 hour and 15 minutes behind the ASF's Git repository. If you are actively developing on Hadoop, especially committing code into the Git repository, that is too long -work off the Apache repositories instead. - - 1. Clone the read-only repository from Github (their recommendation) or from Apache (the ASF's recommendation) - 1. in that clone, rename that repository "apache": {{{git remote rename origin apache}}} - 1. Log in to [http://github.com] - 1. Create a new repository (e.g hadoop-fork) - 1. In the existing clone, add the new repository : + 1. In the existing clone, add the new repository: {{{git remote add -f github g...@github.com:MYUSERNAMEHERE/hadoop.git}}} - This gives you a local repository with two remote repositories: "apache" and "github". Apache has the trunk branch, which you can update whenever you want to get the latest ASF version: + This gives you a local repository with two remote repositories: {{{origin}}} and {{{github}}}. {{{origin}}} has the Apache branches, which you can update whenever you want to get the latest ASF version: {{{ - git checkout trunk - git pull apache + git checkout -b trunk origin/trunk + git pull origin }}} - Your own branches can be merged with trunk, and pushed out to git hub. To generate patches for submitting as JIRA patches, check everything in to your specific branch, merge that with (a recently pulled) trunk, then diff the two: + Your own branches can be merged with trunk, and pushed out to GitHub. To generate patches for attaching to Apache JIRAs, check everything in to your specific branch, merge that with (a recently pulled) trunk, then diff the two: - {{{ git diff --no-prefix trunk > ../hadoop-patches/HADOOP-XYX.patch }}} + {{{ git diff trunk > ../hadoop-patches/HADOOP-XYX.patch }}} - - If you are working deep in the code it's not only convenient to have a directory full of patches to the JIRA issues, it's convenient to have that directory a git repository that is pushed to a remote server, such as [[https://github.com/steveloughran/hadoop-patches|this example]]. Why? It helps you move patches from machine to machine without having to do all the updating and merging. From a pure-git perspective this is wrong: it loses history, but for a mixed workflow it doesn't matter so much. == Branching == - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "GitAndHadoop" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "GitAndHadoop" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/GitAndHadoop?action=diff=23=24 Comment: Remove obsolete svn-bridge migration info. }}} You can then use commands like `git blame --follow` with success. - - == Migrating private branches to the new git commit history == - - The migration from svn to git changed the commit ids for anyone tracking the history of the project via the svn to git bridge. This means that private forks/branches will not rebase to the new versions. Follow the MigratingPrivateGitBranches instructions. - == Forking onto GitHub == - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToContribute" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToContribute" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/HowToContribute?action=diff=117=118 Comment: Update Java version from 7 to 8. * Disable any added value "reformat" and "strip trailing spaces" features as it can create extra noise when reviewing patches. === Build Tools === - * A Java Development Kit. The Hadoop developers recommend [[http://java.com/|Oracle Java 7]]. You may also use [[http://openjdk.java.net/|OpenJDK]]. + * A Java Development Kit. The Hadoop developers recommend [[http://java.com/|Oracle Java 8]]. You may also use [[http://openjdk.java.net/|OpenJDK]]. * Google Protocol Buffers. Check out the ProtocolBuffers guide for help installing protobuf. * [[http://maven.apache.org/|Apache Maven]] version 3 or later (for Hadoop 0.23+) * The Java API javadocs. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by SangjinLee
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by SangjinLee: https://wiki.apache.org/hadoop/HowToRelease?action=diff=89=90 ## page was copied from HowToReleasePostMavenization ''This page is prepared for Hadoop Core committers. You need committer rights to create a new Hadoop Core release.'' - These instructions have been updated to use dev-support/bin/create-release. Earlier versions of this document are at HowToReleaseWithSvnAndAnt and HowToReleasePostMavenization and [[HowToReleasePreDSBCR]] + These instructions have been updated to use dev-support/bin/create-release. Earlier versions of this document are at HowToReleaseWithSvnAndAnt and HowToReleasePostMavenization and [[HowToReleasePreDSBCR]]. For releasing from the 2.6.x or the 2.7.x line, you'll need to consult [[HowToReleasePreDSBCR]] to find applicable steps. <> - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=38=39 Comment: Book links added }}} + === Hadoop Blueprints === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hadoop-blueprints|Hadoop Blueprints]] + + '''Authors:''' Anurag Shrivastava, Tanmay Deshpande + + '''Publisher:''' Packt + + '''Date of Publishing:''' September 2016 + + Use Hadoop to solve business problems by learning from a rich set of real-life case studies. + === Hadoop: Data Processing and Modelling === '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hadoop-data-processing-and-modelling|Hadoop: Data Processing and Modelling]] @@ -423, +435 @@ '''Date of Publishing (est.):''' October 2015 + + + == Hadoop Videos == + + + === Taming Big Data with MapReduce and Hadoop - Hands On! (Video) === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/taming-big-data-mapreduce-and-hadoop-hands-video|Taming Big Data with MapReduce and Hadoop - Hands On! (Video)]] + + '''Author:''' Frank Kane + + '''Publisher:''' Packt + + '''Date of Publishing:''' September 12, 2016 + + Master the art of processing Big Data using Hadoop and MapReduce with the help of real-world examples. + + Hadoop in Action introduces the subject and shows how to write programs in the MapReduce style. It starts with a few easy examples and then moves quickly to show Hadoop use in more complex data analysis tasks. Included are best practices and design patterns of MapReduce programming. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "AmazonS3" by YongjunZhang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "AmazonS3" page has been changed by YongjunZhang: https://wiki.apache.org/hadoop/AmazonS3?action=diff=21=22 === Unmainteained: S3N FileSystem (URI scheme: s3n://) === - '''S3A is the S3 Client for Hadoop 2.6 and earlier. From Hadoop 2.7+, switch to s3a''' + '''S3N is the S3 Client for Hadoop 2.6 and earlier. From Hadoop 2.7+, switch to s3a''' A native filesystem for reading and writing regular files on S3.With this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The S3N code is stable and widely used, but is not adding any new features (which is why it remains stable). - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Trivial Update of "HowToContribute" by QwertyManiac
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToContribute" page has been changed by QwertyManiac: https://wiki.apache.org/hadoop/HowToContribute?action=diff=116=117 Comment: Add gcc-c++ to RHEL instructions For RHEL (and hence also CentOS): {{{ - yum -y install lzo-devel zlib-devel gcc autoconf automake libtool openssl-devel fuse-devel cmake + yum -y install lzo-devel zlib-devel gcc gcc-c++ autoconf automake libtool openssl-devel fuse-devel cmake }}} For Debian and Ubuntu: - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Trivial Update of "HowToContribute" by QwertyManiac
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToContribute" page has been changed by QwertyManiac: https://wiki.apache.org/hadoop/HowToContribute?action=diff=116=117 Comment: Add missing cmake to RHEL instructions For RHEL (and hence also CentOS): {{{ - yum -y install lzo-devel zlib-devel gcc autoconf automake libtool openssl-devel fuse-devel cmake + yum -y install lzo-devel zlib-devel gcc gcc-c++ autoconf automake libtool openssl-devel fuse-devel cmake }}} For Debian and Ubuntu: - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=37=38 Comment: Book added }}} + === Hadoop: Data Processing and Modelling === + + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hadoop-data-processing-and-modelling|Hadoop: Data Processing and Modelling]] + + '''Authors:''' Garry Turkington, Tanmay Deshpande, Sandeep Karanth + + '''Publisher:''' Packt + + '''Date of Publishing:''' August 2016 + + Unlock the power of your data with Hadoop 2.X ecosystem and its data warehousing techniques across large data sets. + === Hadoop Explained (Free eBook Download) === '''Name:''' [[https://www.packtpub.com/packt/free-ebook/hadoop-explained|Hadoop Explained]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=36=37 Comment: Added a free eBook }}} + === Hadoop Explained (Free eBook Download) === + '''Name:''' [[https://www.packtpub.com/packt/free-ebook/hadoop-explained|Hadoop Explained]] + + '''Author:''' Aravind Shenoy + + '''Publisher:''' Packt Publishing + + Learn how MapReduce organizes and processes large sets of data and discover the advantages of Hadoop - from scalability to security, see how Hadoop handles huge amounts of data with care + === Hadoop Real-World Solutions Cookbook- Second Edition === '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hadoop-real-world-solutions-cookbook-second-edition|Hadoop Real-World Solutions Cookbook- Second Edition]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=88=89 Comment: Update website build instructions to point to HowToCommit instead {{{ svn add publish/docs/r${version} }}} - 1. Regenerate the site, review it, then commit it. + 1. Regenerate the site, review it, then commit it per the instructions in HowToCommit. {{{ - ant -Dforrest.home=$FORREST_HOME -Djava5.home=/usr/local/jdk1.5 - firefox publish/index.html + + svn commit -m "Updated site for release X.Y.Z." }}} 1. Send announcements to the user and developer lists once the site changes are visible. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToCommit" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToCommit" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToCommit?action=diff=37=38 Comment: update fix version instructions 1. Cherry-pick the changes to other appropriate branches via {{{git cherry-pick -x }}}. The -x option records the source commit, and reuses the original commit message. Resolve any conflicts. 1. If the conflicts are major, it is preferable to produce a new patch for that branch, review it separately and commit it. When committing an edited patch to other branches, please follow the same steps and make sure to include the JIRA number and description of changes in the commit message. 1. When backporting to branch-2.7 or older branches, we need to update CHANGES.txt. - 1. Resolve the issue as fixed, thanking the contributor. Always set the "Fix Version" at this point, but please only set a single fix version, the earliest release in which the change will appear. '''Special case'''- when committing to a ''non-mainline'' branch (such as branch-0.22 or branch-0.23 ATM), please set fix-version to either 2.x.x or 3.x.x appropriately too. + 1. Resolve the issue as fixed, thanking the contributor. Follow the rules specified at [[https://hadoop.apache.org/versioning.html|Apache Hadoop Release Versioning]] for how to set fix versions appropriately, it's important for tracking purposes with concurrent release lines. 1. Set the assignee if it is not set. If you cannot set the contributor to the assignee, you need to add the contributor into Contributors role in the project. Please see [[#Roles|Adding Contributors role]] for the detail. This How-to-commit [[http://www.youtube.com/watch?v=txW3m7qWdzw=youtu.be|video]] has guidance on the commit process, albeit using svn. Most of the process is still the same, except that we now use git instead. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Roadmap" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Roadmap" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/Roadmap?action=diff=61=62 For more details on how releases are created, see HowToRelease. == Hadoop 3.x Releases == + === Planned for hadoop-3.0.0 === + * HADOOP + * Classpath isolation on by default [[https://issues.apache.org/jira/browse/HADOOP-11656|HADOOP-11656]] + * HDFS + * YARN + * MAPREDUCE + + - === hadoop-3.0 === + === hadoop-3.0.0-alpha1 === * HADOOP * Move to JDK8+ - * Classpath isolation on by default [[https://issues.apache.org/jira/browse/HADOOP-11656|HADOOP-11656]] * Shell script rewrite [[https://issues.apache.org/jira/browse/HADOOP-9902|HADOOP-9902]] * Move default ports out of ephemeral range [[https://issues.apache.org/jira/browse/HDFS-9427|HDFS-9427]] * HDFS * Removal of hftp in favor of webhdfs [[https://issues.apache.org/jira/browse/HDFS-5570|HDFS-5570]] * Support for more than two standby NameNodes [[https://issues.apache.org/jira/browse/HDFS-6440|HDFS-6440]] * Support for Erasure Codes in HDFS [[https://issues.apache.org/jira/browse/HDFS-7285|HDFS-7285]] + * Intra-datanode balancer [[https://issues.apache.org/jira/browse/HDFS-1312|HDFS-1312]] * YARN + * YARN Timeline Service v.2 [[https://issues.apache.org/jira/browse/YARN-2928|YARN-2928]] * MAPREDUCE * Derive heap size or mapreduce.*.memory.mb automatically [[https://issues.apache.org/jira/browse/MAPREDUCE-5785|MAPREDUCE-5785]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Trivial Update of "HowToRelease" by VinodKumarVavilapalli
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by VinodKumarVavilapalli: https://wiki.apache.org/hadoop/HowToRelease?action=diff=87=88 1. Check if the release year for Web UI footer is updated (the property {{{}}} in {{{hadoop-project/pom.xml}}}). If not, create a JIRA to update the property value to the right year, and propagate the fix from trunk to all necessary branches. Consider the voting time needed before publishing, it's better to use the year of (current time + voting time) here, to be consistent with the publishing time. 1. In JIRA, ensure that only issues in the "Fixed" state have a "Fix Version" set to release X.Y.Z. - 1. In JIRA, "release" the version, setting the date to the expected end-of-vote date. Visit the "Administer Project" page, then the "Manage versions" page. You need to have the "Admin" role in HADOOP, HDFS, MAPREDUCE, and YARN. This ensures that the release notes and changes file have the correct date to match the actual release date. 1. Verify that $HOME/.gpg defaults to the key listed in the KEYS file. 1. For the Apache release, a machine capable of running Docker- and Internet- capable, build the release candidate with {{{create-release}}}. Unless the {{{--logdir}}} is given, logs will be in the {{{patchprocess/}}} directory. Artifacts will be in the target/artifacts NOTE: This will take quite a while, since it will download and build the entire source tree, including documentation and native components, from scratch to avoid maven repository caching issues hiding issues with the source release. {{{ @@ -117, +116 @@ = Publishing = In 5 days if [[http://hadoop.apache.org/bylaws#Decision+Making|the release vote passes]], the release may be published. - + 1. In JIRA, "release" the version, setting the date to the end-of-vote date. Visit the "Administer Project" page, then the "Manage versions" page. You need to have the "Admin" role in HADOOP, HDFS, MAPREDUCE, and YARN. 1. Set environment variable version for later steps. {{{export version=X.Y.Z}}} 1. Tag the release. Do it from the release branch and push the created tag to the remote repository: {{{ - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=86=87 Comment: update instructions for uploading to home.apache.org {{{ mvn deploy -DskipTests }}} - 1. Copy release files to a public place and ensure they are readable. + 1. Copy release files to a public place and ensure they are readable. Note that {{{home.apache.org}}} only supports SFTP, so this may be easier with a graphical SFTP client like Nautilus, Konqueror, etc. {{{ - ssh home.apache.org mkdir public_html/hadoop-${version} - scp -p hadoop-${version}*.tar.gz* home.apache.org:public_html/hadoop-${version} - ssh home.apache.org chmod -R a+r public_html/hadoop-${version} + sftp home.apache.org + > cd public_html + > mkdir hadoop-${version} + > put -r /home/hadoop/hadoop-${version} + + > bye }}} 1. Log into [[https://repository.apache.org|Nexus]], select "{{{Staging}}} Repositories" from the left navigation pane, select the check-box against the specific hadoop repository, and {{{close}}} the release. 1. Call a release vote on common-dev at hadoop.apache.org. It's usually a good idea to start the release vote on Monday so that people will have a chance to verify the release candidate during the week. [[https://www.mail-archive.com/common-dev@hadoop.apache.org/msg13339.html|Example]] - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "AmazonS3" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "AmazonS3" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/AmazonS3?action=diff=20=21 Comment: lots more on S3a and why to use it, warnings of state of s3n and deprecation of s3 = S3 Support in Apache Hadoop = [[http://aws.amazon.com/s3|Amazon S3]] (Simple Storage Service) is a data storage service. You are billed - monthly for storage and data transfer. Transfer between S3 and [[AmazonEC2]] instances in the same geographical location are free. This makes use of - S3 attractive for Hadoop users who run clusters on EC2. + monthly for storage and data transfer. Transfer between S3 and [[AmazonEC2]] instances in the same geographical location are free. Most importantly, the data is preserved when a transient Hadoop cluster is shut down + + This makes use of S3 common in Hadoop clusters on EC2. It is also used sometimes for backing up remote cluster. Hadoop provides multiple filesystem clients for reading and writing to and from Amazon S3 or compatible service. - === S3 Native FileSystem (URI scheme: s3n) === - A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The S3N code is stable and widely used, but is not adding any new features (which is why it remains stable). S3N requires a suitable version of the jets3t JAR on the classpath. + === Recommended: S3A (URI scheme: s3a://) - Hadoop 2.7+ === - === S3A (URI scheme: s3a) === + '''S3A is the recommended S3 Client for Hadoop 2.7 and later''' A successor to the S3 Native, s3n:// filesystem, the S3a: system uses Amazon's libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema. - S3A has been considered usable in production since Hadoop 2.7, and is undergoing active maintenance for enhanced security, scalability and performance. + S3A has been usable in production since Hadoop 2.7, and is undergoing active maintenance for enhanced security, scalability and performance. - '''important:''' S3A requires the exact version of the amazon-aws-sdk against which Hadoop was built (and is bundled with). + History - === S3 Block FileSystem (URI scheme: s3) === + 1. Hadoop 2.6: Initial Implementation: [[https://issues.apache.org/jira/browse/HADOOP-10400|HADOOP-10400]] + 2. Hadoop 2.7: Production Ready: [[https://issues.apache.org/jira/browse/HADOOP-11571|HADOOP-11571]] + 3. Hadoop 2.8: Performance, robustness and security [[https://issues.apache.org/jira/browse/HADOOP-11694|HADOOP-11694]] + 4. Hadoop 2.9: Even more features: [[https://issues.apache.org/jira/browse/HADOOP-13204|HADOOP-13204]] + July 2016: For details of ongoing work on S3a, consult [[www.slideshare.net/HadoopSummit/hadoop-cloud-storage-object-store-integration-in-production|Hadoop & Cloud Storage: Object Store Integration in Production]] + + '''important:''' S3A requires the exact version of the amazon-aws-sdk against which Hadoop was built (and is bundled with). If you try to upgrade the library by dropping in a later version, things will break. + + + === Unmainteained: S3N FileSystem (URI scheme: s3n://) === + + '''S3A is the S3 Client for Hadoop 2.6 and earlier. From Hadoop 2.7+, switch to s3a''' + + A native filesystem for reading and writing regular files on S3.With this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The S3N code is stable and widely used, but is not adding any new features (which is why it remains stable). + + S3N requires a compatible version of the jets3t JAR on the classpath. + + Since Hadoop 2.6, all work on S3 integration has been with S3A. S3N is not maintained except for security risks —this helps guarantee security. Most bug reports against S3N will be closed as WONTFIX and the text "use S3A". Please switch to S3A if you can -and do try it before filing bug reports against S3N. + + + === (Deprecated) S3 Block FileSystem (URI scheme: s3://) === + + '''S3 is deprecated and will be removed from Hadoop 2.3''' + - '''important:''' this section covers the s3:// filesystem support inside Apache Hadoop. The one in Amazon EMR is different —see the details at the bottom of this page. + '''important:''' this section covers the s3:// filesystem support from the Apache Software Foundation. The one in Amazon EMR is different —see the details at the bottom of this page. A block-based filesystem backed by S3. Files are stored
[Hadoop Wiki] Update of "LibHDFS" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "LibHDFS" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/LibHDFS?action=diff=12=13 Comment: Fix broken link to libhdfs test cases <> = Examples = - The [[http://svn.apache.org/viewvc/hadoop/core/trunk/src/c++/libhdfs/hdfs_test.c|test cases]] for libhdfs provide some good examples on how to use libhdfs. + The [[https://git-wip-us.apache.org/repos/asf?p=hadoop.git;a=tree;f=hadoop-hdfs-project/hadoop-hdfs-native-client/src/main/native/libhdfs-tests|test cases]] for libhdfs provide some good examples on how to use libhdfs. <> < > - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToCommit" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToCommit" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/HowToCommit?action=diff=36=37 Comment: Committer need to update CHANGES.txt when backporting to branch-2.7 or older branches. 1. '''Push changes to remote repo:''' Build and run a test to ensure it is all still kosher. Push the changes to the remote (main) repo using {{{git push }}}. 1. '''Backporting to other branches:''' If the changes were to trunk, we might want to apply them to other appropriate branches. 1. Cherry-pick the changes to other appropriate branches via {{{git cherry-pick -x }}}. The -x option records the source commit, and reuses the original commit message. Resolve any conflicts. - 1. If the conflicts are major, it is preferable to produce a new patch for that branch, review it separately and commit it. When committing an edited patch to other branches, please follow the same steps and make sure to include the JIRA number and description of changes in the commit message. + 1. If the conflicts are major, it is preferable to produce a new patch for that branch, review it separately and commit it. When committing an edited patch to other branches, please follow the same steps and make sure to include the JIRA number and description of changes in the commit message. + 1. When backporting to branch-2.7 or older branches, we need to update CHANGES.txt. 1. Resolve the issue as fixed, thanking the contributor. Always set the "Fix Version" at this point, but please only set a single fix version, the earliest release in which the change will appear. '''Special case'''- when committing to a ''non-mainline'' branch (such as branch-0.22 or branch-0.23 ATM), please set fix-version to either 2.x.x or 3.x.x appropriately too. 1. Set the assignee if it is not set. If you cannot set the contributor to the assignee, you need to add the contributor into Contributors role in the project. Please see [[#Roles|Adding Contributors role]] for the detail. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToReleasePreDSBCR" by SomeOtherAccount
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToReleasePreDSBCR" page has been changed by SomeOtherAccount: https://wiki.apache.org/hadoop/HowToReleasePreDSBCR?action=diff=83=84 ## page was renamed from HowToReleasePostMavenizationWithGit ## page was copied from HowToReleasePostMavenization ''This page is prepared for Hadoop Core committers. You need committer rights to create a new Hadoop Core release.'' + + + '''WARNING: These instructions use the ASF Jenkins servers to build a release artifact. This is against the ASF release policies!''' The current version of this page is available at HowToRelease - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Trivial Update of "HowToRelease" by SomeOtherAccount
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by SomeOtherAccount: https://wiki.apache.org/hadoop/HowToRelease?action=diff=85=86 1. In JIRA, ensure that only issues in the "Fixed" state have a "Fix Version" set to release X.Y.Z. 1. In JIRA, "release" the version, setting the date to the expected end-of-vote date. Visit the "Administer Project" page, then the "Manage versions" page. You need to have the "Admin" role in HADOOP, HDFS, MAPREDUCE, and YARN. This ensures that the release notes and changes file have the correct date to match the actual release date. 1. Verify that $HOME/.gpg defaults to the key listed in the KEYS file. - 1. On a Docker- and Internet- capable machine, build the release candidate with {{{create-release}}}. Unless the {{{--logdir}}} is given, logs will be in the {{{patchprocess/}}} directory. Artifacts will be in the target/artifacts NOTE: This will take quite a while, since it will download and build the entire source tree, including documentation and native components, from scratch to avoid maven repository caching issues hiding issues with the source release. + 1. For the Apache release, a machine capable of running Docker- and Internet- capable, build the release candidate with {{{create-release}}}. Unless the {{{--logdir}}} is given, logs will be in the {{{patchprocess/}}} directory. Artifacts will be in the target/artifacts NOTE: This will take quite a while, since it will download and build the entire source tree, including documentation and native components, from scratch to avoid maven repository caching issues hiding issues with the source release. {{{ dev-support/bin/create-release --asfrelease --docker --dockercache }}} - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Trivial Update of "HowToRelease" by SomeOtherAccount
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by SomeOtherAccount: https://wiki.apache.org/hadoop/HowToRelease?action=diff=84=85 cp target/artifacts/RELEASENOTES.md hadoop-common-project/hadoop-common/src/site/markdown/release/${version}/RELEASENOTES.${version}.md cp target/artifacts/CHANGES.md hadoop-common-project/hadoop-common/src/site/markdown/release/${version}/CHANGES.${version}.md }}} - 1. Update {{{hadoop-project-dist/pom.xml}}} to point to this new stable version of the API and commit the change. + 1. Copy the jdiff xml files for this version to their appropriate directory. + {{{ + cp hadoop-hdfs-project/hadoop-hdfs/target/site/jdiff/xml/Apache_Hadoop_HDFS_${version}.xml hadoop-hdfs-project/hadoop-hdfs/dev-support/jdiff + }}} + 1. Update {{{hadoop-project-dist/pom.xml}}} {{{ X.Y.Z }}} - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Trivial Update of "HowToRelease" by SomeOtherAccount
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by SomeOtherAccount: https://wiki.apache.org/hadoop/HowToRelease?action=diff=83=84 ## page was copied from HowToReleasePostMavenization ''This page is prepared for Hadoop Core committers. You need committer rights to create a new Hadoop Core release.'' - These instructions have been updated to use dev-support/bin/create-release. Earlier versions of this document are at HowToReleaseWithSvnAndAnt and HowToReleasePostMavenization and HowToReleasePreDSBCR + These instructions have been updated to use dev-support/bin/create-release. Earlier versions of this document are at HowToReleaseWithSvnAndAnt and HowToReleasePostMavenization and [[HowToReleasePreDSBCR]] <> - '''READ ALL OF THESE INSTRUCTIONS THOROUGHLY BEFORE PROCEEDING! + '''READ ALL OF THESE INSTRUCTIONS THOROUGHLY BEFORE PROCEEDING! ''' - ''' = Preparation = 1. If you have not already done so, [[http://www.apache.org/dev/release-signing.html#keys-policy|append your code signing key]] to the [[https://dist.apache.org/repos/dist/release/hadoop/common/KEYS|KEYS]] file. Once you commit your changes, they will automatically be propagated to the website. Also [[http://www.apache.org/dev/release-signing.html#keys-policy|upload your key to a public key server]] if you haven't. End users use the KEYS file (along with the [[http://www.apache.org/dev/release-signing.html#web-of-trust|web of trust]]) to validate that releases were done by an Apache committer. For more details on signing releases, see [[http://www.apache.org/dev/release-signing.html|Signing Releases]] and [[http://www.apache.org/dev/mirror-step-by-step.html?Step-By-Step|Step-By-Step Guide to Mirroring Releases]]. @@ -71, +70 @@ mvn versions:set -DnewVersion=X.Y.Z }}} - Now, for any branches in {trunk, branch-X, branch-X.Y, branch-X.Y.Z} that have changed, push them to the remote repo taking care of any conflicts. {{{ @@ -87, +85 @@ 1. On a Docker- and Internet- capable machine, build the release candidate with {{{create-release}}}. Unless the {{{--logdir}}} is given, logs will be in the {{{patchprocess/}}} directory. Artifacts will be in the target/artifacts NOTE: This will take quite a while, since it will download and build the entire source tree, including documentation and native components, from scratch to avoid maven repository caching issues hiding issues with the source release. {{{ dev-support/bin/create-release --asfrelease --docker --dockercache - }}} + }}} 1. While it should fail {{{create-release}}} if there are issues, doublecheck the rat log to find and fix any potential licensing issues. {{{ grep 'Rat check' target/artifacts/mvn_apache_rat.log - }}} + }}} 1. Check that release files look ok - e.g. install it somewhere fresh and run examples from tutorial, do a fresh build, read the release notes looking for WARNINGs, etc. 1. Set environment variable version for later steps. {{{export version=X.Y.Z-RCN}}} 1. Tag the release candidate: {{{ git tag -s release-$version -m "Release candidate - $version" - }}} + }}} 1. Push branch-X.Y.Z and the newly created tag to the remote repo. 1. Deploy the maven artifacts, on your personal computer. Please be sure you have completed the prerequisite step of preparing the {{{settings.xml}}} file before the deployment. You might want to do this in private and clear your history file as your gpg-passphrase is in clear text. {{{ @@ -135, +133 @@ svn ci -m "Publishing the bits for release ${version}" }}} 1. Update upstream branches to make them aware of this new release: -1. Copy and commit the CHANGES.md and RELEASENOTES.md: + 1. Copy and commit the CHANGES.md and RELEASENOTES.md: -{{{ + {{{ cp target/artifacts/RELEASENOTES.md hadoop-common-project/hadoop-common/src/site/markdown/release/${version}/RELEASENOTES.${version}.md cp target/artifacts/CHANGES.md hadoop-common-project/hadoop-common/src/site/markdown/release/${version}/CHANGES.${version}.md -}}} + }}} -1. Update {{{hadoop-project-dist/pom.xml}}} to point to this new stable version of the API and commit the change. + 1. Update {{{hadoop-project-dist/pom.xml}}} to point to this new stable version of the API and commit the change. -{{{ + {{{ X.Y.Z -}}} + }}} 1. In [[https://repository.apache.org|Nexus]] 1. effect the release of artifacts by selecting the staged repository and then clicking {{{Release}}} 1. If there were multiple RCs, simply drop the staging repositories corresponding to failed RCs. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "HowToRelease" by SomeOtherAccount
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by SomeOtherAccount: https://wiki.apache.org/hadoop/HowToRelease?action=diff=83=84 Comment: Rewrite based upon the new dev-support/bin/create-release script ## page was copied from HowToReleasePostMavenization ''This page is prepared for Hadoop Core committers. You need committer rights to create a new Hadoop Core release.'' - These instructions have been updated to use dev-support/bin/create-release. Earlier versions of this document are at HowToReleaseWithSvnAndAnt and HowToReleasePostMavenization and HowToReleasePreDSBCR + These instructions have been updated to use dev-support/bin/create-release. Earlier versions of this document are at HowToReleaseWithSvnAndAnt and HowToReleasePostMavenization and [[HowToReleasePreDSBCR]] <> - '''READ ALL OF THESE INSTRUCTIONS THOROUGHLY BEFORE PROCEEDING! + '''READ ALL OF THESE INSTRUCTIONS THOROUGHLY BEFORE PROCEEDING! ''' - ''' = Preparation = 1. If you have not already done so, [[http://www.apache.org/dev/release-signing.html#keys-policy|append your code signing key]] to the [[https://dist.apache.org/repos/dist/release/hadoop/common/KEYS|KEYS]] file. Once you commit your changes, they will automatically be propagated to the website. Also [[http://www.apache.org/dev/release-signing.html#keys-policy|upload your key to a public key server]] if you haven't. End users use the KEYS file (along with the [[http://www.apache.org/dev/release-signing.html#web-of-trust|web of trust]]) to validate that releases were done by an Apache committer. For more details on signing releases, see [[http://www.apache.org/dev/release-signing.html|Signing Releases]] and [[http://www.apache.org/dev/mirror-step-by-step.html?Step-By-Step|Step-By-Step Guide to Mirroring Releases]]. @@ -71, +70 @@ mvn versions:set -DnewVersion=X.Y.Z }}} - Now, for any branches in {trunk, branch-X, branch-X.Y, branch-X.Y.Z} that have changed, push them to the remote repo taking care of any conflicts. {{{ @@ -87, +85 @@ 1. On a Docker- and Internet- capable machine, build the release candidate with {{{create-release}}}. Unless the {{{--logdir}}} is given, logs will be in the {{{patchprocess/}}} directory. Artifacts will be in the target/artifacts NOTE: This will take quite a while, since it will download and build the entire source tree, including documentation and native components, from scratch to avoid maven repository caching issues hiding issues with the source release. {{{ dev-support/bin/create-release --asfrelease --docker --dockercache - }}} + }}} 1. While it should fail {{{create-release}}} if there are issues, doublecheck the rat log to find and fix any potential licensing issues. {{{ grep 'Rat check' target/artifacts/mvn_apache_rat.log - }}} + }}} 1. Check that release files look ok - e.g. install it somewhere fresh and run examples from tutorial, do a fresh build, read the release notes looking for WARNINGs, etc. 1. Set environment variable version for later steps. {{{export version=X.Y.Z-RCN}}} 1. Tag the release candidate: {{{ git tag -s release-$version -m "Release candidate - $version" - }}} + }}} 1. Push branch-X.Y.Z and the newly created tag to the remote repo. 1. Deploy the maven artifacts, on your personal computer. Please be sure you have completed the prerequisite step of preparing the {{{settings.xml}}} file before the deployment. You might want to do this in private and clear your history file as your gpg-passphrase is in clear text. {{{ @@ -135, +133 @@ svn ci -m "Publishing the bits for release ${version}" }}} 1. Update upstream branches to make them aware of this new release: -1. Copy and commit the CHANGES.md and RELEASENOTES.md: + 1. Copy and commit the CHANGES.md and RELEASENOTES.md: -{{{ + {{{ cp target/artifacts/RELEASENOTES.md hadoop-common-project/hadoop-common/src/site/markdown/release/${version}/RELEASENOTES.${version}.md cp target/artifacts/CHANGES.md hadoop-common-project/hadoop-common/src/site/markdown/release/${version}/CHANGES.${version}.md -}}} + }}} -1. Update {{{hadoop-project-dist/pom.xml}}} to point to this new stable version of the API and commit the change. + 1. Update {{{hadoop-project-dist/pom.xml}}} to point to this new stable version of the API and commit the change. -{{{ + {{{ X.Y.Z -}}} + }}} 1. In [[https://repository.apache.org|Nexus]] 1. effect the release of artifacts by selecting the staged repository and then clicking {{{Release}}} 1. If there were multiple RCs, simply drop the staging repositories corresponding to failed RCs. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail:
[Hadoop Wiki] Update of "UnixShellScriptProgrammingGuide" by SomeOtherAccount
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "UnixShellScriptProgrammingGuide" page has been changed by SomeOtherAccount: https://wiki.apache.org/hadoop/UnixShellScriptProgrammingGuide?action=diff=20=21 Comment: More dynamic subcommands updates ## page was renamed from ShellScriptProgrammingGuide = Introduction = - With [[https://issues.apache.org/jira/browse/HADOOP-9902|HADOOP-9902]], the shell script code base has been refactored, with common functions and utilities put into a shell library (hadoop-functions.sh). Here are some tips and tricks to get the most out of using this functionality: = The Skeleton = - All properly built shell scripts contain the following sections: 1. `hadoop_usage` function that contains an alphabetized list of subcommands and their description. This is used when the user directly asks for help, a command line syntax error, etc. - 2. `HADOOP_LIBEXEC_DIR` configured. This should be the location of where `hadoop-functions.sh`, `hadoop-config.sh`, etc, are located. + 1. `HADOOP_LIBEXEC_DIR` configured. This should be the location of where `hadoop-functions.sh`, `hadoop-config.sh`, etc, are located. - 3. `HADOOP_NEW_CONFIG=true`. This tells the rest of the system that the code being executed is aware that it is using the new shell API and it will call the routines it needs to call on its own. If this isn't set, then several default actions that were done in Hadoop 2.x and earlier are executed and several key parts of the functionality are lost. + 1. `HADOOP_NEW_CONFIG=true`. This tells the rest of the system that the code being executed is aware that it is using the new shell API and it will call the routines it needs to call on its own. If this isn't set, then several default actions that were done in Hadoop 2.x and earlier are executed and several key parts of the functionality are lost. - 4. `$HADOOP_LIBEXEC_DIR/abc-config.sh` is executed, where abc is the subproject. HDFS scripts should call `hdfs-config.sh`. MAPRED scripts should call `mapred-config.sh` YARN scripts should call `yarn-config.sh`. Everything else should call `hadoop-config.sh`. This does a lot of standard initialization, processes standard options, etc. This is also what provides override capabilities for subproject specific environment variables. For example, the system will normally ignore `yarn-env.sh`, but `yarn-config.sh` will activate those settings. + 1. `$HADOOP_LIBEXEC_DIR/abc-config.sh` is executed, where abc is the subproject. HDFS scripts should call `hdfs-config.sh`. MAPRED scripts should call `mapred-config.sh` YARN scripts should call `yarn-config.sh`. Everything else should call `hadoop-config.sh`. This does a lot of standard initialization, processes standard options, etc. This is also what provides override capabilities for subproject specific environment variables. For example, the system will normally ignore `yarn-env.sh`, but `yarn-config.sh` will activate those settings. - 5. At this point, this is where the majority of your code goes. Programs should process the rest of the arguments and doing whatever their script is supposed to do. + 1. At this point, this is where the majority of your code goes. Programs should process the rest of the arguments and doing whatever their script is supposed to do. - 6. Before executing a Java program (preferably via hadoop_java_exec) or giving user output, call `hadoop_finalize`. This finishes up the configuration details: adds the user class path, fixes up any missing Java properties, configures library paths, etc. + 1. Before executing a Java program (preferably via hadoop_java_exec) or giving user output, call `hadoop_finalize`. This finishes up the configuration details: adds the user class path, fixes up any missing Java properties, configures library paths, etc. - 7. Either an `exit` or an `exec`. This should return 0 for success and 1 or higher for failure. + 1. Either an `exit` or an `exec`. This should return 0 for success and 1 or higher for failure. - = Adding a Subcommand to an Existing Script = + = Adding a Subcommand to an Existing Script (NOT hadoop-tools-based) = - In order to add a new subcommand, there are two things that need to be done: 1. Add a line to that script's `hadoop_usage` function that lists the name of the subcommand and what it does. This should be alphabetized. - 2. Add an additional entry in the case conditional. Depending upon what is being added, several things may need to be done: + 1. Add an additional entry in the case conditional. Depending upon what is being added, several things may need to be done: + a. Set the `HADOOP_CLASSNAME` to the Java method. b. Add $HADOOP_CLIENT_OPTS to $HADOOP_OPTS (or, for YARN apps, $YARN_CLIENT_OPTS to $YARN_OPTS) if this is an interactive application or for some other reason
[Hadoop Wiki] Update of "HowToCommit" by AkiraAjisaka
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToCommit" page has been changed by AkiraAjisaka: https://wiki.apache.org/hadoop/HowToCommit?action=diff=34=35 Comment: Fix how to commit changes to the website 1. End-user documentation, versioned with releases; and, 1. The website. This is maintained separately in subversion, republished as it is changed. - To commit end-user documentation changes to trunk or a branch, ask the user to submit only changes made to the *.xml files in {{{src/docs}}}. Apply that patch, run {{{ant docs}}} to generate the html, and then commit. End-user documentation is only published to the web when releases are made, as described in HowToRelease. + To commit end-user documentation changes to trunk or a branch, ask the user to submit only changes made to the *.xml files in {{{src/docs}}}. Apply that patch, run {{{ant docs}}} to generate the html, and then commit. End-user documentation is only published to the web when releases are made, as described in HowToRelease. To commit changes to the website and re-publish them: {{{ svn co https://svn.apache.org/repos/asf/hadoop/common/site @@ -75, +75 @@ svn stat # check for new pages svn add # add any new pages svn commit - ssh people.apache.org - cd /www/hadoop.apache.org/common - svn up }}} + The commit will be reflected on Apache Hadoop site automatically. - Changes to website (''via svn up'') might take up to an hour to be reflected on Apache Hadoop site. - == Patches that break HDFS, YARN and MapReduce == - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Trivial Update of "Ozone" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Ozone" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/Ozone?action=diff=2=3 + <> + = Introduction = Ozone is an Object Store for Hadoop that is currently under development. See the Ozone Apache Jira [[https://issues.apache.org/jira/browse/HDFS-7240|HDFS-7240]] for more details. Ozone is currently in a prototype phase. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Trivial Update of "Ozone" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Ozone" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/Ozone?action=diff=1=2 This wiki page is intended as a guide for Ozone contributors. = Compiling Ozone = - Setup your development environment if you haven't done so already ([[https://wiki.apache.org/hadoop/HowToContribute|Instructions here]]). Switch to the HDFS-7240 branch and build a Hadoop distribution as usual. + Setup your development environment if you haven't done so already ([[https://wiki.apache.org/hadoop/HowToContribute|Instructions here]]). Switch to the HDFS-7240 branch, apply the in-progress patch for [[https://issues.apache.org/jira/browse/HDFS-10363|HDFS-10363]] and build a Hadoop distribution as usual. = Configuration = Create a new ozone-site.xml file in your Hadoop configuration directory and add the following settings for a bare minimal configuration. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Ozone" by ArpitAgarwal
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Ozone" page has been changed by ArpitAgarwal: https://wiki.apache.org/hadoop/Ozone?action=diff=1=2 This wiki page is intended as a guide for Ozone contributors. = Compiling Ozone = - Setup your development environment if you haven't done so already ([[https://wiki.apache.org/hadoop/HowToContribute|Instructions here]]). Switch to the HDFS-7240 branch and build a Hadoop distribution as usual. + Setup your development environment if you haven't done so already ([[https://wiki.apache.org/hadoop/HowToContribute|Instructions here]]). Switch to the HDFS-7240 branch, apply the in-progress patch for [[https://issues.apache.org/jira/browse/HDFS-10363|HDFS-10363]] and build a Hadoop distribution as usual. = Configuration = Create a new ozone-site.xml file in your Hadoop configuration directory and add the following settings for a bare minimal configuration. - To unsubscribe, e-mail: common-commits-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-commits-h...@hadoop.apache.org
[Hadoop Wiki] Update of "Defining Hadoop" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Defining Hadoop" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/Defining%20Hadoop?action=diff=17=18 Comment: review and update, prefix Hadoop with Apache in more places Derivative works may choose to declare that they are ''Powered by Apache Hadoop''. Please see our [[http://www.apache.org/foundation/marks/faq/#poweredby|FAQ entry on Powered By naming styles]]. - There have been cases in the past where this policy has been unclear, and some products were named like ''XYZ distribution of Hadoop''. Such existing vendors of derivative works have been required to change their product names to become compliant with the current Apache Trademark Policy - most are in the process of doing so. No other supplier of derivative works of Apache Hadoop may describe their products in such a way. - == Domain Names == The use of the name ''Hadoop'' in domain names is covered by the [[http://www.apache.org/foundation/marks/domains.html| Apache Third Party Domain Name Branding Policy]]. @@ -45, +43 @@ * The definition of the signatures of the Hadoop interfaces and classes is the Apache Source tree, under revision control. * The definition of semantics of the Hadoop interfaces and classes is the Apache Source tree, including its test classes. - * The verification that the actual semantics of an Apache Hadoop release is compatible with the expected semantics is that the test suites in the Apache codebase pass, and that Hadoop users within the open source community have tested the release running at production scale in their datacentres. + * The verification that the actual semantics of an Apache Hadoop release is compatible with the expected semantics is that the test suites in the Apache codebase pass, and that Hadoop users within the open source community have tested the release running at production scale in their datacenters. * Bug reports can highlight incompatibility with expectations of community users, and once incorporated into tests form part of the compatibility testing. * Beta testing of forthcoming releases of Apache Hadoop are of great value in finding unexpected problems, and so not only benefit the product, they benefit the beta testers, who can more confident that their code will work in the final release. * The Hadoop source tree has annotations to mark any interface as Public or Private, and Stable vs Unstable, independently of the Java public/private annotations. @@ -84, +82 @@ "Automotive Hadoop" is a trademark of Joe's Automotive." - Bad: Unless this is for a wrench or other product completely unrelated to computer software, this is a clear infringement on Apache's Hadoop registered mark. + This is a clear infringement on Apache's Hadoop registered mark, a mark held in many countries. === INAPPROPRIATE: Camshaft: it's a Hadoop for the Automotive industry === - It's good that Joe has created his own product name and brand, but saying "a Hadoop" is trouble. If it does contain Hadoop-related artifacts, then it breaks the trademark rules. If it doesn't contain ASF code, then it falls foul of the Generic Trademark problem: the ASF don't want their products to be generified, and will send a note reminding Joe of their rights and obligations. + It's good that Joe has created his own product name and brand, but saying "a Hadoop" is trouble. If it does contain Apache Hadoop-related artifacts, then it breaks the trademark rules. If it doesn't contain ASF code, then it falls foul of the Generic Trademark problem: the ASF don't want their products to be generified, and will send a note reminding Joe of their rights and obligations. === APPROPRIATE: Camshaft: Joe's datamining solution for the Automotive industry === @@ -96, +94 @@ Good: it defines a new product "Camshaft", and opts to use the Apache Hadoop brand to emphasize its heritage. The marketing text sells the product. - === APPROPRIATE: Automotive Joe's "Hadoop for Automotive Engineers" === + === APPROPRIATE: Automotive Joe's "Apache Hadoop for Automotive Engineers" === "Continuing Automotive Joe's best selling series, including the popular titles "Spark Gap tuning" and "Datacenter fabric: architecture and implementation", the book "Hadoop for Automotive Engineers" explains Apache Hadoop in an easy and practical way. As with the rest of the series, the cover is designed to be easy to wipe oil off. " - Good: provided it credits Apache properly inside, this appears to be a good book title. Furthermore, because it's the "Automotive Joe" book series, and not "Automotive Joe's Hadoop" series, the series doesn't infringe anything. Please see our [[http://www.apache.org/foundation/marks/faq/#booktitle|FAQ entry on using Apache marks in book titles]]. + Good: provided it credits Apache properly, this
[Hadoop Wiki] Update of "AmazonS3" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "AmazonS3" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/AmazonS3?action=diff=19=20 Comment: update s3a docs, callout AWS, change heading levels + = S3 Support in Apache Hadoop = + [[http://aws.amazon.com/s3|Amazon S3]] (Simple Storage Service) is a data storage service. You are billed - monthly for storage and data transfer. Transfer between S3 and [[AmazonEC2]] is free. This makes use of + monthly for storage and data transfer. Transfer between S3 and [[AmazonEC2]] instances in the same geographical location are free. This makes use of S3 attractive for Hadoop users who run clusters on EC2. Hadoop provides multiple filesystem clients for reading and writing to and from Amazon S3 or compatible service. - S3 Native FileSystem (URI scheme: s3n):: + === S3 Native FileSystem (URI scheme: s3n) === - A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The disadvantage is the 5GB limit on file size imposed by S3. + A native filesystem for reading and writing regular files on S3. The advantage of this filesystem is that you can access files on S3 that were written with other tools. Conversely, other tools can access files written using Hadoop. The S3N code is stable and widely used, but is not adding any new features (which is why it remains stable). S3N requires a suitable version of the jets3t JAR on the classpath. - S3A (URI scheme: s3a):: - A successor to the S3 Native, s3n fs, the S3a: system uses Amazon's libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema. + === S3A (URI scheme: s3a) === + + A successor to the S3 Native, s3n:// filesystem, the S3a: system uses Amazon's libraries to interact with S3. This allows S3a to support larger files (no more 5GB limit), higher performance operations and more. The filesystem is intended to be a replacement for/successor to S3 Native: all objects accessible from s3n:// URLs should also be accessible from s3a simply by replacing the URL schema. + + S3A has been considered usable in production since Hadoop 2.7, and is undergoing active maintenance for enhanced security, scalability and performance. + + '''important:''' S3A requires the exact version of the amazon-aws-sdk against which Hadoop was built (and is bundled with). + - S3 Block FileSystem (URI scheme: s3):: + === S3 Block FileSystem (URI scheme: s3) === + + '''important:''' this section covers the s3:// filesystem support inside Apache Hadoop. The one in Amazon EMR is different —see the details at the bottom of this page. + - A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem - you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools. + A block-based filesystem backed by S3. Files are stored as blocks, just like they are in HDFS. This permits efficient implementation of renames. This filesystem requires you to dedicate a bucket for the filesystem - you should not use an existing bucket containing files, or write other files to the same bucket. The files stored by this filesystem can be larger than 5GB, but they are not interoperable with other S3 tools. Nobody is/should be uploading data to S3 via this scheme any more; it will eventually be removed from Hadoop entirely. Consider it (as of May 2016), deprecated. + S3 can be used as a convenient repository for data input to and output for analytics applications using either S3 filesystem. Data in S3 outlasts Hadoop clusters on EC2, so they should be where persistent data must be kept. Note that by using S3 as an input you lose the data locality optimization, which may be significant. The general best practise is to copy in data using `distcp` at the start of a workflow, then copy it out at the end, using the transient HDFS in between. - = History = + == History == * The S3 block filesystem was introduced in Hadoop 0.10.0 ([[http://issues.apache.org/jira/browse/HADOOP-574|HADOOP-574]]). * The S3 native filesystem was introduced in Hadoop 0.18.0 ([[http://issues.apache.org/jira/browse/HADOOP-930|HADOOP-930]]) and rename support was added in Hadoop 0.19.0
[Hadoop Wiki] Trivial Update of "PoweredBy" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "PoweredBy" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/PoweredBy?action=diff=438=439 Comment: add a title with term "Apache Hadoop"; use "commercial support" as linktext for distributions and commercial support + = Powered by Apache Hadoop = + - This page documents an alphabetical list of institutions that are using Hadoop for educational or production uses. Companies that offer services on or based around Hadoop are listed in [[Distributions and Commercial Support]]. Please include details about your cluster hardware and size. Entries without this may be mistaken for spam references and deleted.'' '' + This page documents an alphabetical list of institutions that are using Apache Hadoop for educational or production uses. Companies that offer services on or based around Hadoop are listed in [[Distributions and Commercial Support|Commercial Support]]. Please include details about your cluster hardware and size. Entries without this may be mistaken for spam references and deleted.'' '' To add entries you need write permission to the wiki, which you can get by subscribing to the common-...@hadoop.apache.org mailing list and asking for permissions on the wiki account username you've registered yourself as. If you are using Apache Hadoop in production you ought to consider getting involved in the development process anyway, by filing bugs, testing beta releases, reviewing the code and turning your notes into shared documentation. Your participation in this process will ensure your needs get met. @@ -70, +72 @@ * ''[[http://atxcursions.com/|ATXcursions]] '' * ''Two applications that are side products/projects of a local tour company: 1. Sentiment analysis of review websites and social media data. Targeting the tourism industry. 2. Marketing tool that analyzes the most valuable/useful reviewers from sites like Tripadvisor and Yelp as well as social media. Lets marketers and business owners find community members most relevant to their businesses. '' - * ''Using Apache Hadoop, HDFS, Hive, and HBase.'' + * ''Using Apache Hadoop, HDFS, Hive, and HBase.'' * ''3 node cluster, 4 cores, 4GB RAM.'' @@ -88, +90 @@ * ''35 Node Cluster '' * ''We have been running our cluster with no downtime for over 2 ½ years and have successfully handled over 75 Million files on a 64 GB Namenode with 50 TB cluster storage. '' * ''We are heavy MapReduce and Apache HBase users and use Apache Hadoop with Apache HBase for semi-supervised Machine Learning, AI R, Image Processing & Analysis, and Apache Lucene index sharding using katta. '' - + * ''[[http://www.beebler.com|Beebler]] '' * ''14 node cluster (each node has: 2 dual core CPUs, 2TB storage, 8GB RAM) '' * ''We use Apache Hadoop for matching dating profiles '' @@ -421, +423 @@ * ''[[http://www.legolas-media.com|Legolas Media]] '' * ''[[http://www.linkedin.com|LinkedIn]] '' - * ''We have multiple grids divided up based upon purpose. + * ''We have multiple grids divided up based upon purpose. * ''Hardware: '' * ''~800 Westmere-based HP SL 170x, with 2x4 cores, 24GB RAM, 6x2TB SATA '' * ''~1900 Westmere-based SuperMicro X8DTT-H, with 2x6 cores, 24GB RAM, 6x2TB SATA ''
[Hadoop Wiki] Update of "UnknownHost" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "UnknownHost" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/UnknownHost?action=diff=9=10 Comment: mention unknown localhost a. The hostname in the configuration files (such as {{{core-site.xml}}}) is misspelled. 1. The hostname in the configuration files (such as {{{core-site.xml}}}) is confused with the hostname of another service. For example, you are using the hostname of the YARN Resource Manager in the {{{fs.defaultFS}}} configuration option to define the namenode. 1. A worker node thinks it has a given name which it reports to the NameNode and JobTracker, but that isn't the name that the network team gave it, so it isn't resolvable. + 1. If it is happening in service startup, it means the hostname of that service (HDFS, YARN, etc) cannot be found in {{{/etc/hosts}}}; the service will fail to start as it cannot determine which network card/address to use. 1. The calling machine is on a different subnet from the target machine, and short names are being used instead of fully qualified domain names (FQDNs). 1. You are running in a cloud infrastructure and the destination machine is no longer there. It may have been deleted from the DNS records, or, due to some race condition, something is trying to talk to a host that hasn't been created yet.
[Hadoop Wiki] Update of "Books" by Packt Publishing
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Books" page has been changed by Packt Publishing: https://wiki.apache.org/hadoop/Books?action=diff=35=36 }}} + === Hadoop Real-World Solutions Cookbook- Second Edition === + '''Name:''' [[https://www.packtpub.com/big-data-and-business-intelligence/hadoop-real-world-solutions-cookbook-second-edition|Hadoop Real-World Solutions Cookbook- Second Edition]] + + '''Author:''' Tanmay Deshpande + + '''Publisher:''' Packt Publishing + + '''Date of Publishing:''' March 2016 + + The book covers recipes that are based on the latest versions of Apache Hadoop 2.X, YARN, Hive, Pig, Sqoop, Flume, Apache Spark, Mahout etc. + === Hadoop Security: Protecting Your Big Data Platform === '''Name:''' [[https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details|Hadoop Security: Protecting Your Big Data Platform]]
[Hadoop Wiki] Update of "ZooKeeper/HowToContribute" by PatrickHunt
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "ZooKeeper/HowToContribute" page has been changed by PatrickHunt: https://wiki.apache.org/hadoop/ZooKeeper/HowToContribute?action=diff=10=11 + = This page is deprecated - please see our new home at https://cwiki.apache.org/confluence/display/ZOOKEEPER = + = How to Contribute to ZooKeeper = This page describes the mechanics of ''how'' to contribute software to ZooKeeper. For ideas about ''what'' you might contribute, please see the [[ZooKeeper/ProjectSuggestions| ProjectSuggestions page]].
[Hadoop Wiki] Update of "ZooKeeper" by PatrickHunt
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "ZooKeeper" page has been changed by PatrickHunt: https://wiki.apache.org/hadoop/ZooKeeper?action=diff=29=30 + = This page is deprecated - please see our new home at https://cwiki.apache.org/confluence/display/ZOOKEEPER = + + == General Information == ZooKeeper: Because coordinating distributed systems is a Zoo
[Hadoop Wiki] Update of "HowToRelease" by XiaoChen
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by XiaoChen: https://wiki.apache.org/hadoop/HowToRelease?action=diff=81=82 Comment: Add 1 step at the beginning of 'Creating the release candidate', according to HADOOP-12768. = Creating the release candidate (X.Y.Z-RC) = These steps need to be performed to create the ''N''th RC for X.Y.Z, where ''N'' starts from 0. + 1. Check if the release year for Web UI footer is updated (the property {{{}}} in {{{hadoop-project/pom.xml}}}). If not, create a jira to update the property value to the right year, and propagate the fix from trunk to all necessary branches. Consider the voting time needed before publishing, it's better to use the year of (current time + voting time) here, to be consistent with the publishing time. 1. Run mvn rat-check and fix any errors {{{ mvn apache-rat:check
[Hadoop Wiki] Update of "SocketException" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "SocketException" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/SocketException?action=diff=2=3 Comment: java.net.SocketException: Permission denied Remember: These are [[YourNetworkYourProblem|your network configuration problems]] . Only you can fix them. + + == Permission denied == + + This can arise if the service is configured to listen on a port numbered less than 1024, but is not running as a user with the appropriate + permissions. + + {{{ + 2016-03-22 15:26:18,905 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in secureMain + java.net.SocketException: Permission denied + at sun.nio.ch.Net.bind0(Native Method) + at sun.nio.ch.Net.bind(Net.java:433) + at sun.nio.ch.Net.bind(Net.java:425) + at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) + at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) + at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) + at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:522) + at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1196) + at io.netty.channel.ChannelHandlerInvokerUtil.invokeBindNow(ChannelHandlerInvokerUtil.java:108) + at io.netty.channel.DefaultChannelHandlerInvoker.invokeBind(DefaultChannelHandlerInvoker.java:214) + at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:208) + at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:1003) + at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:216) + at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:357) + at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:322) + at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:356) + at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:703) + at io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:137) + at java.lang.Thread.run(Thread.java:745) + 2016-03-22 15:26:18,907 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1 + 2016-03-22 15:26:18,908 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG: + / + + }}} + + Fixes: either run the service (here, the Datanode) as a user with permissions, or change the service configuration to use a higher + numbered port. +
[Hadoop Wiki] Update of "ConnectionRefused" by SteveLoughran
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "ConnectionRefused" page has been changed by SteveLoughran: https://wiki.apache.org/hadoop/ConnectionRefused?action=diff=12=13 Comment: subdomains If the application or cluster is not working, and this message appears in the log, then it is more serious. - 1. Check the hostname the client using is correct. If it's in a Hadoop configuration option: examine it carefully, try doing an ping by hand + 1. Check the hostname the client using is correct. If it's in a Hadoop configuration option: examine it carefully, try doing an ping by hand. 1. Check the IP address the client is trying to talk to for the hostname is correct. - 1. Make sure the destination address in the exception isn't 0.0.0.0 -this means that you haven't actually configured the client with the real address for that + 1. Make sure the destination address in the exception isn't 0.0.0.0 -this means that you haven't actually configured the client with the real address for that. service, and instead it is picking up the server-side property telling it to listen on every port for connections. 1. If the error message says the remote service is on "127.0.0.1" or "localhost" that means the configuration file is telling the client that the service is on the local server. If your client is trying to talk to a remote system, then your configuration is broken. - 1. Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this) + 1. Check that there isn't an entry for your hostname mapped to 127.0.0.1 or 127.0.1.1 in /etc/hosts (Ubuntu is notorious for this). 1. Check the port the client is trying to talk to using matches that the server is offering a service on. 1. On the server, try a {{{telnet localhost }}} to see if the port is open there. 1. On the client, try a {{{telnet }}} to see if the port is accessible remotely. 1. Try connecting to the server/port from a different machine, to see if it just the single client misbehaving. + 1. If your client and the server are in different subdomains, it may be that the configuration of the service is only publishing the basic hostname, rather than the Fully Qualified Domain Name. The client in the different subdomain can be unintentionally attempt to bind to a host in the local subdomain —and failing. 1. If you are using a Hadoop-based product from a third party, -please use the support channels provided by the vendor. 1. Please do not file bug reports related to your problem, as they will be closed as [[http://wiki.apache.org/hadoop/InvalidJiraIssues|Invalid]]
[Hadoop Wiki] Update of "HowToReleasePre2.8" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToReleasePre2.8" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToReleasePre2.8?action=diff=81=82 ## page was renamed from HowToReleasePostMavenizationWithGit ## page was copied from HowToReleasePostMavenization ''This page is prepared for Hadoop Core committers. You need committer rights to create a new Hadoop Core release.'' + + The current version of this page is available at HowToRelease These instructions have been updated for Hadoop 2.5.1 and later releases to reflect the changes to version-control (git), build-scripts and mavenization.
[Hadoop Wiki] Update of "HowToRelease" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToRelease" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToRelease?action=diff=80=81 Comment: Remove manual CHANGES.txt related steps ## page was copied from HowToReleasePostMavenization ''This page is prepared for Hadoop Core committers. You need committer rights to create a new Hadoop Core release.'' - These instructions have been updated for Hadoop 2.5.1 and later releases to reflect the changes to version-control (git), build-scripts and mavenization. + These instructions have been updated for Hadoop 2.8.0 and later releases to reflect the changes to version-control (git), build-scripts and mavenization. - Earlier versions of this document are at HowToReleaseWithSvnAndAnt and HowToReleasePostMavenization + Earlier versions of this document are at HowToReleaseWithSvnAndAnt and HowToReleasePostMavenization and HowToReleasePre2.8 <> @@ -32, +32 @@ = Branching = When releasing Hadoop X.Y.Z, the following branching changes are required. Note that a release can match more than one of the following if-conditions. For a major release, one needs to make the changes for minor and point releases as well. Similarly, a new minor release is also a new point release. - 1. Add the release X.Y.Z to CHANGES.txt files if it doesn't already exist (leave the date as unreleased for now). Commit these changes to any '''live''' upstream branch. For example, if you are handling 2.6.2, commit the changes to trunk, branch-2, branch-2.6, and branch-2.7 (provided branch-2.7 is an active branch). - {{{ - git commit -a -m "Adding release X.Y.Z to CHANGES.txt" - }}} 1. If this is a new major release (i.e., Y = 0 and Z = 0) 1. Create a new branch (branch-X) for all releases in this major release. 1. Update the version on trunk to (X+1).0.0-SNAPSHOT @@ -100, +96 @@ mvn apache-rat:check }}} 1. Set environment variable version for later steps. {{{export version=X.Y.Z-RCN}}} - 1. Set the release date for X.Y.Z to the current date in each CHANGES.txt file in branch-X.Y.Z and commit the changes. - {{{ - git commit -a -m "Set the release date for $version" - }}} 1. Tag the release candidate: {{{ git tag -s release-$version -m "Release candidate - $version" @@ -139, +131 @@ = Publishing = In 5 days if [[http://hadoop.apache.org/bylaws#Decision+Making|the release vote passes]], the release may be published. - 1. Update the release date in CHANGES.txt to the final release vote passage date, and commit them to all live upstream branches (e.g., trunk, branch-X, branch-X.Y) to reflect the one in branch-X.Y.Z. Commit and push those changes. - {{{ - git commit -a -m "Set the release date for X.Y.Z" - }}} 1. Tag the release. Do it from the release branch and push the created tag to the remote repository: {{{ git tag -s rel/release-X.Y.Z -m "Hadoop X.Y.Z release" git push origin rel/release-X.Y.Z }}} - 1. Use [[https://builds.apache.org/job/HADOOP2_Release_Artifacts_Builder|this Jenkins job]] to create the final release files 1. Copy release files to the distribution directory 1. Check out the corresponding svn repo if need be {{{ svn co https://dist.apache.org/repos/dist/release/hadoop/common/ hadoop-dist }}} - 1. Generate new .mds files referring to the final release tarballs and not the RCs 1. Copy the release files to hadoop-dist/hadoop-${version} 1. Update the symlinks to current2 and stable2. The release directory usually contains just two releases, the most recent from two branches. 1. Commit the changes (it requires a PMC privilege)
[Hadoop Wiki] Update of "HowToCommit" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "HowToCommit" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/HowToCommit?action=diff=33=34 Comment: remove CHANGES.txt step, now autogenerated Committing a patch When you commit a patch, please follow these steps: - 1. '''CHANGES.txt:''' Add an entry in CHANGES.txt, at the end of the appropriate section. This should include the JIRA issue ID, and the name of the contributor. Attribution in CHANGES.txt should fall under the earliest release that is receiving the patch, and it should be consistent across all live branches. If the patch is targeted to 2.8.0, then its CHANGES.txt entry would go in the 2.8.0 section on trunk and branch-2. If the patch is targeted to 2.7.2, then its CHANGES.txt entry would go in the 2.7.2 section on trunk, branch-2 and branch-2.7. When backporting a patch that was previously committed for a later branch, please update its CHANGES.txt entry on all branches for accuracy. Suppose a patch initially targets 2.8.0, but then later becomes a candidate for 2.7.2. On the initial commit, it would have been listed under the 2.8.0 section on trunk and branch-2. After the decision to backport to 2.7.2, go back and update CHANGES.txt on all branches to match reality, moving it to the 2.7.2 section on trunk, branch-2 and branch-2.7. 1. '''Commit locally:''' Commit the change locally to the appropriate branch (should be ''trunk'' if it is not a feature branch) using {{{git commit -a -m }}}. The commit message should include the JIRA issue id, along with a short description of the change and the name of the contributor if it is not you. ''Note:'' Be sure to get the issue id right, as this causes JIRA to link to the change in git (use the issue's "All" tab to see these). Verify all the changes are included in the commit using {{{git status}}}. If there are any remaining changes (previously missed files), please commit them and squash these commits into one using {{{git rebase -i}}}. 1. '''Pull latest changes from remote repo:''' Pull in the latest changes from the remote branch using {{{git pull --rebase}}} (--rebase is not required if you have setup git pull to always --rebase). Verify this didn't cause any merge commits using {{{git log [--pretty=oneline]}}} 1. '''Push changes to remote repo:''' Build and run a test to ensure it is all still kosher. Push the changes to the remote (main) repo using {{{git push }}}.
[Hadoop Wiki] Update of "Roadmap" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Roadmap" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/Roadmap?action=diff=60=61 * Move default ports out of ephemeral range [[https://issues.apache.org/jira/browse/HDFS-9427|HDFS-9427]] * HDFS * Removal of hftp in favor of webhdfs [[https://issues.apache.org/jira/browse/HDFS-5570|HDFS-5570]] + * Support for more than two standby NameNodes [[https://issues.apache.org/jira/browse/HDFS-6440|HDFS-6440]] + * Support for Erasure Codes in HDFS [[https://issues.apache.org/jira/browse/HDFS-7285|HDFS-7285]] * YARN * MAPREDUCE * Derive heap size or mapreduce.*.memory.mb automatically [[https://issues.apache.org/jira/browse/MAPREDUCE-5785|MAPREDUCE-5785]] @@ -65, +67 @@ === hadoop-2.9 === * HADOOP * HDFS - * Support for Erasure Codes in HDFS [[https://issues.apache.org/jira/browse/HDFS-7285|HDFS-7285]] * YARN * MAPREDUCE
[Hadoop Wiki] Update of "Roadmap" by AndrewWang
Dear Wiki user, You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification. The "Roadmap" page has been changed by AndrewWang: https://wiki.apache.org/hadoop/Roadmap?action=diff=59=60 * Move to JDK8+ * Classpath isolation on by default [[https://issues.apache.org/jira/browse/HADOOP-11656|HADOOP-11656]] * Shell script rewrite [[https://issues.apache.org/jira/browse/HADOOP-9902|HADOOP-9902]] + * Move default ports out of ephemeral range [[https://issues.apache.org/jira/browse/HDFS-9427|HDFS-9427]] * HDFS * Removal of hftp in favor of webhdfs [[https://issues.apache.org/jira/browse/HDFS-5570|HDFS-5570]] * YARN