Re: East coast bigtop hackday/microtalks; Any interest?
if its live streamed, ill forward the feed to some folks. Artem Ervits Data Analyst New York Presbyterian Hospital From: Jay Vyas [mailto:jayunit...@gmail.com] Sent: Wednesday, November 20, 2013 08:15 AM To: user@bigtop.apache.org user@bigtop.apache.org Subject: Re: East coast bigtop hackday/microtalks; Any interest? artem: We have the hartford scalable computation meetup that i run. should we do the first one in stamford? thats about half way between and im more than happy to coordinate. On Wed, Nov 20, 2013 at 12:03 AM, Konstantin Boudnik c...@apache.orgmailto:c...@apache.org wrote: And as I said before - we'll do hangout video feed to simulate a presence effect ;) I will send the URL once we get closer to the event Cos On Sun, Nov 17, 2013 at 06:14PM, Roman Shaposhnik wrote: On Tue, Nov 12, 2013 at 8:55 AM, Jay Vyas jayunit...@gmail.commailto:jayunit...@gmail.com wrote: Hi folks. Is anyone interest in attending a bigtop meetup either in connecticut or massachusets to coincide with the one Kons is planning in california? Im sure some folks around here are using bigtop either for testing or development... maybe in the new york or boston areas? im in hartford so either location is valid for me... connecticut also would work (of course) as its central to both. :) Would be really nice to see an East Coast Bigtop meetup. In general, for all the remote folks -- we typically do have quite a lively IRC presence on #bigtop during our meetups/hackathons. It is not quite as being in the same room, but way better than nothing. Thanks, Roman. -- Jay Vyas http://jayunit100.blogspot.com This electronic message is intended to be for the use only of the named recipient, and may contain information that is confidential or privileged. If you are not the intended recipient, you are hereby notified that any disclosure, copying, distribution or use of the contents of this message is strictly prohibited. If you have received this message in error or are not the named recipient, please notify us immediately by contacting the sender at the electronic mail address noted above, and delete and destroy all copies of this message. Thank you.
Re: Getting Started Guide? (and some installation issues)
Hi Steve, Can you send me your private email and I will be able to send you my configuration up to now. HBase , partly Hive , partly Pig (not much but still something ;)) I think that experienced specialist needs some hours to configure bigtop (at max) Newbee like me needs some days at least ;). I want to start pseudo distributed node with hbase pig giraph solr mahout and hue. Flume is also interesting. What is your desire ? Best regards, Ivo P.S: if you want spark be very carefully by installing it : You have to give repository or you may be receive some Spark language at least that was the case in ubuntu. Am Mittwoch, 20. November 2013 schrieb i.frankov : Hi Stive, I am not at home I will send you my status tonight Cheers, Ivo Von Samsung Mobile gesendet
Re: Getting Started Guide? (and some installation issues)
Hi Ivo, I’m also working, to start with, on a pseudo distributed node. I’ve got a CentOS 6.x Amazon EC2 instance and using FreeNX for the remote desktop. So far there seems to be several services that aren’t started after the BigTop install. Here’s the output of ‘service —status-all' Flume NG agent is running [ OK ] Hadoop datanode is running [ OK ] Hadoop journalnode is running [ OK ] Hadoop namenode is running [ OK ] Hadoop secondarynamenode is running[ OK ] Hadoop zkfc is dead and pid file exists[FAILED] Hadoop httpfs is running [ OK ] Hadoop historyserver is dead and pid file exists [FAILED] Hadoop nodemanager is dead and pid file exists [FAILED] Hadoop proxyserver is dead and pid file exists [FAILED] Hadoop resourcemanager is running [ OK ] hald (pid 1031) is running... HBase master daemon is dead and pid file exists[FAILED] hbase-regionserver is not running. HBase rest daemon is running [ OK ] HBase thrift daemon is running [ OK ] HCatalog server is running [ OK ] Hive Metastore is dead and pid file exists [FAILED] Hive Server is running [ OK ] Hive Server2 is dead and pid file exists [FAILED] WEBHCat server is running [ OK ] … Spark master is not running[FAILED] Spark worker is not running[FAILED] spice-vdagentd is stopped Sqoop Server is running[ OK ] Am I correct in assuming that all of these services should be working properly after a BigTop install? As far as the individual components go, I’ve been working through the Running various Bigtop componentshttps://cwiki.apache.org/confluence/display/BIGTOP/Running+various+Bigtop+components wiki page. Pig and HBase are working so far, but Hive fails: hive create table doh(id int); FAILED: Error in metadata: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.metastore.HiveMetaStoreClient FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask This is after changing permissions to be world writable on: * /tmp * /user/hive/warehouse (actually, BigTop set that permission correctly) * /var/lib/hive/metastore/metastore_db/ (CentOS filesystem) Any ideas? I notice that Hive Server 2 'is dead and pid file exists’; looks like a good place to start. Regards, - Steve -- Illation Pty Ltd 8/350 Collins Street Melbourne 3000 T: +61 3 8399 9442 x100 M: +61 4 0096 4240 From: ivaylo frankov i.fran...@googlemail.commailto:i.fran...@googlemail.com Reply-To: user@bigtop.apache.orgmailto:user@bigtop.apache.org user@bigtop.apache.orgmailto:user@bigtop.apache.org Date: Thursday, 21 November 2013 3:04 To: user@bigtop.apache.orgmailto:user@bigtop.apache.org user@bigtop.apache.orgmailto:user@bigtop.apache.org Subject: Re: Getting Started Guide? (and some installation issues) Hi Steve, Can you send me your private email and I will be able to send you my configuration up to now. HBase , partly Hive , partly Pig (not much but still something ;)) I think that experienced specialist needs some hours to configure bigtop (at max) Newbee like me needs some days at least ;). I want to start pseudo distributed node with hbase pig giraph solr mahout and hue. Flume is also interesting. What is your desire ? Best regards, Ivo P.S: if you want spark be very carefully by installing it : You have to give repository or you may be receive some Spark language at least that was the case in ubuntu. Am Mittwoch, 20. November 2013 schrieb i.frankov : Hi Stive, I am not at home I will send you my status tonight Cheers, Ivo Von Samsung Mobile gesendet
Hive MetaStore Start Error
Tracing down the Hive Metastore error, it seems that the Thrift Server can’t create a socket. Any ideas? Cheers, - Steve 2013-11-21 07:08:48,056 ERROR metastore.HiveMetaStore (HiveMetaStore.java:main(4242)) - Metastore Thrift Server threw an exception... org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083. at org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:93) at org.apache.thrift.transport.TServerSocket.init(TServerSocket.java:75) at org.apache.hadoop.hive.metastore.TServerSocketKeepAlive.init(TServerSocketKeepAlive.java:34) at org.apache.hadoop.hive.metastore.HiveMetaStore.startMetaStore(HiveMetaStore.java:4282) at org.apache.hadoop.hive.metastore.HiveMetaStore.main(HiveMetaStore.java:4239) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.main(RunJar.java:212) From: ivaylo frankov i.fran...@googlemail.commailto:i.fran...@googlemail.com Reply-To: user@bigtop.apache.orgmailto:user@bigtop.apache.org user@bigtop.apache.orgmailto:user@bigtop.apache.org Date: Thursday, 21 November 2013 3:04 To: user@bigtop.apache.orgmailto:user@bigtop.apache.org user@bigtop.apache.orgmailto:user@bigtop.apache.org Subject: Re: Getting Started Guide? (and some installation issues) Hi Steve, Can you send me your private email and I will be able to send you my configuration up to now. HBase , partly Hive , partly Pig (not much but still something ;)) I think that experienced specialist needs some hours to configure bigtop (at max) Newbee like me needs some days at least ;). I want to start pseudo distributed node with hbase pig giraph solr mahout and hue. Flume is also interesting. What is your desire ? Best regards, Ivo P.S: if you want spark be very carefully by installing it : You have to give repository or you may be receive some Spark language at least that was the case in ubuntu. Am Mittwoch, 20. November 2013 schrieb i.frankov : Hi Stive, I am not at home I will send you my status tonight Cheers, Ivo Von Samsung Mobile gesendet
Re: Getting Started Guide? (and some installation issues)
Is there a ‘getting started’ guide? Beyond just installation, most of our documentation is very developer-centric, I'm afraid. What there is can be found on our wiki: https://cwiki.apache.org/confluence/display/BIGTOP/Index Something that will describe the filesystem and configuration file conventions? Bigtop is a distribution of other open-source projects, so there is no single configuration system. The file conventions will vary from project to project, however Bigtop does not modify much about how the configuration files work, so I would refer you to the upstream projects for details of their configuration files (eg. http://hadoop.apache.org, http://hbase.apache.org) In particular the existence of these conf.empty directories is confusing. The conf.dist and conf.empty directories provide some default or template configuration files. You should create a directory at the same level for your own configuration. Perhaps conf.steven. There is a symlink for each component at /etc/component/conf. This symlink, through a system called alternatives, eventually points to the currently active configuration for that component. Once you have modified the configuration to suit your needs, you can make it the active configuration using the alternatives command. See here for it's documentation: http://linux.about.com/library/cmd/blcmdl8_alternatives.htm. For example, if you look at the /etc/hadoop/conf symlink, you will probably find that it points to /etc/alternatives/hadoop-conf. You can see how the alternatives are configured and point the configuration to your new folder like this: alternatives --display hadoop-conf alternatives --set hadoop-conf /etc/hadoop/conf.steven Is Hue supposed to be configured separately, or is BigTop supposed to do that? As I recall, the misconfigurations that are reported at startup are things like services not running (like Oozie, etc.) Once you configure and start those services, these warnings should disappear. For other warnings, post them here and we'll see if we can help you. What is the target time to set-up a Hadoop installation via BigTop? Not sure what to tell you here. I regularly set up pseudo-distributed Hadoop installations in minutes with little more than yum install hadoop-conf-pseudo, sudo service hadoop-hdfs-namenode init and a reboot. If you're using a bunch of other services on a fully-distributed cluster and you're completely new to this, I would expect it take hours / days to get everything running. Bigtop also maintains puppet code that will configure everything with a pretty good default configuration and have your cluster working pretty much out-of-the-box. Maybe this is a good option for you? Can you send me your private email and I will be able to send you my configuration up to now. As I mentioned, our documentation is very developer-centric, and as Steven is showing, some user-centric documentation would be a huge help to the community. Could I persuade you to share what you've learned on the mailing list, or perhaps on the wiki so others can benefit? On Wed, Nov 20, 2013 at 11:04 AM, ivaylo frankov i.fran...@googlemail.comwrote: Hi Steve, Can you send me your private email and I will be able to send you my configuration up to now. HBase , partly Hive , partly Pig (not much but still something ;)) I think that experienced specialist needs some hours to configure bigtop (at max) Newbee like me needs some days at least ;). I want to start pseudo distributed node with hbase pig giraph solr mahout and hue. Flume is also interesting. What is your desire ? Best regards, Ivo P.S: if you want spark be very carefully by installing it : You have to give repository or you may be receive some Spark language at least that was the case in ubuntu. Am Mittwoch, 20. November 2013 schrieb i.frankov : Hi Stive, I am not at home I will send you my status tonight Cheers, Ivo Von Samsung Mobile gesendet
FW: Hive Clues
Gents, Happy to discuss the problems and resolutions here so there’s a community record. Below is a few clues on the Hive problems. - SteveN From: Steven Nunez steven.nu...@illation.commailto:steven.nu...@illation.com Date: Thursday, 21 November 2013 9:32 To: i.fran...@googlemail.commailto:i.fran...@googlemail.com i.fran...@googlemail.commailto:i.fran...@googlemail.com Subject: Hive Clues Hi Ivo, I don’t know how far you got with Hive, but I have found some clues as to why my configuration isn’t working. I sent a message to the group at large regarding some socket bind errors. After some searching, I found a post on Hive 2 installationhttp://danieladeniji.wordpress.com/2013/05/24/technical-hadoopcloudera-cdhhive-v2-installation/ that suggests /etc/hive/conf/hive-site.xml needs a hive.metastore.uris property entry. This didn’t work for me, but I’ll try a reboot and some other measures to be sure. Have you worked out what these conf.empty, conf.dist, conf.* directories are for? Is there a convention for their usage? Regards, - SteveN
CentOS Out of Box Install Summary
Gents, Below is a summary of the results of an out of the box CentOS/EC2 BigTop 0.70.0 install. It lists all the components I need for the project I’m writing about. What would be useful somewhere on the wiki is a list of known issues and a page to some possible resolutions. This could be as easy as taking this list and adding a third column ‘workaround’ with a page on how to fix it. It could also be used as a QA page of sorts, on the assumption that all of the components are supposed to work out of the box (looks like some of the init.d scripts aren’t quite right either judging by the error below). Cheers, - SteveN Hadoop datanode is running [ OK ] Hadoop journalnode is running [ OK ] Hadoop namenode is running [ OK ] Hadoop secondarynamenode is running[ OK ] Hadoop zkfc is dead and pid file exists[FAILED] Hadoop httpfs is running [ OK ] Hadoop historyserver is dead and pid file exists [FAILED] Hadoop nodemanager is dead and pid file exists [FAILED] Hadoop proxyserver is dead and pid file exists [FAILED] Hadoop resourcemanager is running [ OK ] hald (pid 1041) is running... HBase master daemon is dead and pid file exists[FAILED] hbase-regionserver is not running. HBase rest daemon is running [ OK ] HBase thrift daemon is running [ OK ] HCatalog server is running [ OK ] Hive Metastore is dead and pid file exists [FAILED] Hive Server is running [ OK ] Hive Server2 is dead and pid file exists [FAILED] not running but /var/run/oozie/oozie.pid exists. Spark master is not running[FAILED] Spark worker is not running[FAILED] spice-vdagentd is stopped Sqoop Server is running[ OK ]
Re: Getting Started Guide? (and some installation issues)
Is this puppet code in the install? If so, where? I’m thinking I should perhaps start working with a src version of BigTop, but I was really hoping to spend as little time as possible on configuring my Hadoop ecosystem and maximize problem solving. Seems I keep getting deeper into the rabbit warren. From: Sean Mackrory mackror...@gmail.commailto:mackror...@gmail.com Reply-To: user@bigtop.apache.orgmailto:user@bigtop.apache.org user@bigtop.apache.orgmailto:user@bigtop.apache.org Date: Thursday, 21 November 2013 10:10 To: user@bigtop.apache.orgmailto:user@bigtop.apache.org user@bigtop.apache.orgmailto:user@bigtop.apache.org Subject: Re: Getting Started Guide? (and some installation issues) Bigtop also maintains puppet code that will configure everything with a pretty good default configuration and have your cluster working pretty much out-of-the-box. Maybe this is a good option for you?
Probably Bugs in Hive Install
Gents, The summary of this is: the system is reporting metastore not running (nor hive server 2), yet hive is now working. See below email where I was going to request diagnoses help, only to discover that the example now works, for reasons unknown. Cheers, - SteveN -- Original Message - I’ve got a bit closer. The metastore and hive server aren’t running: org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:9083 org.apache.thrift.transport.TTransportException: Could not create ServerSocket on address 0.0.0.0/0.0.0.0:1 Because something is already bound to those ports (from netstat): tcp0 0 0.0.0.0:1 0.0.0.0:* LISTEN 3071/java tcp0 0 0.0.0.0:90830.0.0.0:* LISTEN 2827/java Whatever is listening on those ports doesn’t speak HTTP, and 1 just closes a telnet connection straight away. Grepping through /etc doesn’t produce much of value, or anything I don’t already know: nunez$ grep -R 9083 * alternatives/hive-conf/hive-site.xml: valuethrift://localhost:9083/value grep: alternatives/oozie-tomcat-conf/webapps/oozie/ext-2.2: No such file or directory default/hive-hcatalog-server:export METASTORE_PORT=9083 hive/conf.dist/hive-site.xml: valuethrift://localhost:9083/value hive/conf/hive-site.xml: valuethrift://localhost:9083/value grep: oozie/tomcat-deployment.http/webapps/oozie/ext-2.2: No such file or directory grep: oozie/tomcat-deployment.https/webapps/oozie/ext-2.2: No such file or directory grep: oozie/tomcat-deployment/webapps/oozie/ext-2.2: No such file or directory I modified the hive-site.xml file based on a ‘blog post I found earlier, but the port was bound even before that change. So, does anyone have any ideas about who is listening to these ports and why? I would have thought them reserved for Hive, but some Java program thinks otherwise. This of course is assuming that my Hive failure is caused by something related: STOP — THIS IS STRANGE. So, what was going to appear here is the Hive failure I reported yesterdayhttp://mail-archives.apache.org/mod_mbox/bigtop-user/201311.mbox/%3cceb36003.217ff%25steve.nu...@illation.com%3e. However upon repeating the commands, it now works: hive create table doh(id int); OK Time taken: 5.344 seconds So, I guess the question now is: Why is this working when ‘service —status-all’ reports that the services are failed, and there’s the above two socket errors in the log files? Cheers, - SteveN P.S. Interesting that grep is reporting some kind of ooze filesystem errors. Oozie is also failing, but I haven’t got around to looking at it yet. I wonder if these errors are part of the reason though.
0.70.0 Mahout Examples
Anyone got the Mahout examples running? The ones from the Running Various Bigtop Components pagehttps://cwiki.apache.org/confluence/display/BIGTOP/Running+various+Bigtop+components? I get: /usr/lib/hadoop-hdfs/bin/hdfs: line 34: /usr/lib/hadoop-hdfs/bin/../libexec/hdfs-config.sh: No such file or directory /usr/lib/hadoop-hdfs/bin/hdfs: line 150: cygpath: command not found /usr/lib/hadoop-hdfs/bin/hdfs: line 191: exec: : not found Why is this looking for a cygpath? All the other Mahout examples seem to work. - Steve
Re: Best Way to Determine Package Versions
I look in bigtop.mk after uncompressing the source tarball or zip file which corresponds to the version I am using. On Wed, Nov 20, 2013 at 8:44 PM, Steven Núñez steven.nu...@illation.comwrote: Gents, What is the best way to determine the particular version of a BigTop package? A command for this would be very useful. This particular use case involves trying out the Oozie component according to the Running Various BigTop Componentshttps://cwiki.apache.org/confluence/display/BIGTOP/Running+various+Bigtop+componentswiki page. The instructions for Oozie appear to be out of date, and the Oozie website http://oozie.apache.org has different configurations for different versions. It seems that the Oozie instructions on that page are particularly out of date. Regards, - SteveN
Re: Best Way to Determine Package Versions
On 11/20/2013 08:44 PM, Steven Núñez wrote: Gents, What is the best way to determine the particular version of a BigTop package? A command for this would be very useful. This particular use case involves trying out the Oozie component according to the Running Various BigTop Components https://cwiki.apache.org/confluence/display/BIGTOP/Running+various+Bigtop+components wiki page. The instructions for Oozie appear to be out of date, and the Oozie website http://oozie.apache.org has different configurations for different versions. It seems that the Oozie instructions on that page are particularly out of date. Regards, - SteveN I assume you are referring to an installed package On a RPM based GNU/Linux distribution: rpm -qi package name rpm -qf file or directory Note that you can also use yum (yum info, yum list). for deb based distributions, I don't have one handy right now, but the following link should help you in that regard: http://www.debian.org/doc/manuals/debian-faq/ch-pkgtools.en.html Thanks, Bruno