Just to give a feedback to the list, fsck.ext4 reported some size estimation errors for the fs ... because of that, even if not full (df -h) files creation failed, and therefore was causing issues to hadoop.
Everything recovered well after that. JM Le 2 mai 2013 17:47, "Andrew Purtell" <[email protected]> a écrit : > By "expensive" I mean "seriously?" > > > On Thu, May 2, 2013 at 2:32 PM, Michael Segel <[email protected] > >wrote: > > > > > On May 2, 2013, at 4:18 PM, Andrew Purtell <[email protected]> wrote: > > > > > Sorry, hit send too soon. I would recommend the following instance > types: > > > > > > hi1.4xlarge: Expensive but it has a comfortable level of resources > and > > > will perform > > > > Yeah at a spot price of $3.00 an hour per server? > > Its expensive and fast. Note that you will want to up the number of slots > > from the default 2 that are set up. ;-) > > More tuning is recommended. (Ooops! That's for EMR not just EC2) > > > > > > > hs1.8xlarge: This is what you might see in a typical data center > > Hadoop > > > deployment, also expensive > > > m2.2xlarge/m2.4xlarge: Getting up to the amount of RAM you want for > > > caching in "big data" workloads > > > m1.xlarge: Less CPU but more RAM than c1.xlarge, so safer > > > c1.xlarge: Only if you really know what you are doing and need to be > > > cheap > > > Anything lesser endowed: Never > > > > > > You may find that, relative to AWS charges for a hi1.4xlarge, some > other > > > hosting option for the equivalent would be more economical. > > > > > > > > > On Thu, May 2, 2013 at 2:12 PM, Andrew Purtell <[email protected]> > > wrote: > > > > > >>> OS is Ubuntu 12.04 and instance type is c1.medium > > >> > > >> Eeek! > > >> > > >> You shouldn't use less than c1.xlarge for running Hadoop+HBase on > EC2. A > > >> c1.medium has only 7 GB of RAM in total. > > >> > > >> > > >> On Thu, May 2, 2013 at 1:53 PM, Loic Talon <[email protected]> wrote: > > >> > > >>> Hi Andrew, > > >>> Thanks for those responses. > > >>> > > >>> The server has been deployed by Cloudera Manager. > > >>> OS is Ubuntu 12.04 and instance type is c1.medium. > > >>> Instance store are used, not EBS. > > >>> > > >>> It's possible that this problem is a memory problem ? > > >>> Because when region server hab been started I have in stdout.log : > > >>> > > >>> Thu May 2 17:01:10 UTC 2013 > > >>> using /usr/lib/jvm/j2sdk1.6-oracle as JAVA_HOME > > >>> using 4 as CDH_VERSION > > >>> using as HBASE_HOME > > >>> using /run/cloudera-scm-agent/process/381-hbase-REGIONSERVER as > > >>> HBASE_CONF_DIR > > >>> using /run/cloudera-scm-agent/process/381-hbase-REGIONSERVER as > > >>> HADOOP_CONF_DIR > > >>> using as HADOOP_HOME > > >>> > > >>> But when I have the problem, I have in stdout.log : > > >>> Thu May 2 17:01:10 UTC 2013 > > >>> using /usr/lib/jvm/j2sdk1.6-oracle as JAVA_HOME > > >>> using 4 as CDH_VERSION > > >>> using as HBASE_HOME > > >>> using /run/cloudera-scm-agent/process/381-hbase-REGIONSERVER as > > >>> HBASE_CONF_DIR > > >>> using /run/cloudera-scm-agent/process/381-hbase-REGIONSERVER as > > >>> HADOOP_CONF_DIR > > >>> using as HADOOP_HOME > > >>> # > > >>> # java.lang.OutOfMemoryError: Java heap space > > >>> # -XX:OnOutOfMemoryError="kill -9 %p" > > >>> # Executing /bin/sh -c "kill -9 20140"... > > >>> > > >>> Thanks > > >>> > > >>> Loic > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> > > >>> Loïc TALON > > >>> > > >>> > > >>> [email protected] <http://teads.tv/> > > >>> Video Ads Solutions > > >>> > > >>> > > >>> > > >>> 2013/5/2 Andrew Purtell <[email protected]> > > >>> > > >>>> Every instance type except t1.micro has a certain number of instance > > >>>> storage (locally attached disk) volumes available, 1, 2, or 4 > > depending > > >>> on > > >>>> type. > > >>>> > > >>>> As you probably know, you can use or create AMIs backed by > > >>> instance-store, > > >>>> in which the OS image is constructed on locally attached disk by a > > >>> parallel > > >>>> fetch process from slices of the root volume image stored in S3, or > > >>> backed > > >>>> by EBS, in which case the OS image is an EBS volume and attached > over > > >>> the > > >>>> network, like a SAN. > > >>>> > > >>>> If you launch an Amazon Linux instance store backed instance the > first > > >>>> "ephemeral" local volume will be automatically attached on > > >>>> /media/ephemeral0. That's where that term comes from, it's a synonym > > for > > >>>> instance-store. (You can by the way tell CloudInit via directives > sent > > >>> over > > >>>> instance data to mount all of them.) > > >>>> > > >>>> If you have an EBS backed instance the default is to NOT attach any > of > > >>>> these volumes. > > >>>> > > >>>> If you are launching your instance with the Amazon Web console, in > the > > >>>> volume configuration part you can set up instance-store aka > > "ephemeral" > > >>>> mounts whether it is instance-store backed or EBS backed. > > >>>> > > >>>> Sorry I can't get into more background on this. Hope it helps. > > >>>> > > >>>> > > >>>> > > >>>> On Thu, May 2, 2013 at 1:23 PM, Jean-Marc Spaggiari < > > >>>> [email protected] > > >>>>> wrote: > > >>>> > > >>>>> Hi Andrew, > > >>>>> > > >>>>> No, this AWS instance is configured with instance stores too. > > >>>>> > > >>>>> What do you mean by "ephemeral"? > > >>>>> > > >>>>> JM > > >>>>> > > >>>>> 2013/5/2 Andrew Purtell <[email protected]> > > >>>>> > > >>>>>> Oh, I have faced issues with Hadoop on AWS personally. :-) But not > > >>> this > > >>>>>> one. I use instance-store aka "ephemeral" volumes for DataNode > block > > >>>>>> storage. Are you by chance using EBS? > > >>>>>> > > >>>>>> > > >>>>>> On Thu, May 2, 2013 at 1:10 PM, Jean-Marc Spaggiari < > > >>>>>> [email protected] > > >>>>>>> wrote: > > >>>>>> > > >>>>>>> But that's wierld. This instance is running on AWS. If there > > >>> issues > > >>>>> with > > >>>>>>> Hadoop and AWS I think some other people will have faced it > before > > >>>> me. > > >>>>>>> > > >>>>>>> Ok. I will move the discussion on the Hadoop mailing list since > it > > >>>>> seems > > >>>>>> to > > >>>>>>> be more related to hadoop vs OS. > > >>>>>>> > > >>>>>>> Thank, > > >>>>>>> > > >>>>>>> JM > > >>>>>>> > > >>>>>>> 2013/5/2 Andrew Purtell <[email protected]> > > >>>>>>> > > >>>>>>>>> 2013-05-02 14:02:41,063 INFO org.apache.hadoop.hdfs.DFSClient: > > >>>>>>> Exception > > >>>>>>>> in > > >>>>>>>> createBlockOutputStream java.io.EOFException: Premature EOF: no > > >>>>> length > > >>>>>>>> prefix available > > >>>>>>>> > > >>>>>>>> The DataNode aborted the block transfer. > > >>>>>>>> > > >>>>>>>>> 2013-05-02 14:02:41,063 ERROR org.apache.hadoop.hdfs.server. > > >>>>>>>> datanode.DataNode: > > >>>>>>>> ip-10-238-38-193.eu-west-1.compute.internal:50010:DataXceiver > > >>>>>>>> error processing WRITE_BLOCK operation src: / > > >>> 10.238.38.193:39831 > > >>>>> dest: > > >>>>>>> / > > >>>>>>>> 10.238.38.193:50010 java.io.FileNotFoundException: > > >>>>>>> /mnt/dfs/dn/current/BP- > > >>>>>>>> 1179773663-10.238.38.193-1363960970263/current/rbw/blk_ > > >>>>>>>> 7082931589039745816_1955950.meta (Invalid argument) > > >>>>>>>>> at java.io.RandomAccessFile.open(Native Method) > > >>>>>>>>> at > > >>>>> java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) > > >>>>>>>> > > >>>>>>>> This looks like the native (OS level) side of RAF got EINVAL > > >>> back > > >>>>> from > > >>>>>>>> create() or open(). Go from there. > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> On Thu, May 2, 2013 at 12:27 PM, Jean-Marc Spaggiari < > > >>>>>>>> [email protected]> wrote: > > >>>>>>>> > > >>>>>>>>> Hi, > > >>>>>>>>> > > >>>>>>>>> Any idea what can be the cause of a "Premature EOF: no length > > >>>>> prefix > > >>>>>>>>> available" error? > > >>>>>>>>> > > >>>>>>>>> 2013-05-02 14:02:41,063 INFO org.apache.hadoop.hdfs.DFSClient: > > >>>>>>> Exception > > >>>>>>>> in > > >>>>>>>>> createBlockOutputStream > > >>>>>>>>> java.io.EOFException: Premature EOF: no length prefix > > >>> available > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:171) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1105) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.nextBlockOutputStream(DFSOutputStream.java:1039) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:487) > > >>>>>>>>> 2013-05-02 14:02:41,064 INFO org.apache.hadoop.hdfs.DFSClient: > > >>>>>>> Abandoning > > >>>>>>>>> > > >>>>>>> > > >>>>> > > >>> > > BP-1179773663-10.238.38.193-1363960970263:blk_7082931589039745816_1955950 > > >>>>>>>>> 2013-05-02 14:02:41,068 INFO org.apache.hadoop.hdfs.DFSClient: > > >>>>>>> Excluding > > >>>>>>>>> datanode 10.238.38.193:50010 > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> I'm getting that on a server start. Logs are splitted > > >>> correctly, > > >>>>>>>>> coprocessors deployed corretly, and then I'm getting this > > >>>>> exception. > > >>>>>>> It's > > >>>>>>>>> excluding the datanode, and because of that almost everything > > >>>>>> remaining > > >>>>>>>> is > > >>>>>>>>> failing. > > >>>>>>>>> > > >>>>>>>>> There is only one server in this "cluster"... But even so, it > > >>>>> should > > >>>>>> be > > >>>>>>>>> working. There is one master, one RS, one NN and one DN. On a > > >>> AWS > > >>>>>> host. > > >>>>>>>>> > > >>>>>>>>> At the same time on the hadoop datanode side I'm getting that: > > >>>>>>>>> > > >>>>>>>>> 2013-05-02 14:02:41,063 INFO > > >>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock > > >>>>>>>>> > > >>>>>>> > > >>>>> > > >>> > > BP-1179773663-10.238.38.193-1363960970263:blk_7082931589039745816_1955950 > > >>>>>>>>> received exception java.io.FileNotFoundException: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > /mnt/dfs/dn/current/BP-1179773663-10.238.38.193-1363960970263/current/rbw/blk_7082931589039745816_1955950.meta > > >>>>>>>>> (Invalid argument) > > >>>>>>>>> 2013-05-02 14:02:41,063 ERROR > > >>>>>>>>> org.apache.hadoop.hdfs.server.datanode.DataNode: > > >>>>>>>>> ip-10-238-38-193.eu-west-1.compute.internal:50010:DataXceiver > > >>>> error > > >>>>>>>>> processing WRITE_BLOCK operation src: /10.238.38.193:39831 > > >>> dest: > > >>>>> / > > >>>>>>>>> 10.238.38.193:50010 > > >>>>>>>>> java.io.FileNotFoundException: > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > /mnt/dfs/dn/current/BP-1179773663-10.238.38.193-1363960970263/current/rbw/blk_7082931589039745816_1955950.meta > > >>>>>>>>> (Invalid argument) > > >>>>>>>>> at java.io.RandomAccessFile.open(Native Method) > > >>>>>>>>> at > > >>>>> java.io.RandomAccessFile.<init>(RandomAccessFile.java:216) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.server.datanode.ReplicaInPipeline.createStreams(ReplicaInPipeline.java:187) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.server.datanode.BlockReceiver.<init>(BlockReceiver.java:199) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:457) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opWriteBlock(Receiver.java:103) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:67) > > >>>>>>>>> at > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>> > > >>>> > > >>> > > > org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221) > > >>>>>>>>> at java.lang.Thread.run(Thread.java:662) > > >>>>>>>>> > > >>>>>>>>> > > >>>>>>>>> Does is sound more an hadoop issue than an HBase one? > > >>>>>>>>> > > >>>>>>>>> JM > > >>>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> > > >>>>>>>> -- > > >>>>>>>> Best regards, > > >>>>>>>> > > >>>>>>>> - Andy > > >>>>>>>> > > >>>>>>>> Problems worthy of attack prove their worth by hitting back. - > > >>> Piet > > >>>>>> Hein > > >>>>>>>> (via Tom White) > > >>>>>>>> > > >>>>>>> > > >>>>>> > > >>>>>> > > >>>>>> > > >>>>>> -- > > >>>>>> Best regards, > > >>>>>> > > >>>>>> - Andy > > >>>>>> > > >>>>>> Problems worthy of attack prove their worth by hitting back. - > Piet > > >>>> Hein > > >>>>>> (via Tom White) > > >>>>>> > > >>>>> > > >>>> > > >>>> > > >>>> > > >>>> -- > > >>>> Best regards, > > >>>> > > >>>> - Andy > > >>>> > > >>>> Problems worthy of attack prove their worth by hitting back. - Piet > > Hein > > >>>> (via Tom White) > > >>>> > > >>> > > >> > > >> > > >> > > >> -- > > >> Best regards, > > >> > > >> - Andy > > >> > > >> Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > >> (via Tom White) > > >> > > > > > > > > > > > > -- > > > Best regards, > > > > > > - Andy > > > > > > Problems worthy of attack prove their worth by hitting back. - Piet > Hein > > > (via Tom White) > > > > > > > -- > Best regards, > > - Andy > > Problems worthy of attack prove their worth by hitting back. - Piet Hein > (via Tom White) >
