2010/5/17 Tatsuya Kawano
>
> Hi,
>
> On 05/17/2010, at 11:50 PM, Todd Lipcon wrote:
>
> > 2010/5/16 Tatsuya Kawano
> >
> >> 2. On Hadoop trunk, I'd prefer not to hflush() every single put, but
> rely
> >> on un-flushed replicas on HDFS node
es staged for CDH3 that will also make the performance of
this quite competitive by pipelining hflushes - basically it has little to
no effect on throughput, but only a few ms penalty on each write.
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
ant to put 4G of data in memory!
-Todd
> -Original Message-
> From: Todd Lipcon [mailto:t...@cloudera.com]
> Sent: Sat 5/15/2010 3:51 AM
> To: hbase-user@hadoop.apache.org
> Subject: Re: Using HBase on other file systems
>
> On Fri, May 14, 2010 at 2:15 PM, Gibbon, Ro
gt; >
> > We've rolled back to 0.20.3, preferring to bear those ills we have than
> fly
> > to others we know not of. YMMV, our cluster was messed up when we
> started.
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
Using HBase on other file systems
> >
> > You really want to run HBase backed by Eucalyptus' Walrus? What do you
> have
> > behind that?
> >
> > > From: Gibbon, Robert, VF-Group
> > > Subject: RE: Using HBase on other file systems
> > [...]
> > > NB. I checked out running HBase over Walrus (an AWS S3
> > > clone): bork - you want me to file a Jira on that?
> >
> >
> >
> >
> >
> >
> >
> >
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>
>
--
Todd Lipcon
Software Engineer, Cloudera
[...]
> > NB. I checked out running HBase over Walrus (an AWS S3
> > clone): bork - you want me to file a Jira on that?
>
>
>
>
>
>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
code (including HBase's, so It will take me
> some time...) and see what exactly happens in our scenario, because from my
> current knowledge the jstack outputs don't mean enough to me.
>
>
>
> Friso
>
>
>
>
> On May 13, 2010, at 7:09 PM, Todd Lipcon wrote
on <0x2aaab364c9d0> (a java.lang.Object)
>at org.apache.hadoop.hbase.util.Sleeper.sleep(Sleeper.java:89)
>- locked <0x2aaab364c9d0> (a java.lang.Object)
>at org.apache.hadoop.hbase.Chore.run(Chore.java:76)
>
> I still have no clue what ha
nd
>> region servers as well. The inputs to the periodic MapReduce jobs are very
>> small (ranging from some Kb to several Mb) and thus contain not so many
>> records. I know this is not very efficient to do in MapReduce and will be
>> faster when inserted in process by the importer process because of startup
>> overhead, but we are setting up this architecture of importers and insertion
>> for anticipated larger loads (up to 80 million records per day).
>>
>> Does anyone have a clue about what happens? Or where to look for further
>> investigation?
>>
>> Thanks a lot!
>>
>>
>> Cheers,
>> Friso
>>
>>
>
--
Todd Lipcon
Software Engineer, Cloudera
tion: Block blk_-26696347696547536_49275 is not valid.
>
> So let me know if I should change the log level and whether I need to enable
> NN clienttrace. I don't think I'll have to wait long for this problem to
> reappear. It seems to be happening almost every day now.
>
ten
> >> to
> >> W nodes, which is configurable. In case of HBase, the replication is taken
> >> care of by the filesystem (HDFS). When the region is flushed to the disk,
> >> HDFS replicates the HFiles (in which the data for the regions is stored).
> >> For more details of the working, read the Bigtable paper and
> >> http://www.larsgeorge.com/2009/10/hbase-architecture-101-storage.html.
> >
> >
--
Todd Lipcon
Software Engineer, Cloudera
igrated to the new regionserver)
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
em may hit us again
> soon because it has happened twice in the past two days now.
>
> -James
>
>
> On Sat, May 8, 2010 at 12:30 AM, Todd Lipcon wrote:
>
> > If you can grep for '4841840178880951849' as well
> > as /hbase/users/73382377/data/312780071564432169 ac
another block/file/datanode/region
> > > server? We're using 3x replication in HDFS, and we have 8 data nodes
> > which
> > > double as our region servers.
> > > 3. Are there any best practices for achieving high availability in an
> > HBase
> > &g
ere any best practices for achieving high availability in an
> HBase
> > cluster? How can I configure the system to gracefully (and
> automatically)
> > handle these types of problems?
> >
>
> Let us know what your hadoop is and then we figure more on the issues
> above.
> Thanks James,
> St.Ack
> P.S. Its eight node cluster on what kinda hw? (You've probably said in
> the past and I can dig through mail -- just say -- and then what kind
> of loading are you applying? Ditto for if you've said this already)
>
--
Todd Lipcon
Software Engineer, Cloudera
7;re currently shooting for a
> > > prerelease
> > > > date of mid July), but your requirements seem to match closely what
> > > we are
> > > > building at the moment.
> > > >
> > > > Lily sources will be released under an Apache license from
> > > www.lilycms.org
> > > >
> > > > Cheers,
> > > >
> > > > Steven.
> > > > --
> > > > Steven Noelshttp://outerthought.org/
> > > > OuterthoughtOpen Source Java & XML
> > > > stevenn at outerthought.org Makers of the Daisy CMS
> > > >
> > >
> > > A simple alternative to secondary indexes is to store the table a
> > > second
> > > time:
> > >
> > > Key -> Value
> > > and
> > > Value -> Key
> > >
> > > With this design you can search on the key or the value quickly. With
> > > this,
> > > a single insert is transformed into multiple inserts and keeping data
> > > integrity falls on the user.
>
--
Todd Lipcon
Software Engineer, Cloudera
> >>>
> >>>val get = new Get(rowId)
> >>>
> >>>val lock = table.lockRow(rowId) // will expire in one minute
> >>>try {
> >>> if (table.exists(get)) {
> >>>throw new DuplicateRowException("Tried to insert a duplicate
> row: "
> >>>+ Bytes.toString(rowId))
> >>>
> >>> } else {
> >>>val put = new Put(rowId, lock)
> >>>put.add(family, qualifier, value)
> >>>
> >>>table.put(put)
> >>> }
> >>>
> >>>} finally {
> >>> table.unlockRow(lock)
> >>>}
> >>>
> >>> }
> >>> ===
> >>>
> >>> Thanks,
> >>>
> >>> --
> >>> 河野 達也
> >>> Tatsuya Kawano (Mr.)
> >>> Tokyo, Japan
> >>>
> >>> twitter: http://twitter.com/tatsuya6502
>
--
Todd Lipcon
Software Engineer, Cloudera
into two tables, one has several indexes and I'm
> >>>> loading into three column families, the other has no indexes and one
> >>>> column family. Both tables only currently have two region each.
> >>>>
> >>>> The regionserver that serves the indexed table's regions is using the
> >>>> most CPU but is 87% idle. The other servers are all at ~90% idle.
> There
> >>>> is no IO wait. the perl processes are barely ticking over. Java on the
> >>>> most "loaded" server is using about 50-60% of one CPU.
> >>>>
> >>>> Normally when I do load in a pseudo-distrbuted hbase (my development
> >>>> platform) perl's speed is the limiting factor and uses about 85% of a
> >>>> CPU. In this cluster they are using only 5-10% of a CPU as they are
> all
> >>>> waiting on thrift (hbase). When I run only 1 process on the cluster,
> >>>> perl uses much more of a CPU, maybe 70%.
> >>>>
> >>>> Any tips or help in getting the speed/scalability up would be great.
> >>>> Please let me know if you need any other info.
> >>>>
> >>>> As I send this - it looks like the main table has split again and is
> >>>> being served by three regionservers.. My performance is going up a bit
> >>>> (now 35 rows/sec/table per processes), but still seems like I'm not
> >>>> using the full potential of even the limited EC2 system, no IO wait
> and
> >>>> lots of idle CPU.
> >>>>
> >>>>
> >>>> many thanks
> >>>> -chris
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>
> >>
>
>
--
Todd Lipcon
Software Engineer, Cloudera
e of this technology and updates are released quite
> frequently. There are also JVM updates, kernel updates etc. so system
> must be resistant to loss of one of the nodes. (We already manage to
> make disaster with only one node see my earlier post on group).
>
> Todd if you need more logs drop me an email. I can provide you will
> all logs from hbase and hadoop.
>
>
> Thanks,
> Michal.
>
--
Todd Lipcon
Software Engineer, Cloudera
able to include it in CDH3 (no
promises though, stability come first!).
-Todd
>
>
> 2010/4/26 Todd Lipcon
>
> > On Mon, Apr 26, 2010 at 3:36 PM, Geoff Hendrey
> > wrote:
> >
> > > Let me preface this by saying that you all know much better than I do
> > w
gt; like NIO"). I'd like my entire table to be distributed across region
> > servers, so that random reads are quickly served by a region server
> > without having to transfer a block from HDFS. Is this the right
> > approach? I would have thought that some sort of memory-mapped region
> > file would be perfect for this. Anyway, just looking to understand the
> > best practice(s).
> >
> >
> > -geoff
> >
> >
> >
>
--
Todd Lipcon
Software Engineer, Cloudera
hbase-daemon.sh stop regionserver
on 3 of the nodes? Are you doing all three at once or in quick succession?
I'd like to try to reproduce your problem so we can get it fixed for 0.20.5.
Thanks
-Todd
>
>
> 2010/4/26 Todd Lipcon :
> > Hi Michal,
> >
> > What version
plain
> what happened when we shutdown 3/6 regions servers? Why cluster got
> into inconsistent state with so many missing regions? Is this so extra
> usual situation that hbase can't handle?
>
> Thanks,
> Michal
>
--
Todd Lipcon
Software Engineer, Cloudera
> > > String type = "c";
> > > if (userID == null) {
> > > userID = EventUtils.extractAgentID(line);
> > > type = "a";
> > > }
> > > if (userID != null) {
> > > containedUser = true;
> > > int attempt
concurrent reducers, each
> > of which writes into HBase, with 32,000 row flush buffers.
>
Do you really mean 200 concurrent reducers?? That is to say 100 reducers
per box? I would recommend that only if you have a 100+ core machine... not
likely.
FYI typical values for reduce slots on dual quad core Nehalem with
hyperthreading (ie 16 logical cores) are in the range of 8-10, not 100!
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
On Mon, Apr 5, 2010 at 11:41 AM, Andrew Purtell wrote:
> The below from Patrick is not uncommon to encounter.
>
> The "commodity hardware" talk around MR and BigTable is a bit of a joke --
> you can do that if you can afford 1,000s or 10,000s of commodity components
> custom assembled. Hadoop+HBa
st here also. Please don't
> take
> > >> offense to this - I promise I'm not trying to disparage anyone!
> > >>
> > >> Thanks in advance for comments!
> > >>
> > >> Jim
> > >> --
> > >> View this message in context:
> http://old.nabble.com/Cloudera-desktop-tp28181265p28181265.html
> > >> Sent from the HBase User mailing list archive at Nabble.com.
> > >>
> > >>
> > >
>
> _
> The New Busy is not the too busy. Combine all your e-mail accounts with
> Hotmail.
>
> http://www.windowslive.com/campaign/thenewbusy?tile=multiaccount&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_4
>
--
Todd Lipcon
Software Engineer, Cloudera
dback has been incorporated, this document will serve both as user
documentation and as a guide for what we should be building towards and
testing.
Thanks!
-Todd
--
Todd Lipcon
Software Engineer, Cloudera
On Tue, 29 Jul 2008, Billy Pearson wrote:
Considering hbase has less coders and testers I do not thank we can hope be
able to keep up with hadoop release speed.
I thank we should skip over 0.18.0 and tag 0.3.0 to go with 0.19.0 sense it
is about 50% done and we have not started with 0.3.0.
I
Hi William,
For TextInputFormat, the keys produced by the input format will be
LongWritables equal to the line number of the input line.
Hope that helps
-Todd
On Thu, 5 Jun 2008, William Clay Moody wrote:
Newbie Question:
I am trying to write a MapReduce from a text file in HDFS into a HBa
30 matches
Mail list logo