Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-25 Thread Bradford Stephens
Hey all, Just writing a quick note of thanks, we had another solid group of people show up! As always, we learned quite a lot about interesting use cases for Hadoop, Lucene, and the rest of the Apache 'Cloud Stack'. I couldn't get it taped, but we talked about: -Scaling Lucene with Katta and

Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-23 Thread Bradford Stephens
Greetings, I've gotten a few replies on this, but I'd really like to know who else is coming. Just send me a quick note :) Cheers, Bradford On Mon, Jun 22, 2009 at 5:40 PM, Bradford Stephensbradfordsteph...@gmail.com wrote: Hey all, just a friendly reminder that this is Wednesday! I hope to

Re: Can you tell if a particular mapper was data local ?

2009-06-23 Thread Bradford Stephens
(Correct me if I'm wrong), but I think you can tell though the Hadoop Web UI -- it'll show a count of which map tasks are data-local. You can then click on that to see a list of all the tasks there, and drill down to see which nodes those tasks ran on. On Tue, Jun 23, 2009 at 6:37 PM, Suratna

THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-22 Thread Bradford Stephens
Hey all, just a friendly reminder that this is Wednesday! I hope to see everyone there again. Please let me know if there's something interesting you'd like to talk about -- I'll help however I can. You don't even need a Powerpoint presentation -- there's many whiteboards. I'll try to have a video

Re: [ANN] HBase 0.20.0-alpha available for download

2009-06-16 Thread Bradford Stephens
Oh sweet. This will be a most excellent party. On Tue, Jun 16, 2009 at 10:23 PM, stackst...@duboce.net wrote: An alpha version of HBase 0.20.0 is available for download at:  http://people.apache.org/~stack/hbase-0.20.0-alpha/ We are making this release available to preview what is coming in

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-06-03 Thread Bradford Stephens
and the lessons we've learned. The next meetup will be June 24th. Be there, or be... boring :) Cheers, Bradford On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-06-03 Thread Bradford Stephens
Sorry, no videos this time. The conversation wasn't very structured... next month I'll record it :) On Wed, Jun 3, 2009 at 1:59 PM, Bhupesh Bansal bban...@linkedin.com wrote: Great Bradford, Can you post some videos if you have some ? Best Bhupesh On 6/3/09 11:58 AM, Bradford Stephens

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-05-19 Thread Bradford Stephens
these after the presentations, and I'll record what we've learned in a wiki and share that with the rest of us. Looking forward to meeting you all! Cheers, Bradford On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, Would anybody be willing to join

Re: Free Training at 2009 Hadoop Summit

2009-05-11 Thread Bradford Stephens
Hey there, I notice this is already sold out -- any chance of more openings? :) Cheers, Bradford On Tue, May 5, 2009 at 6:25 PM, Christophe Bisciglia christo...@cloudera.com wrote: Just wanted to follow up on this and let everyone know that Cloudera and Y! are teaming up to offer two day-long

Re: What do we call Hadoop+HBase+Lucene+Zookeeper+etc....

2009-05-05 Thread Bradford Stephens
, 2009 at 7:00 AM, Steve Loughran ste...@apache.org wrote: Bradford Stephens wrote: Hey all, I'm going to be speaking at OSCON about my company's experiences with Hadoop and Friends, but I'm having a hard time coming up with a name for the entire software ecosystem. I'm thinking of calling

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-20 Thread Bradford Stephens
mh...@informatics.jax.org wrote: Same here, sadly there isn't much call for Lucene user groups in Maine.  It would be nice though ^^ Matt Amin Mohammed-Coleman wrote: I would love to come but I'm afraid I'm stuck in rainy old England :( Amin On 18 Apr 2009, at 01:08, Bradford Stephens

Re: Using the Stanford NLP with hadoop

2009-04-18 Thread Bradford Stephens
Greetings, There's a way you can distribute files along with your MR job as part of a payload, or you could save the file in the same spot on every machine of your cluster with some rsyncing, and hard-code loading it. This may be of some help:

Re: fyi: A Comparison of Approaches to Large-Scale Data Analysis: MapReduce vs. DBMS Benchmarks

2009-04-17 Thread Bradford Stephens
There's definitely a false dichotomy to this paper, and I think it's a tad disingenuous. It's titled A Comparison Of Approaches To Large Scale Data Analysis, when it should be titled A Comparison of Parallel RDBMSs to MapReduce for RDBMS-specific problems. There's little surprise that the people

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-17 Thread Bradford Stephens
OK, we've got 3 people... that's enough for a party? :) Surely there must be dozens more of you guys out there... c'mon, accelerate your knowledge! Join us in Seattle! On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, Would anybody be willing

Seattle / PNW Hadoop + Lucene User Group?

2009-04-16 Thread Bradford Stephens
Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle area? I can donate some facilities, etc. -- I also always have topics to speak about :) Cheers, Bradford

2009 Hadoop Summit?

2009-01-29 Thread Bradford Stephens
Hey there, I was just wondering if there's plans for another Hadoop Summit this year? I went last March and learned quite a bit -- I'm excited to see what new things people have done since then. Cheers, Bradford

Avoiding Newline Problems in Hadoop Streaming + StreamXMLRecordReader

2008-05-21 Thread Bradford Stephens
Greetings, I have an interesting problem I'm trying to solve. I currently store a bunch of webpages in a large XML file in Hadoop. I'm trying to parse information out of these webpages using a complex C# program that I have running on Mono (I'm in a Linux environment). Therefore, I'm using Hadoop

Re: Hadoop cluster build, machine specs

2008-04-04 Thread Bradford Stephens
Greetings, It really depends on your budget. What are you looking to spend? $5k? $20k? Hadoop is about bringing the calculations to your data, so the more machines you can have, the better. In general, I'd recommend Dual-Core Opterons and 2-4 GB of RAM with an SATA hard drive. My company just

Re: hadoop 0.15.3 r612257 freezes on reduce task

2008-03-28 Thread Bradford Stephens
Hey everyone, I'm having a similar problem: Map output lost, rescheduling: getMapOutput(task_200803281212_0001_m_00_2,0) failed : org.apache.hadoop.util.DiskChecker$DiskErrorException: Could not find task_200803281212_0001_m_00_2/file.out.index in any of the configured local directories

Re: hadoop 0.15.3 r612257 freezes on reduce task

2008-03-28 Thread Bradford Stephens
Also, I'm running hadoop 0.16.1 :) On Fri, Mar 28, 2008 at 1:23 PM, Bradford Stephens [EMAIL PROTECTED] wrote: Hey everyone, I'm having a similar problem: Map output lost, rescheduling: getMapOutput(task_200803281212_0001_m_00_2,0) failed : org.apache.hadoop.util.DiskChecker

Re: hadoop 0.15.3 r612257 freezes on reduce task

2008-03-28 Thread Bradford Stephens
you please check what your mapred.local.dir is set to? Devaraj. -Original Message- From: Bradford Stephens [mailto:[EMAIL PROTECTED] Sent: Saturday, March 29, 2008 1:54 AM To: core-user@hadoop.apache.org Cc: [EMAIL PROTECTED] Subject: Re: hadoop 0.15.3 r612257 freezes

Re: Amazon S3 questions

2008-03-01 Thread Bradford Stephens
What sort of performance hit is there for using S3 vs. a local cluster? On Sat, Mar 1, 2008 at 1:09 PM, Steve Sapovits [EMAIL PROTECTED] wrote: One other note: When you use S3 URIs, you get a port out of range error on startup but that doesn't appear to be fatal. I spent a few hours on

Re: MapReduce usage with Lucene Indexing

2008-01-24 Thread Bradford Stephens
I'm actually going to be doing something similar, with Nutch. I just started learning about Hadoop this week, so I'm interested in what everyone has to say :) On Jan 24, 2008 5:00 PM, roger dimitri [EMAIL PROTECTED] wrote: Hi, I am very new to Hadoop, and I have a project where I need to use