April Seattle Hadoop/Scalability/NoSQL Meetup: Cassandra, Science, More!

2010-04-21 Thread Bradford Stephens
., Seattle, WA 98109-5210 Hope to see you there! And we're always open to suggestions. -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data

Re: Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

2010-02-25 Thread Bradford Stephens
Thanks for coming, everyone! We had around 25 people. A *huge* success, for Seattle. And a big thanks to 10gen for sending Richard. Can't wait to see you all next month. On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: The Seattle Hadoop/Scalability/NoSQL

Seattle Hadoop/Scalability/NoSQL Meetup Tonight!

2010-02-24 Thread Bradford Stephens
The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) meetup is tonight! We're going to have a guest speaker from MongoDB :) As always, it's at the University of Washington, Allen Computer Science building, Room 303 at 6:45pm. You can find a map here:

Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-25 Thread Bradford Stephens
Hey all, Just writing a quick note of thanks, we had another solid group of people show up! As always, we learned quite a lot about interesting use cases for Hadoop, Lucene, and the rest of the Apache 'Cloud Stack'. I couldn't get it taped, but we talked about: -Scaling Lucene with Katta and

Re: THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-23 Thread Bradford Stephens
Greetings, I've gotten a few replies on this, but I'd really like to know who else is coming. Just send me a quick note :) Cheers, Bradford On Mon, Jun 22, 2009 at 5:40 PM, Bradford Stephensbradfordsteph...@gmail.com wrote: Hey all, just a friendly reminder that this is Wednesday! I hope to

THIS WEEK: PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-22 Thread Bradford Stephens
Hey all, just a friendly reminder that this is Wednesday! I hope to see everyone there again. Please let me know if there's something interesting you'd like to talk about -- I'll help however I can. You don't even need a Powerpoint presentation -- there's many whiteboards. I'll try to have a video

PNW Hadoop / Apache Cloud Stack Users' Meeting, Wed Jun 24th, Seattle

2009-06-15 Thread Bradford Stephens
Greetings, On the heels of our smashing success last month, we're going to be convening the Pacific Northwest (Oregon and Washington) Hadoop/HBase/Lucene/etc. meetup on the last Wednesday of June, the 24th. The meeting should start at 6:45, organized chats will end around 8:00, and then there

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-06-03 Thread Bradford Stephens
and the lessons we've learned. The next meetup will be June 24th. Be there, or be... boring :) Cheers, Bradford On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-06-03 Thread Bradford Stephens
Sorry, no videos this time. The conversation wasn't very structured... next month I'll record it :) On Wed, Jun 3, 2009 at 1:59 PM, Bhupesh Bansal bban...@linkedin.com wrote: Great Bradford, Can you post some videos if you have some ? Best Bhupesh On 6/3/09 11:58 AM, Bradford Stephens

PNW Hadoop + Apache Cloud Stack Meetup, Wed. May 27th:

2009-05-26 Thread Bradford Stephens
. Looking forward to meeting you all! Cheers, Bradford Stephens

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-05-19 Thread Bradford Stephens
these after the presentations, and I'll record what we've learned in a wiki and share that with the rest of us. Looking forward to meeting you all! Cheers, Bradford On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, Would anybody be willing to join

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-20 Thread Bradford Stephens
mh...@informatics.jax.org wrote: Same here, sadly there isn't much call for Lucene user groups in Maine.  It would be nice though ^^ Matt Amin Mohammed-Coleman wrote: I would love to come but I'm afraid I'm stuck in rainy old England :( Amin On 18 Apr 2009, at 01:08, Bradford Stephens

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-18 Thread Bradford Stephens
, we've got 3 people... that's enough for a party? :) Surely there must be dozens more of you guys out there... c'mon, accelerate your knowledge! Join us in Seattle! On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, Would anybody be willing

Re: Seattle / PNW Hadoop + Lucene User Group?

2009-04-17 Thread Bradford Stephens
OK, we've got 3 people... that's enough for a party? :) Surely there must be dozens more of you guys out there... c'mon, accelerate your knowledge! Join us in Seattle! On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens bradfordsteph...@gmail.com wrote: Greetings, Would anybody be willing

Seattle / PNW Hadoop + Lucene User Group?

2009-04-16 Thread Bradford Stephens
Greetings, Would anybody be willing to join a PNW Hadoop and/or Lucene User Group with me in the Seattle area? I can donate some facilities, etc. -- I also always have topics to speak about :) Cheers, Bradford

Re: The Future of Nutch

2009-03-27 Thread Bradford Stephens
Hey there, Just chiming in that we use the complete Nutch + Hadoop + Lucene stack -- we download pages, index them for keywords, and then do heavy Semantic Parsing on it to produce BI data. We also use a lot of plug-ins for parsing and ranking information. What we don't use is the 'built-in GUI

Nutch on Hadoop 0.19?

2009-01-27 Thread Bradford Stephens
Greetings, We're running Nutch on the latest Hadoop distributed with it (0.16.1), but we're missing some features that we could really used. How goes the progress with getting it to work on 0.19? Is there a large suite of patches needed? Perhaps a wiki article to explain it? :) --Bradford

Re: svn nutch with hadoop 0.17

2008-05-23 Thread Bradford Stephens
Greetings, I've actually tried to do something similar, and ran into some of the same issues as you. If there's a plan to migrate to hadoop .17, I'll chip in as well. On Fri, May 23, 2008 at 2:43 PM, Chris Anderson [EMAIL PROTECTED] wrote: Hey all, We're experimenting with Nutch on a Hadoop

Injector / Generator fails with can't find rules...

2008-05-16 Thread Bradford Stephens
Greetings, I'm running the latest trunk of Nutch 0.9 with some patches (like 467 which fixed the small injection list issue). I'm on Ubuntu Server 8.04. For several weeks, my installation has been running correctly. Now, however, if I try to crawl a list of URLs using the bin/nutch crawl command,

Re: Injector / Generator fails with can't find rules...

2008-05-16 Thread Bradford Stephens
Greetings again: On a hunch, I double-checked the filters.xml -- it turns out rsync had not updated anything on the datanodes, so all the URLs were still being filtered out :) Crisis averted! Cheers, Bradford On Fri, May 16, 2008 at 2:12 PM, Bradford Stephens [EMAIL PROTECTED] wrote: Greetings

Re: Cache URL Rewriting Not Working...

2008-04-28 Thread Bradford Stephens
Greetings (again), I'm wondering if anyone has had any insight into this yet? :) I've hunted through config files and haven't come up with anything. Cheers, Bradford On Fri, Apr 25, 2008 at 12:10 PM, Bradford Stephens [EMAIL PROTECTED] wrote: Greetings, I've noticed that links to other

Cache URL Rewriting Not Working...

2008-04-25 Thread Bradford Stephens
Greetings, I've noticed that links to other pages in my Nutch cache which have been fetched are not working. For instance, If the link I'm following is originally: http://www.site.com/directory/page.html The cache will send me to: http://localhost:8080/directory/page.html When it *should* (I'd

Running other Hadoop Tasks on Nutch Servers?

2008-04-24 Thread Bradford Stephens
Greetings, I'm totally loving our new Hadoop/Nutch cluster. It's so rewarding to work with :) We're trying to figure out the best way to utilize the servers we have been allocated for our Nutch/Hadoop project. Currently, we do two things: 1. Crawl and index pages using Nutch/Hadoop, and then 2.

Re: Efficiently Finding the Segment of a Single URL

2008-04-17 Thread Bradford Stephens
It's nice to know the simple approach worked. Thanks for your help! On Thu, Apr 17, 2008 at 1:07 AM, Andrzej Bialecki [EMAIL PROTECTED] wrote: Bradford Stephens wrote: Well, I figured out something on my own, and it seems to work. I modified Selector to have a SegmentName property, which

Re: Efficiently Finding the Segment of a Single URL

2008-04-16 Thread Bradford Stephens
in the metadata field. Am I missing anything? :) On Wed, Apr 16, 2008 at 11:21 AM, Bradford Stephens [EMAIL PROTECTED] wrote: Greetings, I've decided to follow this method: One way is to keep track of segment names in CrawlDb (in CrawlDatum.metaData). This requires modifications to Generator

Re: Efficiently Finding the Segment of a Single URL

2008-04-15 Thread Bradford Stephens
Andrzej, Thanks for the ideas! I will let you know what I implement. Cheers, Bradford On Mon, Apr 14, 2008 at 11:29 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote: Bradford Stephens wrote: Greetings, Thanks for the suggestion! Unfortunately, doing a search with the Query seems to do

Efficiently Finding the Segment of a Single URL

2008-04-14 Thread Bradford Stephens
Greetings, I'm currently working on some code that allows one to determine which segment a single URL is in. This will be rather useful because I need to sometimes get the content of a single URL that Nutch has collected, and parse it with some tools I have developed. I was thinking about using

Re: Efficiently Finding the Segment of a Single URL

2008-04-14 Thread Bradford Stephens
. For that, why now just run a Query (TermQuery) a la url:http://. Of course, that URL will have to be analyzed the same way as Nutch analyzes it for indexing purposes. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bradford

Re: Slow Crawl Speed and Tika Error Media type alias already exists: text/xml

2008-04-09 Thread Bradford Stephens
was the setting for the delay between subsequent requests to the same server? (ah, probably doesn't matter if ou let 3 threads hit the same server concurrently) Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Bradford Stephens

Nutch Remote Access API

2008-04-09 Thread Bradford Stephens
Greetings, As part of our Nutch rollout strategy, we need a way to access Nutch from remote machines. I've looked through the API a bit and some of the example code in TestDistributedSearch, but haven't found what I need yet. Basically, we need to do two things 1. Access an exact page we've

Slow Crawl Speed and Tika Error Media type alias already exists: text/xml

2008-04-04 Thread Bradford Stephens
Greetings, I'm running Nutch 0.9 and Hadoop on 5 new, fast servers connected to a multiple T-3 line. Although it works fine, the fetch portion of the crawls seems to be awfully slow. The status message at one point is 157 pages, 1 errors, 1.7 pages/s, 487 kb/s. Less than one page a second seems

Difficulty w/ Distributed Crawl with Separate Nutch/Hadoop

2008-04-03 Thread Bradford Stephens
Greetings, I'm using Hadoop for more than just Nutch, so I decided to separate the two, following the instructions I found here: http://www.mail-archive.com/nutch-user@lucene.apache.org/msg10225.html It seems to be mostly working -- I'm running Nutch 0.9 on Hadoop 0.16.1. I'm running it on one

Re: Difficulty w/ Distributed Crawl with Separate Nutch/Hadoop

2008-04-03 Thread Bradford Stephens
Greetings, Scratch this, it appears that I had changed a config file and was no longer running in distributed mode. Trying to fix the issue I have with NUTCH-503 now :) On Thu, Apr 3, 2008 at 10:42 AM, Bradford Stephens [EMAIL PROTECTED] wrote: Greetings, I'm using Hadoop for more than just

Re: Running Nutch on existing Hadoop installation

2008-03-30 Thread Bradford Stephens
Yes, we are. On Sat, Mar 29, 2008 at 4:21 AM, Developer Developer [EMAIL PROTECTED] wrote: Are you setting on multi node environment? On Fri, Mar 28, 2008 at 8:04 PM, Bradford Stephens [EMAIL PROTECTED] wrote: Greetings, What should it take to run Nutch on an existing Hadoop

Running Nutch on existing Hadoop installation

2008-03-28 Thread Bradford Stephens
Greetings, What should it take to run Nutch on an existing Hadoop installation? I plan on using HBase on my Hadoop cluster as well as Nutch, so I'd like to keep everything as 'separate' as possible. I would think the only thing I need to do from a fresh Nutch install is edit hadoop-env.sh and

Recrawling without deleting crawl directory

2008-03-13 Thread Bradford Stephens
Greetings, A coworker and I are experimenting with Nutch in anticipation of a pretty large rollout at our company. However, we seem to be stuck on something -- after the crawler is finished, we can't manually re-crawl into the same directory/index! It says Directory already exists when we try to

Re: Setting nutch/hadopp multi node environment on a SAN device.

2008-03-11 Thread Bradford Stephens
I would concur with the above. Correct me if I'm wrong, but the paradigm of Hadoop/Nutch is such that it needs local, commodity machines on local racks. You should bring the computation to the data, not the other way around. This generally precludes SANs, even with fiber channel connectivity :)