., Seattle, WA 98109-5210
Hope to see you there! And we're always open to suggestions.
--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528
http://www.drawntoscalehq.com -- The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data
Thanks for coming, everyone! We had around 25 people. A *huge*
success, for Seattle. And a big thanks to 10gen for sending Richard.
Can't wait to see you all next month.
On Wed, Feb 24, 2010 at 2:15 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
The Seattle Hadoop/Scalability/NoSQL
The Seattle Hadoop/Scalability/NoSQL (yeah, we vary the title) meetup
is tonight! We're going to have a guest speaker from MongoDB :)
As always, it's at the University of Washington, Allen Computer
Science building, Room 303 at 6:45pm. You can find a map here:
Hey all,
Just writing a quick note of thanks, we had another solid group of
people show up! As always, we learned quite a lot about interesting
use cases for Hadoop, Lucene, and the rest of the Apache 'Cloud
Stack'.
I couldn't get it taped, but we talked about:
-Scaling Lucene with Katta and
Greetings,
I've gotten a few replies on this, but I'd really like to know who
else is coming. Just send me a quick note :)
Cheers,
Bradford
On Mon, Jun 22, 2009 at 5:40 PM, Bradford
Stephensbradfordsteph...@gmail.com wrote:
Hey all, just a friendly reminder that this is Wednesday! I hope to
Hey all, just a friendly reminder that this is Wednesday! I hope to see
everyone there again. Please let me know if there's something interesting
you'd like to talk about -- I'll help however I can. You don't even need a
Powerpoint presentation -- there's many whiteboards. I'll try to have a
video
Greetings,
On the heels of our smashing success last month, we're going to be
convening the Pacific Northwest (Oregon and Washington)
Hadoop/HBase/Lucene/etc. meetup on the last Wednesday of June, the
24th. The meeting should start at 6:45, organized chats will end
around 8:00, and then there
and
the lessons we've learned.
The next meetup will be June 24th. Be there, or be... boring :)
Cheers,
Bradford
On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
Greetings,
Would anybody be willing to join a PNW Hadoop and/or Lucene User Group
with me in the Seattle
Sorry, no videos this time. The conversation wasn't very structured... next
month I'll record it :)
On Wed, Jun 3, 2009 at 1:59 PM, Bhupesh Bansal bban...@linkedin.com wrote:
Great Bradford,
Can you post some videos if you have some ?
Best
Bhupesh
On 6/3/09 11:58 AM, Bradford Stephens
.
Looking forward to meeting you all!
Cheers,
Bradford Stephens
these after the presentations, and
I'll record what we've learned in a wiki and share that with the rest of
us.
Looking forward to meeting you all!
Cheers,
Bradford
On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
Greetings,
Would anybody be willing to join
mh...@informatics.jax.org wrote:
Same here, sadly there isn't much call for Lucene user groups in Maine. It
would be nice though ^^
Matt
Amin Mohammed-Coleman wrote:
I would love to come but I'm afraid I'm stuck in rainy old England :(
Amin
On 18 Apr 2009, at 01:08, Bradford Stephens
, we've got 3 people... that's enough for a party? :)
Surely there must be dozens more of you guys out there... c'mon,
accelerate your knowledge! Join us in Seattle!
On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
Greetings,
Would anybody be willing
OK, we've got 3 people... that's enough for a party? :)
Surely there must be dozens more of you guys out there... c'mon,
accelerate your knowledge! Join us in Seattle!
On Thu, Apr 16, 2009 at 3:27 PM, Bradford Stephens
bradfordsteph...@gmail.com wrote:
Greetings,
Would anybody be willing
Greetings,
Would anybody be willing to join a PNW Hadoop and/or Lucene User Group
with me in the Seattle area? I can donate some facilities, etc. -- I
also always have topics to speak about :)
Cheers,
Bradford
Hey there,
Just chiming in that we use the complete Nutch + Hadoop + Lucene stack
-- we download pages, index them for keywords, and then do heavy
Semantic Parsing on it to produce BI data. We also use a lot of
plug-ins for parsing and ranking information.
What we don't use is the 'built-in GUI
Greetings,
We're running Nutch on the latest Hadoop distributed with it (0.16.1),
but we're missing some features that we could really used. How goes
the progress with getting it to work on 0.19? Is there a large suite
of patches needed? Perhaps a wiki article to explain it? :)
--Bradford
Greetings,
I've actually tried to do something similar, and ran into some of the
same issues as you. If there's a plan to migrate to hadoop .17, I'll
chip in as well.
On Fri, May 23, 2008 at 2:43 PM, Chris Anderson [EMAIL PROTECTED] wrote:
Hey all,
We're experimenting with Nutch on a Hadoop
Greetings,
I'm running the latest trunk of Nutch 0.9 with some patches (like 467
which fixed the small injection list issue). I'm on Ubuntu Server
8.04. For several weeks, my installation has been running correctly.
Now, however, if I try to crawl a list of URLs using the bin/nutch
crawl command,
Greetings again:
On a hunch, I double-checked the filters.xml -- it turns out rsync had
not updated anything on the datanodes, so all the URLs were still
being filtered out :) Crisis averted!
Cheers,
Bradford
On Fri, May 16, 2008 at 2:12 PM, Bradford Stephens
[EMAIL PROTECTED] wrote:
Greetings
Greetings (again),
I'm wondering if anyone has had any insight into this yet? :) I've
hunted through config files and haven't come up with anything.
Cheers,
Bradford
On Fri, Apr 25, 2008 at 12:10 PM, Bradford Stephens
[EMAIL PROTECTED] wrote:
Greetings,
I've noticed that links to other
Greetings,
I've noticed that links to other pages in my Nutch cache which have
been fetched are not working. For instance, If the link I'm following
is originally:
http://www.site.com/directory/page.html
The cache will send me to:
http://localhost:8080/directory/page.html
When it *should* (I'd
Greetings,
I'm totally loving our new Hadoop/Nutch cluster. It's so rewarding to
work with :)
We're trying to figure out the best way to utilize the servers we have
been allocated for our Nutch/Hadoop project. Currently, we do two
things:
1. Crawl and index pages using Nutch/Hadoop, and then
2.
It's nice to know the simple approach worked. Thanks for your help!
On Thu, Apr 17, 2008 at 1:07 AM, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Bradford Stephens wrote:
Well, I figured out something on my own, and it seems to work. I
modified Selector to have a SegmentName property, which
in the metadata field.
Am I missing anything? :)
On Wed, Apr 16, 2008 at 11:21 AM, Bradford Stephens
[EMAIL PROTECTED] wrote:
Greetings,
I've decided to follow this method: One way is to keep track of
segment names in CrawlDb (in CrawlDatum.metaData). This requires
modifications to Generator
Andrzej,
Thanks for the ideas! I will let you know what I implement.
Cheers,
Bradford
On Mon, Apr 14, 2008 at 11:29 PM, Andrzej Bialecki [EMAIL PROTECTED] wrote:
Bradford Stephens wrote:
Greetings,
Thanks for the suggestion! Unfortunately, doing a search with the
Query seems to do
Greetings,
I'm currently working on some code that allows one to determine which
segment a single URL is in. This will be rather useful because I need
to sometimes get the content of a single URL that Nutch has collected,
and parse it with some tools I have developed.
I was thinking about using
. For that,
why now just run a Query (TermQuery) a la url:http://. Of course, that
URL will have to be analyzed the same way as Nutch analyzes it for indexing
purposes.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Bradford
was the setting for the delay between subsequent requests to the same
server? (ah, probably doesn't matter if ou let 3 threads hit the same server
concurrently)
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- Original Message
From: Bradford Stephens
Greetings,
As part of our Nutch rollout strategy, we need a way to access Nutch
from remote machines. I've looked through the API a bit and some of
the example code in TestDistributedSearch, but haven't found what I
need yet.
Basically, we need to do two things
1. Access an exact page we've
Greetings,
I'm running Nutch 0.9 and Hadoop on 5 new, fast servers connected to a
multiple T-3 line. Although it works fine, the fetch portion of the
crawls seems to be awfully slow. The status message at one point is
157 pages, 1 errors, 1.7 pages/s, 487 kb/s. Less than one page a
second seems
Greetings,
I'm using Hadoop for more than just Nutch, so I decided to separate
the two, following the instructions I found here:
http://www.mail-archive.com/nutch-user@lucene.apache.org/msg10225.html
It seems to be mostly working -- I'm running Nutch 0.9 on Hadoop
0.16.1. I'm running it on one
Greetings,
Scratch this, it appears that I had changed a config file and was no
longer running in distributed mode. Trying to fix the issue I have
with NUTCH-503 now :)
On Thu, Apr 3, 2008 at 10:42 AM, Bradford Stephens
[EMAIL PROTECTED] wrote:
Greetings,
I'm using Hadoop for more than just
Yes, we are.
On Sat, Mar 29, 2008 at 4:21 AM, Developer Developer
[EMAIL PROTECTED] wrote:
Are you setting on multi node environment?
On Fri, Mar 28, 2008 at 8:04 PM, Bradford Stephens
[EMAIL PROTECTED] wrote:
Greetings,
What should it take to run Nutch on an existing Hadoop
Greetings,
What should it take to run Nutch on an existing Hadoop installation? I
plan on using HBase on my Hadoop cluster as well as Nutch, so I'd like
to keep everything as 'separate' as possible.
I would think the only thing I need to do from a fresh Nutch install
is edit hadoop-env.sh and
Greetings,
A coworker and I are experimenting with Nutch in anticipation of a
pretty large rollout at our company. However, we seem to be stuck on
something -- after the crawler is finished, we can't manually re-crawl
into the same directory/index! It says Directory already exists when
we try to
I would concur with the above. Correct me if I'm wrong, but the
paradigm of Hadoop/Nutch is such that it needs local, commodity
machines on local racks. You should bring the computation to the data,
not the other way around. This generally precludes SANs, even with
fiber channel connectivity :)
37 matches
Mail list logo