date:20070412

DummySSLProtocolSocketFactory problem, please help me!!!! 2

2007-04-12 Thread Gavino Marras

---BeginMessage--- I have a problem with nutch 0.8.1 in DummySSLProtocolSocketFactory class (org.apache.nutch.protocol.httpclient plugin). I have to index pages from a web site on https protocol and that it uses authentication and sessions. My problem is about the management of the sessions.

Have anybody thought of replacing CrawlDb with any kind of Rational DB?

2007-04-12 Thread wangxu

Have anybody thought of replacing CrawlDb with any kind of Rational DB,mysql,for example? Crawldb is so difficult to manipulate. I often have the requirements to edit several entries in crawdb; But that would cost too much waiting for the mapReduce.

Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

2007-04-12 Thread Nuther

Hi, wangxu. You wrote 13 апреля 2007 г., 1:03:31: Have anybody thought of replacing CrawlDb with any kind of Rational DB,mysql,for example? Crawldb is so difficult to manipulate. I often have the requirements to edit several entries in crawdb; But that would cost too much waiting for the

Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

2007-04-12 Thread Andrzej Bialecki

wangxu wrote: Have anybody thought of replacing CrawlDb with any kind of Rational DB,mysql,for example? Crawldb is so difficult to manipulate. I often have the requirements to edit several entries in crawdb; But that would cost too much waiting for the mapReduce. Please make the following

Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

2007-04-12 Thread Sami Siren

wangxu wrote: Have anybody thought of replacing CrawlDb with any kind of Rational DB,mysql,for example? Crawldb is so difficult to manipulate. I often have the requirements to edit several entries in crawdb; But that would cost too much waiting for the mapReduce. Once when I was young

Runing a nutch crawler on Eclipse

2007-04-12 Thread Tanmoy Kumar Mukherjee

Hi . I am having certain problems in running the nutch crawler on eclipse after having followed the tutorial on Nutch wiki. It says canot build project. Can anyone suggest a good tool? Tanmoy

problem parsing HTML

2007-04-12 Thread Ian Holsman

Hi. I'm trying to figure out how nutch actually extracts the links out of a piece of HTML. I'm getting confused in what parts TagSoup, NekoHTML, and parse-html play in all this. from what I can see the regular expression it is using to extract the link is slightly off, but i'm not

Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

2007-04-12 Thread Dennis Kubes

Andrzej Bialecki wrote: wangxu wrote: Have anybody thought of replacing CrawlDb with any kind of Rational DB,mysql,for example? Crawldb is so difficult to manipulate. I often have the requirements to edit several entries in crawdb; But that would cost too much waiting for the mapReduce.

Re: problem parsing HTML

2007-04-12 Thread Dennis Kubes

It happens in org.apache.nutch.parse.html.DOMContentUtils.getOutlinks() which is called from org.apache.nutch.parse.html.HtmlParser. Running some simple tests on your fragment below I get non outlink for this. What version of Nutch are you running? Dennis Kubes Ian Holsman wrote: Hi. I'm

Re: Runing a nutch crawler on Eclipse

2007-04-12 Thread Dennis Kubes

I run the crawler through Nutch all the time. What are the specific errors that you are getting? Dennis Kubes Tanmoy Kumar Mukherjee wrote: Hi . I am having certain problems in running the nutch crawler on eclipse after having followed the tutorial on Nutch wiki. It says canot build

Re: problem parsing HTML

2007-04-12 Thread Ian Holsman

Hi Dennis, thanks for the fast response. I'm running the SVN head. I'll try narrowing it down a bit further. What led me to believe it was this was looking at what the fetcher was fetching. It could have been we had some bad html on our servers, but it's a standard header area. regards

RE: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

2007-04-12 Thread Howie Wang

Please make the following test using your favorite relational DB:* create a table with 300 mln rows and 10 columns of mixed type* select 1 mln rows, sorted by some value* update 1 mln rows to different valuesIf you find that these operations take less time than with

RE: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

2007-04-12 Thread Howie Wang

Sorry about the previous crappily formatted message. In brief, my point wasthat relational DB might perform better for small niche users, and plusyou get the flexibility of SQL. No more writing custom code to tweak webdb.Howie _

DummySSLProtocolSocketFactory problem, please help me!!!! 2

Have anybody thought of replacing CrawlDb with any kind of Rational DB?

Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

Runing a nutch crawler on Eclipse

problem parsing HTML

Re: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

Re: problem parsing HTML

Re: Runing a nutch crawler on Eclipse

Re: problem parsing HTML

RE: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

RE: Have anybody thought of replacing CrawlDb with any kind of Rational DB?

13 matches

Site Navigation

Mail list logo

Footer information