Re: Few questions from a newbie

Arjun Kumar Reddy Wed, 26 Jan 2011 06:43:39 -0800

I am developing an application based on twitter feeds...so 90% of the url's
will be short urls.
So, it is difficult for me to manually convert all these urls to actual
urls. Do we have any other solution for this?



Thanks and regards,
Arjun Kumar Reddy


On Wed, Jan 26, 2011 at 7:09 PM, Estrada Groups <
[email protected]> wrote:

> You probably have to literally click on each URL to get the URL it's
> referencing. Those are URL shorteners  and probably won't play nicely with a
> crawler because of the redirection.
>
> Adam
>
> Sent from my iPhone
>
> On Jan 26, 2011, at 8:02 AM, Arjun Kumar Reddy <
> [email protected]> wrote:
>
> > Hi list,
> >
> > I have given the set of urls as
> >
> > http://is.gd/Jt32Cf
> > http://is.gd/hS3lEJ
> > http://is.gd/Jy1Im3
> > http://is.gd/QoJ8xy
> > http://is.gd/e4ct89
> > http://is.gd/WAOVmd
> > http://is.gd/lhkA69
> > http://is.gd/3OilLD
> > ..... 43 such urls
> >
> > And I have run the crawl command bin/nutch crawl urls/ -dir crawl -depth
> 3
> >
> > *arjun@arjun-ninjas:~/nutch$* bin/nutch readdb crawl/crawldb -stats
> > *CrawlDb statistics start: crawl/crawldb*
> > *Statistics for CrawlDb: crawl/crawldb*
> > *TOTAL urls: 43*
> > *retry 0: 43*
> > *min score: 1.0*
> > *avg score: 1.0*
> > *max score: 1.0*
> > *status 3 (db_gone): 1*
> > *status 4 (db_redir_temp): 1*
> > *status 5 (db_redir_perm): 41*
> > *CrawlDb statistics: done*
> >
> > When I am trying to read the content from the segments, the content block
> is
> > empty for every record.
> >
> > Can you please tell me where I can get the content of these urls.
> >
> > Thanks and regards,*
> > *Arjun Kumar Reddy
>

Re: Few questions from a newbie

Reply via email to