cha wrote:
First of all thanks for your reply.
you're welcome.
Am really got confused !! pardon me..
I dont know whether i need to put the given code by creating new class in
nutch directory?
Do i have to import other classes or packages..?? any thing i need to take
care of??
I can suggest you download eclipse, then using the tutorial on nutch
wiki called running nutch on eclipse, set up the project. Then for
example in the org.apache.nutch.tools package create a new class and
then paste the previously mentioned code.
//here fs is an instance of FileSystem object, seqFile is a Path to
the crawldb
MapFile.Reader reader = new MapFile.Reader (fs, seqFile, conf);
then in the loop change the below from
out.println(key);
to
out.println("<url><loc>" + key + "</loc></url>");
I have tried creating a new separate class in nutch directory..but gives
lotsa errors related to packages/class not found.Still try to figuring out
whats wrong there.
Secondly How should am able to read the urls from crawldb once the class get
running..I have know idea how should i figure it out..
How can fit the output of my url in some xml format.i.e.
<url>
<loc>http://www.example.com/</loc>
</url>
<url>
<loc>http://www.example1.com/</loc>
</url>
...........
So can you please elaborate me how should i do this..
Thanks a lot for your time..
Well, there is nothing more i can do except write the code my own : )
You can first try to be more familiar with Java programming if need be.
Good luck
Cheers,
Cha
Enis Soztutar wrote:
cha wrote:
Thanks enis,
am getting some idea from that..
Can you tell me in which class i should implement that.
I havent have hadoop install on my box.
Just make a new class in nutch and put the code there : ) As long as
you have hadoop jar in your classpath, you do not need to checkout the
hadoop codebase.
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general