Berlin Buzzwords - early registration extended

2010-04-08 Thread Isabel Drost
Hello, we would like to invite everyone interested in data storage, analysis and search to join us for two days on June 7/8th in Berlin for an in-depth, technical, developer-focused conference located in the heart of Europe. Presentations will range from beginner friendly introductions on

[VOTE RESULTS] Nutch to become a top-level project (TLP)

2010-04-08 Thread Andrzej Bialecki
Hi all, I'm happy to announce that this vote is closed and the proposal has passed with 4 +1 binding votes and 0 -1 binding votes - in fact, there were only +1-s both from the committers and the community. Thanks to all who expressed their opinion - we will now proceed with the remaining formal

how to parse html files while crawling

2010-04-08 Thread cefurkan0 cefurkan0
i can successfully crawl web sites with bin/nutch crawl command but i also want to save parsed html files how can i do that ty

how to retrieve only content text not html text

2010-04-08 Thread cefurkan0 cefurkan0
i can successfully retrieve source page from segment with this bin/nutch readseg -dump crawl_folder/segments/segment_folder_name(i dont know how to include all folders so if you tell i appreciate that)/ extract_folder_name -nofetch -nogenerate -noparse -noparsedata -noparsetex so this brings

Re: About Apache Nutch 1.1 Final Release

2010-04-08 Thread Mattmann, Chris A (388J)
Hi there, Well as soon as we have 3 +1 binding VOTEs. Right now I'm the only PMC member that's VOTE'd +1 on the release. Hopefully in the next few days someone will have a chance to check... Cheers, Chris On 4/8/10 8:54 PM, yhdelgado yhdelg...@estudiantes.uci.cu wrote: Hi. I have a