I have a couple very basic questions about Luke and indexes in general. Answers to any of these questions are much appreciated:
1. In the Luke overview tab, what does "Index version" refer to? 2. Also in the overview tab, if "Has Deletions?" is equal to yes, where are the possible sources of deletions? Dedup? Manual deletions through luke? 3. Is there any way (w/ Luke or otherwise) to get a file listing all of the docs in an index. Basically is there an index equivalent of this command (which outputs all the URLs in a segment): bin/nutch org.apache.nutch.pagedb.FetchListEntry -dumpurls segmentsDir 4. Finally, my last question is the one I'm most perplexed by: I called "bin/nutch segread -list -dir" for a particular segments directory and found out that one directory had 93 entries. BUT, when I opened up the index of that segment in Luke, there were only 23 documents (and 3 deletions)! Where did the rest of the URLs go?? Thanks ahead of time for any helpful suggestions, Bryan ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
