svn commit: r264964 - /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java
Author: jerome Date: Wed Aug 31 01:04:52 2005 New Revision: 264964 URL: http://svn.apache.org/viewcvs?rev=264964view=rev Log: No more NullPointerException while logging the doc language if none Modified: lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java Modified: lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java?rev=264964r1=264963r2=264964view=diff == --- lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/indexer/IndexSegment.java Wed Aug 31 01:04:52 2005 @@ -145,8 +145,9 @@ // add the document to the index NutchAnalyzer analyzer = AnalyzerFactory.get(doc.get(lang)); - LOG.info( Indexing [ + doc.getField(url).stringValue() + - ] with analyzer + analyzer + ( + doc.getField(lang).stringValue() + )); + LOG.info( Indexing [ + doc.getField(url).stringValue() + ] + +with analyzer + analyzer + +( + doc.get(lang) + )); //LOG.info( Doc is + doc); writer.addDocument(doc, analyzer); if (count 0 count % LOG_STEP == 0) {
svn commit: r265020 - /lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java
Author: jerome Date: Wed Aug 31 04:38:28 2005 New Revision: 265020 URL: http://svn.apache.org/viewcvs?rev=265020view=rev Log: Fixes some typo (analySer = analyZer) Modified: lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java Modified: lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java?rev=265020r1=265019r2=265020view=diff == --- lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/analysis/AnalyzerFactory.java Wed Aug 31 04:38:28 2005 @@ -44,7 +44,7 @@ private final static Map CACHE = new HashMap(); - private final static NutchAnalyzer DEFAULT_ANALYSER = + private final static NutchAnalyzer DEFAULT_ANALYZER = new NutchDocumentAnalyzer(); @@ -60,22 +60,22 @@ /** - * Returns the appropriate [EMAIL PROTECTED] Analyser} implementation given a language - * code. + * Returns the appropriate [EMAIL PROTECTED] NutchAnalyzer analyzer} implementation + * given a language code. * - * pNutchAnalyser extensions should define the attribute lang. The first + * pNutchAnalyzer extensions should define the attribute lang. The first * plugin found whose lang attribute equals the specified lang parameter is * used. If none match, then the [EMAIL PROTECTED] NutchDocumentAnalyzer} is used. */ public static NutchAnalyzer get(String lang) { -NutchAnalyzer analyzer = DEFAULT_ANALYSER; +NutchAnalyzer analyzer = DEFAULT_ANALYZER; Extension extension = getExtension(lang); if (extension != null) { try { analyzer = (NutchAnalyzer) extension.getExtensionInstance(); } catch (PluginRuntimeException pre) { -analyzer = DEFAULT_ANALYSER; +analyzer = DEFAULT_ANALYZER; } } return analyzer;
[Nutch Wiki] Update of FrontPage by AndrzejBialecki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The following page has been changed by AndrzejBialecki: http://wiki.apache.org/nutch/FrontPage -- ||DissectingTheNutchCrawler by MattKangas ||Add, View, or Do tasks from the TaskList ||HowToContribute|| || + ||[Committer's Rules]|| || ||[Release HOWTO]|| || ||[Website Update HOWTO]|| ||
[Nutch Wiki] Update of Committer's Rules by AndrzejBialecki
Dear Wiki user, You have subscribed to a wiki page or wiki category on Nutch Wiki for change notification. The following page has been changed by AndrzejBialecki: http://wiki.apache.org/nutch/Committer's_Rules New page: = Commits and Release Engineering = Committers should follow these guidelines when deciding, which branch to use for committing the patches and when to commit. == Branches and Release Engineering == 1. The SVN repository consists of the following areas: a. '''trunk''' (equivalent to CVS HEAD), where the current development code base is found. This area is not always guaranteed to be in usable state, some occasional breakage may occur, some parts of the code base may not work properly or at all. This is the area for developers, the bleeding edge, and usually not suitable for stable production - average users are discouraged to use it, unless they miss some functionality available only here, and are prepared to face some hardships (such as the lack of documentation, the issue of setting up a development environment, bugs, etc). a. '''Release-x.x''' branches, where the code from each release is put for further maintenance. These areas contain code, which is considered stable, i.e. at the point of release it was known to be working well, ''within the limits of functionality available for that release''. The code here is also maintained in a well-working state for a certain period after release, but only minor fixes are applied here in order to provide a solid product with the functionality of the given release. Normally, no new functionality should be added to the maintenance branches. It is unacceptable to introduce changes to this branch, which would break the compatibility with the earlier code within the same branch. a. any other temporary branches (such as e.g. mapred), which serve as temporary repository for the work to be merged with the trunk at a later stage. You should not expect anything functional here, unless the developers explicitly ask for help in testing and integration. 2. The trunk is the area, where active current development occurs. New features and enhancements are first committed here. a. This requirement helps to minimize the risk of losing new features and enhancements somewhere on the branches, because as the time goes it is more and more difficult to forward-port them from the past branches to the trunk. a. If some changes are invasive and would result in prolonged periods of breakage, they probably need more development time before they are integrated with the trunk. If you want other developers to join you in work, it's a good idea to put these changes on a temporary branch to be merged later with the trunk. 3. If there are important features or fixes, which will benefit majority of users, these can be back-ported to release branches, after they have been committed to the trunk (if appropriate). The back-porting process should involve extensive testing to ensure that the code on the Release branch remains stable and production-quality. It is unacceptable to commit code, which breaks the build process, or is known to be unstable. Users will expect from the Release branches to be stable and working with production quality at all times. == Backward compatibility == == Committer's checklist == Things to check before commit.
svn commit: r265503 - in /lucene/nutch/trunk/src: java/org/apache/nutch/clustering/ java/org/apache/nutch/fs/ java/org/apache/nutch/mapReduce/ java/org/apache/nutch/parse/ java/org/apache/nutch/protoc
Author: jerome Date: Wed Aug 31 08:17:11 2005 New Revision: 265503 URL: http://svn.apache.org/viewcvs?rev=265503view=rev Log: Merged 0.7 branch changes 240321:240453 into trunk Modified: lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClusterer.java lucene/nutch/trunk/src/java/org/apache/nutch/fs/NutchFileSystem.java lucene/nutch/trunk/src/java/org/apache/nutch/mapReduce/FileSplit.java lucene/nutch/trunk/src/java/org/apache/nutch/mapReduce/MapOutputFile.java lucene/nutch/trunk/src/java/org/apache/nutch/mapReduce/RecordReader.java lucene/nutch/trunk/src/java/org/apache/nutch/mapReduce/package.html lucene/nutch/trunk/src/java/org/apache/nutch/parse/Parse.java lucene/nutch/trunk/src/java/org/apache/nutch/protocol/Content.java lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ProtocolException.java lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ResourceGone.java lucene/nutch/trunk/src/java/org/apache/nutch/protocol/ResourceMoved.java lucene/nutch/trunk/src/java/org/apache/nutch/protocol/RetryLater.java lucene/nutch/trunk/src/java/org/apache/nutch/searcher/Hits.java lucene/nutch/trunk/src/java/org/apache/nutch/segment/SegmentReader.java lucene/nutch/trunk/src/java/org/apache/nutch/util/Daemon.java lucene/nutch/trunk/src/plugin/languageidentifier/src/java/org/apache/nutch/analysis/lang/LanguageIdentifier.java lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/DOMBuilder.java lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html/XMLCharacterRecognizer.java lucene/nutch/trunk/src/plugin/protocol-httpclient/src/java/org/apache/nutch/protocol/httpclient/DummySSLProtocolSocketFactory.java Modified: lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClusterer.java URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClusterer.java?rev=265503r1=265502r2=265503view=diff == --- lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClusterer.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/clustering/OnlineClusterer.java Wed Aug 31 08:17:11 2005 @@ -23,8 +23,8 @@ * algorithms. * * pBy the term bonline/b search results clustering we will understand - * a clusterer that works on a set of [EMAIL PROTECTED] Hit}s retrieved for a user's query - * and produces a set of [EMAIL PROTECTED] Clusters} that can be displayed to help + * a clusterer that works on a set of [EMAIL PROTECTED] HitDetails} retrieved for a user's + * query and produces a set of [EMAIL PROTECTED] HitsCluster} that can be displayed to help * the user gain insight in the topics found in the result./p * * pOther clustering options include predefined categories and off-line Modified: lucene/nutch/trunk/src/java/org/apache/nutch/fs/NutchFileSystem.java URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/java/org/apache/nutch/fs/NutchFileSystem.java?rev=265503r1=265502r2=265503view=diff == --- lucene/nutch/trunk/src/java/org/apache/nutch/fs/NutchFileSystem.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/fs/NutchFileSystem.java Wed Aug 31 08:17:11 2005 @@ -80,8 +80,8 @@ return getNamed(NutchConf.get().get(fs.default.name, local)); } -/** Returns a name for this filesystem, suitable to pass to [EMAIL PROTECTED] - * NutchFileSystem#getNamed(String).*/ +/** Returns a name for this filesystem, suitable to pass to + * [EMAIL PROTECTED] NutchFileSystem#getNamed(String)}.*/ public abstract String getName(); /** Returns a named filesystem. Names are either the string local or a Modified: lucene/nutch/trunk/src/java/org/apache/nutch/mapReduce/FileSplit.java URL: http://svn.apache.org/viewcvs/lucene/nutch/trunk/src/java/org/apache/nutch/mapReduce/FileSplit.java?rev=265503r1=265502r2=265503view=diff == --- lucene/nutch/trunk/src/java/org/apache/nutch/mapReduce/FileSplit.java (original) +++ lucene/nutch/trunk/src/java/org/apache/nutch/mapReduce/FileSplit.java Wed Aug 31 08:17:11 2005 @@ -25,9 +25,12 @@ import org.apache.nutch.io.UTF8; import org.apache.nutch.fs.NutchFileSystem; -/** A section of an input file. Returned by [EMAIL PROTECTED] - * InputFormat#getSplits(File[], int)} and passed to - * InputFormat#getRecordReader(FileSplit). */ +/** + * A section of an input file. + * Returned by [EMAIL PROTECTED] InputFormat#getSplits(NutchFileSystem, JobConf, int)} + * and passed to + * [EMAIL PROTECTED] InputFormat#getRecordReader(NutchFileSystem, FileSplit, JobConf)}. + */ public class FileSplit implements Writable { private File file; private long start; Modified: