Author: dogacan
Date: Sat Jun 16 03:33:24 2007
New Revision: 547901
URL: http://svn.apache.org/viewvc?view=revrev=547901
Log:
NUTCH-495 - Unnecessary delays in Fetcher2.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher2.java
Modified
Author: dogacan
Date: Fri Jun 15 03:51:23 2007
New Revision: 547610
URL: http://svn.apache.org/viewvc?view=revrev=547610
Log:
Added myself (Dogacan Güney) to the list of committers.
Modified:
lucene/nutch/trunk/site/credits.html
lucene/nutch/trunk/site/credits.pdf
lucene/nutch/trunk
Author: dogacan
Date: Mon Jun 18 11:13:15 2007
New Revision: 548429
URL: http://svn.apache.org/viewvc?view=revrev=548429
Log:
NUTCH-489 - URLFilter-suffix management of the url path when the url contains
some query parameters.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk
Author: dogacan
Date: Tue Jun 19 02:21:21 2007
New Revision: 548666
URL: http://svn.apache.org/viewvc?view=revrev=548666
Log:
NUTCH-502 - Bug in SegmentReader causes infinite loop.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/segment
Author: dogacan
Date: Thu Jun 21 08:15:32 2007
New Revision: 549507
URL: http://svn.apache.org/viewvc?view=revrev=549507
Log:
NUTCH-471 - Fix synchronization in NutchBean creation.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/searcher
Author: dogacan
Date: Sun Jun 24 02:28:41 2007
New Revision: 550188
URL: http://svn.apache.org/viewvc?view=revrev=550188
Log:
NUTCH-468 - Scoring filter should distribute score to all outlinks at once.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch
Author: dogacan
Date: Sun Jun 24 03:04:30 2007
New Revision: 550196
URL: http://svn.apache.org/viewvc?view=revrev=550196
Log:
NUTCH-504 - Parsing during fetching is broken.
Added:
lucene/nutch/trunk/src/testresources/fetch-test-site/exception.html
Modified:
lucene/nutch/trunk/CHANGES.txt
Author: dogacan
Date: Wed Jun 27 00:05:52 2007
New Revision: 551081
URL: http://svn.apache.org/viewvc?view=revrev=551081
Log:
NUTCH-474 - Replace usage of ObjectWritable with something based on
GenericWritable.
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/crawl/NutchWritable.java
Author: dogacan
Date: Wed Jun 27 05:46:05 2007
New Revision: 551147
URL: http://svn.apache.org/viewvc?view=revrev=551147
Log:
NUTCH-498 - Use Combiner in LinkDb to increase speed of linkdb generation.
Contributed by Espen Amble Kolstad.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene
Author: dogacan
Date: Wed Jul 11 03:54:37 2007
New Revision: 555237
URL: http://svn.apache.org/viewvc?view=revrev=555237
Log:
NUTCH-505 - Outlink urls should be validated.
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/net/UrlValidator.java
Modified:
lucene/nutch/trunk/CHANGES.txt
Author: dogacan
Date: Wed Jul 11 08:30:29 2007
New Revision: 555307
URL: http://svn.apache.org/viewvc?view=revrev=555307
Log:
NUTCH-510 - IndexMerger delete working dir. Contributed by Enis.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/indexer
Author: dogacan
Date: Fri Jul 13 10:20:44 2007
New Revision: 556072
URL: http://svn.apache.org/viewvc?view=revrev=556072
Log:
NUTCH-513 - suffix-urlfilter.txt does not have a template.
Added:
lucene/nutch/trunk/conf/suffix-urlfilter.txt.template
- copied unchanged from r556068,
lucene
Author: dogacan
Date: Mon Jul 16 23:19:06 2007
New Revision: 556824
URL: http://svn.apache.org/viewvc?view=revrev=556824
Log:
NUTCH-515 - Next fetch time is set incorrectly.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/crawl/CrawlDatum.java
Author: dogacan
Date: Tue Jul 17 08:16:40 2007
New Revision: 556946
URL: http://svn.apache.org/viewvc?view=revrev=556946
Log:
NUTCH-506 - Delegate compression to Hadoop.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/fetcher
Author: dogacan
Date: Wed Jul 18 10:59:59 2007
New Revision: 557342
URL: http://svn.apache.org/viewvc?view=revrev=557342
Log:
NUTCH-517 - build encoding should be UTF-8. Contributed by Enis.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/default.properties
Modified: lucene
Author: dogacan
Date: Wed Jul 18 11:04:26 2007
New Revision: 557344
URL: http://svn.apache.org/viewvc?view=revrev=557344
Log:
NUTCH-518 - Fix OpicScoringFilter to respect scoring filter chaining.
Contributed by Enis.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src
Author: dogacan
Date: Thu Jul 26 01:10:38 2007
New Revision: 559742
URL: http://svn.apache.org/viewvc?view=revrev=559742
Log:
NUTCH-516 - Next fetch time is not set when it is a
CrawlDatum.STATUS_FETCH_GONE. Contributed by Emmanuel Joke.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene
Author: dogacan
Date: Thu Jul 26 01:44:33 2007
New Revision: 559754
URL: http://svn.apache.org/viewvc?view=revrev=559754
Log:
NUTCH-525 - DeleteDuplicates generates ArrayIndexOutOfBoundsException when
trying to rerun dedup on a segment. Contributed by Vishal Shah.
Modified:
lucene/nutch
Author: dogacan
Date: Mon Jul 30 12:02:27 2007
New Revision: 561092
URL: http://svn.apache.org/viewvc?view=revrev=561092
Log:
NUTCH-514 - Indexer should only index pages with fetch status SUCCESS.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch
Author: dogacan
Date: Wed Aug 1 07:50:51 2007
New Revision: 561816
URL: http://svn.apache.org/viewvc?view=revrev=561816
Log:
Plugin summary-lucene's plugin.xml contained a link to non-existant
lucene-highlighter jar. Updated plugin.xml to point to new jar.
Modified:
lucene/nutch/trunk/src
Author: dogacan
Date: Wed Aug 8 00:33:23 2007
New Revision: 563777
URL: http://svn.apache.org/viewvc?view=revrev=563777
Log:
NUTCH-535 - ParseData's contentMeta accumulates unnecessary values during parse.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache
Modified: lucene/nutch/trunk/src/java/org/apache/nutch/tools/DmozParser.java
URL:
http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/tools/DmozParser.java?view=diffrev=563894r1=563893r2=563894
==
Added: lucene/nutch/trunk/conf/domain-suffixes.xsd
URL:
http://svn.apache.org/viewvc/lucene/nutch/trunk/conf/domain-suffixes.xsd?rev=568053view=auto
==
--- lucene/nutch/trunk/conf/domain-suffixes.xsd (added)
+++
Author: dogacan
Date: Tue Aug 21 03:50:07 2007
New Revision: 568053
URL: http://svn.apache.org/viewvc?rev=568053view=rev
Log:
NUTCH-439 - Top Level Domains Indexing / Scoring. Contributed by Enis.
Added:
lucene/nutch/trunk/conf/domain-suffixes.xml
lucene/nutch/trunk/conf/domain
Author: dogacan
Date: Mon Sep 10 12:40:20 2007
New Revision: 574344
URL: http://svn.apache.org/viewvc?rev=574344view=rev
Log:
NUTCH-550 - Parse fails if db.max.outlinks.per.page is -1.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/parse
Author: dogacan
Date: Mon Sep 10 12:45:22 2007
New Revision: 574346
URL: http://svn.apache.org/viewvc?rev=574346view=rev
Log:
NUTCH-546 - file URL are filtered out by the crawler.
Added:
lucene/nutch/trunk/src/plugin/urlfilter-validator/
lucene/nutch/trunk/src/plugin/urlfilter-validator
Author: dogacan
Date: Tue Sep 11 03:50:15 2007
New Revision: 574545
URL: http://svn.apache.org/viewvc?rev=574545view=rev
Log:
Java 5 Compatibility fix for NUTCH-546.
Modified:
lucene/nutch/trunk/src/plugin/urlfilter-validator/src/java/org/apache/nutch/urlfilter/validator/UrlValidator.java
Author: dogacan
Date: Mon Sep 24 01:27:34 2007
New Revision: 578703
URL: http://svn.apache.org/viewvc?rev=578703view=rev
Log:
NUTCH-529 - NodeWalker.skipChildren doesn't work for more than 1 child.
Contributed by Emmanuel Joke.
Added:
lucene/nutch/trunk/src/test/org/apache/nutch/util
Author: dogacan
Date: Wed Sep 26 07:02:48 2007
New Revision: 579656
URL: http://svn.apache.org/viewvc?rev=579656view=rev
Log:
NUTCH-25 - needs 'character encoding' detector. Mostly contributed by Doug
Cook. Some parts are contributed by Marcin Okraszewski and Renaud Richardet.
Also fixes NUTCH
Author: dogacan
Date: Wed Sep 26 23:49:26 2007
New Revision: 579922
URL: http://svn.apache.org/viewvc?rev=579922view=rev
Log:
Java 5 compatibility fix for NUTCH-25. Contributed by Ned Rockson.
Modified:
lucene/nutch/trunk/src/plugin/parse-html/src/java/org/apache/nutch/parse/html
Author: dogacan
Date: Sat Sep 29 04:02:01 2007
New Revision: 580572
URL: http://svn.apache.org/viewvc?rev=580572view=rev
Log:
Yet another java5 compatibility fix for NUTCH-25. Updates unit test.
Modified:
lucene/nutch/trunk/src/test/org/apache/nutch/util/TestEncodingDetector.java
Modified
Author: dogacan
Date: Mon Oct 8 03:58:11 2007
New Revision: 582775
URL: http://svn.apache.org/viewvc?rev=582775view=rev
Log:
NUTCH-508 - ${hadoop.log.dir} and ${hadoop.log.file} are not propagated to the
tasktracker. Contributed by Mathijs Homminga and Emmanuel Joke.
Modified:
lucene/nutch
Author: dogacan
Date: Mon Oct 29 07:57:19 2007
New Revision: 589654
URL: http://svn.apache.org/viewvc?rev=589654view=rev
Log:
NUTCH-501 - Implement a different caching mechanism for objects cached in
configuration.
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/util/ObjectCache.java
Author: dogacan
Date: Thu Nov 8 05:18:05 2007
New Revision: 593151
URL: http://svn.apache.org/viewvc?rev=593151view=rev
Log:
NUTCH-547 - Redirection handling: YahooSlurp's algorithm.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/fetcher
Author: dogacan
Date: Thu Nov 8 07:32:11 2007
New Revision: 593200
URL: http://svn.apache.org/viewvc?rev=593200view=rev
Log:
NUTCH-548 - Last commit failed to upgrade some of the plugins. This commit
removes all instances of Outlink(..,..,Configuration) calls.
Modified:
lucene/nutch/trunk
Author: dogacan
Date: Thu Nov 8 07:08:47 2007
New Revision: 593186
URL: http://svn.apache.org/viewvc?rev=593186view=rev
Log:
NUTCH-548 - Move URLNormalizer from Outlink to ParseOutputFormat. Contributed
by Emmanuel Joke.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src
Author: dogacan
Date: Thu Nov 8 11:13:37 2007
New Revision: 593263
URL: http://svn.apache.org/viewvc?rev=593263view=rev
Log:
NUTCH-494 - FindBugs: CrawlDbReader and DeleteDuplicates.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/crawl
Author: dogacan
Date: Thu Nov 8 11:09:06 2007
New Revision: 593261
URL: http://svn.apache.org/viewvc?rev=593261view=rev
Log:
NUTCH-538 - Delete unused classes under o.a.n.util.
Removed:
lucene/nutch/trunk/src/java/org/apache/nutch/util/FibonacciHeap.java
lucene/nutch/trunk/src/java/org
Author: dogacan
Date: Mon Feb 25 01:38:12 2008
New Revision: 630779
URL: http://svn.apache.org/viewvc?rev=630779view=rev
Log:
NUTCH-567 - Proper (?) handling of URIs in TagSoup.
Added:
lucene/nutch/trunk/src/plugin/parse-html/lib/tagsoup-1.2.jar (with props)
Removed:
lucene/nutch/trunk
Author: dogacan
Date: Sat Sep 20 10:05:03 2008
New Revision: 697395
URL: http://svn.apache.org/viewvc?rev=697395view=rev
Log:
NUTCH-639 - Change LuceneDocumentWrapper visibility from private to protected
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache
Author: dogacan
Date: Mon Sep 22 04:08:09 2008
New Revision: 697781
URL: http://svn.apache.org/viewvc?rev=697781view=rev
Log:
NUTCH-651 - Remove bin/{start|stop}-balancer.sh from svn tracking
Removed:
lucene/nutch/trunk/bin/start-balancer.sh
lucene/nutch/trunk/bin/stop-balancer.sh
Author: dogacan
Date: Mon Sep 22 09:43:33 2008
New Revision: 697896
URL: http://svn.apache.org/viewvc?rev=697896view=rev
Log:
NUTCH-633 - ParseSegment no longer allow reparsing.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache/nutch/fetcher/Fetcher.java
Author: dogacan
Date: Wed Sep 24 01:50:15 2008
New Revision: 698469
URL: http://svn.apache.org/viewvc?rev=698469view=rev
Log:
NUTCH-651 second part. Also add bin/{start|stop}-balancer.sh to svn ignore.
Modified:
lucene/nutch/trunk/bin/ (props changed)
Propchange: lucene/nutch/trunk/bin
Author: dogacan
Date: Wed Sep 24 01:52:19 2008
New Revision: 698471
URL: http://svn.apache.org/viewvc?rev=698471view=rev
Log:
NUTCH-653 - Upgrade to hadoop 0.18
Added:
lucene/nutch/trunk/lib/hadoop-0.18.1-core.jar (with props)
lucene/nutch/trunk/lib/jets3t-0.6.0.jar (with props
Author: dogacan
Date: Thu Oct 2 02:05:22 2008
New Revision: 701045
URL: http://svn.apache.org/viewvc?rev=701045view=rev
Log:
NUTCH-654 - urlfilter-regex's main does not work
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/plugin/urlfilter-regex/src/java/org/apache/nutch
Author: dogacan
Date: Mon Jan 12 04:50:02 2009
New Revision: 733712
URL: http://svn.apache.org/viewvc?rev=733712view=rev
Log:
Added a test domain-urlfilter conf file so that it doesn't filter everything
Added:
lucene/nutch/trunk/src/test/domain-urlfilter.txt
Added: lucene/nutch/trunk/src
Author: dogacan
Date: Mon Jan 12 05:30:28 2009
New Revision: 733744
URL: http://svn.apache.org/viewvc?rev=733744view=rev
Log:
Unrelated change went in accidentally in NUTCH-442. Reverting to old version.
Modified:
lucene/nutch/trunk/src/plugin/build.xml
Modified: lucene/nutch/trunk/src
Author: dogacan
Date: Mon Jan 12 05:37:23 2009
New Revision: 733747
URL: http://svn.apache.org/viewvc?rev=733747view=rev
Log:
NUTCH-652 - AdaptiveFetchSchedule#setFetchSchedule doesn't calculate fetch
interval correctly
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src
Author: dogacan
Date: Mon Jan 12 09:33:16 2009
New Revision: 733848
URL: http://svn.apache.org/viewvc?rev=733848view=rev
Log:
Two more NUTCH-442 changes:
* Delete TestDistributedSearch for now
* Set reduceSpeculativeExecution false for SolrIndexer
Removed:
lucene/nutch/trunk/src/test/org
Author: dogacan
Date: Mon Jan 19 09:09:47 2009
New Revision: 735748
URL: http://svn.apache.org/viewvc?rev=735748view=rev
Log:
NUTCH-678 - Hadoop 0.19 requires an update of jets3t (julien nioche)
Added:
lucene/nutch/trunk/lib/jets3t-0.6.1.jar (with props)
Removed:
lucene/nutch/trunk/lib
Author: dogacan
Date: Wed Jan 21 05:09:48 2009
New Revision: 736307
URL: http://svn.apache.org/viewvc?rev=736307view=rev
Log:
NUTCH-681 - parse-mp3 compilation problem. Patch by Wildan Maulana.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/plugin/parse-mp3/src/java/org
Author: dogacan
Date: Wed Jan 21 11:26:27 2009
New Revision: 736385
URL: http://svn.apache.org/viewvc?rev=736385view=rev
Log:
NUTCH-676 - MapWritable is written inefficiently and confusingly.
Removed:
lucene/nutch/trunk/src/test/org/apache/nutch/crawl/TestMapWritable.java
Modified
Author: dogacan
Date: Wed Jan 21 11:41:55 2009
New Revision: 736388
URL: http://svn.apache.org/viewvc?rev=736388view=rev
Log:
NUTCH-579 - Feed plugin only indexes one post per feed due to identical digest
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org/apache
Author: dogacan
Date: Sat Jan 24 10:28:37 2009
New Revision: 737325
URL: http://svn.apache.org/viewvc?rev=737325view=rev
Log:
NUTCH-680 - Update external jars to latest versions
Updates:
nekohtml
lucene-highlighter
icu4j
jakarta-oro
Added:
lucene/nutch/trunk/lib/icu4j-4_0_1.LICENSE.txt
Author: dogacan
Date: Tue Jan 27 10:21:58 2009
New Revision: 738049
URL: http://svn.apache.org/viewvc?rev=738049view=rev
Log:
NUTCH-680 - Remove pmd-ext jars for now
Removed:
lucene/nutch/trunk/lib/pmd-ext/
Author: dogacan
Date: Wed Jan 28 11:33:20 2009
New Revision: 738455
URL: http://svn.apache.org/viewvc?rev=738455view=rev
Log:
NUTCH-571 - parse-mp3 plugin doesn't always index album of mp3. Patch
by Joseph Chen.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/plugin
Author: dogacan
Date: Mon Mar 9 17:34:51 2009
New Revision: 751774
URL: http://svn.apache.org/viewvc?rev=751774view=rev
Log:
NUTCH-684 - Dedup support for Solr
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrDeleteDuplicates.java
Modified:
lucene/nutch/trunk
Author: dogacan
Date: Thu Apr 2 12:46:47 2009
New Revision: 761271
URL: http://svn.apache.org/viewvc?rev=761271view=rev
Log:
NUTCH-721 - Commit old fetcher as OldFetcher for now so that we can test
Fetcher2 performance.
Added:
lucene/nutch/trunk/src/java/org/apache/nutch/fetcher
Author: dogacan
Date: Sun Jun 7 17:12:18 2009
New Revision: 782412
URL: http://svn.apache.org/viewvc?rev=782412view=rev
Log:
NUTCH-735 - crawl-tool.xml must be read before nutch-site.xml when invoked
using crawl command. Patch by Susam Pal.
Modified:
lucene/nutch/trunk/CHANGES.txt
Author: dogacan
Date: Tue Jun 30 07:09:14 2009
New Revision: 789591
URL: http://svn.apache.org/viewvc?rev=789591view=rev
Log:
Remove dtd URL from xml in TestNodeWalker to prevent build failures for now.
Modified:
lucene/nutch/trunk/src/test/org/apache/nutch/util/TestNodeWalker.java
Modified
Author: dogacan
Date: Sun Aug 16 21:30:22 2009
New Revision: 804782
URL: http://svn.apache.org/viewvc?rev=804782view=rev
Log:
Creating initial nutchbase branch.
Added:
lucene/nutch/branches/nutchbase/
- copied from r804781, lucene/nutch/trunk/
Author: dogacan
Date: Tue Aug 25 05:45:53 2009
New Revision: 807485
URL: http://svn.apache.org/viewvc?rev=807485view=rev
Log:
Fetcher2 slow. Patch contributed by Julien Nioche.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/conf/nutch-default.xml
lucene/nutch/trunk/src
Author: dogacan
Date: Tue Sep 8 13:15:03 2009
New Revision: 812497
URL: http://svn.apache.org/viewvc?rev=812497view=rev
Log:
NUTCH-702 - Lazy Instanciation of Metadata in CrawlDatum. Contributed by Julien
Nioche.
Modified:
lucene/nutch/trunk/CHANGES.txt
lucene/nutch/trunk/src/java/org
63 matches
Mail list logo