Re: [DISCUSS] Board resolution for Nutch as TLP

2010-04-10 Thread Dennis Kubes
are appointed to serve as the initial members of the Apache Nutch Project: • Andrzej Bialecki a...@... • Otis Gospodnetic o...@... • Dogacan Guney doga...@... • Dennis Kubes ku...@... • Chris Mattmann mattm...@... • Julien Nioche jnio

[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

2009-12-14 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790162#action_12790162 ] Dennis Kubes commented on NUTCH-768: The older jetty jar file was not removed

Re: Build failed in Hudson: Nutch-trunk #1011

2009-12-14 Thread Dennis Kubes
This is failing because of the older jetty jar being removed and the Jetty interfaces changes. I am currently working to fix the interfaces for the new Jetty version. Hope to have a patch committed later today and this should be back to normal. Dennis Apache Hudson Server wrote: See

[jira] Closed: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

2009-12-01 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-768. -- Resolution: Fixed Weird. The hsqldb License file was the same checksum as that pulled from hadoop

[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

2009-11-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784066#action_12784066 ] Dennis Kubes commented on NUTCH-768: If no objections I will commit this tomorrow

Re: svn commit: r884075 - /lucene/nutch/trunk/src/java/org/apache/nutch/indexer/solr/SolrIndexer.java

2009-11-25 Thread Dennis Kubes
Oops. Sorry about that. a...@apache.org wrote: Author: ab Date: Wed Nov 25 12:44:34 2009 New Revision: 884075 URL: http://svn.apache.org/viewvc?rev=884075view=rev Log: Change access from private to public - this fixes Crawl.java breakage. Modified:

[jira] Updated: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

2009-11-25 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-768: --- Attachment: NUTCH-768-1-20091125.patch I thought I was going to be able to do this without code

[jira] Created: (NUTCH-771) Add WebGraph classes to the bin/nutch script

2009-11-24 Thread Dennis Kubes (JIRA)
Environment: All, shell script Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.1 Currently the webgraph jobs are called on the command line by calling main methods on their classes. I propose to upgrade the bin/nutch shell script to allow calling these jobs

[jira] Commented: (NUTCH-768) Upgrade Nutch 1.0 to use Hadoop 0.20

2009-11-24 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782172#action_12782172 ] Dennis Kubes commented on NUTCH-768: I have tested the upgrade with Hadoop 0.20

[jira] Assigned: (NUTCH-765) Allow Crawl class to call Either Solr or Lucene Indexer

2009-11-21 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes reassigned NUTCH-765: -- Assignee: Dennis Kubes Allow Crawl class to call Either Solr or Lucene Indexer

[jira] Resolved: (NUTCH-765) Allow Crawl class to call Either Solr or Lucene Indexer

2009-11-21 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-765. Resolution: Fixed Committed. Allow Crawl class to call Either Solr or Lucene Indexer

[jira] Closed: (NUTCH-765) Allow Crawl class to call Either Solr or Lucene Indexer

2009-11-21 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-765. -- Allow Crawl class to call Either Solr or Lucene Indexer

Re: Plugin Help

2009-11-14 Thread Dennis Kubes
It depends on how you are building and your classpath. Lets call your plugin myhtmlfilter. If running on a single server and you added it to your src/plugin/build.xml under the deploy section, a myhtmlfilter folder with the plugin should show up in under the build/plugins folder upon build.

[jira] Updated: (NUTCH-765) Allow Crawl class to call Either Solr or Lucene Indexer

2009-11-12 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-765: --- Attachment: NUTCH-765-2009112-1.patch Allow Crawl class to call Either Solr or Lucene Indexer

[jira] Created: (NUTCH-765) Allow Crawl class to call Either Solr or Lucene Indexer

2009-11-12 Thread Dennis Kubes (JIRA)
: All Reporter: Dennis Kubes Priority: Minor Fix For: 1.1, 1.0.0 Attachments: NUTCH-765-2009112-1.patch Change to the crawl class to have a -solr option which will call the solr indexer instead of the lucene indexer. This also allows it to ignore dedup

Re: Server suggestion

2009-07-25 Thread Dennis Kubes
My mistake, you're right. The last processing clusters we built were using Xeon quad cores, not i7s. The i7s were search servers which didn't need ecc memory. AFAICT, wikipedia is correct and the i7s don't yet support ECC. So my suggestion would be to stick with Xeon procs or something

Re: Nutch dev. plans

2009-07-17 Thread Dennis Kubes
Doğacan Güney wrote: On Fri, Jul 17, 2009 at 21:32, Andrzej Bialeckia...@getopt.org wrote: Doğacan Güney wrote: Hey list, On Fri, Jul 17, 2009 at 16:55, Andrzej Bialeckia...@getopt.org wrote: Hi all, I think we should be creating a sandbox area, where we can collaborate on various

Re: Ranking Scoring Algorithm Pseudocode

2009-05-31 Thread Dennis Kubes
There isn't any pseudocode for this. The code for the main algorithm is in the LinkRank class. It is similar in nature to PageRank except it has the ability to filter reciprocal links. If the Link Loops program is run it also has the ability to filter out link cycles, but that program is

Re: Ranking Algorithms

2009-05-18 Thread Dennis Kubes
The answer is simple and not so simple at the same time. Last year we put in quite a bit of work to implement a stable PageRank like algorithm into Nutch. This was released as the new scoring and indexing frameworks. That give a good general relevancy score, but it is really a starting

Re: LinkRank why 10 iterations?

2009-03-27 Thread Dennis Kubes
You are running LinkRank and a comparatively small webgraph. LinkRank is meant, in principle, to be run on very large webgraphs, millions or perhaps 100s of millions of urls. On that scale 10 iterations was what we saw as a good default for the webgraph to converge while not taking an

[jira] Closed: (NUTCH-291) OpenSearchServlet should return date as well as lastModified

2009-03-25 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-291. -- Resolution: Fixed The open search servlet has been superseded by formatters for serving results in xml

[jira] Created: (NUTCH-729) NPE in FieldIndexer when BasicFields url doesn't exist

2009-03-25 Thread Dennis Kubes (JIRA)
Affects Versions: 0.9.0, 1.0.0 Environment: All Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.1 There is a NullPointerException during a logging call in FieldIndexer when there isn't a url for a document. Documents shouldn't be without

[jira] Updated: (NUTCH-729) NPE in FieldIndexer when BasicFields url doesn't exist

2009-03-25 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-729: --- Attachment: NUTCH-729-1-20090235.patch Simple patch. Changes the logging to use the key (which

Re: [VOTE] Release Apache Nutch 1.0

2009-03-25 Thread Dennis Kubes
+1, is this binding? :) Dog(acan Güney wrote: Another non-binding +1 from me. Hope this one is a keeper :D On Mon, Mar 23, 2009 at 22:28, Sami Siren ssi...@gmail.com mailto:ssi...@gmail.com wrote: Hello, I have packaged the third release candidate for Apache Nutch 1.0 release

[jira] Created: (NUTCH-730) NPE in LinkRank if no nodes with which to create the WebGraph

2009-03-25 Thread Dennis Kubes (JIRA)
Versions: 1.0.0 Environment: All Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.0.0, 1.1 For LinkRank, if there are no nodes to process, then a NullPointerException is thrown when trying to count number of nodes. -- This message

[jira] Updated: (NUTCH-730) NPE in LinkRank if no nodes with which to create the WebGraph

2009-03-25 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-730: --- Attachment: NUTCH-730-1-20090325.patch Throws a more detailed error message if there are no nodes

Re: [VOTE] Release Apache Nutch 1.0

2009-03-08 Thread Dennis Kubes
Non-binding +1 too :) Sami Siren wrote: Hello, I have packaged the first release candidate for Apache Nutch 1.0 release at http://people.apache.org/~siren/nutch-1.0/rc0/ See the included CHANGES.txt file for details on release contents and latest changes. The release was made from tag:

Re: planning for nutch-1.0-rc1

2009-03-08 Thread Dennis Kubes
) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65) at org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275) In crawl/indexes is only _temporary folder. I will try to debug this but have problems with running nutch in eclipse Thanks, Bartosz Dennis Kubes pisze: I don't know

Re: planning for nutch-1.0-rc1

2009-03-06 Thread Dennis Kubes
NUTCH-578 was a while back but as I remember it worked fine. No objections to either including or pushing it. Dennis Sami Siren wrote: I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009 morning (EET). There are still some issues marked as fix for 1.0 in Jira. Neither of the

Re: planning for nutch-1.0-rc1

2009-03-06 Thread Dennis Kubes
I don't know if I would make this primary yet. I need to check what is causing this as it worked fine for me, in fact we currently have it in production. Also we would need to update the shell scripts to integrate this more tightly. Dennis Bartosz Gadzimski wrote: Sami Siren pisze:

[jira] Commented: (NUTCH-477) Extend URLFilters to support different filtering chains

2009-02-23 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675907#action_12675907 ] Dennis Kubes commented on NUTCH-477: Same here. I am not against having extra

[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2009-01-23 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-666: --- Affects Version/s: (was: 1.0.0) 1.1 Fix Version/s: (was: 1.0.0

[jira] Commented: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2009-01-23 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666484#action_12666484 ] Dennis Kubes commented on NUTCH-666: It is ok to move to 1.1. Analysis plugins

Re: Site update

2009-01-05 Thread Dennis Kubes
http://www.mail-archive.com/d...@forrest.apache.org/msg15136.html This might help. Dennis Andrzej Bialecki wrote: Otis Gospodnetic wrote: Below is what it spits out. I'm not sure what the cause is. I did try forrest seed forrest validate as prescribed at

[jira] Closed: (NUTCH-594) Serve Nutch search results in multiple formats including XML and JSON

2009-01-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-594. -- Serve Nutch search results in multiple formats including XML and JSON

[jira] Commented: (NUTCH-572) Scoring and redirected Urls

2009-01-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12660394#action_12660394 ] Dennis Kubes commented on NUTCH-572: I would like to close this issue. Redirect

[jira] Issue Comment Edited: (NUTCH-594) Serve Nutch search results in multiple formats including XML and JSON

2008-12-30 Thread Dennis Kubes (JIRA)
- Key: NUTCH-594 URL: https://issues.apache.org/jira/browse/NUTCH-594 Project: Nutch Issue Type: New Feature Environment: all Reporter: Dennis Kubes Assignee: Dennis Kubes

[jira] Commented: (NUTCH-594) Serve Nutch search results in multiple formats including XML and JSON

2008-12-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12659825#action_12659825 ] Dennis Kubes commented on NUTCH-594: JSON-LIb and EZMorph are both under Apache

[jira] Updated: (NUTCH-594) Serve Nutch search results in multiple formats including XML and JSON

2008-12-30 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Attachment: NUTCH-594-4-20081230.patch Final patch. Adds the ability to stop summaries from being

[jira] Resolved: (NUTCH-668) Domain URL Filter

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-668. Resolution: Fixed Committed with revision 729958. Domain URL Filter

[jira] Updated: (NUTCH-594) Serve Nutch search results in XML and JSON

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Attachment: ezmorph-1.0.6.jar ezmorph jar required for framework Serve Nutch search results in XML

[jira] Updated: (NUTCH-594) Serve Nutch search results in XML and JSON

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Attachment: NUTCH-594-3-20081229.patch A completely reworked framework with extension point

[jira] Updated: (NUTCH-594) Serve Nutch search results in multiple formats including XML and JSON

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Summary: Serve Nutch search results in multiple formats including XML and JSON (was: Serve Nutch

[jira] Updated: (NUTCH-594) Serve Nutch search results in XML and JSON

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Attachment: commons-beanutils-1.8.0.jar commons beanutils Serve Nutch search results in XML

[jira] Updated: (NUTCH-594) Serve Nutch search results in XML and JSON

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Attachment: commons-collections-3.2.1.jar commons collections Serve Nutch search results in XML

[jira] Updated: (NUTCH-594) Serve Nutch search results in XML and JSON

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Attachment: json-lib-2.2.2-jdk15.jar json lib jar Serve Nutch search results in XML and JSON

[jira] Updated: (NUTCH-594) Serve Nutch search results in multiple formats including XML and JSON

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Attachment: (was: NUTCH-594-3-20081229.patch) Serve Nutch search results in multiple formats

[jira] Updated: (NUTCH-594) Serve Nutch search results in multiple formats including XML and JSON

2008-12-29 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-594: --- Attachment: NUTCH-594-3-20081229.patch Fixed some things. Added the ability to set mime output type

Re: [jira] Commented: (NUTCH-675) Reduce tasks do not report their status and are killed by jobtracker

2008-12-22 Thread Dennis Kubes
This is old. It has been fixed in more recent versions of hadoop and nutch. Otis Gospodnetic (JIRA) wrote: [ https://issues.apache.org/jira/browse/NUTCH-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12658610#action_12658610 ] Otis Gospodnetic

[jira] Commented: (NUTCH-668) Domain URL Filter

2008-12-19 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12658118#action_12658118 ] Dennis Kubes commented on NUTCH-668: Anybody have a problem if I commit this today

Re: File system

2008-12-16 Thread Dennis Kubes
the data inside those files (like html pages) I can find no algorithm available by nutch, nor the process used to store the data. Do you know if it is possible to extract using lucene? Dennis Kubes-2 wrote: The nutch databases are either SequenceFile or MapFile formats which store key

Re: File system

2008-12-15 Thread Dennis Kubes
The nutch databases are either SequenceFile or MapFile formats which store key and value pairs. Their keys and values are Writable implementations which translate an object into it byte equivalent and vice versa. Data and index files are MapFile format. Data is a SequenceFile, index is an

[jira] Closed: (NUTCH-448) Allow Plugin Includes and Excludes from File

2008-12-09 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-448. -- Resolution: Later This was some old functionality that seemed good at the time. Not so much now

[jira] Commented: (NUTCH-646) New Indexing Framework for Nutch

2008-12-06 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654154#action_12654154 ] Dennis Kubes commented on NUTCH-646: Not yet. I need to write up some serious

Domain URL filter Commit?

2008-12-05 Thread Dennis Kubes
Anybody have a problem with me committing the domain-urlfilter plugin in NUTCH-668? Dennis

[jira] Commented: (NUTCH-668) Domain URL Filter

2008-12-05 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653881#action_12653881 ] Dennis Kubes commented on NUTCH-668: I agree. Being able to search for tlds like .com

Builds are Failing

2008-12-04 Thread Dennis Kubes
After the upgrade to Hadoop, builds are failing because I think we have nutch set to build with Java 5 by default but I think Hadoop is built with Java 6 (At least the release version that I downloaded and used to upgrade Nutch). I know we aren't requiring Nutch to use Java 6 yet. This may

Re: Builds are Failing

2008-12-04 Thread Dennis Kubes
I take it back. Hadoop *requires* java 6 now as of 0.19. Which means we should be making changes to require Nutch to use java 6. Dennis Dennis Kubes wrote: After the upgrade to Hadoop, builds are failing because I think we have nutch set to build with Java 5 by default but I think Hadoop

[jira] Updated: (NUTCH-668) Domain URL Filter

2008-12-04 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-668: --- Attachment: NUTCH-668-2-20081204.patch Updated to include URLUtil methods that were missing. Sorry

[jira] Commented: (NUTCH-207) Bandwidth target for fetcher rather than a thread count

2008-12-04 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653404#action_12653404 ] Dennis Kubes commented on NUTCH-207: I think this would be an interesting addition

[jira] Closed: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-12-04 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-635. -- LinkAnalysis Tool for Nutch --- Key: NUTCH-635

[jira] Resolved: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-12-04 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-635. Resolution: Fixed Committed with revision 723441 LinkAnalysis Tool for Nutch

[jira] Commented: (NUTCH-646) New Indexing Framework for Nutch

2008-12-04 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653489#action_12653489 ] Dennis Kubes commented on NUTCH-646: For the final version of this I have removed

[jira] Resolved: (NUTCH-646) New Indexing Framework for Nutch

2008-12-04 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-646. Resolution: Fixed Committed with revision 723447 New Indexing Framework for Nutch

[jira] Resolved: (NUTCH-662) Upgrade Nutch to use Lucene 2.4

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-662. Resolution: Fixed Committed with revision 722475 Upgrade Nutch to use Lucene 2.4

[jira] Closed: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-663. -- Upgrade Nutch to use Hadoop 0.19 Key: NUTCH-663

[jira] Closed: (NUTCH-647) Resolve URLs tool

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-647. -- Resolve URLs tool - Key: NUTCH-647 URL: https

[jira] Resolved: (NUTCH-647) Resolve URLs tool

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-647. Resolution: Fixed Fix Version/s: 1.0.0 Committed with revision 722478 Resolve URLs tool

[jira] Resolved: (NUTCH-665) Search Load Testing Tool

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-665. Resolution: Fixed Committed with revision 722481 Search Load Testing Tool

[jira] Closed: (NUTCH-665) Search Load Testing Tool

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-665. -- Search Load Testing Tool Key: NUTCH-665 URL

[jira] Closed: (NUTCH-667) Input Format for working with Content in Hadoop Streaming

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes closed NUTCH-667. -- Input Format for working with Content in Hadoop Streaming

[jira] Resolved: (NUTCH-667) Input Format for working with Content in Hadoop Streaming

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes resolved NUTCH-667. Resolution: Fixed Committed with revision 722483 Input Format for working with Content in Hadoop

[jira] Created: (NUTCH-668) Domain URL Filter

2008-12-02 Thread Dennis Kubes (JIRA)
Domain URL Filter - Key: NUTCH-668 URL: https://issues.apache.org/jira/browse/NUTCH-668 Project: Nutch Issue Type: Improvement Affects Versions: 1.0.0 Environment: All Reporter: Dennis Kubes

[jira] Updated: (NUTCH-668) Domain URL Filter

2008-12-02 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-668: --- Attachment: NUTCH-668-1-20081202.patch Includes the DomainURLFilter and test files. Domains can

Re: Pending Commits for Nutch Issues

2008-11-27 Thread Dennis Kubes
Doğacan Güney wrote: Hi Dennis, On Wed, Nov 26, 2008 at 11:42 PM, Dennis Kubes [EMAIL PROTECTED] wrote: If nobody has a problem with them I would like to commit the following issues in the next day or two: NUTCH-663: Upgrade Nutch to the most recent Hadoop version (0.19) NUTCH-662: Upgrade

[jira] Updated: (NUTCH-665) Search Load Testing Tool

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-665: --- Attachment: NUTCH-665-20081126-1.patch Search load testing tool. Search Load Testing Tool

[jira] Updated: (NUTCH-647) Resolve URLs tool

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-647: --- Attachment: NUTCH-647-2-20081126.patch Updated patch. Resolve URLs tool

[jira] Created: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)
: Improvement Affects Versions: 1.0.0 Environment: All Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.0.0 Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, russian, and thai. Also includes a new Language Identifier

[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-666: --- Attachment: NUTCH-666-1-20081126.patch Part one of patch. This includes the new analyzers

[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-663: --- Attachment: NUTCH-663-1-20081126.patch Updates jar and native files Upgrade Nutch to use Hadoop

[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-663: --- Attachment: hadoop-0.19.0-core.jar Hadoop core jar Upgrade Nutch to use Hadoop 0.18.2

[jira] Commented: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650982#action_12650982 ] Dennis Kubes commented on NUTCH-663: hadoop 0.19 was release. I am integrating

[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-663: --- Summary: Upgrade Nutch to use Hadoop 0.19 (was: Upgrade Nutch to use Hadoop 0.18.2) change to 0.19

[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-666: --- Attachment: (was: NUTCH-666-1-20081126.patch) Analysis plugins for multiple language and new

[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-663: --- Attachment: NUTCH-663-1-20081126.patch Updated patch to include API changes in Nutch classes

[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-663: --- Attachment: (was: NUTCH-663-1-20081126.patch) Upgrade Nutch to use Hadoop 0.19

[jira] Updated: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-635: --- Attachment: (was: NUTCH-635-8-20080818.patch) LinkAnalysis Tool for Nutch

[jira] Updated: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-635: --- Attachment: NUTCH-635-9-20081126.patch Updated final patch for new link analysis framework. I am

[jira] Created: (NUTCH-667) Input Forma for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)
Versions: 1.0.0 Environment: All Reporter: Dennis Kubes Assignee: Dennis Kubes Priority: Minor Fix For: 1.0.0 This is a ContextAsText input format that removes line endings with spaces that allow Nutch content to be used more effectively inside

[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-666: --- Attachment: NUTCH-666-1-20081126.patch Fixed patch. Now includes the changes to AnalyzerFactory

[jira] Updated: (NUTCH-667) Input Forma for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-667: --- Attachment: NUTCH-667-1-20081126.patch Input format for working with hadoop streaming. Input Forma

[jira] Updated: (NUTCH-667) Input Format for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-667: --- Summary: Input Format for working with Content in Hadoop Streaming (was: Input Forma for working

[jira] Updated: (NUTCH-646) New Indexing Framework for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-646: --- Attachment: NUTCH-646-2-20081126.patch Updated indexing patch. New Indexing Framework for Nutch

[jira] Commented: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-25 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650713#action_12650713 ] Dennis Kubes commented on NUTCH-663: @buddha1021 The 1.0 release for Nutch has some

[jira] Commented: (NUTCH-662) Upgrade Nutch to use Lucene 2.4

2008-11-23 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650009#action_12650009 ] Dennis Kubes commented on NUTCH-662: We had been running in production for about a month

[jira] Created: (NUTCH-662) Upgrade Nutch to use Lucene 2.4

2008-11-21 Thread Dennis Kubes (JIRA)
Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.0.0 Upgrade nutch to use Lucene 2.4. This release changes the lucene file format. New indexes created by this lucene version will NOT be readable by older versions. Lucene 2.4 can read and update older index

[jira] Created: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-21 Thread Dennis Kubes (JIRA)
Reporter: Dennis Kubes Assignee: Dennis Kubes Fix For: 1.0.0 Upgrade Nutch to use a newer hadoop, version 0.18.2. This includes performance improvements, bug fixes, and new functionality. Changes some current APIs. -- This message is automatically generated by JIRA

[jira] Updated: (NUTCH-662) Upgrade Nutch to use Lucene 2.4

2008-11-21 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-662: --- Attachment: lucene-misc-2.4.0.jar Upgrade Nutch to use Lucene 2.4

[jira] Commented: (NUTCH-662) Upgrade Nutch to use Lucene 2.4

2008-11-21 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12649679#action_12649679 ] Dennis Kubes commented on NUTCH-662: The upgrade to Lucene 2.4 causes a weird problem

[jira] Updated: (NUTCH-662) Upgrade Nutch to use Lucene 2.4

2008-11-21 Thread Dennis Kubes (JIRA)
[ https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dennis Kubes updated NUTCH-662: --- Attachment: lucene-analyzers-2.4.0.jar Upgrade Nutch to use Lucene 2.4

  1   2   3   4   >