are appointed to serve as the initial members of the
Apache Nutch Project:
• Andrzej Bialecki a...@...
• Otis Gospodnetic o...@...
• Dogacan Guney doga...@...
• Dennis Kubes ku...@...
• Chris Mattmann mattm...@...
• Julien Nioche jnio
[
https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12790162#action_12790162
]
Dennis Kubes commented on NUTCH-768:
The older jetty jar file was not removed
This is failing because of the older jetty jar being removed and the
Jetty interfaces changes. I am currently working to fix the interfaces
for the new Jetty version. Hope to have a patch committed later today
and this should be back to normal.
Dennis
Apache Hudson Server wrote:
See
[
https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-768.
--
Resolution: Fixed
Weird. The hsqldb License file was the same checksum as that pulled from
hadoop
[
https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12784066#action_12784066
]
Dennis Kubes commented on NUTCH-768:
If no objections I will commit this tomorrow
Oops. Sorry about that.
a...@apache.org wrote:
Author: ab
Date: Wed Nov 25 12:44:34 2009
New Revision: 884075
URL: http://svn.apache.org/viewvc?rev=884075view=rev
Log:
Change access from private to public - this fixes Crawl.java breakage.
Modified:
[
https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-768:
---
Attachment: NUTCH-768-1-20091125.patch
I thought I was going to be able to do this without code
Environment: All, shell script
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Fix For: 1.1
Currently the webgraph jobs are called on the command line by calling main
methods on their classes. I propose to upgrade the bin/nutch shell script to
allow calling these jobs
[
https://issues.apache.org/jira/browse/NUTCH-768?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12782172#action_12782172
]
Dennis Kubes commented on NUTCH-768:
I have tested the upgrade with Hadoop 0.20
[
https://issues.apache.org/jira/browse/NUTCH-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes reassigned NUTCH-765:
--
Assignee: Dennis Kubes
Allow Crawl class to call Either Solr or Lucene Indexer
[
https://issues.apache.org/jira/browse/NUTCH-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-765.
Resolution: Fixed
Committed.
Allow Crawl class to call Either Solr or Lucene Indexer
[
https://issues.apache.org/jira/browse/NUTCH-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-765.
--
Allow Crawl class to call Either Solr or Lucene Indexer
It depends on how you are building and your classpath. Lets call your
plugin myhtmlfilter. If running on a single server and you added it to
your src/plugin/build.xml under the deploy section, a myhtmlfilter
folder with the plugin should show up in under the build/plugins folder
upon build.
[
https://issues.apache.org/jira/browse/NUTCH-765?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-765:
---
Attachment: NUTCH-765-2009112-1.patch
Allow Crawl class to call Either Solr or Lucene Indexer
: All
Reporter: Dennis Kubes
Priority: Minor
Fix For: 1.1, 1.0.0
Attachments: NUTCH-765-2009112-1.patch
Change to the crawl class to have a -solr option which will call the solr
indexer instead of the lucene indexer. This also allows it to ignore dedup
My mistake, you're right. The last processing clusters we built were
using Xeon quad cores, not i7s. The i7s were search servers which
didn't need ecc memory. AFAICT, wikipedia is correct and the i7s don't
yet support ECC.
So my suggestion would be to stick with Xeon procs or something
Doğacan Güney wrote:
On Fri, Jul 17, 2009 at 21:32, Andrzej Bialeckia...@getopt.org wrote:
Doğacan Güney wrote:
Hey list,
On Fri, Jul 17, 2009 at 16:55, Andrzej Bialeckia...@getopt.org wrote:
Hi all,
I think we should be creating a sandbox area, where we can collaborate
on various
There isn't any pseudocode for this. The code for the main algorithm is
in the LinkRank class. It is similar in nature to PageRank except it
has the ability to filter reciprocal links. If the Link Loops program
is run it also has the ability to filter out link cycles, but that
program is
The answer is simple and not so simple at the same time. Last year we
put in quite a bit of work to implement a stable PageRank like algorithm
into Nutch. This was released as the new scoring and indexing
frameworks. That give a good general relevancy score, but it is really
a starting
You are running LinkRank and a comparatively small webgraph. LinkRank
is meant, in principle, to be run on very large webgraphs, millions or
perhaps 100s of millions of urls. On that scale 10 iterations was what
we saw as a good default for the webgraph to converge while not taking
an
[
https://issues.apache.org/jira/browse/NUTCH-291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-291.
--
Resolution: Fixed
The open search servlet has been superseded by formatters for serving results
in xml
Affects Versions: 0.9.0, 1.0.0
Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Fix For: 1.1
There is a NullPointerException during a logging call in FieldIndexer when
there isn't a url for a document. Documents shouldn't be without
[
https://issues.apache.org/jira/browse/NUTCH-729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-729:
---
Attachment: NUTCH-729-1-20090235.patch
Simple patch. Changes the logging to use the key (which
+1, is this binding? :)
Dog(acan Güney wrote:
Another non-binding +1 from me.
Hope this one is a keeper :D
On Mon, Mar 23, 2009 at 22:28, Sami Siren ssi...@gmail.com
mailto:ssi...@gmail.com wrote:
Hello,
I have packaged the third release candidate for Apache Nutch 1.0
release
Versions: 1.0.0
Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Fix For: 1.0.0, 1.1
For LinkRank, if there are no nodes to process, then a NullPointerException is
thrown when trying to count number of nodes.
--
This message
[
https://issues.apache.org/jira/browse/NUTCH-730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-730:
---
Attachment: NUTCH-730-1-20090325.patch
Throws a more detailed error message if there are no nodes
Non-binding +1 too :)
Sami Siren wrote:
Hello,
I have packaged the first release candidate for Apache Nutch 1.0 release at
http://people.apache.org/~siren/nutch-1.0/rc0/
See the included CHANGES.txt file for details on release contents and
latest changes. The release was made from tag:
)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.field.FieldIndexer.main(FieldIndexer.java:275)
In crawl/indexes is only _temporary folder.
I will try to debug this but have problems with running nutch in eclipse
Thanks,
Bartosz
Dennis Kubes pisze:
I don't know
NUTCH-578 was a while back but as I remember it worked fine. No
objections to either including or pushing it.
Dennis
Sami Siren wrote:
I am planning to build the first rc for nutch 1.0 at Tue 3.3.2009
morning (EET). There are still some issues marked as fix for 1.0 in
Jira. Neither of the
I don't know if I would make this primary yet. I need to check what is
causing this as it worked fine for me, in fact we currently have it in
production. Also we would need to update the shell scripts to integrate
this more tightly.
Dennis
Bartosz Gadzimski wrote:
Sami Siren pisze:
[
https://issues.apache.org/jira/browse/NUTCH-477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12675907#action_12675907
]
Dennis Kubes commented on NUTCH-477:
Same here. I am not against having extra
[
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-666:
---
Affects Version/s: (was: 1.0.0)
1.1
Fix Version/s: (was: 1.0.0
[
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12666484#action_12666484
]
Dennis Kubes commented on NUTCH-666:
It is ok to move to 1.1.
Analysis plugins
http://www.mail-archive.com/d...@forrest.apache.org/msg15136.html
This might help.
Dennis
Andrzej Bialecki wrote:
Otis Gospodnetic wrote:
Below is what it spits out. I'm not sure what the cause is. I did
try forrest seed forrest validate as prescribed at
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-594.
--
Serve Nutch search results in multiple formats including XML and JSON
[
https://issues.apache.org/jira/browse/NUTCH-572?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12660394#action_12660394
]
Dennis Kubes commented on NUTCH-572:
I would like to close this issue. Redirect
-
Key: NUTCH-594
URL: https://issues.apache.org/jira/browse/NUTCH-594
Project: Nutch
Issue Type: New Feature
Environment: all
Reporter: Dennis Kubes
Assignee: Dennis Kubes
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12659825#action_12659825
]
Dennis Kubes commented on NUTCH-594:
JSON-LIb and EZMorph are both under Apache
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Attachment: NUTCH-594-4-20081230.patch
Final patch. Adds the ability to stop summaries from being
[
https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-668.
Resolution: Fixed
Committed with revision 729958.
Domain URL Filter
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Attachment: ezmorph-1.0.6.jar
ezmorph jar required for framework
Serve Nutch search results in XML
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Attachment: NUTCH-594-3-20081229.patch
A completely reworked framework with extension point
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Summary: Serve Nutch search results in multiple formats including XML and
JSON (was: Serve Nutch
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Attachment: commons-beanutils-1.8.0.jar
commons beanutils
Serve Nutch search results in XML
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Attachment: commons-collections-3.2.1.jar
commons collections
Serve Nutch search results in XML
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Attachment: json-lib-2.2.2-jdk15.jar
json lib jar
Serve Nutch search results in XML and JSON
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Attachment: (was: NUTCH-594-3-20081229.patch)
Serve Nutch search results in multiple formats
[
https://issues.apache.org/jira/browse/NUTCH-594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-594:
---
Attachment: NUTCH-594-3-20081229.patch
Fixed some things. Added the ability to set mime output type
This is old. It has been fixed in more recent versions of hadoop and nutch.
Otis Gospodnetic (JIRA) wrote:
[ https://issues.apache.org/jira/browse/NUTCH-675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12658610#action_12658610 ]
Otis Gospodnetic
[
https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12658118#action_12658118
]
Dennis Kubes commented on NUTCH-668:
Anybody have a problem if I commit this today
the data inside those files (like html pages) I can
find no algorithm available by nutch, nor the process used to store the
data. Do you know if it is possible to extract using lucene?
Dennis Kubes-2 wrote:
The nutch databases are either SequenceFile or MapFile formats which
store key
The nutch databases are either SequenceFile or MapFile formats which
store key and value pairs. Their keys and values are Writable
implementations which translate an object into it byte equivalent and
vice versa.
Data and index files are MapFile format. Data is a SequenceFile, index
is an
[
https://issues.apache.org/jira/browse/NUTCH-448?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-448.
--
Resolution: Later
This was some old functionality that seemed good at the time. Not so much now
[
https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12654154#action_12654154
]
Dennis Kubes commented on NUTCH-646:
Not yet. I need to write up some serious
Anybody have a problem with me committing the domain-urlfilter plugin in
NUTCH-668?
Dennis
[
https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653881#action_12653881
]
Dennis Kubes commented on NUTCH-668:
I agree. Being able to search for tlds like .com
After the upgrade to Hadoop, builds are failing because I think we have
nutch set to build with Java 5 by default but I think Hadoop is built
with Java 6 (At least the release version that I downloaded and used to
upgrade Nutch).
I know we aren't requiring Nutch to use Java 6 yet. This may
I take it back. Hadoop *requires* java 6 now as of 0.19. Which means
we should be making changes to require Nutch to use java 6.
Dennis
Dennis Kubes wrote:
After the upgrade to Hadoop, builds are failing because I think we have
nutch set to build with Java 5 by default but I think Hadoop
[
https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-668:
---
Attachment: NUTCH-668-2-20081204.patch
Updated to include URLUtil methods that were missing. Sorry
[
https://issues.apache.org/jira/browse/NUTCH-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653404#action_12653404
]
Dennis Kubes commented on NUTCH-207:
I think this would be an interesting addition
[
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-635.
--
LinkAnalysis Tool for Nutch
---
Key: NUTCH-635
[
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-635.
Resolution: Fixed
Committed with revision 723441
LinkAnalysis Tool for Nutch
[
https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12653489#action_12653489
]
Dennis Kubes commented on NUTCH-646:
For the final version of this I have removed
[
https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-646.
Resolution: Fixed
Committed with revision 723447
New Indexing Framework for Nutch
[
https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-662.
Resolution: Fixed
Committed with revision 722475
Upgrade Nutch to use Lucene 2.4
[
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-663.
--
Upgrade Nutch to use Hadoop 0.19
Key: NUTCH-663
[
https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-647.
--
Resolve URLs tool
-
Key: NUTCH-647
URL: https
[
https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-647.
Resolution: Fixed
Fix Version/s: 1.0.0
Committed with revision 722478
Resolve URLs tool
[
https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-665.
Resolution: Fixed
Committed with revision 722481
Search Load Testing Tool
[
https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-665.
--
Search Load Testing Tool
Key: NUTCH-665
URL
[
https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes closed NUTCH-667.
--
Input Format for working with Content in Hadoop Streaming
[
https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes resolved NUTCH-667.
Resolution: Fixed
Committed with revision 722483
Input Format for working with Content in Hadoop
Domain URL Filter
-
Key: NUTCH-668
URL: https://issues.apache.org/jira/browse/NUTCH-668
Project: Nutch
Issue Type: Improvement
Affects Versions: 1.0.0
Environment: All
Reporter: Dennis Kubes
[
https://issues.apache.org/jira/browse/NUTCH-668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-668:
---
Attachment: NUTCH-668-1-20081202.patch
Includes the DomainURLFilter and test files. Domains can
Doğacan Güney wrote:
Hi Dennis,
On Wed, Nov 26, 2008 at 11:42 PM, Dennis Kubes [EMAIL PROTECTED] wrote:
If nobody has a problem with them I would like to commit the following
issues in the next day or two:
NUTCH-663: Upgrade Nutch to the most recent Hadoop version (0.19)
NUTCH-662: Upgrade
[
https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-665:
---
Attachment: NUTCH-665-20081126-1.patch
Search load testing tool.
Search Load Testing Tool
[
https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-647:
---
Attachment: NUTCH-647-2-20081126.patch
Updated patch.
Resolve URLs tool
: Improvement
Affects Versions: 1.0.0
Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Fix For: 1.0.0
Add analysis plugins for czech, greek, japanese, chinese, korean, dutch,
russian, and thai. Also includes a new Language Identifier
[
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-666:
---
Attachment: NUTCH-666-1-20081126.patch
Part one of patch. This includes the new analyzers
[
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-663:
---
Attachment: NUTCH-663-1-20081126.patch
Updates jar and native files
Upgrade Nutch to use Hadoop
[
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-663:
---
Attachment: hadoop-0.19.0-core.jar
Hadoop core jar
Upgrade Nutch to use Hadoop 0.18.2
[
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650982#action_12650982
]
Dennis Kubes commented on NUTCH-663:
hadoop 0.19 was release. I am integrating
[
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-663:
---
Summary: Upgrade Nutch to use Hadoop 0.19 (was: Upgrade Nutch to use
Hadoop 0.18.2)
change to 0.19
[
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-666:
---
Attachment: (was: NUTCH-666-1-20081126.patch)
Analysis plugins for multiple language and new
[
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-663:
---
Attachment: NUTCH-663-1-20081126.patch
Updated patch to include API changes in Nutch classes
[
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-663:
---
Attachment: (was: NUTCH-663-1-20081126.patch)
Upgrade Nutch to use Hadoop 0.19
[
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-635:
---
Attachment: (was: NUTCH-635-8-20080818.patch)
LinkAnalysis Tool for Nutch
[
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-635:
---
Attachment: NUTCH-635-9-20081126.patch
Updated final patch for new link analysis framework. I am
Versions: 1.0.0
Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Priority: Minor
Fix For: 1.0.0
This is a ContextAsText input format that removes line endings with spaces that
allow Nutch content to be used more effectively inside
[
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-666:
---
Attachment: NUTCH-666-1-20081126.patch
Fixed patch. Now includes the changes to AnalyzerFactory
[
https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-667:
---
Attachment: NUTCH-667-1-20081126.patch
Input format for working with hadoop streaming.
Input Forma
[
https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-667:
---
Summary: Input Format for working with Content in Hadoop Streaming (was:
Input Forma for working
[
https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-646:
---
Attachment: NUTCH-646-2-20081126.patch
Updated indexing patch.
New Indexing Framework for Nutch
[
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650713#action_12650713
]
Dennis Kubes commented on NUTCH-663:
@buddha1021
The 1.0 release for Nutch has some
[
https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650009#action_12650009
]
Dennis Kubes commented on NUTCH-662:
We had been running in production for about a month
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Fix For: 1.0.0
Upgrade nutch to use Lucene 2.4. This release changes the lucene file format.
New indexes created by this lucene version will NOT be readable by older
versions. Lucene 2.4 can read and update older index
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Fix For: 1.0.0
Upgrade Nutch to use a newer hadoop, version 0.18.2. This includes performance
improvements, bug fixes, and new functionality. Changes some current APIs.
--
This message is automatically generated by JIRA
[
https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-662:
---
Attachment: lucene-misc-2.4.0.jar
Upgrade Nutch to use Lucene 2.4
[
https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12649679#action_12649679
]
Dennis Kubes commented on NUTCH-662:
The upgrade to Lucene 2.4 causes a weird problem
[
https://issues.apache.org/jira/browse/NUTCH-662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Dennis Kubes updated NUTCH-662:
---
Attachment: lucene-analyzers-2.4.0.jar
Upgrade Nutch to use Lucene 2.4
1 - 100 of 383 matches
Mail list logo