[jira] Updated: (NUTCH-664) Possibility to update already stored documents.

2008-11-26 Thread Andrzej Bialecki (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrzej Bialecki  updated NUTCH-664:


  Priority: Minor  (was: Major)
Issue Type: Wish  (was: New Feature)

There is no proposed design, so this is a Wish.

 Possibility to update already stored documents.
 ---

 Key: NUTCH-664
 URL: https://issues.apache.org/jira/browse/NUTCH-664
 Project: Nutch
  Issue Type: Wish
Reporter: Sergey Khilkov
Priority: Minor

 We have huge index of stored documents. It is high cost procedure to fetch 
 page, merge indexes any time we update some information about page. The 
 information can be changed 1-3 times per day. At this moment we have to store 
 changed info in database, but in this case we have lots of problems with 
 sorting, search restricions and so on. Lucene itself allows delete single 
 document and add new one into existing index. But there is a problem with 
 hadoop... As I understand hadoop filesystem has no possibility to write in 
 random positions. But it will be great feature if nutch will be able to 
 update created index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-664) Possibility to update already stored documents.

2008-11-26 Thread Sergey Khilkov (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-664?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650912#action_12650912
 ] 

Sergey Khilkov commented on NUTCH-664:
--

Yes, It will be great to have changeDocument() method of IndexWriter class. 
Hope it's possible )

 Possibility to update already stored documents.
 ---

 Key: NUTCH-664
 URL: https://issues.apache.org/jira/browse/NUTCH-664
 Project: Nutch
  Issue Type: Wish
Reporter: Sergey Khilkov
Priority: Minor

 We have huge index of stored documents. It is high cost procedure to fetch 
 page, merge indexes any time we update some information about page. The 
 information can be changed 1-3 times per day. At this moment we have to store 
 changed info in database, but in this case we have lots of problems with 
 sorting, search restricions and so on. Lucene itself allows delete single 
 document and add new one into existing index. But there is a problem with 
 hadoop... As I understand hadoop filesystem has no possibility to write in 
 random positions. But it will be great feature if nutch will be able to 
 update created index.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-26 Thread Davide (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651009#action_12651009
 ] 

Davide commented on NUTCH-563:
--

Hi,
is it possible to apply this code also on Nutch 0.8.1? Can you explain me how?

Thanks

 Include custom fields in BasicQueryFilter
 -

 Key: NUTCH-563
 URL: https://issues.apache.org/jira/browse/NUTCH-563
 Project: Nutch
  Issue Type: New Feature
  Components: searcher
Reporter: julien nioche
Priority: Minor
 Fix For: 0.9.0

 Attachments: diff.BasicQueryFilter.dynamicFields.txt


 This patch allows to include additional fields in the BasicQueryFilter by 
 specifying runtime parameters.  Any parameter matching the regular expression 
 (query\\.basic\\.(.+)\\.boost) will be added to the list of fields to be 
 used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-665) Search Load Testing Tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-665:
---

Attachment: NUTCH-665-20081126-1.patch

Search load testing tool.

 Search Load Testing Tool
 

 Key: NUTCH-665
 URL: https://issues.apache.org/jira/browse/NUTCH-665
 Project: Nutch
  Issue Type: New Feature
  Components: searcher
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Priority: Minor
 Fix For: 1.0.0

 Attachments: NUTCH-665-20081126-1.patch


 A tool which spawn a number of threads and executes searches against 
 configured search servers.  This is used for light load testing of search 
 servers.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-647) Resolve URLs tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-647:
---

Attachment: NUTCH-647-2-20081126.patch

Updated patch.

 Resolve URLs tool
 -

 Key: NUTCH-647
 URL: https://issues.apache.org/jira/browse/NUTCH-647
 Project: Nutch
  Issue Type: New Feature
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Attachments: NUTCH-647-1-20080818.patch, NUTCH-647-2-20081126.patch


 A tool that takes a listing of urls and attempts to resolve their IP 
 addresses.  Useful for running after the fetcher has run to determine if DNS 
 problems exist.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)
Analysis plugins for multiple language and new Language Identifier Tool
---

 Key: NUTCH-666
 URL: https://issues.apache.org/jira/browse/NUTCH-666
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0


Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
russian, and thai.  Also includes a new Language Identifier tool that used the 
new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-666:
---

Attachment: NUTCH-666-1-20081126.patch

Part one of patch.  This includes the new analyzers for different languages.  
Part two will include the new language identifier tool.

 Analysis plugins for multiple language and new Language Identifier Tool
 ---

 Key: NUTCH-666
 URL: https://issues.apache.org/jira/browse/NUTCH-666
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: NUTCH-666-1-20081126.patch


 Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
 russian, and thai.  Also includes a new Language Identifier tool that used 
 the new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: NUTCH-663-1-20081126.patch

Updates jar and native files

 Upgrade Nutch to use Hadoop 0.18.2
 --

 Key: NUTCH-663
 URL: https://issues.apache.org/jira/browse/NUTCH-663
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: hadoop-0.19-native.tar.gz, NUTCH-663-1-20081126.patch


 Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
 performance improvements, bug fixes, and new functionality.  Changes some 
 current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: hadoop-0.19.0-core.jar

Hadoop core jar

 Upgrade Nutch to use Hadoop 0.18.2
 --

 Key: NUTCH-663
 URL: https://issues.apache.org/jira/browse/NUTCH-663
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: hadoop-0.19-native.tar.gz, hadoop-0.19.0-core.jar, 
 NUTCH-663-1-20081126.patch


 Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
 performance improvements, bug fixes, and new functionality.  Changes some 
 current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-663) Upgrade Nutch to use Hadoop 0.18.2

2008-11-26 Thread Dennis Kubes (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12650982#action_12650982
 ] 

Dennis Kubes commented on NUTCH-663:


hadoop 0.19 was release.  I am integrating it in and should have a patch 
shortly.

 Upgrade Nutch to use Hadoop 0.18.2
 --

 Key: NUTCH-663
 URL: https://issues.apache.org/jira/browse/NUTCH-663
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0


 Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
 performance improvements, bug fixes, and new functionality.  Changes some 
 current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Summary: Upgrade Nutch to use Hadoop 0.19  (was: Upgrade Nutch to use 
Hadoop 0.18.2)

change to 0.19 instead of 0.18.2

 Upgrade Nutch to use Hadoop 0.19
 

 Key: NUTCH-663
 URL: https://issues.apache.org/jira/browse/NUTCH-663
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: hadoop-0.19-native.tar.gz, hadoop-0.19.0-core.jar, 
 NUTCH-663-1-20081126.patch


 Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
 performance improvements, bug fixes, and new functionality.  Changes some 
 current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-666:
---

Attachment: (was: NUTCH-666-1-20081126.patch)

 Analysis plugins for multiple language and new Language Identifier Tool
 ---

 Key: NUTCH-666
 URL: https://issues.apache.org/jira/browse/NUTCH-666
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: NUTCH-666-1-20081126.patch


 Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
 russian, and thai.  Also includes a new Language Identifier tool that used 
 the new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: NUTCH-663-1-20081126.patch

Updated patch to include API changes in Nutch classes.

 Upgrade Nutch to use Hadoop 0.19
 

 Key: NUTCH-663
 URL: https://issues.apache.org/jira/browse/NUTCH-663
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: hadoop-0.19-native.tar.gz, hadoop-0.19.0-core.jar, 
 NUTCH-663-1-20081126.patch


 Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
 performance improvements, bug fixes, and new functionality.  Changes some 
 current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-663) Upgrade Nutch to use Hadoop 0.19

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-663:
---

Attachment: (was: NUTCH-663-1-20081126.patch)

 Upgrade Nutch to use Hadoop 0.19
 

 Key: NUTCH-663
 URL: https://issues.apache.org/jira/browse/NUTCH-663
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: hadoop-0.19-native.tar.gz, hadoop-0.19.0-core.jar, 
 NUTCH-663-1-20081126.patch


 Upgrade Nutch to use a newer hadoop, version 0.18.2.  This includes 
 performance improvements, bug fixes, and new functionality.  Changes some 
 current APIs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-635:
---

Attachment: (was: NUTCH-635-8-20080818.patch)

 LinkAnalysis Tool for Nutch
 ---

 Key: NUTCH-635
 URL: https://issues.apache.org/jira/browse/NUTCH-635
 Project: Nutch
  Issue Type: New Feature
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch, 
 NUTCH-635-3-20080614.patch, NUTCH-635-4-20080615.patch, 
 NUTCH-635-5-20080620.patch, NUTCH-635-6-20080725.patch, 
 NUTCH-635-7-20080808.patch, NUTCH-635-9-20081126.patch


 This is a basic pagerank type link analysis tool for nutch which simulates a 
 sparse matrix using inlinks and outlinks and converges after a given number 
 of iterations.  This tool is mean to replace the current scoring system in 
 nutch with a system that converges instead of exponentially increasing 
 scores.  Also includes a tool to create an outlinkdb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-635) LinkAnalysis Tool for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-635:
---

Attachment: NUTCH-635-9-20081126.patch

Updated final patch for new link analysis framework.  I am also going to write 
up some documentation on the wiki for how this new process works.

 LinkAnalysis Tool for Nutch
 ---

 Key: NUTCH-635
 URL: https://issues.apache.org/jira/browse/NUTCH-635
 Project: Nutch
  Issue Type: New Feature
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: NUTCH-635-1-20080612.patch, NUTCH-635-2-20080613.patch, 
 NUTCH-635-3-20080614.patch, NUTCH-635-4-20080615.patch, 
 NUTCH-635-5-20080620.patch, NUTCH-635-6-20080725.patch, 
 NUTCH-635-7-20080808.patch, NUTCH-635-9-20081126.patch


 This is a basic pagerank type link analysis tool for nutch which simulates a 
 sparse matrix using inlinks and outlinks and converges after a given number 
 of iterations.  This tool is mean to replace the current scoring system in 
 nutch with a system that converges instead of exponentially increasing 
 scores.  Also includes a tool to create an outlinkdb.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (NUTCH-667) Input Forma for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)
Input Forma for working with Content in Hadoop Streaming


 Key: NUTCH-667
 URL: https://issues.apache.org/jira/browse/NUTCH-667
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Priority: Minor
 Fix For: 1.0.0


This is a ContextAsText input format that removes line endings with spaces that 
allow Nutch content to be used more effectively inside of Hadoop streaming jobs 
that allow MapReduce jobs to be written in any language that can communicate 
with stdin and stdout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-666) Analysis plugins for multiple language and new Language Identifier Tool

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-666:
---

Attachment: NUTCH-666-1-20081126.patch

Fixed patch.  Now includes the changes to AnalyzerFactory to allow multiple 
languages per plugin.

 Analysis plugins for multiple language and new Language Identifier Tool
 ---

 Key: NUTCH-666
 URL: https://issues.apache.org/jira/browse/NUTCH-666
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 1.0.0

 Attachments: NUTCH-666-1-20081126.patch


 Add analysis plugins for czech, greek, japanese, chinese, korean, dutch, 
 russian, and thai.  Also includes a new Language Identifier tool that used 
 the new indexing framework in NUTCH-646.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-667) Input Forma for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-667:
---

Attachment: NUTCH-667-1-20081126.patch

Input format for working with hadoop streaming.

 Input Forma for working with Content in Hadoop Streaming
 

 Key: NUTCH-667
 URL: https://issues.apache.org/jira/browse/NUTCH-667
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Priority: Minor
 Fix For: 1.0.0

 Attachments: NUTCH-667-1-20081126.patch


 This is a ContextAsText input format that removes line endings with spaces 
 that allow Nutch content to be used more effectively inside of Hadoop 
 streaming jobs that allow MapReduce jobs to be written in any language that 
 can communicate with stdin and stdout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-667) Input Format for working with Content in Hadoop Streaming

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-667:
---

Summary: Input Format for working with Content in Hadoop Streaming  (was: 
Input Forma for working with Content in Hadoop Streaming)

 Input Format for working with Content in Hadoop Streaming
 -

 Key: NUTCH-667
 URL: https://issues.apache.org/jira/browse/NUTCH-667
 Project: Nutch
  Issue Type: Improvement
Affects Versions: 1.0.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
Priority: Minor
 Fix For: 1.0.0

 Attachments: NUTCH-667-1-20081126.patch


 This is a ContextAsText input format that removes line endings with spaces 
 that allow Nutch content to be used more effectively inside of Hadoop 
 streaming jobs that allow MapReduce jobs to be written in any language that 
 can communicate with stdin and stdout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



Troubles while creating a plugin

2008-11-26 Thread Pau
Hello,
I am creating a plugin for Nutch that extends the QueryFilter.
I get a successful compilation with ant and ant war, but when I do a
search, I get the following exception:

26/11/2008 18:50:07 org.apache.catalina.core.StandardWrapperValve invoke
SEVERE: Servlet.service() for servlet jsp threw exception
java.lang.NoClassDefFoundError: org/apache/commons/codec/DecoderException
at
org.apache.tika.mime.MimeTypesReader.readMatch(MimeTypesReader.java:272)
at
org.apache.tika.mime.MimeTypesReader.readMatches(MimeTypesReader.java:221)
at
org.apache.tika.mime.MimeTypesReader.readMagic(MimeTypesReader.java:201)
at
org.apache.tika.mime.MimeTypesReader.readMimeType(MimeTypesReader.java:164)
at
org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:138)
at
org.apache.tika.mime.MimeTypesReader.read(MimeTypesReader.java:121)
at
org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:56)
at org.apache.nutch.util.MimeUtil.init(MimeUtil.java:62)
at org.apache.nutch.protocol.Content.init(Content.java:85)
at
org.apache.nutch.personalizedsearch.searcher.context.ContextQueryFilter.filter(ContextQueryFilter.java:55)
at
org.apache.nutch.searcher.QueryFilters.filter(QueryFilters.java:111)
at
org.apache.nutch.searcher.IndexSearcher.search(IndexSearcher.java:96)
at org.apache.nutch.searcher.NutchBean.search(NutchBean.java:251)
at org.apache.jsp.search_jsp._jspService(search_jsp.java:284)
at
org.apache.jasper.runtime.HttpJspBase.service(HttpJspBase.java:70)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at
org.apache.jasper.servlet.JspServletWrapper.service(JspServletWrapper.java:393)
at
org.apache.jasper.servlet.JspServlet.serviceJspFile(JspServlet.java:320)
at org.apache.jasper.servlet.JspServlet.service(JspServlet.java:266)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:803)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:263)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:584)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

The DecoderException class is in commons-codec-1.3.jar, so I added the jar
file to my plugin.xml:
   runtime
  !-- As defined in build.xml this plugin will end up bundled as
recommended.jar --
  library name=personalized-search.jar
 export name=*/
  /library
  library name=commons-codec-1.3.jar /
   /runtime

But the same error appears. Any idea on what I may be doing wrong?
Thanks.


[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-26 Thread Jasper Kamperman (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651093#action_12651093
 ] 

Jasper Kamperman commented on NUTCH-563:


Hi Davide,

I never tried to apply it to 0.8, sorry.

Jasper



 Include custom fields in BasicQueryFilter
 -

 Key: NUTCH-563
 URL: https://issues.apache.org/jira/browse/NUTCH-563
 Project: Nutch
  Issue Type: New Feature
  Components: searcher
Reporter: julien nioche
Priority: Minor
 Fix For: 0.9.0

 Attachments: diff.BasicQueryFilter.dynamicFields.txt


 This patch allows to include additional fields in the BasicQueryFilter by 
 specifying runtime parameters.  Any parameter matching the regular expression 
 (query\\.basic\\.(.+)\\.boost) will be added to the list of fields to be 
 used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[Nutch Wiki] Update of PluginCentral by johnroman

2008-11-26 Thread Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on Nutch Wiki for change 
notification.

The following page has been changed by johnroman:
http://wiki.apache.org/nutch/PluginCentral

--
   * WritingPluginExample - A step-by-step example of how to write a plugin for 
the 0.7 branch. - updated by LucasBoullosa
   * [http://wiki.media-style.com/display/nutchDocu/Write+a+plugin Writing 
Plugins] - by Stefan
  
- == Plugins that Come with Nutch (0.7) ==
+ == Plugins that Come with Nutch (0.9) ==
  
  In order to get Nutch to use any of these plugins, you just need to edit your 
conf/nutch-site.xml file and add the name of the plugin to the list of 
plugin.includes.
  
@@ -24, +24 @@

   * '''parse-html''' - Parses HTML documents
   * '''parse-js''' - Parses Java``Script
   * '''parse-mp3''' - Parses MP3s
+  * '''parse-zip''' - Parses ZIP archives
+  * '''parse-mspowerpoint''' - Parses Microsoft Powerpoint files
   * '''parse-msword''' - Parses MS Word documents
+  * '''parse-msexcel''' - Parses MS Excel documents
   * '''parse-pdf''' - Parses PDFs
   * '''parse-rss''' - Parses RSS feeds
+  * '''parse-oo''' - Parses OpenOffice files
+  * '''parse-swf''' - Parses Shockwave Flash
   * '''parse-rtf''' - Parses RTF files
   * '''parse-text''' - Parses text documents
   * '''protocol-file''' - Retreives documents from the filesystem
@@ -47, +52 @@

   * '''lib-commons-httpclient'''
   * '''lib-http'''
   * '''lib-jakarta-poi'''
-  * '''lib-log4j'''
+  * '''lib-log4j''' 
-  * '''lib-lucene-analyzers'''
+  * '''lib-lucene-analyzers''' - Lucene analyzers
-  * '''lib-nekohtml'''
-  * '''lib-parsems'''
+  * '''lib-nekohtml''' - automatic tag balancer 
+  * '''lib-parsems''' - parse ms documents framework
   * '''parse-msexcel''' - Parses MS Excel documents
   * '''parse-mspowerpoint''' - Parses MS Powerpoint documents
   * '''parse-oo''' - Parses Open Office and Star Office documents 
(Extentsions: ODT, OTT, ODH, ODM, ODS, OTS, ODP, OTP, SXW, STW, SXC, STC, SXI, 
STI)


[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-26 Thread Davide (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651152#action_12651152
 ] 

Davide commented on NUTCH-563:
--

Hi Jasper,

could you explain me how to apply it? I can't find the right file to apply the 
diff..

Thank you a lot!

 Include custom fields in BasicQueryFilter
 -

 Key: NUTCH-563
 URL: https://issues.apache.org/jira/browse/NUTCH-563
 Project: Nutch
  Issue Type: New Feature
  Components: searcher
Reporter: julien nioche
Priority: Minor
 Fix For: 0.9.0

 Attachments: diff.BasicQueryFilter.dynamicFields.txt


 This patch allows to include additional fields in the BasicQueryFilter by 
 specifying runtime parameters.  Any parameter matching the regular expression 
 (query\\.basic\\.(.+)\\.boost) will be added to the list of fields to be 
 used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Updated: (NUTCH-646) New Indexing Framework for Nutch

2008-11-26 Thread Dennis Kubes (JIRA)

 [ 
https://issues.apache.org/jira/browse/NUTCH-646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dennis Kubes updated NUTCH-646:
---

Attachment: NUTCH-646-2-20081126.patch

Updated indexing patch.

 New Indexing Framework for Nutch
 

 Key: NUTCH-646
 URL: https://issues.apache.org/jira/browse/NUTCH-646
 Project: Nutch
  Issue Type: New Feature
  Components: indexer
Affects Versions: 0.9.0
 Environment: All
Reporter: Dennis Kubes
Assignee: Dennis Kubes
 Fix For: 0.9.0, 1.0.0

 Attachments: arity-1.3.2.jar, NUTCH-646-1-20080818.patch, 
 NUTCH-646-2-20081126.patch


 New indexing framework for Nutch that provides a more generic field 
 abstraction consistent with Lucene index semantics.  Allows multiple MR jobs 
 to be created for different fields and those fields to be aggregated and 
 indexed in the end.  Overcomes limitations of the current indexer that limits 
 what databases are passed into the indexer.  Creates a new extension point as 
 well for field-filters for manipulation of fields during the indexing process.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (NUTCH-563) Include custom fields in BasicQueryFilter

2008-11-26 Thread Jasper Kamperman (JIRA)

[ 
https://issues.apache.org/jira/browse/NUTCH-563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12651208#action_12651208
 ] 

Jasper Kamperman commented on NUTCH-563:


Hi Davide,

My laptop which has nutch-0.9 on it is in the shop so I can't verify where that 
file is, but I think it is altogether possible that nutch-0.8 doesn't yet have 
a file BasicQueryFilter.java .

Sorry I can't be of more help. I'm CC'ing the original author of the patch, but 
he just became Father, so it might be a while until you hear from him :-).

Jasper



 Include custom fields in BasicQueryFilter
 -

 Key: NUTCH-563
 URL: https://issues.apache.org/jira/browse/NUTCH-563
 Project: Nutch
  Issue Type: New Feature
  Components: searcher
Reporter: julien nioche
Priority: Minor
 Fix For: 0.9.0

 Attachments: diff.BasicQueryFilter.dynamicFields.txt


 This patch allows to include additional fields in the BasicQueryFilter by 
 specifying runtime parameters.  Any parameter matching the regular expression 
 (query\\.basic\\.(.+)\\.boost) will be added to the list of fields to be 
 used by the BQF and the specified float value will be used as boost.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.